Layout Detection Algorithm#

Introduction#

Layout detection is a fundamental task in document content extraction, aiming to locate different types of regions on a page, such as images, tables, text, and headings, to facilitate high-quality content extraction. For text and heading regions, OCR models can be used for text recognition, while table regions can be converted using table recognition models.

Model Usage#

Layout detection supports following models：

Model	Description	Characteristics	Model weight	Config file
DocLayout-YOLO	Improved based on YOLO-v10： 1. Generate diverse pre-training data，enhance generalization ability across multiple document types 2. Model architecture improvement, improve perception ability on scale-varing instances Details in DocLayout-YOLO	Speed:Fast, Accuracy:High	doclayout_yolo_ft.pt	layout_detection.yaml
YOLO-v10	Base YOLO-v10 model	Speed:Fast, Accuracy:Moderate	yolov10l_ft.pt	layout_detection_yolo.yaml
LayoutLMv3	Base LayoutLMv3 model	Speed:Slow, Accuracy:High	layoutlmv3_ft	layout_detection_layoutlmv3.yaml

Once enciroment is setup, you can perform layout detection by executing scripts/layout_detection.py directly.

Run demo

$ python scripts/layout_detection.py --config configs/layout_detection.yaml

Model Configuration#

1. DocLayout-YOLO / YOLO-v10

inputs: assets/demo/layout_detection
outputs: outputs/layout_detection
tasks:
  layout_detection:
    model: layout_detection_yolo
    model_config:
      img_size: 1024
      conf_thres: 0.25
      iou_thres: 0.45
      model_path: path/to/doclayout_yolo_model
      visualize: True

inputs/outputs: Define the input file path and the directory for visualization output.
tasks: Define the task type, currently only a layout detection task is included.
model: Specify the specific model type, e.g., layout_detection_yolo.
model_config: Define the model configuration.
img_size: Define the image long edge size; the short edge will be scaled proportionally based on the long edge, with the default long edge being 1024.
conf_thres: Define the confidence threshold, detecting only targets above this threshold.
iou_thres: Define the IoU threshold, removing targets with an overlap greater than this threshold.
model_path: Path to the model weights.
visualize: Whether to visualize the model results; visualized results will be saved in the outputs directory.

2. layoutlmv3

Note

LayoutLMv3 cannot run directly by default. Please follow the steps below to modify the configuration:

Detectron2 Environment Setup

# For Linux
pip install https://wheels-1251341229.cos.ap-shanghai.myqcloud.com/assets/whl/detectron2/detectron2-0.6-cp310-cp310-linux_x86_64.whl

# For macOS
pip install https://wheels-1251341229.cos.ap-shanghai.myqcloud.com/assets/whl/detectron2/detectron2-0.6-cp310-cp310-macosx_10_9_universal2.whl

# For Windows
pip install https://wheels-1251341229.cos.ap-shanghai.myqcloud.com/assets/whl/detectron2/detectron2-0.6-cp310-cp310-win_amd64.whl

Enable LayoutLMv3 Registration Code

Uncomment the lines at the following links:

from pdf_extract_kit.tasks.layout_detection.models.yolo import LayoutDetectionYOLO
from pdf_extract_kit.tasks.layout_detection.models.layoutlmv3 import LayoutDetectionLayoutlmv3
from pdf_extract_kit.registry.registry import MODEL_REGISTRY

__all__ = [
   "LayoutDetectionYOLO",
   "LayoutDetectionLayoutlmv3",
]

inputs: assets/demo/layout_detection
outputs: outputs/layout_detection
tasks:
  layout_detection:
    model: layout_detection_layoutlmv3
    model_config:
      model_path: path/to/layoutlmv3_model

inputs/outputs: Define the input file path and the directory for visualization output.
tasks: Define the task type, currently only a layout detection task is included.
model: Specify the specific model type, e.g., layout_detection_layoutlmv3.
model_config: Define the model configuration.
model_path: Path to the model weights.

Diverse Input Support#

The layout detection script in PDF-Extract-Kit supports input formats such as a single image, a directory containing only image files, a single PDF file, and a directory containing only PDF files.

Note

Modify the path to inputs in configs/layout_detection.yaml according to your actual data format: - Single image: path/to/image - Image directory: path/to/images - Single PDF file: path/to/pdf - PDF directory: path/to/pdfs

Note

When using PDF as input, you need to change predict_images to predict_pdfs in layout_detection.py.

# for image detection
detection_results = model_layout_detection.predict_images(input_data, result_path)

Change to:

# for pdf detection
detection_results = model_layout_detection.predict_pdfs(input_data, result_path)

Viewing Visualization Results#

When visualize is set to True in the config file, the visualization results will be saved in the outputs directory.

Note

Visualization is helpful for analyzing model results, but for large-scale tasks, it is recommended to turn off visualization (set visualize to False ) to reduce memory and disk usage.

Layout Detection Algorithm

Contents

Layout Detection Algorithm#

Introduction#

Model Usage#

Model Configuration#

Diverse Input Support#

Viewing Visualization Results#