OCR (Optical Character Recognition) Algorithm#

Introduction#

OCR(Optical Character Recognition) involves identifying the positions ajnd contents of all text blocks in pictures.

Model Usage#

With the environment properly set up, simply run the ocr algorithm script by executing scripts/ocr.py .

$ python scripts/ocr.py --config configs/ocr.yaml

Model Configuration#

inputs: assets/demo/ocr
outputs: outputs/ocr
visualize: True
tasks:
   ocr:
      model: ocr_ppocr
      model_config:
         lang: ch
         show_log: True
         det_model_dir: models/OCR/PaddleOCR/det/ch_PP-OCRv4_det
         rec_model_dir: models/OCR/PaddleOCR/rec/ch_PP-OCRv4_rec
         det_db_box_thresh: 0.3
  • inputs/outputs: Define the input path and the output path, respectively.

  • visualize: Whether to visualize the model results. Visualized results will be saved in the outputs directory.

  • tasks: Define the task type, currently only a OCR task is included.

  • model: Define the specific model type, currently, only the PaddleOCR model is available.

  • model_config: Define the model configuration.

  • lang: Define the language, default language ch supports both english and chinese.

  • show_log: Whether to print running logs.

  • det_model_dir: Define the path of PaddleOCR’ detection model, If the specified path does not exist, the model weight will be automatically downloaded to the path.

  • rec_model_dir: Define the path of PaddleOCR’ recognize model, If the specified path does not exist, the model weight will be automatically downloaded to the path.

  • det_db_box_thresh: Confidence filter threshold, bounding boxes whose confidence is lower than the threshold are discarded.

Diverse Input Support#

The OCR script in PDF-Extract-Kit supports various input formats such as a single image/PDF, a directory of image/PDF files.

Viewing Visualization Results#

When the visualize option in the config file is set to True, visualization results will be saved in the outputs directory.

Note

Visualization facilitates the analysis of model results. However, for large-scale tasks, it is recommended to disable visualization (set visualize to False ) to reduce memory and disk usage.