Installation#
In this section, we will demonstrate how to install PDF-Extract-Kit.
Best Practices#
We recommend users follow our best practices for installing PDF-Extract-Kit. It is recommended to use a Python 3.10 conda virtual environment for the installation.
Step 1. Create a Python 3.10 virtual environment using conda.
$ conda create -n pdf-extract-kit-1.0 python=3.10 -y
$ conda activate pdf-extract-kit-1.0
Step 2. Install the dependencies for PDF-Extract-Kit.
$ # For GPU devices
$ pip install -r requirements.txt
$ # For CPU-only devices
$ pip install -r requirements-cpu.txt
Note
For the convenience of user environment configuration, requirements.txt only includes the environment needed for the current best models, which currently include:
Layout Detection: YOLO series (YOLOv10, DocLayout-YOLO)
Formula Detection: YOLO series (YOLOv8)
Formula Recognition: UniMERNet
OCR: PaddleOCR
For other models, such as LayoutLMv3, additional environment setup is required. For details, see Layout Detection Algorithms.