Marker is a high-performance tool designed to convert documents into structured formats such as markdown, JSON, and HTML. It supports a wide range of input formats including PDF, image files, PPTX, DOCX, XLSX, HTML, and EPUB. The tool excels in preserving complex elements like tables, equations, forms, and code blocks while removing unnecessary artifacts like headers and footers.
Key features include:
- Multi-format Support: Handles various document types and languages.
- Accuracy and Speed: Benchmarks favorably against cloud services and open-source alternatives, with a throughput of up to 122 pages/second on an H100 GPU.
- Extensibility: Allows customization through user-defined formatting and logic.
- LLM Integration: Optional use of large language models (LLMs) to enhance accuracy, especially for complex layouts and table extraction.
- GPU/CPU Compatibility: Works efficiently on different hardware setups.
Marker is particularly useful for researchers, developers, and businesses needing reliable document conversion. It offers a hosted API for scalable usage and is licensed under cc-by-nc-sa-4.0, with commercial options available for qualifying organizations.