ContextGem
Effortless LLM extraction from documents
2025-04-24

ContextGem is a free, open-source LLM framework that makes it radically easier to extract structured data and insights from documents — with minimal code.
ContextGem is a free, open-source framework designed to simplify structured data extraction from documents using large language models (LLMs). Unlike traditional frameworks that require extensive boilerplate code, ContextGem minimizes development overhead with intuitive abstractions. It supports automated prompts, multilingual I/O, precise reference mapping, and built-in validation, all while eliminating manual setup. The tool excels at extracting entities, themes, and anomalies from text and images, with features like concurrent processing and cost tracking. It integrates seamlessly with both cloud and local LLMs via LiteLLM and offers serialization for saving workflows. ContextGem focuses on single-document accuracy, leveraging long-context LLMs for deeper insights without retrieval inconsistencies. Ideal for developers, it includes converters for DOCX files and detailed documentation. Apache 2.0 licensed, it’s built by Shcherbak AI, now part of Microsoft for Startups.
Open Source
Developer Tools
Artificial Intelligence
GitHub