ScrapeGraphAI is an innovative Python library designed to revolutionize web scraping by leveraging Large Language Models (LLMs) and direct graph logic. It enables users to create efficient scraping pipelines for both websites and local documents, including XML, HTML, JSON, and Markdown files. With ScrapeGraphAI, users can simply specify the information they want to extract, and the library handles the rest.
Key Features:
- SmartScraperGraph: Extracts information from a single page based on a user prompt and source URL.
- Multi-page Scraping: Supports extraction from multiple pages with parallel LLM calls.
- Multiple Pipelines: Offers various pipelines like SearchGraph, SpeechGraph, and ScriptCreatorGraph for diverse scraping needs.
- LLM Integration: Compatible with OpenAI, Groq, Azure, Gemini, and local models via Ollama.
- SDKs Available: Provides Python and Node.js SDKs for easy integration.
ScrapeGraphAI is ideal for data exploration and research, offering a powerful, AI-driven solution for effortless data extraction. The library is open-source, MIT licensed, and welcomes contributions from the community.