MiniCPM-V 4.5

GPT-4o level vision model on the phone

2025-08-26

MiniCPM-V 4.5
MiniCPM-V 4.5 is a new 8B open-source MLLM that delivers GPT-4o level performance on your phone. It excels at image, video, and document understanding, beating top proprietary models on key benchmarks like OCRBench.
MiniCPM-V 4.5 is an open-source multimodal language model with 8 billion parameters, designed to run efficiently on mobile devices. It offers GPT-4o level performance for processing images, videos, and documents, surpassing many proprietary and larger open-source models in benchmarks like OCRBench and OpenCompass. The model features advanced capabilities including high-resolution image handling, efficient video compression for up to 10FPS understanding, and support for over 30 languages. It also includes a controllable fast/deep thinking mode for adaptable problem-solving and strong OCR and document parsing. Easy to integrate, it supports local inference on iOS and other platforms, fine-tuning, and web demos, making it a versatile tool for developers and researchers.
Open Source Artificial Intelligence Development