DeepSeek-VL2
MoE vision-language, now easier to access
2025-02-10

DeepSeek-VL2 are open-source vision-language models with strong multimodal understanding, powered by an efficient MoE architecture. Easily test them out with the new Hugging Face demo.
DeepSeek-VL2 is an open-source series of Mixture-of-Experts (MoE) vision-language models designed for advanced multimodal understanding. It excels in tasks like visual question answering, OCR, document understanding, and visual grounding. The series includes three variants—Tiny, Small, and Standard—with 1.0B, 2.8B, and 4.5B activated parameters, respectively, offering competitive performance with fewer parameters than many existing models. DeepSeek-VL2 supports incremental prefilling for efficient GPU memory usage, making it accessible for diverse research and commercial applications. Available on Hugging Face, it provides a user-friendly demo and is licensed for both academic and commercial use.
Open Source
Artificial Intelligence
GitHub