Ollama v0.7

Run leading vision models locally with the new engine

2025-05-19

Ollama v0.7
Ollama v0.7 introduces a new engine for first-class multimodal AI, starting with vision models like Llama 4 & Gemma 3. Offers improved reliability, accuracy, and memory management for running LLMs locally.
Ollama v0.7 introduces a groundbreaking engine designed to run advanced multimodal AI models locally, starting with vision models like Llama 4 and Gemma 3. The update focuses on enhanced reliability, accuracy, and memory management, enabling seamless local execution of large language models. Key features include improved modularity, where each model operates independently, reducing integration complexity for developers. The engine also optimizes memory usage with techniques like image caching and KV cache optimizations, tailored to hardware specifications. Examples showcase capabilities like analyzing video frames, recognizing objects across multiple images, and translating documents. Future plans include support for longer context sizes, advanced reasoning, and tool calling. Ollama v0.7 sets the foundation for integrating additional modalities like speech and video generation, making it a powerful tool for local AI inference.
Open Source Artificial Intelligence GitHub Development