Kimi-K2

State-of-the-art MoE language model for frontier knowledge, reasoning, and coding

2025-07-13

Kimi K2 is a state-of-the-art Mixture-of-Experts (MoE) language model developed by Moonshot AI. With 32 billion activated parameters and a staggering 1 trillion total parameters, Kimi K2 is designed to excel in frontier knowledge, reasoning, and coding tasks. The model is meticulously optimized for agentic capabilities, making it particularly adept at tool use, autonomous problem-solving, and reasoning.

Key Features

  • Large-Scale Training: Pre-trained on 15.5T tokens, Kimi K2 achieves zero training instability despite its massive scale.
  • MuonClip Optimizer: The model leverages the Muon optimizer at an unprecedented scale, incorporating novel techniques to resolve instabilities during scaling.
  • Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving.
  • Two Variants:
    • Kimi-K2-Base: A foundation model ideal for researchers and developers who need full control for fine-tuning and custom solutions.
    • Kimi-K2-Instruct: A post-trained model optimized for general-purpose chat and agentic experiences, offering reflex-grade performance without long thinking.

Model Architecture

Kimi K2 employs a Mixture-of-Experts (MoE) architecture with 1 trillion total parameters, 32 billion activated parameters, and 384 experts (8 selected per token). It features a 128K context length, MLA attention mechanism, and SwiGLU activation function.

Performance

Kimi K2 demonstrates exceptional performance across various benchmarks, including coding tasks (LiveCodeBench, OJBench), tool use tasks (Tau2, AceBench), and math & STEM tasks (AIME, MATH-500). It also excels in general tasks like MMLU and IFEval.

Deployment

Kimi K2 can be deployed using popular inference engines like vLLM, SGLang, KTransformers, and TensorRT-LLM. The model is available via an OpenAI/Anthropic-compatible API on the Moonshot AI platform. Local deployment examples and tool-calling capabilities are extensively documented.

Licensing

Both the code and model weights are released under the Modified MIT License. For inquiries, contact support@moonshot.cn.

Artificial Intelligence Language Models Mixture-of-Experts Machine Learning Natural Language Processing