Kimi K2 is an open agentic intelligence platform featuring tools for Excel formulas, document conversion, AI agent deployment, code assistance, and research advancements like Kimi K2.6 and Agent Swarm.
Kimi open-sources the Vendor Verifier (KVV) to help users verify the accuracy of inference implementations of open-source models. It includes six critical benchmarks for detecting common deployment issues and encourages infrastructure providers to fix root causes.
KVV includes pre-verification, OCRBench, MMMU Pro, AIME2025, tool call test, and SWE-Bench.
It promotes transparency through a public leaderboard of vendor results.
Kimi K2.5 is an open-source multimodal model that delivers state-of-the-art performance in coding and vision tasks. It features a self-directed agent swarm capable of orchestrating up to 100 sub-agents for parallel execution, reducing task completion time by up to 4.5x. The model also excels in office productivity, handling complex documents, spreadsheets, and presentations. Available on multiple platforms, Kimi K2.5 represents a significant step toward AGI for the open-source community.
State-of-the-art open-source model for coding with vision capabilities.
Self-directed agent swarm with up to 100 sub-agents for parallel workflows.
WorldVQA is a new benchmark to evaluate factual correctness of MLLMs on visual world knowledge. It includes 3,500 high-quality image-question pairs across 9 categories, with a focus on head vs tail distribution. Frontier models achieve below 50% accuracy, revealing overconfidence and gaps in visual knowledge.
WorldVQA benchmark tests multimodal LLMs on atomic visual world knowledge with 3,500 high-quality image-question pairs.
Models struggle significantly, with top models scoring below 50% accuracy, especially on long-tail knowledge.
Kimi launches Agent Swarm, a multi-agent architecture enabling up to 100 parallel sub-agents for horizontal scaling. The system self-organizes into roles like CEO, researcher, and analyst, autonomously decomposing tasks, assigning agents, and synthesizing results. It is up to 4.5x faster than sequential execution and excels in broad research, batch processing, and multi-perspective analysis. Now available in preview for top-tier subscribers.
Agent Swarm enables horizontal scaling with up to 100 parallel sub-agents and 1,500+ tool calls, achieving 4.5x speedup over sequential execution.
The system self-organizes into agent teams (e.g., CEO, researchers, analysts) without human micromanagement.
Kimi K2.6 is a new open-source model with state-of-the-art coding, long-horizon execution, and agent swarm capabilities. This blog details its features, benchmarks, and community feedback.
Kimi K2.6 achieves state-of-the-art coding performance with long-horizon execution
Excels in benchmarks like SWE-Bench, Terminal-Bench, and BrowseComp