AI News HubLIVE
In-site rewrite13 min read

Kimi K2.5 Tech Blog: Visual Agentic Intelligence

Kimi K2.5 is an open-source multimodal model that delivers state-of-the-art performance in coding and vision tasks. It features a self-directed agent swarm capable of orchestrating up to 100 sub-agents for parallel execution, reducing task completion time by up to 4.5x. The model also excels in office productivity, handling complex documents, spreadsheets, and presentations. Available on multiple platforms, Kimi K2.5 represents a significant step toward AGI for the open-source community.

SourceKimi Blog

Kimi K2.5 Tech Blog: Visual Agentic Intelligence

Features

Sheets

Build Excel formulas, pivots & charts

Docs

Create, convert & review documents

Kimi Claw

Deploy 24/7 AI agents in one click

Kimi Code

AI Code Agent for Terminal & IDE

Kimi WebBridge

A browser extension for AI agents

Research

Kimi K2.6

Advancing Open-Source Coding

Agent Swarm

Scale Out, Not Just Up

WorldVQA

Atomic World Knowledge in MLLMs

Kimi K2.5

Visual Agentic Intelligence

Kimi Vendor Verifier

Rebuilding the Chain of Trust

Kimi K2 Thinking

Open-source thinking model

Kimi K2

Open Agentic Intelligence

Resources

Hermes Agent Overview

Hermes API Integration

OpenClaw SaaS

How to Deploy OpenClaw

How to Install OpenClaw on Mac

How to Install OpenClaw on Windows

AI Tools for Excel

Vibe Coding Guide

How to Vibe Code

How to Build a Website from Scratch

Refactor moonshot.ai with Kimi Code CLI

Pricing

Help Center

Getting Started

Agent Mode

Websites

Docs & Sheets

Slides

Deep Research

Kimi Claw

Membership

Kimi Code

Kimi API

Others

Kimi K2.6

Try Kimi

<Research

Kimi K2.5: Visual Agentic Intelligence ​

Today, we are introducing Kimi K2.5, the most powerful open-source model to date.

Kimi K2.5 builds on Kimi K2 with continued pretraining over approximately 15T mixed visual and text tokens. Built as a native multimodal model, K2.5 delivers state-of-the-art coding and vision capabilities and a self-directed agent swarm paradigm.

For complex tasks, Kimi K2.5 can self-direct an agent swarm with up to 100 sub-agents, executing parallel workflows across up to 1,500 tool calls. Compared with a single-agent setup, this reduces execution time by up to 4.5x. The agent swarm is automatically created and orchestrated by Kimi K2.5 without any predefined subagents or workflow.

Kimi K2.5 is available via Kimi.com, the Kimi App, the API, and Kimi Code. Kimi.com & Kimi App now supports 4 modes: K2.5 Instant, K2.5 Thinking, K2.5 Agent, and K2.5 Agent Swarm (Beta). Agent Swarm is currently in beta on Kimi.com, with free credits available for high-tier paid users.

Agents

HLE-Full

BrowseComp

DeepSearchQA

Coding

SWE-Bench Verified

SWE-Bench Multilingual

Image

MMMU Pro

MathVision

OmniDocBench 1.5

Video

VideoMMMU

LongVideoBench

Across three agentic benchmarks—HLE, BrowseComp, and SWE-Verified—Kimi K2.5 delivers strong performance at a fraction of the cost.

  1. Coding with Vision ​

Kimi K2.5 is the strongest open-source model to date for coding, with particularly strong capabilities in front-end development.

K2.5 can turn simple conversations into complete front-end interfaces, implementing interactive layouts and rich animations such as scroll-triggered effects. Below are examples generated by K2.5 from a single prompt with image-gen tool:

Beyond text prompts, K2.5 excels at coding with vision. By reasoning over images and video, K2.5 improves image/video-to-code generation and visual debugging, lowering the barrier for users to express intent visually.

Here is an example of K2.5 reconstructing a website from video:

Original Jesko Jet website designed by @thefirstthelast.agency

This capability stems from massive-scale vision-text joint pre-training. At scale, the trade-off between vision and text capabilities disappears — they improve in unison.

Below is an example of K2.5 reasoning over a puzzle and marking the shortest path using code:

Chat

Find the shortest path from the top-left corner (green dot) to the bottom-right corner (red dot), where black represents the road.

Find Shortest Path in Image Using BFS or A*

The user wants me to find the shortest path through a maze from the top-left corner (green dot) to the bottom-right corner (red dot). Black represents the road (passable), and presumably white represents walls.

Let me analyze the image:

It's a complex maze pattern with black and white pixels

The green dot should be at the top-left corner

The red dot should be at the bottom-right corner

I need to find the shortest path through the black areas

This is essentially a pathfinding problem on a grid. I can treat this as a graph where:

Each black pixel is a node

Edges connect adjacent black pixels (up, down, left, right, or possibly diagonally depending on interpretation)

The goal is to find the shortest path from start to end

The standard algorithm for this is A* (A-star) or Dijkstra's algorithm. Given the grid nature, BFS (Breadth-First Search) would also work since all edges have equal weight.

Approach:

Load the image using Python (PIL/Pillow or OpenCV)

Convert to binary/black and white to identify passable areas

Find the start (green) and end (red) points, or assume they are at top-left and bottom-right corners

Implement BFS or A* to find the shortest path

Visualize the path on the image

Let me start by loading and analyzing the image.

I'll help you find the shortest path through this maze. Let me analyze the image and implement a pathfinding algorithm.

Execute Python code

Locate green and red corner dots in image

The corners appear to be white. The problem statement mentions a green dot at top-left and red dot at bottom-right. Let me search for green and red pixels in the image. Green is typically [0, 255, 0] or close to it, and red is [255, 0, 0] or close.

I need to scan the image for green and red pixels. Since the image is large (1503x3003), I should look specifically around the corners first, or check for non-black and non-white colors.

Let me convert to grayscale first and threshold to see the maze structure, then look for colored pixels.

Let me search for the green (start) and red (end) dots in the image:

Execute Python code

Convert image to binary maze and locate nearest black start/end points

It seems there are no pure green or red pixels. The image appears to be grayscale or nearly so, with very dark colors (near black) being the paths. Let me look more carefully at the extreme corners to see if there's a specific pattern, or perhaps I need to look for the darkest/brightest pixels.

Actually, looking at the description again: "Find the shortest path from the top-left corner (green dot) to the bottom-right corner (red dot), where black represents the road."

The user mentioned green and red dots, but they might not be visible in the image provided, or they might be so small they're hard to detect. The statement says "where black represents the road" - so black pixels are passable.

Let me check if the corners have distinct values. I should look for:

Start: Top-left corner (0,0)

End: Bottom-right corner (height-1, width-1)

But I need to check if these are actually on the road (black). Looking at the previous output, the top-left pixel is [255, 255, 255] (white), which is not black. The bottom-right is [255, 254, 255] (almost white). So the actual start and end points might be nearby.

Let me convert to grayscale and threshold to create a binary maze, then find the nearest black pixel to each corner as the actual start/end points.

Let me convert the image to a binary maze and find the nearest road (black) pixels to the corners:

Execute Python code

BFS Shortest Path with Parent Tracking and Reconstruction

Good! I found the start point at (7, 3) - which is near the top-left corner, and the end point at (1495, 2999) - near the bottom-right corner.

Now I need to implement BFS (Breadth-First Search) to find the shortest path. The maze is quite large (1503x3003 ≈ 4.5 million pixels), but BFS should be able to handle it efficiently if I use deque.

The maze appears to be a perfect maze (no cycles, single path), but I should implement BFS anyway to guarantee the shortest path.

Steps:

BFS from start to end

Track the path using a parent dictionary

Reconstruct the path

Visualize the result

Now I'll implement BFS to find the shortest path through this maze:

Execute Python code

Overlay Maze Path on Image with Matplotlib for Visualization

Excellent! BFS found a path of 113,557 steps. Now I need to visualize this path on the maze. Given the maze is very large and the path is long, I'll create a visualization where I overlay the path in red on the original image.

I should:

Create a copy of the original image or the binary maze

Draw the path in a distinct color (red or bright green)

Display it

Given the size, I might want to either:

Show the full maze with the path (might be hard to see details)

Or show a zoomed portion

Or create an animation (but that's complex)

Let me first try to visualize the full path on the maze. Since the image is large, I'll plot it and overlay the path.

Now let me visualize the shortest path on the maze:

Execute Python code

Visualize Shortest Path with Thicker Lines and Gradient Color

The path is quite long (113,557 steps) and wiggles through the entire maze. The red line shows up but is a bit thin. Let me also create a zoomed-in view of the start and end to verify the path is correct, and perhaps create a cleaner visualization with thicker lines.

Also, I should verify the path is actually valid by checking that consecutive points are adjacent (distance 1) and all are on black pixels.

Actually, looking at the visualization, it's hard to see the path clearly because it's so dense. Let me create a better visualization by drawing lines between consecutive points with thicker linewidth, or by creating a gradient color effect to show the progression.

Let me also verify the path length and that it's indeed the shortest (which BFS guarantees for unweighted graphs like this).

Let me create a clearer visualization showing the path progression with color coding:

Execute Python code

Verify End Region for Path Validation and Save Final Result Image

Perfect! The path is valid (all consecutive steps are adjacent). Now let me also show the end region to verify, and save the final result image.

Let me also show the end region and save the final result:

Execute Python code

BFS finds shortest path of 113,557 steps in maze

Perfect! I have successfully found the shortest path through the maze. The results show:

Algorithm: BFS (Breadth-First Search) - guarantees the shortest path in unweighted grids

Path length: 113,557 steps (pixels)

Start point: (7, 3) - near top-left

End point: (1495, 2999) - near bottom-right

Verification: All 113,556 consecutive steps are valid (Manhattan distance = 1 between points)

The visualizations show:

The full maze with the path colored in gradient (green → red → blue) showing progression

Zoomed start region showing the green start point and initial path

Zoomed end region showing the red path leading to the blue end point

The solution has been saved to a file

I should provide the final answer to the user now with a summary and the path coordinates if they need them.

Solution Found!

I successfully found the shortest path through the maze using the BFS (Breadth-First Search) algorithm, which guarantees the optimal solution for unweighted grid mazes.

Results:

Table

Copy

MetricValue

Path Length113,557 steps

Start Position(7, 3) - near top-left corner

End Position(1495, 2999) - near bottom-right corner

AlgorithmBFS (optimal for unweighted graphs)

VerificationAll steps valid (4-directional movement)

Key Observations:

Path Visualization: The solution is visualized with a color gradient:

Green indicates the start region (top-left)

Red/Yellow indicates the middle progression

Blue indicates the end region (bottom-right)

Maze Structure: This is a dense "perfect maze" (essentially a spanning tree) with approximately 50% wall density, meaning the path winds through roughly half of the available space.

Complexity: With nearly 4.5 million pixels to search, the BFS algorithm efficiently explored all possible paths level-by-level to guarantee the shortest route.

Download:

The full solution image has been saved and can be viewed above. The path is highlighted in color against the original black (road) and white (wall) maze structure.

Note: The coordinates (row, column) are 0-indexed from the top-left. The actual path consists of 113,557 connected pixels forming a continuous route from start to finish without any shortcuts possible (as guaranteed by BFS optimality).

K2.5 excels in real-world software engineering tasks. We evaluate it using Kimi Code Bench, our internal coding benchmark covering diverse end-to-end tasks — from building to debugging, refactoring, testing, and scripting — across multiple programming languages. On this benchmark, K2.5 shows consistent and meaningful improvements over K2 across task types.

To try out K2.5's agentic coding capabilities, K2.5 Agent offers a set of preconfigured tools for immediate, hands-on experiences. For software engineering use cases, we recommend pairing Kimi K2.5 with our new coding product, Kimi Code.

Kimi Code works in your terminal and can be integrated with various IDEs including VSCode, Cursor, Zed, etc. Kimi Code is open-sourced and supports images and videos as inputs. It also automatically discovers and migrates existing skills and MCPs into your working environment in Kimi Code.

Here's an example using Kimi Code to translate the aesthetic of Matisse's La Danse into the Kimi App. This demo highlights a breakthrough in autonomous visual debugging. Using visual inputs and documentation lookup, K2.5 visually inspects its own output and iterates on it autonomously. It creates an art-inspired webpage created end to end:

  1. Agent Swarm ​

Scaling Out, Not Just Up. We release K2.5 Agent Swarm as a research preview, marking a shift from single-agent scaling to self-directed, coordinated swarm-like execution.

Trained with Parallel-Agent Reinforcement Learning (PARL), K2.5 learns to self-direct an agent swarm of up to 100 sub-agents, executing parallel workflows across up to 1,500 coordinated steps, without predefined roles or hand-crafted workflows.

PARL uses a trainable orchestrator agent to decompose tasks into parallelizable subtasks, each executed by dynamically instantiated, frozen subagents. Running these subtasks concurrently significantly reduces end-to-end latency compared to sequential agent execution.

Training a reliable parallel orchestrator is challenging due to delayed, sparse, and non-stationary feedback from independently running subagents. A common failure mode is serial collapse, where the orchestrator defaults to single-agent execution despite having parallel capacity. To address this, PARL employs staged reward shaping that encourages parallelism early in training and gradually shifts focus toward task success.

We define the reward as

rPARL(x,y)=λ1⋅rparallel⏟instantiation reward+λ2⋅rfinish⏟sub-agent finish rate+rperf(x,y)⏟task-level outcome.

The performance reward rperf evaluates the overall success and quality of the solution y for a given task x. This is augmented by two auxiliary rewards, each addressing a distinct challenge in learning parallel orchestration. The reward rparallel is introduced to mitigate serial collapse—a local optimum where the orchestrator defaults to single-agent execution. By incentivizing subagent instantiation, this term encourages the exploration of concurrent scheduling spaces. The rfinish reward focuses on the successful completion of assigned subtasks. It is used to prevent spurious parallelism, a reward-hacking behavior in which the orchestrator increases parallel metrics dramatically by spawning many subagents without meaningful task decomposition. By rewarding completed subtasks, rfinish enforces feasibility and guides the policy toward valid and effective decompositions. The hyperparameters λ1 and λ2 are annealed to zero over the course of training.

To further force parallel strategies to emerge, we introduce a computational bottleneck that makes sequential execution impractical. Instead of counting total steps, we evaluate performance using Critical Steps, a latency-oriented metric inspired by the critical path in parallel computation:

CriticalSteps=∑t=1T(Smain(t)+maxiSsub,i(t))

Smain(t)captures orchestration overhead, while maxiSsub,i(t) reflects the slowest subagent at each stage. Under this metric, spawning more subtasks only helps if it shortens the critical path.

An agent swarm has an orchestrator that dynamically creates specialized subagents (e.g., AI Researcher, Physics Researcher, Fact Checker) and decomposes complex tasks into parallelizable subtasks for efficient distributed execution.

In our parallel-agent reinforcement learning environment, the reward increases smoothly as training progresses. At the same time, the level of parallelism during training also gradually increases.

K2.5 Agent Swarm improves performance on complex tasks through parallel, specialized execution. In our internal evaluations, it leads to an 80% reduction in end-to-end runtime while enabling more complex, long-horizon workloads, as shown below.

Agent Swarm reduces the minimum critical steps required to achieve target performance by 3×–4.5× compared to single-agent execution in wide search scenario, with savings scaling as targets rise—translating to up to 4.5× wall-clock time reduction via parallelization.

Here are representative trajectories demonstrating K2.5 Agent Swarm in action:

Parallel at Scale

100 Sub-agents Hunting for Creators

Parallel at Scale

Tool Use at Scale

Output at Scale

Download at Scale

The task is to identify the top three YouTube creators across 100 niche domains. K2.5 Agent Swarm first researches and defines each domain, then autonomously creates 100 sub-agents to conduct parallel searches. Each sub-agent identifies leading creators within its assigned niche, and the results—300 YouTuber profiles—are aggregated into a structured spreadsheet.

  1. Office Productivity ​

Kimi K2.5 brings agentic intelligence into real-world knowledge work.

K2.5 Agent can handle high-density, large-scale office work end to end. It reasons over large, high-density inputs, coordinates multi-step tool use, and delivers expert-level outputs: documents, spreadsheets, PDFs, and slide decks—directly through conversation.

With a focus on real-world professional tasks, we design two internal expert productivity benchmarks. The AI Office Benchmark evaluates end-to-end Office output quality, while the General Agent Benchmark measures multi-step, production-grade workflows against human expert performance. Across both benchmarks, K2.5 shows 59.3% and 24.3% improvements over K2 Thinking, reflecting stronger end-to-end performance on real-world tasks.

Internal Expert Productivity Bench (AI Office)

K2.5 agent supports advanced tasks such as adding annotations in Word, constructing financial models with Pivot Tables, and writing LaTeX equations in PDFs, while scaling to long-form outputs like 10,000-word papers or 100-page documents.

Tasks that once took hours or days now complete in minutes. Here are some examples:

Spreadsheet

100-Shot Storyboard in Spreadsheet with Images

Spreadsheet

Doc

PDF

Slide

  1. Conclusion ​

Grounded in advances in coding with vision, agent swarms, and office productivity, Kimi K2.5 represents a meaningful step toward AGI for the open-source community, demonstrating strong capability on real-world tasks under real-world constraints. Looking ahead, we will push further into the frontier of agentic intelligence, redefining the boundaries of AI in knowledge work.

Appendix ​

Benchmark table ​

BenchmarkKimi K2.5 (Thinking)GPT-5.2 (xhigh)Claude 4.5 Opus (Extend Thinking)Gemini 3 Pro (High Thinking Level)DeepSeek V3.2 (Thinking)Qwen3-VL-235B-A22B (Thinking)

Reasoning & Knowledge

HLE-Full

30.1 34.5 30.8 37.5 25.1* —

HLE-Full w/ tools

50.2 45.5 43.2 45.8 40.8* —

AIME 2025

96.1 100.0 92.8 95.0 93.1 —

HMMT 2025 (Feb)

95.4 99.4 92.9* 97.3* 92.5 —

IMO-AnswerBench

81.8 86.3 78.5* 83.1* 78.3 —

GPQA-Diamond

87.6 92.4 87.0 91.9 82.4 —

MMLU-Pro

87.1 86.7* 89.3* 90.1 85.0 —

Image & Video

MMMU-Pro

78.5 79.5 74.0 81.0 — 69.3

CharXiv (RQ)

77.5 82.1 67.2* 81.4 — 66.1

MathVision

84.2 83.0 77.1* 86.1 — 74.6

MathVista (mini)

90.1 82.8* 80.2* 89.8* — 85.8

ZeroBench

9.0 9.0* 3.0* 8.0* — 4.0*

ZeroBench w/ tools

11.0 7.0* 9.0* 12.0* — 3.0*

OCRBench

92.3 80.7* 86.5* 90.3* — 87.5

OmniDocBench 1.5

88.8 85.7 87.7* 88.5 — 82.0*

InfoVQA (test)

92.6 84.0* 76.9* 57.2* — 89.5

SimpleVQA

71.2 55.8* 69.7* 69.7* — 56.8*

WorldVQA

46.3 28.0 36.8 47.4 — 23.5

VideoMMMU

86.6 85.9 84.4* 87.6 — 80.0

MMVU

80.4 80.8* 77.3 77.5 — 71.1

MotionBench

70.4 64.8 60.3 70.3 — —

VideoMME

87.4 86.0 — 88.4* — 79.0

LongVideoBench

79.8 76.5 67.2 77.7* — 65.6*

LVBench

75.9 — — 73.5* — 63.6

Coding

SWE-Bench Verified

76.8 80.0 80.9 76.2 73.1 —

SWE-Bench Pro

50.7 55.6 55.4* — — —

SWE-Bench Multilingual

73.0 72.0 77.5 65.0 70.2 —

Terminal-Bench 2.0

50.8 54.0 59.3 54.2 46.4 —

PaperBench

63.5 63.7* 72.9* — 47.1 —

CyberGym

41.3 — 50.6 39.9* 17.3* —

SciCode

48.7 52.1 49.5 56.1 38.9 —

OJBench (cpp)

57.4 — 54.6* 68.5* 54.7* —

LiveCodeBench (v6)

85.0 — 82.2* 87.4* 83.3 —

Long Context

Longbench v2

61.0 54.5* 64.4* 68.2* 59.8* —

AA-LCR

70.0 72.3* 71.3* 65.3* 64.3* —

Agentic Search

BrowseComp

60.6 — 37.0 37.8 51.4 —

BrowseComp(w/ctx mgm)

74.9 65.8 57.8 59.2 67.6 —

BrowseComp(Agent Swarm)

78.4 — — — — —

WideSearch (item-f1)

72.7 — 76.2* 57.0 32.5* —

WideSearch (item-f1)(Agent Swarm)

79.0 — — — — —

DeepSearchQA

77.1 71.3* 76.1* 63.2* 60.9* —

FinSearchCompT2&T3

67.8 — 66.2* 49.9 59.1* —

Seal-0

57.4 45.0 47.7* 45.5* 49.5* —

Computer Use

OSWorld-Verified

63.3 8.6* 66.3 20.7* — 38.1

WebArena

58.9 — 63.4* — — 26.4*

To reproduce official Kimi-K2.5 benchmark results, we recommend using the official API. For third-party providers, refer to Kimi Vendor Verifier (KVV) to choose high-accuracy services. Details: https://kimi.com/blog/kimi-vendor-verifier

Footnotes ​

  1. General Testing Details

We report results for Kimi K2.5 and DeepSeek-V3.2 with thinking mode enabled, Claude Opus 4.5 with extended thinking mode, GPT-5.2 with xhigh reasoning effort, and Gemini 3 Pro with a high thinking level. For vision benchmarks, we additionally report results for Qwen3-VL-235B-A22B-Thinking.

Unless otherwise specified, all Kimi K2.5 experiments were conducted with temperature = 1.0, top-p = 0.95, and a context length of 256k tokens.

Benchmarks without publicly available scores were re-evaluated under the same conditions used for Kimi K2.5 and are marked with an asterisk (*).

We could not evaluate GPT-5.2 xhigh on all benchmarks due to service stability issues. For benchmarks that were not tested, we mark them as "-".

  1. Text and Reasoning

HLE, AIME 2025, HMMT 2025 (Feb), GPQA-Diamond and IMO-AnswerBench were evaluated with a maximum completion budget of 96k tokens.

Results for AIME and HMMT are averaged over 32 runs (avg@32); GPQA-Diamond over 8 runs (avg@8).

For HLE, we report scores on the full set (text & image). Kimi K2.5 scores 31.5 (text) and 21.3 (image) without tools, and 51.8 (text) and 39.8 (image) with tools. The DeepSeek-V3.2 score corresponds to its text-only subset (marked with †) . Hugging Face access was blocked to prevent potential data leakage. HLE with tools uses simple context management: once the context exceeds a threshold, only the latest round of tool messages is retained.

  1. Tool-Augmented / Agentic Search

Kimi K2.5 was equipped with search, code-interpreter, and web-browsing tools for HLE with tools and all agentic search benchmarks.

Except for BrowseComp (where K2.5 and DeepSeek-V3.2 used the discard-all strategy), no context management was applied, and tasks exceeding the supported context length were directly counted as failed.

The test system prompts emphasize deep and proactive tool use, instructing models to reason carefully, leverage tools, and verify uncertain information. Full prompts will be provided in the technical report.

Results for Seal-0 and WideSearch are averaged over four runs (avg@4).

  1. Vision Benchmarks

Max-tokens = 64k, averaged over three runs (avg@3).

ZeroBench (w/ tools) uses max-tokens-per-step = 24k and max-steps = 30 for multi-step reasoning.

MMMU-Pro follows the official protocol, preserving input order and prepending images.

GPT-5.2-xhigh had ~10% failure rate (no output despite 3 retries), treated as incorrect; reported scores likely underestimate true performance.

WorldVQA, a benchmark designed to evaluate atomic vision-centric world knowledge. Access WorldVQA at https://github.com/MoonshotAI/WorldVQA.

OmniDocBench Score is computed as (1 − normalized Levenshtein distance) × 100, where a higher score denotes superior accuracy.

  1. Coding Tasks

Terminal-Bench 2.0 scores were obtained with the default agent framework (Terminus-2) and the provided JSON parser. In our implementation, we evaluated Terminal-Bench 2.0 under non-thinking mode. This choice was made because our current context management strategy for the thinking mode is incompatible with Terminus-2.

For the SWE-Bench series of evaluations (including verified, multilingual, and pro), we used an internally developed evaluation framework. This framework includes a minimal set of tools—bash tool, createfile tool, insert tool, view tool, strreplace tool, and submit tool—along with tailored system prompts designed for the tasks. The highest scores were achieved under non-thinking mode.

The score of Claude Opus 4.5 on CyberGym is reported under the non-thinking setting.

All reported scores of coding tasks are averaged over 5 independent runs.

  1. Long-Context Benchmarks

AA-LCR: scores averaged over three runs (avg@3).

LongBench-V2: identical prompts and input contexts standardized to ~128k tokens.

  1. Agent Swarm

BrowseComp (Swarm Mode): main agent max 15 steps; sub-agents max 100 steps.

WideSearch (Swarm Mode): main and sub-agents max 100 steps.

Products

Kimi Open Platform Kimi Code Pricing

Features

AI Agent Agent Swarm AI Website Builder AI Document Agent AI Slides Generator AI Sheets Agent Deep Research

Company

Moonshot AI Terms of Service Privacy Policy