2026-06-13站内改写5 min readUpdated: 2026-06-15

Kimi K2.7 Code

Kimi K2.7 Code is the latest coding model with breakthroughs in long-horizon tasks, 256K context, and strong reasoning. It supports multimodal tool calling and comes with detailed usage examples and best practices.

SourceHacker News AIAuthor: cmogni1

Overview of Kimi K2.7 Code Model

Kimi K2.7 Code, our most capable coding model to date. It follows instructions more reliably in long contexts, completes coding tasks with higher success rates.

Long-horizon coding capability breakthrough

K2.7 Code has achieved a breakthrough in long-horizon coding tasks, demonstrating more reliable generalization across diverse programming languages (such as Rust, Go, and Python) and task scenarios (including frontend development, DevOps, and performance optimization).

Ultra-Long Context Support

kimi-k2.7-code, kimi-k2.6, kimi-k2.5 models all provide a 256K context window.

Long-Thinking Capabilities

Kimi K2.7 Code still has strong reasoning capabilities, supporting multi-step tool invocation and reasoning, excelling at solving complex problems, such as complex logical reasoning, mathematical problems, and code writing.

Kimi K2.7 Code does not support non-thinking mode.

Example Usage

Here is a complete usage example to help you quickly get started with the Kimi K2.7 Code model.

Install the OpenAI SDK

Kimi API is fully compatible with OpenAI’s API format. You can install the OpenAI SDK as follows:

pip install --upgrade 'openai>=1.0'

Verify the Installation

python -c 'import openai; print("version =",openai.version)'

The output may be version = 1.10.0, indicating the OpenAI SDK was installed successfully and your Python environment is using OpenAI SDK v1.10.0.

Quick Start

Try it now: Test model performance in your business scenarios through interactive operations in the Dev Workbench

Apply for API Key: Test via API call immediately

Multimodal Tool Capability Example

Kimi K2.7 Code model combines multiple capabilities. The following example demonstrates K2.7 Code’s visual understanding + tool calling capabilities. First, download this sample video to your local machine, such as ~/Download/test_video.mp4

Then run the following code:

import base64 import json import os import subprocess import tempfile from pathlib import Path from openai import OpenAI

tools = [{ "type": "function", "function": { "name": "watch_video_clip", "description": "Watch a video file or a sub-clip of it. If start_time and end_time are not provided, the entire video will be returned.", "parameters": { "type": "object", "properties": { "path": { "type": "string", "description": "The path to the video file to watch" }, "start_time": { "type": "number", "description": "The start time of the clip in seconds (optional, defaults to 0)" }, "end_time": { "type": "number", "description": "The end time of the clip in seconds (optional, defaults to end of video)" } }, "required": ["path"] } } }]

def watch_video_clip(path: str, start_time: float | None = None, end_time: float | None = None) -> list[dict]: """ Watch a video file or a sub-clip of it.

Args: path: The path to the video file to watch start_time: The start time in seconds (optional, defaults to 0) end_time: The end time in seconds (optional, defaults to end of video)

Returns: A list of content blocks in MultiModal Tool API format """

video_path = Path(path) if not video_path.exists(): raise FileNotFoundError(f"Video file not found: {path}")

Get video duration if needed

if start_time is None and end_time is None:

Return entire video

with open(path, "rb") as f: video_base64 = base64.b64encode(f.read()).decode("utf-8") return [ {"type": "video_url", "video_url": {"url": f"data:video/mp4;base64,{video_base64}"}}, {"type": "text", "text": f"Full video: {video_path.name}"} ]

Get video duration for defaults

probe = subprocess.run( ["ffprobe", "-v", "quiet", "-print_format", "json", "-show_format", path], capture_output=True, text=True ) duration = float(json.loads(probe.stdout)["format"]["duration"])

start_time = start_time or 0 end_time = end_time or duration clip_duration = end_time - start_time

Extract clip

with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as tmp: tmp_path = tmp.name

try: subprocess.run([ "ffmpeg", "-y", "-ss", str(start_time), "-i", path, "-t", str(clip_duration), "-c:v", "libx264", "-c:a", "aac", "-preset", "fast", "-crf", "23", "-movflags", "+faststart", "-loglevel", "error", tmp_path ], check=True)

with open(tmp_path, "rb") as f: video_base64 = base64.b64encode(f.read()).decode("utf-8")

return [ {"type": "video_url", "video_url": {"url": f"data:video/mp4;base64,{video_base64}"}}, {"type": "text", "text": f"Clip from {video_path.name}: {start_time}s - {end_time}s"} ] finally: if os.path.exists(tmp_path): os.unlink(tmp_path)

client = OpenAI( api_key=os.environ.get("MOONSHOT_API_KEY"), base_url="https://api.moonshot.ai/v1" )

def agent_loop(user_message: str): """Simple agent loop with multimodal tool support."""

messages = [ {"role": "system", "content": "You are a video analysis assistant. Use watch_video_clip to examine specific portions of videos."}, {"role": "user", "content": user_message} ]

while True: response = client.chat.completions.create( model="kimi-k2.7-code", messages=messages, tools=tools, tool_choice="auto" ) message = response.choices[0].message messages.append(message.model_dump())

No tool calls = done

if not message.tool_calls: return message.content

Execute tool calls

for tool_call in message.tool_calls: if tool_call.function.name == "watch_video_clip": args = json.loads(tool_call.function.arguments) result = watch_video_clip( path=args["path"], start_time=args.get("start_time"), end_time=args.get("end_time") )

Multimodal tool result

messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": result })

Usage

answer = agent_loop("Analyze what happens between seconds 8-13 in ~/Download/test_video.mp4") print(answer)

Best Practices

Supported Formats

Images are supported in formats: png, jpeg, webp, gif.

Videos are supported in formats: mp4, mpeg, mov, avi, x-flv, mpg, webm, wmv, 3gpp.

Token Calculation and Billing

Image and video token usage is dynamically calculated. You can use the token estimation API to check the expected token consumption for a request containing images or video before processing. Generally, the higher the resolution of an image, the more tokens it will consume. For videos, the number of tokens depends on the number of keyframes and their resolution—the more keyframes and the higher their resolution, the greater the token consumption. The Vision model uses the same billing method as the moonshot-v1 model series, with charges based on the total number of tokens processed. For more information, see: For token pricing details, refer to Model Pricing.

Recommended Resolution

We recommend that image resolution should not exceed 4k (4096×2160), and video resolution should not exceed 2k (2048×1080). Higher resolutions will only increase processing time and will not improve the model’s understanding.

Upload File or Base64?

Due to the limitation on the overall size of the request body, for very large videos you must use the file upload method to utilize vision capabilities.For images or videos that will be referenced multiple times, it is recommended to use the file upload method. Regarding file upload limitations, please refer to the File Upload documentation. Image quantity limit: The Vision model has no limit on the number of images, but ensure that the request body size does not exceed 100M URL-formatted images: Not supported, currently only supports base64-encoded image content

Parameters Differences in Request Body

Parameters are listed in chat. However, behaviour of some parameters may be different in k2.7-code/k2.6/k2.5 models. We recommend using the default values instead of manually configuring these parameters. Differences are listed below.

FieldRequiredDescriptionTypeValues

max_tokensoptionalThe maximum number of tokens to generate for the chat completion.intDefault to be 32k aka 32768

thinkingoptionalNew! This parameter controls if the thinking is enabled for this requestobjectDefault to be {"type": "enabled"}. Kimi K2.7 Code model will throw an error if the thinking mode is disabled.

temperatureoptionalThe sampling temperature to usefloatKimi K2.7 Code model will use a fixed value 1.0. Any other value will result in an error

top_poptionalA sampling methodfloatKimi K2.7 Code model will use a fixed value 0.95. Any other value will result in an error

noptionalThe number of results to generate for each input messageintKimi K2.7 Code model will use a fixed value 1. Any other value will result in an error

presence_penaltyoptionalPenalizing new tokens based on whether they appear in the textfloatKimi K2.7 Code model will use a fixed value 0.0. Any other value will result in an error

frequency_penaltyoptionalPenalizing new tokens based on their existing frequency in the textfloatKimi K2.7 Code model will use a fixed value 0.0. Any other value will result in an error

Tool Use Compatibility

When using tools, please note the following constraints to ensure model performance:

tool_choice can only be set to “auto” or “none” (default is “auto”) to avoid conflicts between reasoning content and the specified tool_choice. Any other value will result in an error;

During multi-step tool calling, you must keep the reasoning_content from the assistant message in the current turn’s tool call within the context, otherwise an error will be thrown;

Model Pricing

For token pricing details, refer to Model Pricing.

Learn More

For the benchmark testing with Kimi K2.7 Code, please refer to this benchmark best practice

For the most detailed API usage example of Kimi K2.7 Code, see: How to Use Kimi Vision Model

See How to Use Kimi K2 in Claude Code, Roo Code, and Cline

Learn how to configure and use the Thinking Model

For all model pricing see here, Billing & Rate Limit details, and Web Search Pricing

Was this page helpful?

Quickstart with Kimi APIKimi K2.6 Multi-modal Model

⌘I