AI News HubLIVE
站内改写6 分鐘閱讀

待翻譯:MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding

AI 服務暫時不可用,以下為來源摘要,待恢復後補全翻譯:MiniMax M3 introduces MiniMax Sparse Attention, a 1M-token context window, and native image, video, and computer use support. The post MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding appeared first on MarkTechPost.

來源MarkTechPost作者: Asif Razzaq

AI 服務暫時不可用,以下為來源正文,待恢復後補全翻譯。

MiniMax officially released MiniMax M3 on June 1, 2026. The model introduces MSA (MiniMax Sparse Attention), a new sparse attention architecture that gives M3 a 1M-token context window. M3 also supports image and video input and desktop computer operation natively. The API is live now. MiniMax M3 is available today via MiniMax Code, the MiniMax Token Plan, and the MiniMax API. It is the next model in the M-series line after M2.7. MiniMax positions M3 as an open-weight model combining frontier-level coding performance, a 1M-token context window, and native multimodal input in a single architecture — the first to do so, per MiniMax. The corresponding model weights and technical report are scheduled for release within 10 days of launch. MSA: MiniMax Sparse Attention The central architectural change in MiniMax M3 is MSA (MiniMax Sparse Attention). Standard full attention has quadratic computational complexity: as context length grows, compute cost grows as the square of the sequence length. MSA is designed to address this. Sparse attention mechanisms generally add a pre-filtering stage before computing attention, avoiding full quadratic cost. MiniMax team states that compared to approaches like DSA and MoBA, MSA partitions the KV cache into blocks more precisely, achieving higher effective context coverage. At the operator level, MSA uses a “KV outer gather Q” approach. KV blocks serve as the outer loop to aggregate the queries that hit them. Each block is read only once and memory access is contiguous. MiniMax team reports this is more than 4× faster than open-source implementations such as Flash-Sparse-Attention and flash-moba under MiniMax M3’s head configuration. The result: at a context length of 1 million tokens, MiniMax M3’s per-token compute is 1/20th that of the previous-generation M2 models. MiniMax team reports a speedup of more than 9× in the prefill stage and more than 15× in the decoding stage at 1M-token context. Across multiple ablation studies, MSA matched full attention on the majority of capabilities. Coding and Agentic Benchmarks Coding and agentic capabilities are key areas of improvement for M3. The benchmark results below are reported by MiniMax team. Several evaluations were run on MiniMax internal infrastructure, while some comparison scores were taken from official leaderboards or external benchmark sources, as noted in MiniMax’s methodology. SWE-Bench Verified was tested on internal infrastructure using Claude Code scaffolding and averaged over 4 runs. SWE-Bench Pro was also tested on internal infrastructure using Claude Code scaffolding, with testing logic aligned to the official evaluation. SWE-Bench Pro: 59.0% (surpasses GPT-5.5 and Gemini 3.1 Pro; approaches Opus 4.7) Terminal-Bench 2.1: 66.0% SWE-fficiency: 34.8% KernelBench Hard: 28.8% (evaluated on NVIDIA Blackwell GPUs, CUDA capability sm_120) MCP Atlas: 74.2% Claw-Eval: highest score among models evaluated (General Task Group, 161 tasks) SVG-Bench: surpasses Opus 4.7 On OmniDocBench, a multimodal document understanding benchmark, M3 scores above Gemini 3.1 Pro. On OSWorld-Verified (361 samples), M3 achieves a 70.06% task completion rate for computer use (Max Steps = 200). MiniMax also built an interactive user simulator framework for training and evaluation. It simulates multi-turn developer collaboration: requirement elaboration, solution discussion, feedback-based correction, continuous task switching, and multi-round project iteration. This is intended to reduce the gap between single-turn benchmark performance and real-world, multi-turn developer workflows. Native Multimodality MiniMax M3 underwent mixed-modality training from step 0. Text, images, and video are trained together from the beginning rather than added post-training. MiniMax team reports that interleaved data — sequences where text and images are naturally intermixed — is more critical to model performance than commonly assumed. After rebuilding the entire data pipeline for interleaved formats, training data was scaled to the order of 100 trillion tokens. MiniMax M3 supports image and video input and can operate a desktop computer. Real-World Task Examples from MiniMax MiniMax documents three internal tasks in the release post: Paper reproduction: MiniMax gave MiniMax M3 the ICLR 2025 Outstanding Paper Award-winning paper Learning Dynamics of LLM Finetuning and asked it to reproduce the experiments independently. M3 ran autonomously for nearly 12 hours, produced 18 commits and 23 experimental figures, and completed the core experiments without human intervention. It required multimodal capability to read curves and formulas, long context to hold the paper and experiment logs simultaneously, and coding capability to execute the reproduction across a long thread. CUDA kernel optimization: MiniMax asked MiniMax M3 to optimize an FP8 matrix multiplication (GEMM) kernel on NVIDIA Hopper architecture GPUs. The model started with only a task description, a benchmark evaluation script, and a non-functional Triton skeleton — no reference implementation was provided. Over approximately 24 hours, MiniMax M3 made 147 benchmark submissions and 1,959 tool calls. It progressed through baseline implementation, autotune configuration generation, performance bottleneck diagnosis, CUDA Graph integration, persistent kernel rewriting, and host-side scheduling optimization. After six landmark rounds of optimization, MiniMax M3 improved Hopper FP8 hardware peak utilization from 7.6% to 71.3%, a 9.4× speedup. The best solution appeared on the 145th submission. MiniMax notes that most other models stopped making new progress within the first 30 submissions; only Opus 4.7 and M3 continued beyond that point. PostTrainBench (autonomous model training): MiniMax gave MiniMax M3 four base models that had completed pretraining only. MiniMax M3 autonomously ran the full data synthesis → training → evaluation → iteration cycle over 12 hours with no human intervention. The target was for the base models to acquire capabilities across mathematical reasoning (AIME2025), tool calling (BFCL), scientific knowledge reasoning (GPQA Main), arithmetic reasoning (GSM8K), and code generation (HumanEval). MiniMax M3 scored 0.37, below Opus 4.7 (0.42) and GPT-5.5 (0.39), but ahead of the other models tested. Marktechpost’s Visual Explainer Overview MiniMax M3: Frontier Coding, 1M-Token Context, Native Multimodality MiniMax officially released M3 on June 1, 2026. The API is live now. Model weights and technical report will be open-sourced within 10 days. M3 is the next model in the M-series line after M2.7. MiniMax positions it as the first open-weight model to combine all three of the following in a single architecture: 1M Token Context Window 59.0% SWE-Bench Pro Score MSA Sparse Attention Architecture 70.06% OSWorld-Verified (Computer Use) Architecture MSA: MiniMax Sparse Attention Standard full attention has quadratic computational complexity. As context length grows, compute cost grows as the square of the sequence length. MSA is designed to solve this at the operator level. Compared to approaches like DSA and MoBA, MSA partitions the KV cache into blocks more precisely, achieving higher effective context coverage. MSA uses a “KV outer gather Q” approach — each KV block is read only once, memory access is contiguous, and arithmetic intensity is significantly better than common methods. >9× Prefill Speedup at 1M ctx >15× Decoding Speedup at 1M ctx 1/20 Per-token compute vs M2 at 1M >4× Faster than Flash-Sparse-Attn Benchmarks Coding and Agentic Performance Results reported by MiniMax. SWE-Bench Verified used Claude Code scaffolding, averaged over 4 runs. SWE-Bench Pro used Claude Code scaffolding, aligned to official evaluation. SWE-Bench Pro: 59.0% — surpasses GPT-5.5 and Gemini 3.1 Pro; approaches Opus 4.7 Terminal-Bench 2.1: 66.0% SWE-fficiency: 34.8% KernelBench Hard: 28.8% — evaluated on NVIDIA Blackwell GPUs (sm_120) MCP Atlas: 74.2% Claw-Eval: Highest score among models evaluated (161 tasks) SVG-Bench: Surpasses Opus 4.7 OmniDocBench: Above Gemini 3.1 Pro OSWorld-Verified: 70.06% — 361 samples, Max Steps = 200 Multimodality Native Multimodal Training from Step 0 M3 underwent mixed-modality training from step 0. Text, images, and video are trained together from the start — not added as a post-training capability. MiniMax reports that interleaved data — sequences where text and images are naturally intermixed — is more critical to model performance than commonly assumed. After rebuilding the entire data pipeline for interleaved formats, training data was scaled to the order of 100 trillion tokens. M3 supports: Image input Video input Desktop computer operation (computer use) Real-World Tasks Three Internal Tasks Documented by MiniMax Paper Reproduction — M3 reproduced the ICLR 2025 paper Learning Dynamics of LLM Finetuning autonomously over ~12 hours, producing 18 commits and 23 experimental figures with no human intervention. CUDA Kernel Optimization — M3 optimized an FP8 GEMM kernel on NVIDIA Hopper GPUs over ~24 hours: 147 benchmark submissions, 1,959 tool calls, 6 landmark optimization rounds. Improved Hopper FP8 peak utilization from 7.6% → 71.3% (9.4× speedup). Best solution appeared on submission 145. PostTrainBench — M3 autonomously ran data synthesis → training → evaluation → iteration for 4 base models over 12 hours. Scored 0.37, below Opus 4.7 (0.42) and GPT-5.5 (0.39), but ahead of other evaluated models. Targets: AIME2025, BFCL, GPQA Main, GSM8K, HumanEval. MiniMax Code MiniMax Code: Agent Product Built and Trained with M3 MiniMax Code is an agent product built and trained together with M3. Available at agent.minimaxi.com/download. Works with MiniMax Token Plans. Agent Teams — multiple agents run concurrent, multi-stage, dynamically adjustable workflows Producer + Verifier loop — adversarial harness enables continuous self-correction during execution Computer use — M3’s native multimodal capability enables cross-application desktop automation Built on OpenCode and Pi — MiniMax states it plans to open-source MiniMax Code in the future // Example use case User (on phone): “Open the local ERP client and batch-enter invoice data from this Excel file.” → MiniMax Code handles operations across applications, files, and systems on desktop. API & Pricing API Details and Token Plan Tiers The M3 API is live at platform.minimax.io. Pricing by input length: Calls ≤512K tokens → standard rate. Calls >512K → higher long-context rate. Thinking mode: Toggle on/off at request time. Both modes share the same pricing. Service tiers: standard (default) and priority (service_tier=priority) — priority available via sales, opening to all users soon. Plus ~1.7B tokens/mo $20/mo Max ~5.1B tokens/mo $50/mo Ultra ~9.8B tokens/mo $120/mo Text, image, speech, and music usage all draw from the same token pool. Key Takeaways What Engineers and Researchers Need to Know MiniMax M3 launched June 1, 2026. API is live. Open model weights and technical report committed within 10 days. MSA delivers >9× prefill and >15× decoding speedup at 1M-token context vs M2, at 1/20th the per-token compute. M3 scores 59.0% on SWE-Bench Pro, surpassing GPT-5.5 and Gemini 3.1 Pro. Natively multimodal from step 0 — supports image, video input, and 70.06% on OSWorld-Verified for computer use. Thinking mode toggleable at request time. Token Plan starts at $20/month (~1.7B M3 tokens). 1 / 8 Key Takeaways MiniMax M3 launched June 1, 2026; API is live now. MiniMax has committed to releasing open model weights and a technical report within 10 days. MSA (MiniMax Sparse Attention) delivers more than 9× prefill and more than 15× decoding speedup at 1M-token context versus M2, at 1/20th the per-token compute. M3 scores 59.0% on SWE-Bench Pro, surpassing GPT-5.5 and Gemini 3.1 Pro. M [truncated for AI cost control]