AI News HubLIVE
站内改写6 分で読了

翻訳待ち:[AINews] NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark

AI サービスが一時的に利用できないため、復旧後に翻訳を補完します。ソース概要:Jensen scores a huge win.

ソースLatent Space

AI サービスが一時的に利用できないため、復旧後に翻訳を補完します。

Today’s podcast guest was the lead on NVIDIA Cosmos over a year ago, discussing training videogen and world models. Fittingly, Cosmos 3 launched today, unifying language, image, video, audio and action in a Mixture-of-Transformers architecture that pairs an autoregressive reasoner with a diffusion generator in: base Nano (16B: 8B reasoner tower + 8B generator tower) Super (64B: 32B reasoner tower + 32B generator tower) models, and Super finetunes for Text2Image and Image2Video, which are now the new SOTA open weights imagegen and videogen models, just below Nano Banana 2 At Computex in Taiwan, Jensen also brought the heat with Nemotron 3 Ultra, their 550B-A55B, remarkably efficient/fast open weights LLM that is the new US SoTA: Finally, the RTX Spark personal computer 1 petaflop superchip, was previewed with Microsoft and OpenClaw and Hermes Agent as a launch partner (good analysis here) AI News for 5/30/2026-6/1/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies! AI Twitter Recap NVIDIA’s Cosmos 3, Nemotron 3 Ultra, and the Push for Open Physical AI NVIDIA’s open-source week: NVIDIA dominated the open-model conversation with Cosmos 3, an open family of omnimodal world models for physical AI, plus the announcement of Nemotron 3 Ultra, a 550B open-weight model that several posters called the strongest U.S. open model so far. Cosmos 3 was framed as a full-stack release—weights, code, datasets, and fine-tuning recipes—with NVIDIA also launching the Cosmos Coalition alongside partners including Runway to build an open ecosystem for world models @NVIDIAAI ecosystem context, @runwayml coalition announcement, @kimmonismus Cosmos thread, @ClementDelangue on NVIDIA’s HF footprint. Why Cosmos 3 mattered technically: Beyond robotics rhetoric, the more concrete details were that Cosmos 3 unifies language, image, video, audio, and action in a single Mixture-of-Transformers design pairing an autoregressive reasoner with a diffusion generator. Artificial Analysis said Cosmos 3 reached #1 among open-weight models on both their Text-to-Image and Image-to-Video leaderboards, noting the generator uses structured JSON prompts and can be driven either by an external prompt-upsampling harness or its own reasoner branch. Separately, NVIDIA’s hardware + software push extended to adoption of the OpenMDW framework and partner ecosystem integrations on platforms like fal @ArtificialAnlys, @fal. Nemotron 3 Ultra reception: Community reaction to Nemotron 3 Ultra was unusually strong for a fresh open release. Posters highlighted both capability and serving characteristics, including claims that it is already topping some open evals and may be serving at 300+ tok/s in some setups—far faster than large DeepSeek/Kimi-class models @scaling01, @ctnzr, @caspar_br. There was also some technical discussion that Nemotron appears less sparse than peers like Kimi K2 / DeepSeek V4—roughly ~10% active vs ~3%—which could affect both economics and behavior @eliebakouch. MiniMax M3, Qwen3.7-Plus, and JetBrains Mellum2 Expand the Open Agent Model Field MiniMax M3’s launch was the day’s biggest model release: M3 was presented as an open-weight multimodal agent/coding model with 1M context, native multimodality, and competitive agent benchmarks. The headline figures repeated across launch partners were 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, and 74.2% MCP Atlas @MiniMax_AI, @PBDTokenRouter, @kimmonismus. Multiple infra vendors shipped day-0 support—Novita, Vercel AI Gateway, Cloudflare AI Gateway, OpenClaude, Flowith, and others—suggesting unusually fast ecosystem adoption @MiniMax_AI on Novita, @rauchg, @gitlawb. Benchmarks vs practical experience were mixed: M3 earned praise for frontend generation, visual/game tasks, and price-performance, with side-by-side demos showing strong one-shot UI/game outputs and notable benchmark placement for Next.js agent evals @notjazii, @lostinlatencyX, @rauchg. But several evaluators also reported high token consumption, verbose self-check loops, and occasional requirement drift on long tasks, making M3 look more like a “quality first, efficiency later” model @ZhihuFrontier review, @teortaxesTex skepticism. Qwen3.7-Plus: Alibaba launched Qwen3.7-Plus as a multimodal interactive hybrid agent that unifies GUI and CLI operation, visual reasoning, coding, and search-augmented QA. It is API-available via Alibaba Cloud Model Studio and was quickly added to tools like Cline @Alibaba_Qwen launch, @cline. The launch reinforces the trend that open-ish Asian labs are no longer releasing “just chat models,” but full agent-capable multimodal systems. JetBrains Mellum2: JetBrains released Mellum2, a 12B MoE model with 2.5B active parameters, trained on roughly 11T tokens and post-trained with RLVR, shipping base / SFT / RL checkpoints and a technical report @nv_pavlichenko, @jetbrains. The intended niche is especially interesting: ultra-low-latency inference for routing, RAG, sub-agents, and IDE use, and it landed in vLLM immediately @vllm_project. This looks like a serious “small fast open model for developer workflows” play rather than a benchmark-chasing frontier release. Agents, Sandboxes, Memory, and Search Are Becoming the Real Product Surface The stack is shifting from model calls to agent runtimes: Several launches converged on the idea that the main engineering leverage is now in the harness rather than the model. Perplexity’s “Search as Code” is the clearest example: instead of iterative search tool calls, the model writes Python against a search SDK, enabling custom ranking pipelines, map-reduce over indexes, batching, aggregation, and lower token overhead. Perplexity reports a jump on its internal WANDR benchmark from 0.152 to 0.386 with this architecture @perplexity_ai, @AravSrinivas. Managed agents + sandboxes are becoming standard: Google detailed Managed Agents in the Gemini API, where a single API call can spin up an agent that reasons, writes/runs code, manages files, and operates inside a hosted Linux sandbox @_philschmid, @GoogleAIStudio. LangChain pushed similar ideas around Deep Agents, Context Hub, and LangSmith Sandboxes/Engine, emphasizing persistent context, agent lifecycle tooling, and automated failure triage @LangChain, @hwchase17. Memory remains a missing primitive: One recurring complaint was that enormous context windows still don’t solve cross-session memory. A thread on HydraDB argued that “RAG + manual context injection” has been misnamed as memory, while actual persistent session knowledge remains underserved @kimmonismus. Related research threads pointed to reusable context management policies like AdaCoM, which trains a separate LLM via RL to prune/preserve context for frozen agents @dair_ai. Security remains the gating issue for enterprise agents: There was a notable warning from Microsoft Security Intelligence about a major npm supply chain compromise affecting 90+ redhat-cloud-services packages, including a self-propagating worm stealing npm/GitHub/AWS/SSH credentials @MsftSecIntel. At the same time, enterprise agent vendors highlighted sandboxing, runtime isolation, and security stack integration as prerequisites for deployment, including discussion of NVIDIA OpenShell and LangChain’s sandbox keynote @shannholmberg, @LangChain. Codex, Claude Code, and the Competitive Coding-Agent Race OpenAI extended Codex into more places: OpenAI announced that frontier models and Codex are now generally available on AWS / Amazon Bedrock, aimed squarely at enterprises that want OpenAI capabilities inside existing AWS security/compliance workflows @OpenAI, @OpenAIDevs. OpenAI also shipped a Codex Python SDK supporting threads, turns, streaming, resume, images, and sandbox control @reach_vb, plus support for Bedrock-backed Codex workflows @reach_vb on Bedrock config. Claude Code had a real ops incident: Anthropic reset 5-hour and weekly rate limits for Pro and Max users after fixing a bug where some Opus 4.8 sessions spawned too many parallel subagents/tool calls, burning usage unexpectedly @ClaudeDevs, follow-up. That’s a notable reminder that coding-agent product quality is increasingly determined by orchestration behavior, not just raw model IQ. Behavioral differences across coding models remain material: Developers highlighted large qualitative differences between GPT, Claude, and other models on benchmarks like ProgramBench and WeirdML, with Opus sometimes preferring exploration over score-maximization or showing benchmark-specific quirks @OfirPress, @htihle. A separate long thread argued newer Claude Opus 4.6–4.8 variants can fabricate plausible but fictional concepts in non-coding domains, suggesting possible truthfulness/alignment regressions rather than ordinary hallucinations @distributionat. Infra, Hardware, and Local AI Systems NVIDIA is coming for the PC: The most-discussed hardware launch was RTX Spark, an NVIDIA/Microsoft “personal AI computer” built around Grace + Blackwell, with up to 128GB unified memory and claimed 1 PFLOP FP4. The key strategic read: NVIDIA is no longer just selling accelerators, but an end-to-end local AI system that competes with Apple Silicon, x86 PCs, and Qualcomm simultaneously @kimmonismus, @swyx. Cluster/networking updates: On the datacenter side, Lambda said it is first to adopt NVIDIA Quantum-X InfiniBand Photonics Q3450-LD switches, pushing co-packaged optics to reduce network power and failures in large AI clusters @LambdaAPI. OpenAI also announced Stargate Michigan, a planned 1GW data center using closed-loop cooling and paired with workforce/education commitments @OpenAINewsroom. Local open-model tooling is improving fast: The MLX-VLM v0.6.0 release was one of the more substantive local inference/tooling updates, adding speculative decoding, Anthropic-style and responses-style APIs, tool calls, support for many new multimodal models, and image/audio features with the explicit pitch of turning Apple devices into “real local agent machines” @Prince_Canuma. That pairs well with growing DGX Spark + vLLM experimentation for local NVFP4 MoE serving @vllm_project. Top Tweets (by engagement, filtered for technical relevance) Anthropic’s IPO path: Anthropic said it has confidentially submitted a draft S-1 to the SEC, opening the door to an IPO pending review @AnthropicAI. Claude Code usage incident: Anthropic reset user rate limits after an Opus 4.8 parallel subagent/tool-call bug caused excessive quota burn @ClaudeDevs. Qwen3.7-Plus: Alibaba launched a multimodal agent model spanning GUI/CLI operation, coding, and visual tasks @Alibaba_Qwen. OpenAI on Bedrock: OpenAI models and Codex are now available through Amazon Bedrock for enterprise workflows @OpenAI. ARC-AGI-3 movement: Claude Opus 4.8 posted a new SOTA on ARC-AGI-3 at 1.5%, still tiny in absolute terms but a meaningful jump on that benchmark @arcprize. AI Reddit Recap /r/LocalLlama + /r/localLLM Recap 1. New Frontier Model Releases and Early Tests MiniMax M3 - Coding & Agentic Frontier, 1M Context, Multimodal (Activity: 1090): MiniMax M3 is announced as an open-weight frontier model with coding/agentic focus, native multimodality/vision, and MiniMax Sparse Attention for up to 1M tokens of context with a guaranteed 512K minimum (MiniMax M3). Claimed long-horizon agentic results include 12-hour ICLR paper reproduction, Hopper FP8 GEMM CUDA/Triton optimization reaching 9.4× speedup after 147 iterations, and PostTrainBench ranking third behind Opus 4.7 and GPT-5.5; access is currently via API/MiniMax Code, with HuggingFace/GitHub weights/local deployment planned. Commenters are cautiously interested in the combination of cheap/efficient vision plus long-context agentic coding, but skeptical because the announcement calls it “open-weight” w [truncated for AI cost control]