AI News HubLIVE
原文1 min read

NVIDIA Nemotron 3 Ultra

NVIDIA Nemotron 3 Ultra is a 550 billion parameter (55B active) open model designed for long-running agentic workflows, with 1M token context and NVFP4 optimization, leading in agentic benchmarks and cost efficiency.

NVIDIA Nemotron 3 Ultra

June 4, 2026

NVIDIA Nemotron 3 Ultra is now available on Ollama’s cloud. It’s a 550 billion parameter (55B active) open model from NVIDIA built for long-running, agentic workflows with fast and affordable performance across hundreds of tool calls.

Model highlights

Built for long-running agents: Tuned for agent orchestration, coding agents, deep research, and complex enterprise workflows that run across hundreds of steps.

1M token context: Keep entire codebases, long tool histories, and research trails in context without losing the thread.

Frontier reasoning, high efficiency: 550B total parameters with only 55B active per token, and optimized for NVFP4—NVIDIA’s 4-bit floating point format that packs the model into less memory and runs faster.

Get started

Download Ollama, then run Nemotron 3 Ultra with your tool of choice.

Claude Code

ollama launch claude --model nemotron-3-ultra:cloud

Hermes Agent

ollama launch hermes --model nemotron-3-ultra:cloud

OpenClaw

ollama launch openclaw --model nemotron-3-ultra:cloud

General chat

ollama run nemotron-3-ultra:cloud

See more integrations.

Benchmarks

Nemotron 3 Ultra leads on accuracy across agent productivity, instruction following, and long-context tasks, while delivering leading throughput—saving up to 30% on costs compared to other leading open models.

Figure 1: Nemotron 3 Ultra leads among open models on agentic benchmarks for agent productivity, coding, and instruction following.

Figure 2: Nemotron 3 Ultra is in the most attractive quadrant with leading accuracy and leading throughput among open models.

Figure 3: Nemotron 3 Ultra saves up to 30% in costs and leads on the cost efficiency frontier.

Reference

NVIDIA Nemotron 3 Ultra blog

Ollama model page