At ICRA, NVIDIA Research highlights eight papers on sim-to-real transfer, enabling robots to perceive, reason, plan, and act in dynamic environments. Methods like ScheduleStream, COMPASS, Grasp-MPC, SPARR, and SEAL improve coordination, navigation, grasping, assembly, and task execution, with significant gains in success rates and robustness.
NVIDIA presents 8 papers on sim-to-real transfer at ICRA
Methods include multi-arm coordination, cross-robot navigation, novel object grasping, precision assembly, and vision-language-action models
Jensen Huang announced Nvidia will spend $150 billion annually in Taiwan on AI infrastructure, despite a previous $500 billion US commitment. This highlights Taiwan's critical role in AI chip manufacturing and packaging.
Nvidia will invest $150B per year in Taiwan for AI infrastructure.
Despite a $500B US data center pledge, Taiwan remains the core manufacturing hub.
Nvidia CEO Jensen Huang plans a $150 billion investment in Taiwan for AI infrastructure, despite Trump administration tariffs aimed at bringing chip manufacturing back to the US. Taiwan refuses to relinquish its semiconductor dominance, while US chip manufacturing capacity remains low.
Nvidia announces $150 billion investment in Taiwan to boost AI chip position.
Trump administration weighs tariffs on semiconductors to boost domestic manufacturing, but US only produces about 10% of its chip needs.
NVIDIA CEO Jensen Huang has accepted an invitation to join the Advisory Board of Tsinghua University's School of Economics and Management (SEM). The board, chaired by Apple CEO Tim Cook, includes Elon Musk, Satya Nadella, Mark Zuckerberg, Jack Ma, and other global leaders. Huang also recently received an honorary doctorate from Carnegie Mellon University.
Jensen Huang joins Tsinghua SEM Advisory Board
Board chaired by Apple's Tim Cook, includes top tech and business leaders
NVIDIA researchers have introduced Polar, a rollout framework that trains language agents using reinforcement learning without modifying their agent harnesses. Polar places a model API proxy between the harness and the inference server, capturing token-level interactions and reconstructing trainer-ready trajectories. Using GRPO on a Qwen3.5-4B base model, Polar improves SWE-Bench Verified pass@1 by 22.6 points under the Codex harness, 4.8 points under Claude Code, and 6.2 points under Pi. The framework is registered as a NeMo Gym environment and released under the ProRL Agent Server repository.
Polar enables RL training on any agent harness via a model API proxy without modifying the harness code
Achieves up to 22.6 point improvement on SWE-Bench Verified using GRPO on Qwen3.5-4B across four coding harnesses
AI factories are a new class of infrastructure that convert energy into tokens—the unit of production for reasoning models, agents, and intelligent systems. As agentic AI scales, performance per watt and cost per token become the critical economics. This article explores how AI factories work, their full-stack optimization, and how NVIDIA's latest hardware drives efficiency.
AI factories convert energy into tokens, serving as the 'power plants' of the AI age.
Agentic AI creates deeper, more complex inference workloads requiring real-time orchestration.
The government has secretly requested $9 billion for Nvidia GB10 superchips to help the CIA and NSA keep up with leading AI firms like Anthropic and OpenAI. The funding requires congressional approval, while $800 million has been repurposed for cloud compute. The article covers chip specs, costs, and the escalating AI hardware race.
The US government secretly requested $9 billion for Nvidia GB10 superchips to help the CIA and NSA keep pace with big AI players.
Each GB10 chip consumes only 140W but delivers 1 petaflop of FP4 performance, enabling fine-tuning of 70-billion-parameter models.
Nvidia CEO Jensen Huang criticized CEOs who blame artificial intelligence for job cuts, calling the reasoning 'lazy' and 'doesn't make any sense.' He noted that generative AI tools only became broadly useful recently, while many layoffs occurred two years prior. Huang urged a balanced narrative about AI, emphasizing both its potential and the need for safe advancement. He also recounted joining President Trump on a last-minute trip to Beijing.
Huang says blaming AI for layoffs is a 'lazy' excuse used to sound smart.
He argues AI only recently became productive, making prior layoff links illogical.
We present the stochastic decoupled policy gradient (SDPG), a lightweight visual reinforcement learning method that trains diverse visuomotor control policies end-to-end within a few hours on a single NVIDIA RTX 4080 GPU. SDPG estimates policy gradients via random perturbations of trajectory rollouts, requiring orders of magnitude fewer batch-rendered environments and substantially reducing compute and memory overhead. On visual MuJoCo benchmarks, SDPG consistently outperforms baseline methods in training time, memory usage, and rewards. Finally, we introduce a suite of realistic visual robotics benchmarks spanning dexterous manipulation, challenging locomotion, and demonstrate effective sim-to-real transfer on physical hardware.
SDPG enables end-to-end training of visual RL policies in hours on a single RTX 4080 GPU.
Uses random perturbations of trajectory rollouts to estimate policy gradients, drastically reducing environment requirements.
NightSight presents a lightweight perception approach combining a monocular event camera, coded aperture lens, and IR dot projector to enable autonomous navigation in complete darkness for small aerial robots. The system uses depth-dependent blur from the coded aperture to train a CNN on synthetic data, achieving zero-shot generalization to real scenes. It runs at 20 Hz on an NVIDIA Jetson Orin Nano with 7.0 cm error up to 2.5 m range.
Combines event camera, coded aperture, and IR projection for passive depth sensing in darkness
CNN trained solely on synthetic data generalizes zero-shot to complex real-world scenes
The shift to agentic AI creates new CPU requirements for AI factories: fast cores, massive memory bandwidth, and sustained high performance under all-core load. Initial Phoronix benchmarks show NVIDIA's Vera CPU delivers. With 88 custom Olympus cores, 1.2 TB/s memory bandwidth, and an efficient power envelope, Vera outperforms previous-generation Grace by 1.6x and leads against latest x86 processors in code compilation, file compression, video transcoding, and more. Its LPDDR5X memory subsystem achieves 90% peak bandwidth while consuming under 30 watts—over 4x memory bandwidth per core versus traditional x86. NVIDIA has shipped early Vera CPUs to leading AI companies and cloud providers, with partner availability expected in the second half of the year.
Vera CPU features 88 custom NVIDIA Olympus cores and 1.2 TB/s memory bandwidth, optimized for agentic AI workloads.
Phoronix benchmarks show Vera delivers 1.6x generational performance gain over Grace and outperforms latest x86 processors in many tasks.
Despite 97% of telecom executives adopting AI, most initiatives stall due to 'data debt'—fragmented, ungoverned, and semantically opaque data. NVIDIA's report indicates the bottleneck is data availability, not model quality. Databricks Unity Catalog addresses this with a unified semantic layer and governance, enabling cross-system data federation, fine-grained access control, and rich semantic context to move AI from demo to production.
97% of telecom executives adopt AI, but projects stall due to data debt.
Data fragmentation and lack of semantic context are key barriers.
Learn how to build a multi-agent campaign review system that demonstrates parallel reasoning, context persistence, and traceable execution paths using an integrated architecture combining NVIDIA NIM for GPU-accelerated inference, Amazon Bedrock AgentCore for managed runtime, and Strands Agents for serverless orchestration.
Combines NVIDIA NIM, Amazon Bedrock AgentCore, and Strands Agents for high-performance multi-agent AI.
Enables parallel reasoning, context persistence, and traceable execution.
ModelBest (面壁智能) unveils ForgeTrain, the world's first production-grade LLM pretraining framework entirely written by AI, which outperforms NVIDIA's Megatron by 10%. The framework was used to train MiniCPM5-1B, a compact model that sets new records for intelligence density among sub-2B models.
ForgeTrain is the first production-grade LLM pretraining framework fully generated by AI.
It achieves 10% faster training than NVIDIA Megatron on equivalent hardware.
RED is a real-time scheduling framework for multi-task deep neural network workloads on resource-constrained robotic platforms. It adapts to runtime environmental changes by assigning intermediate sub-deadlines, leveraging MIMONet weight sharing, and reconstructing computation graphs. Implemented on NVIDIA Jetson and Apple M-series platforms, RED consistently outperforms existing methods in throughput, deadline satisfaction, robustness, adaptability, and overhead.
RED assigns intermediate sub-deadlines to accommodate evolving computation graphs and asynchronous inference.
It leverages MIMONet's shared parameters to improve schedulability through workload refinement and graph reconstruction.
This tutorial provides a detailed guide to building an advanced federated learning experiment using NVIDIA FLARE, comparing FedAvg and FedProx on a non-IID CIFAR-10 dataset. Client data is partitioned using a Dirichlet distribution to simulate realistic label imbalance. The NVFlare Job API is used to define and launch federated jobs, while the Client API handles local training and model exchange. Complete code implementation and experimental results visualization are provided.
Build federated learning experiments with NVIDIA FLARE to compare FedAvg and FedProx.
Use Dirichlet distribution (alpha=0.3) to partition CIFAR-10 into 3 non-IID clients.
This paper introduces PIMbot, a framework that manipulates outcomes in multi-robot RL via two complementary levers: incentive manipulation of the reward channel and policy manipulation of an agent's own actions. An adaptive multi-objective controller balances these levers online. Experiments in Gazebo simulation and on NVIDIA Jetson Orin Nano embedded device demonstrate effectiveness, positioning PIMbot as a stress-test tool for vulnerabilities in multi-robot cooperation.
PIMbot uses two manipulation levers: reward channel incentive manipulation and policy manipulation.
An adaptive multi-objective controller balances the levers online.
The last three weeks marked a phase transition in AI: Google unveiled Gemini Omni and an agent-first platform; Andrej Karpathy joined Anthropic to accelerate pretraining; Anthropic secured a $45B compute lease from xAI's Colossus; Cerebras IPO surged to a ~$95B market cap; and SpaceX, OpenAI, and Anthropic are planning to go public within six months, collectively worth trillions. Research highlights include HRM-Text efficient pretraining, AI reviewer evaluation, NVIDIA's unified AR-diffusion model, and more.
Google I/O introduced Gemini Omni, Gemini 3.5 Flash, Antigravity agent platform, and TPU 8i for a vertically integrated agent pipeline.
Andrej Karpathy joined Anthropic to lead a team using Claude to accelerate pretraining, signaling a practical self-improvement flywheel.
Google's SynthID watermarking system for AI content is being adopted by OpenAI, Nvidia, ElevenLabs, and Kakao, marking a shift toward a shared industry standard for detection of AI-generated media.
SynthID embeds watermarks directly into pixels and audio waveforms, making them harder to remove than metadata.
OpenAI, Nvidia, ElevenLabs, and Kakao are now using SynthID for their image, video, and voice generation tools.
Anthropic will likely keep supplying AI models to the NSA despite being labeled a "supply chain risk." Intelligence agencies lack Nvidia's latest Grace Blackwell chips, and Anthropic's "Mythos" model reportedly runs on older hardware too. The controversial "any lawful use" clause that derailed earlier talks is not part of the deal.
Anthropic likely to continue supplying AI models to NSA despite Pentagon's supply chain risk label.
NVIDIA's Gated DeltaNet-2 is a linear attention layer that decouples memory erasing and writing into channel-wise gates. Trained at 1.3B parameters on 100B FineWeb-Edu tokens, it outperforms Mamba-2, Gated DeltaNet, KDA, and Mamba-3 in language modeling, commonsense reasoning, and long-context retrieval, with the largest gains on RULER benchmarks.
Gated DeltaNet-2 decomposes the scalar gate into a channel-wise erase gate (key axis) and write gate (value axis), enabling independent control of erasing old content and writing new content.
At 1.3B parameters trained on 100B FineWeb-Edu tokens, it achieves best average performance across benchmarks compared to baselines.
Meta launched an internal AI leaderboard called 'Claudeonomics' to track employee token usage, but shut it down after data leaked. The trend of tracking AI usage is growing, with Nvidia's Jensen Huang proposing AI tokens as part of compensation.
Meta's internal AI leaderboard 'Claudeonomics' ranked employees based on token consumption and used gamification badges.
The leaderboard was shut down after internal usage data was shared publicly.
NVIDIA introduces Nemotron-Labs Diffusion language models that achieve up to 6.4x faster inference than autoregressive models while maintaining high accuracy by generating tokens in parallel and refining them iteratively. The models support three modes: autoregressive, diffusion, and self-speculation. The 8B model outperforms Qwen3 8B by 1.2% accuracy.
Nemotron-Labs Diffusion models offer three generation modes: autoregressive, diffusion, and self-speculation.
The 8B model achieves 2.6x TPF in diffusion mode and up to 6.4x with self-speculation.
Mahjax is a fully vectorized Riichi Mahjong environment implemented in JAX, enabling large-scale rollout parallelization on GPUs. It achieves throughputs of up to 2 million and 1 million steps per second on eight NVIDIA A100 GPUs under no-red and red rules, respectively. Designed for tabula rasa reinforcement learning, it also includes a visualization tool. Experiments show agents can effectively improve their rank against baseline policies.
Mahjax is a fully vectorized Riichi Mahjong simulator based on JAX for GPU parallelization.
It achieves up to 2 million steps per second on 8 NVIDIA A100 GPUs (no-red rule).
At NVIDIA GTC Taipei at COMPUTEX, the world’s developers, researchers and industry leaders are converging to dive into the latest breakthroughs shaping every industry, covering topics spanning AI factories and scaling infrastructure to agentic and physical AI and more.
NVIDIA wins multiple COMPUTEX 2026 Best Choice Awards for AI factories, robotics, and autonomous vehicles.
Vera Rubin NVL72 achieves 10x inference performance per watt and 10x lower cost per token.
The open-source movement is bringing AI breakthroughs to robotics, lowering barriers to entry. From the ROS framework to models from Nvidia, Hugging Face, and Alibaba, robots' ability to reason, decide, and act is becoming accessible to more people. However, tensions between commercial incentives and academic ideals present new challenges.
Open-source robotics software has evolved over decades; ROS set the infrastructure, and now open-source AI models are driving the evolution of robot 'brains'.
Companies like Nvidia, Hugging Face, and Alibaba have released open-source robotic AI tools and models, significantly lowering the entry barrier.
Nvidia CEO Jensen Huang revealed that the Vera CPU opens a US$200 billion market, with projected revenue of US$20 billion this fiscal year. Despite beating Q1 estimates, supply constraints and competition from custom chips pose challenges.
Nvidia's Vera chip targets AI inference, unlocking a US$200 billion market.
Vera is expected to be the second-largest revenue contributor this fiscal year at US$20 billion.
NVIDIA researchers have released Nemotron-Labs-Diffusion, a language model family that unifies three decoding modes in one architecture: autoregressive (AR) decoding, diffusion-based parallel decoding, and self-speculation decoding. Available in 3B, 8B, and 14B parameter sizes with base, instruct, and vision-language variants. Self-speculation mode achieves up to 6× tokens per forward over Qwen3-8B while maintaining competitive accuracy. The model is open-source and supports flexible deployment across different concurrency scenarios.
Nemotron-Labs-Diffusion integrates AR, diffusion, and self-speculation decoding in a single model with no architectural changes. Switching modes is done at inference time by changing attention patterns.
At 8B scale, linear self-speculation delivers 5.99× tokens per forward with 62.81% accuracy, outperforming Qwen3-8B in throughput and accuracy.
A white paper reveals that NVIDIA A100 GPUs can draw up to 146.66 watts while reporting 0% utilization, exposing a critical blind spot in GPU telemetry. The author proposes a new energy efficiency benchmark (CEI) and an open-source optimizer to detect such 'GHOST' anomalies.
Reported GPU utilization can be 0% while actual power draw is over 146W, leading to hidden energy waste.
At Google I/O, NVIDIA and Google Cloud are accelerating work for over 100,000 developers in their joint community, offering curated learning paths, hands-on labs, and events. New additions include a JAX learning path, NVIDIA Dynamo codelab, and monthly livestreams. The collaboration extends to JAX, NVIDIA Dynamo on GKE, and integration of Google DeepMind's Gemma and NVIDIA Nemotron models. NVIDIA is the first industry partner to apply SynthID watermarking to NVIDIA Cosmos models, ensuring content integrity.
Joint developer community surpasses 100,000 members, providing AI skill-building resources.
New learning paths for JAX on NVIDIA GPUs, NVIDIA Dynamo codelab, and monthly developer livestreams.
On May 19, 2026, NVIDIA debuted its standalone Vera CPU, purpose-built for agentic AI workloads, with initial deliveries to Anthropic, OpenAI, Oracle Cloud Infrastructure, and SpaceXAI. The CPU features 88 custom Olympus cores, 1.2 TB/s memory bandwidth, and 50% faster per-core performance. Oracle plans to deploy hundreds of thousands of Vera CPUs starting in 2026.
NVIDIA Vera CPU delivered to Anthropic, OpenAI, Oracle Cloud Infrastructure, and SpaceXAI.
Vera features 88 custom Olympus cores, 1.2 TB/s memory bandwidth, 50% faster per-core performance.
President Trump flew to Beijing, brought Jensen Huang along at the last minute, and left two days later, telling reporters that "something could happen" on chip exports. Nothing did. Not a single Nvidia H200 has shipped to China since Trump first authorised the sales in December 2025, and US Trade Representative Jamieson Greer told Bloomberg that semiconductor controls were not even on the bilateral agenda.
Trump's summit with Xi failed to unblock H200 chip exports to China.
US approved exports but Beijing prevents Chinese firms from taking delivery.
This study conducts comprehensive 10-phase optimization experiments on Apple M3 Ultra (60-core GPU, 512 GB unified memory) to achieve real-time camera img2img transformation. By combining CoreML conversion of the distillation-specialized model SDXS-512 with a 3-thread camera pipeline, it reaches 22.7 FPS at 512x512 resolution. The work demonstrates that CUDA optimization insights do not transfer to Apple Silicon's unified memory architecture, with quantization showing no speedup, parallel inference being ineffective, and the Neural Engine unsuitable for large models, providing practical guidelines for Apple Silicon diffusion model inference.
10-phase systematic optimization on Apple M3 Ultra using techniques like CoreML, quantization, Token Merging, and Neural Engine.
Achieved 22.7 FPS real-time img2img at 512x512 with CoreML-converted SDXS-512 and 3-thread pipeline.
SuperInfer is a high-performance LLM inference system designed for emerging superchips (e.g., NVIDIA GH200). It introduces RotaSched, a proactive SLO-aware rotary scheduler, and DuplexKV, a full-duplex memory engine, achieving up to 74.7% higher TTFT SLO attainment while maintaining comparable TBT and throughput.
Proposes RotaSched, the first proactive SLO-aware rotary scheduler that rotates requests between HBM and DRAM based on latency urgency.
DuplexKV engine enables full-duplex KV cache transfer over NVLink-C2C, overcoming PCIe bandwidth limitations.
At Dell Technologies World, Dell and NVIDIA unveiled new AI infrastructure including the Dell PowerEdge XE9812 based on NVIDIA Vera Rubin NVL72, delivering up to 10x lower cost-per-token for agentic AI inference. Dell CEO Michael Dell projected worldwide AI infrastructure spending could reach $3-4 trillion by 2030, with token consumption growing 3,400%. NVIDIA CEO Jensen Huang emphasized that demand is 'utterly parabolic.' Enterprise AI has moved from pilots to agentic AI and inference at scale. The Dell AI Factory with NVIDIA provides end-to-end solutions from deskside to data center, including confidential computing and support for open models.
Dell and NVIDIA launch new servers based on Vera Rubin NVL72, cutting inference cost 10x.
Dell CEO forecasts AI infrastructure spending to hit trillions by 2030.
Ian Buck hand-delivered the first NVIDIA Vera CPU systems to Anthropic, OpenAI, SpaceXAI, and Oracle Cloud Infrastructure. Vera is purpose-built for agentic AI workloads, featuring 88 custom cores, 1.2 TB/s memory bandwidth, and 50% faster per-core performance.
NVIDIA's first custom CPU for agentic AI, Vera, delivered to leading AI labs.
VP Ian Buck personally handed over systems to Anthropic, OpenAI, SpaceXAI, and Oracle.
This article presents a parameter-efficient fine-tuning approach using LoRA and DoRA to adapt NVIDIA Cosmos Predict 2.5 for robot video generation on a single GPU. It covers data preparation, adapter initialization, training with rectified flow loss, inference, and evaluation metrics.
LoRA and DoRA enable efficient fine-tuning of large world models by injecting small trainable adapters, reducing memory and avoiding catastrophic forgetting.
Training uses 92 robot manipulation videos with rectified flow loss and MSE loss on non-conditioned frames.
NVIDIA introduces a 4-bit pretraining methodology built around the NVFP4 microscaling format — combining selective BF16 layers, 16×16 Random Hadamard Transforms on Wgrad inputs, 2D weight scaling, and stochastic rounding on gradients — validated on a 12B hybrid Mamba-Transformer trained on 10 trillion tokens, the longest publicly documented 4-bit pretraining run, with downstream accuracy closely tracking the FP8 baseline (62.58% vs 62.62% on MMLU-Pro).
NVFP4 is a 4-bit microscaling format natively supported on Blackwell Tensor Cores, quantizing only linear-layer GEMMs while keeping other components in higher precision.
Trained a 12B hybrid Mamba-Transformer on 10T tokens, achieving 62.58% on MMLU-Pro vs 62.62% for FP8 baseline.
This article follows NVIDIA CEO Jensen Huang's half-day citywalk in Beijing, visiting places like Yin San Douzhi, Gulou Mantou, Huangwa Zengfu Wealth Temple, Mixue Ice Cream & Tea, Ziguangyuan Yogurt, Fangzhuancheng No. 69 Zhajiangmian, Daoxiangcun, a toy store, a Houhai bar, Qingyunlou Restaurant, and Chaofu Linyuan. It records interactions with shop owners and fans, and provides an open-source route guide.
Jensen Huang did a half-day citywalk in Beijing, hitting multiple landmarks and food spots.
His reactions to douzhi (fermented mung bean drink), drinking Mixue, and praying at the wealth temple went viral.
Yum Brands partners with Nvidia to accelerate AI development, deploying tools in about 500 restaurants in Q2 2025 across Pizza Hut, Taco Bell, KFC, and Habit Burger. Focus areas include voice AI for drive-thrus and call centers, computer vision for operations and labor monitoring, and restaurant-level analytics.
Blue Technology's BabyAlpha A3 quadruped robot breaks from NVIDIA's ecosystem with a self-developed heterogeneous computing cluster, delivering 10x efficiency, on-device 7B-parameter models, and human-level perception, aiming to bring embodied AI into homes.
6600MP camera, HDR140db, 223.2M point clouds/sec surpass human vision
NVIDIA's SANA-WM is an open-source world model that generates 60-second 720p video with camera control, trainable on 64 H100s and inferable on a single GPU. Its distilled variant generates a full minute of 720p video in 34 seconds on a single RTX 5090.
SANA-WM generates 60-second 720p video from a single image and 6-DoF camera trajectory.
Uses hybrid linear attention (Gated DeltaNet) and dual-branch camera control for efficient long-sequence generation.
Nvidia disclosed the latest compensation for CEO Jensen Huang's children, Madison and Spencer, with annual salaries of $1.232 million and $1.32 million respectively, both increased. The company emphasized that the raises were determined independently of Huang and that the siblings earned their positions through merit.
The US has cleared roughly ten Chinese companies—including Alibaba, Tencent, and ByteDance—to buy up to 75,000 Nvidia H200 chips each. But not a single chip has shipped. According to Commerce Secretary Lutnick, Beijing is blocking the purchases to protect its domestic chip industry.
US approved up to 75,000 Nvidia H200 chips for each of about ten Chinese firms.
No chips have shipped; China blocks purchases to protect domestic industry.
Researchers from the group of theoretical physicist Hans Briegel have collaborated with NVIDIA to develop an AI method that automatically generates efficient quantum circuits, a key bottleneck in making quantum computers practically useful.
Research team collaborates with NVIDIA to auto-generate quantum circuits
Efficient quantum circuits are key to practical quantum computers