AI News HubLIVE

Source Mix

  • Hacker News AI9
  • NVIDIA Blog7
  • MarkTechPost6
  • 量子位5
  • arXiv Robotics4
  • AI Business3
  • The Decoder3
  • Artificial Intelligence News2

Topic Mix

  • Chips50
  • Agents24
  • Research14
  • Models12
  • Policy7
  • Startups7

Timeline

  • 2026-05-278
  • 2026-05-187
  • 2026-05-195
  • 2026-05-245
  • 2026-05-265
  • 2026-05-284
  • 2026-05-143
  • 2026-05-203

Latest Updates

NVIDIA Research Advances Robotics From Simulation to the Real World

At ICRA, NVIDIA Research highlights eight papers on sim-to-real transfer, enabling robots to perceive, reason, plan, and act in dynamic environments. Methods like ScheduleStream, COMPASS, Grasp-MPC, SPARR, and SEAL improve coordination, navigation, grasping, assembly, and task execution, with significant gains in success rates and robustness.

  • NVIDIA presents 8 papers on sim-to-real transfer at ICRA
  • Methods include multi-arm coordination, cross-robot navigation, novel object grasping, precision assembly, and vision-language-action models
In-site article

Nvidia to Spend $150B a Year in Taiwan for AI Infrastructure

Jensen Huang announced Nvidia will spend $150 billion annually in Taiwan on AI infrastructure, despite a previous $500 billion US commitment. This highlights Taiwan's critical role in AI chip manufacturing and packaging.

  • Nvidia will invest $150B per year in Taiwan for AI infrastructure.
  • Despite a $500B US data center pledge, Taiwan remains the core manufacturing hub.
In-site article

Nvidia bets $150B on Taiwan as Trump's plan to make US an AI hub backfires

Nvidia CEO Jensen Huang plans a $150 billion investment in Taiwan for AI infrastructure, despite Trump administration tariffs aimed at bringing chip manufacturing back to the US. Taiwan refuses to relinquish its semiconductor dominance, while US chip manufacturing capacity remains low.

  • Nvidia announces $150 billion investment in Taiwan to boost AI chip position.
  • Trump administration weighs tariffs on semiconductors to boost domestic manufacturing, but US only produces about 10% of its chip needs.
In-site article

Jensen Huang Joins Tsinghua University's Advisory Board

NVIDIA CEO Jensen Huang has accepted an invitation to join the Advisory Board of Tsinghua University's School of Economics and Management (SEM). The board, chaired by Apple CEO Tim Cook, includes Elon Musk, Satya Nadella, Mark Zuckerberg, Jack Ma, and other global leaders. Huang also recently received an honorary doctorate from Carnegie Mellon University.

  • Jensen Huang joins Tsinghua SEM Advisory Board
  • Board chaired by Apple's Tim Cook, includes top tech and business leaders
In-site article

NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code

NVIDIA researchers have introduced Polar, a rollout framework that trains language agents using reinforcement learning without modifying their agent harnesses. Polar places a model API proxy between the harness and the inference server, capturing token-level interactions and reconstructing trainer-ready trajectories. Using GRPO on a Qwen3.5-4B base model, Polar improves SWE-Bench Verified pass@1 by 22.6 points under the Codex harness, 4.8 points under Claude Code, and 6.2 points under Pi. The framework is registered as a NeMo Gym environment and released under the ProRL Agent Server repository.

  • Polar enables RL training on any agent harness via a model API proxy without modifying the harness code
  • Achieves up to 22.6 point improvement on SWE-Bench Verified using GRPO on Qwen3.5-4B across four coding harnesses
In-site article

AI Factories: The New Infrastructure of Intelligence

AI factories are a new class of infrastructure that convert energy into tokens—the unit of production for reasoning models, agents, and intelligent systems. As agentic AI scales, performance per watt and cost per token become the critical economics. This article explores how AI factories work, their full-stack optimization, and how NVIDIA's latest hardware drives efficiency.

  • AI factories convert energy into tokens, serving as the 'power plants' of the AI age.
  • Agentic AI creates deeper, more complex inference workloads requiring real-time orchestration.
In-site article

AI is an arms race, and the US wants $9 billion in Nvidia superchips to keep up

The government has secretly requested $9 billion for Nvidia GB10 superchips to help the CIA and NSA keep up with leading AI firms like Anthropic and OpenAI. The funding requires congressional approval, while $800 million has been repurposed for cloud compute. The article covers chip specs, costs, and the escalating AI hardware race.

  • The US government secretly requested $9 billion for Nvidia GB10 superchips to help the CIA and NSA keep pace with big AI players.
  • Each GB10 chip consumes only 140W but delivers 1 petaflop of FP4 performance, enabling fine-tuning of 70-billion-parameter models.
In-site article

Nvidia Signals $150B Spend in Taiwan

Speaking at a launch event for Nvidia’s upcoming Taiwan headquarters, CEO Jensen Huang deemed the country the “epicenter” of the AI revolution.

  • Nvidia CEO Jensen Huang calls Taiwan the epicenter of AI revolution
  • Nvidia plans to invest approximately $150 billion in Taiwan
In-site article

Jensen Huang says CEOs who blame AI for layoffs are giving a 'lazy' excuse

Nvidia CEO Jensen Huang criticized CEOs who blame artificial intelligence for job cuts, calling the reasoning 'lazy' and 'doesn't make any sense.' He noted that generative AI tools only became broadly useful recently, while many layoffs occurred two years prior. Huang urged a balanced narrative about AI, emphasizing both its potential and the need for safe advancement. He also recounted joining President Trump on a last-minute trip to Beijing.

  • Huang says blaming AI for layoffs is a 'lazy' excuse used to sound smart.
  • He argues AI only recently became productive, making prior layoff links illogical.
In-site article

Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient

We present the stochastic decoupled policy gradient (SDPG), a lightweight visual reinforcement learning method that trains diverse visuomotor control policies end-to-end within a few hours on a single NVIDIA RTX 4080 GPU. SDPG estimates policy gradients via random perturbations of trajectory rollouts, requiring orders of magnitude fewer batch-rendered environments and substantially reducing compute and memory overhead. On visual MuJoCo benchmarks, SDPG consistently outperforms baseline methods in training time, memory usage, and rewards. Finally, we introduce a suite of realistic visual robotics benchmarks spanning dexterous manipulation, challenging locomotion, and demonstrate effective sim-to-real transfer on physical hardware.

  • SDPG enables end-to-end training of visual RL policies in hours on a single RTX 4080 GPU.
  • Uses random perturbations of trajectory rollouts to estimate policy gradients, drastically reducing environment requirements.
In-site article

NightSight: Passive Computation for Navigation in Dark Using Events

NightSight presents a lightweight perception approach combining a monocular event camera, coded aperture lens, and IR dot projector to enable autonomous navigation in complete darkness for small aerial robots. The system uses depth-dependent blur from the coded aperture to train a CNN on synthetic data, achieving zero-shot generalization to real scenes. It runs at 20 Hz on an NVIDIA Jetson Orin Nano with 7.0 cm error up to 2.5 m range.

  • Combines event camera, coded aperture, and IR projection for passive depth sensing in darkness
  • CNN trained solely on synthetic data generalizes zero-shot to complex real-world scenes
In-site article

NVIDIA Vera CPU Is ‘Packing a Heavy-Hitting Punch’ Against Competition

The shift to agentic AI creates new CPU requirements for AI factories: fast cores, massive memory bandwidth, and sustained high performance under all-core load. Initial Phoronix benchmarks show NVIDIA's Vera CPU delivers. With 88 custom Olympus cores, 1.2 TB/s memory bandwidth, and an efficient power envelope, Vera outperforms previous-generation Grace by 1.6x and leads against latest x86 processors in code compilation, file compression, video transcoding, and more. Its LPDDR5X memory subsystem achieves 90% peak bandwidth while consuming under 30 watts—over 4x memory bandwidth per core versus traditional x86. NVIDIA has shipped early Vera CPUs to leading AI companies and cloud providers, with partner availability expected in the second half of the year.

  • Vera CPU features 88 custom NVIDIA Olympus cores and 1.2 TB/s memory bandwidth, optimized for agentic AI workloads.
  • Phoronix benchmarks show Vera delivers 1.6x generational performance gain over Grace and outperforms latest x86 processors in many tasks.
In-site article

AI readiness in telecommunications

Despite 97% of telecom executives adopting AI, most initiatives stall due to 'data debt'—fragmented, ungoverned, and semantically opaque data. NVIDIA's report indicates the bottleneck is data availability, not model quality. Databricks Unity Catalog addresses this with a unified semantic layer and governance, enabling cross-system data federation, fine-grained access control, and rich semantic context to move AI from demo to production.

  • 97% of telecom executives adopt AI, but projects stall due to data debt.
  • Data fragmentation and lack of semantic context are key barriers.
In-site article

Build high-performance generative AI systems with Strands Agents, NVIDIA NIM, and Amazon Bedrock AgentCore

Learn how to build a multi-agent campaign review system that demonstrates parallel reasoning, context persistence, and traceable execution paths using an integrated architecture combining NVIDIA NIM for GPU-accelerated inference, Amazon Bedrock AgentCore for managed runtime, and Strands Agents for serverless orchestration.

  • Combines NVIDIA NIM, Amazon Bedrock AgentCore, and Strands Agents for high-performance multi-agent AI.
  • Enables parallel reasoning, context persistence, and traceable execution.
In-site article

AI Builds AI: Chinese Company Achieves World First with Self-Written Training Framework

ModelBest (面壁智能) unveils ForgeTrain, the world's first production-grade LLM pretraining framework entirely written by AI, which outperforms NVIDIA's Megatron by 10%. The framework was used to train MiniCPM5-1B, a compact model that sets new records for intelligence density among sub-2B models.

  • ForgeTrain is the first production-grade LLM pretraining framework fully generated by AI.
  • It achieves 10% faster training than NVIDIA Megatron on equivalent hardware.
In-site article

RED: Adaptive Real-Time DAG Scheduling for Robotic Inference under Environmental Dynamics

RED is a real-time scheduling framework for multi-task deep neural network workloads on resource-constrained robotic platforms. It adapts to runtime environmental changes by assigning intermediate sub-deadlines, leveraging MIMONet weight sharing, and reconstructing computation graphs. Implemented on NVIDIA Jetson and Apple M-series platforms, RED consistently outperforms existing methods in throughput, deadline satisfaction, robustness, adaptability, and overhead.

  • RED assigns intermediate sub-deadlines to accommodate evolving computation graphs and asynchronous inference.
  • It leverages MIMONet's shared parameters to improve schedulability through workload refinement and graph reconstruction.
In-site article

Step by Step Guide to Build and Compare FedAvg and FedProx Federated Learning on Non-IID CIFAR-10 with NVIDIA FLARE

This tutorial provides a detailed guide to building an advanced federated learning experiment using NVIDIA FLARE, comparing FedAvg and FedProx on a non-IID CIFAR-10 dataset. Client data is partitioned using a Dirichlet distribution to simulate realistic label imbalance. The NVFlare Job API is used to define and launch federated jobs, while the Client API handles local training and model exchange. Complete code implementation and experimental results visualization are provided.

  • Build federated learning experiments with NVIDIA FLARE to compare FedAvg and FedProx.
  • Use Dirichlet distribution (alpha=0.3) to partition CIFAR-10 into 3 non-IID clients.
In-site article

PIMbot: A Self-Adaptive Attack Framework for Adversarial Manipulation of Multi-Robot Reinforcement Learning

This paper introduces PIMbot, a framework that manipulates outcomes in multi-robot RL via two complementary levers: incentive manipulation of the reward channel and policy manipulation of an agent's own actions. An adaptive multi-objective controller balances these levers online. Experiments in Gazebo simulation and on NVIDIA Jetson Orin Nano embedded device demonstrate effectiveness, positioning PIMbot as a stress-test tool for vulnerabilities in multi-robot cooperation.

  • PIMbot uses two manipulation levers: reward channel incentive manipulation and policy manipulation.
  • An adaptive multi-objective controller balances the levers online.
In-site article

The Sequence Radar #865: Last Week in AI: Karpathy, Google, Colossus, and the Coming IPO Wave

The last three weeks marked a phase transition in AI: Google unveiled Gemini Omni and an agent-first platform; Andrej Karpathy joined Anthropic to accelerate pretraining; Anthropic secured a $45B compute lease from xAI's Colossus; Cerebras IPO surged to a ~$95B market cap; and SpaceX, OpenAI, and Anthropic are planning to go public within six months, collectively worth trillions. Research highlights include HRM-Text efficient pretraining, AI reviewer evaluation, NVIDIA's unified AR-diffusion model, and more.

  • Google I/O introduced Gemini Omni, Gemini 3.5 Flash, Antigravity agent platform, and TPU 8i for a vertically integrated agent pipeline.
  • Andrej Karpathy joined Anthropic to lead a team using Claude to accelerate pretraining, signaling a practical self-improvement flywheel.
In-site article

OpenAI and Nvidia Are Using Google's SynthID to Watermark AI Content

Google's SynthID watermarking system for AI content is being adopted by OpenAI, Nvidia, ElevenLabs, and Kakao, marking a shift toward a shared industry standard for detection of AI-generated media.

  • SynthID embeds watermarks directly into pixels and audio waveforms, making them harder to remove than metadata.
  • OpenAI, Nvidia, ElevenLabs, and Kakao are now using SynthID for their image, video, and voice generation tools.
In-site article

Anthropic may keep supplying Claude to the NSA despite being flagged as a supply chain risk by the Pentagon

Anthropic will likely keep supplying AI models to the NSA despite being labeled a "supply chain risk." Intelligence agencies lack Nvidia's latest Grace Blackwell chips, and Anthropic's "Mythos" model reportedly runs on older hardware too. The controversial "any lawful use" clause that derailed earlier talks is not part of the deal.

  • Anthropic likely to continue supplying AI models to NSA despite Pentagon's supply chain risk label.
  • Intelligence agencies lack Nvidia's latest Grace Blackwell chips.
In-site article

NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule

NVIDIA's Gated DeltaNet-2 is a linear attention layer that decouples memory erasing and writing into channel-wise gates. Trained at 1.3B parameters on 100B FineWeb-Edu tokens, it outperforms Mamba-2, Gated DeltaNet, KDA, and Mamba-3 in language modeling, commonsense reasoning, and long-context retrieval, with the largest gains on RULER benchmarks.

  • Gated DeltaNet-2 decomposes the scalar gate into a channel-wise erase gate (key axis) and write gate (value axis), enabling independent control of erasing old content and writing new content.
  • At 1.3B parameters trained on 100B FineWeb-Edu tokens, it achieves best average performance across benchmarks compared to baselines.
In-site article

Meta's Claudeonomics leaderboard

Meta launched an internal AI leaderboard called 'Claudeonomics' to track employee token usage, but shut it down after data leaked. The trend of tracking AI usage is growing, with Nvidia's Jensen Huang proposing AI tokens as part of compensation.

  • Meta's internal AI leaderboard 'Claudeonomics' ranked employees based on token consumption and used gamification badges.
  • The leaderboard was shut down after internal usage data was shared publicly.
In-site article

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

NVIDIA introduces Nemotron-Labs Diffusion language models that achieve up to 6.4x faster inference than autoregressive models while maintaining high accuracy by generating tokens in parallel and refining them iteratively. The models support three modes: autoregressive, diffusion, and self-speculation. The 8B model outperforms Qwen3 8B by 1.2% accuracy.

  • Nemotron-Labs Diffusion models offer three generation modes: autoregressive, diffusion, and self-speculation.
  • The 8B model achieves 2.6x TPF in diffusion mode and up to 6.4x with self-speculation.
In-site article

Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX

Mahjax is a fully vectorized Riichi Mahjong environment implemented in JAX, enabling large-scale rollout parallelization on GPUs. It achieves throughputs of up to 2 million and 1 million steps per second on eight NVIDIA A100 GPUs under no-red and red rules, respectively. Designed for tabula rasa reinforcement learning, it also includes a visualization tool. Experiments show agents can effectively improve their rank against baseline policies.

  • Mahjax is a fully vectorized Riichi Mahjong simulator based on JAX for GPU parallelization.
  • It achieves up to 2 million steps per second on 8 NVIDIA A100 GPUs (no-red rule).
In-site article

NVIDIA GTC Taipei at COMPUTEX: Live Updates on What’s Next in AI

At NVIDIA GTC Taipei at COMPUTEX, the world’s developers, researchers and industry leaders are converging to dive into the latest breakthroughs shaping every industry, covering topics spanning AI factories and scaling infrastructure to agentic and physical AI and more.

  • NVIDIA wins multiple COMPUTEX 2026 Best Choice Awards for AI factories, robotics, and autonomous vehicles.
  • Vera Rubin NVL72 achieves 10x inference performance per watt and 10x lower cost per token.
In-site article

Open-Source Software Is Starting to Help Robots Think

The open-source movement is bringing AI breakthroughs to robotics, lowering barriers to entry. From the ROS framework to models from Nvidia, Hugging Face, and Alibaba, robots' ability to reason, decide, and act is becoming accessible to more people. However, tensions between commercial incentives and academic ideals present new challenges.

  • Open-source robotics software has evolved over decades; ROS set the infrastructure, and now open-source AI models are driving the evolution of robot 'brains'.
  • Companies like Nvidia, Hugging Face, and Alibaba have released open-source robotic AI tools and models, significantly lowering the entry barrier.
In-site article

Nvidia’s Vera chip is the US$200 billion bet Jensen Huang doesn’t want you to overlook

Nvidia CEO Jensen Huang revealed that the Vera CPU opens a US$200 billion market, with projected revenue of US$20 billion this fiscal year. Despite beating Q1 estimates, supply constraints and competition from custom chips pose challenges.

  • Nvidia's Vera chip targets AI inference, unlocking a US$200 billion market.
  • Vera is expected to be the second-largest revenue contributor this fiscal year at US$20 billion.
In-site article

Alibaba Aims for Independence with New AI Chips, Model

Alibaba launches new AI chips and model, reducing reliance on Nvidia and advancing its full-stack AI strategy.

  • Alibaba unveils new AI chips, highlighting its full-stack AI strategy
  • Company seeks to reduce dependence on Nvidia AI chips
In-site article

NVIDIA AI Releases Nemotron-Labs-Diffusion: A Tri-Mode Language Model with 6× Tokens Per Forward Over Qwen3-8B

NVIDIA researchers have released Nemotron-Labs-Diffusion, a language model family that unifies three decoding modes in one architecture: autoregressive (AR) decoding, diffusion-based parallel decoding, and self-speculation decoding. Available in 3B, 8B, and 14B parameter sizes with base, instruct, and vision-language variants. Self-speculation mode achieves up to 6× tokens per forward over Qwen3-8B while maintaining competitive accuracy. The model is open-source and supports flexible deployment across different concurrency scenarios.

  • Nemotron-Labs-Diffusion integrates AR, diffusion, and self-speculation decoding in a single model with no architectural changes. Switching modes is done at inference time by changing attention patterns.
  • At 8B scale, linear self-speculation delivers 5.99× tokens per forward with 62.81% accuracy, outperforming Qwen3-8B in throughput and accuracy.
In-site article

GPU telemetry anomaly: 146W idle draw on A100 (white paper)

A white paper reveals that NVIDIA A100 GPUs can draw up to 146.66 watts while reporting 0% utilization, exposing a critical blind spot in GPU telemetry. The author proposes a new energy efficiency benchmark (CEI) and an open-source optimizer to detect such 'GHOST' anomalies.

  • Reported GPU utilization can be 0% while actual power draw is over 146W, leading to hidden energy waste.
  • NVIDIA's MIG profiling limitation creates observability gaps in multi-tenant cloud environments.
In-site article

NVIDIA and Google Cloud Empower the Next Wave of AI Builders

At Google I/O, NVIDIA and Google Cloud are accelerating work for over 100,000 developers in their joint community, offering curated learning paths, hands-on labs, and events. New additions include a JAX learning path, NVIDIA Dynamo codelab, and monthly livestreams. The collaboration extends to JAX, NVIDIA Dynamo on GKE, and integration of Google DeepMind's Gemma and NVIDIA Nemotron models. NVIDIA is the first industry partner to apply SynthID watermarking to NVIDIA Cosmos models, ensuring content integrity.

  • Joint developer community surpasses 100,000 members, providing AI skill-building resources.
  • New learning paths for JAX on NVIDIA GPUs, NVIDIA Dynamo codelab, and monthly developer livestreams.
In-site article

NVIDIA’s Vera CPU Lands at Leading AI Labs as Agentic AI Demand Grows

On May 19, 2026, NVIDIA debuted its standalone Vera CPU, purpose-built for agentic AI workloads, with initial deliveries to Anthropic, OpenAI, Oracle Cloud Infrastructure, and SpaceXAI. The CPU features 88 custom Olympus cores, 1.2 TB/s memory bandwidth, and 50% faster per-core performance. Oracle plans to deploy hundreds of thousands of Vera CPUs starting in 2026.

  • NVIDIA Vera CPU delivered to Anthropic, OpenAI, Oracle Cloud Infrastructure, and SpaceXAI.
  • Vera features 88 custom Olympus cores, 1.2 TB/s memory bandwidth, 50% faster per-core performance.
In-site article

The Nvidia H200 China deal survived the Trump-Xi summit–just not in the way anyone expected

President Trump flew to Beijing, brought Jensen Huang along at the last minute, and left two days later, telling reporters that "something could happen" on chip exports. Nothing did. Not a single Nvidia H200 has shipped to China since Trump first authorised the sales in December 2025, and US Trade Representative Jamieson Greer told Bloomberg that semiconductor controls were not even on the bilateral agenda.

  • Trump's summit with Xi failed to unblock H200 chip exports to China.
  • US approved exports but Beijing prevents Chinese firms from taking delivery.
In-site article

Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra

This study conducts comprehensive 10-phase optimization experiments on Apple M3 Ultra (60-core GPU, 512 GB unified memory) to achieve real-time camera img2img transformation. By combining CoreML conversion of the distillation-specialized model SDXS-512 with a 3-thread camera pipeline, it reaches 22.7 FPS at 512x512 resolution. The work demonstrates that CUDA optimization insights do not transfer to Apple Silicon's unified memory architecture, with quantization showing no speedup, parallel inference being ineffective, and the Neural Engine unsuitable for large models, providing practical guidelines for Apple Silicon diffusion model inference.

  • 10-phase systematic optimization on Apple M3 Ultra using techniques like CoreML, quantization, Token Merging, and Neural Engine.
  • Achieved 22.7 FPS real-time img2img at 512x512 with CoreML-converted SDXS-512 and 3-thread pipeline.
In-site article

SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference

SuperInfer is a high-performance LLM inference system designed for emerging superchips (e.g., NVIDIA GH200). It introduces RotaSched, a proactive SLO-aware rotary scheduler, and DuplexKV, a full-duplex memory engine, achieving up to 74.7% higher TTFT SLO attainment while maintaining comparable TBT and throughput.

  • Proposes RotaSched, the first proactive SLO-aware rotary scheduler that rotates requests between HBM and DRAM based on latency urgency.
  • DuplexKV engine enables full-duplex KV cache transfer over NVLink-C2C, overcoming PCIe bandwidth limitations.
In-site article

NVIDIA CEO Jensen Huang at Dell Technologies World: “Demand Is Going Parabolic, Utterly Parabolic”

At Dell Technologies World, Dell and NVIDIA unveiled new AI infrastructure including the Dell PowerEdge XE9812 based on NVIDIA Vera Rubin NVL72, delivering up to 10x lower cost-per-token for agentic AI inference. Dell CEO Michael Dell projected worldwide AI infrastructure spending could reach $3-4 trillion by 2030, with token consumption growing 3,400%. NVIDIA CEO Jensen Huang emphasized that demand is 'utterly parabolic.' Enterprise AI has moved from pilots to agentic AI and inference at scale. The Dell AI Factory with NVIDIA provides end-to-end solutions from deskside to data center, including confidential computing and support for open models.

  • Dell and NVIDIA launch new servers based on Vera Rubin NVL72, cutting inference cost 10x.
  • Dell CEO forecasts AI infrastructure spending to hit trillions by 2030.
In-site article

Vera Arrives: NVIDIA’s First CPU Built for Agents Lands at Top AI Labs

Ian Buck hand-delivered the first NVIDIA Vera CPU systems to Anthropic, OpenAI, SpaceXAI, and Oracle Cloud Infrastructure. Vera is purpose-built for agentic AI workloads, featuring 88 custom cores, 1.2 TB/s memory bandwidth, and 50% faster per-core performance.

  • NVIDIA's first custom CPU for agentic AI, Vera, delivered to leading AI labs.
  • VP Ian Buck personally handed over systems to Anthropic, OpenAI, SpaceXAI, and Oracle.
In-site article

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

This article presents a parameter-efficient fine-tuning approach using LoRA and DoRA to adapt NVIDIA Cosmos Predict 2.5 for robot video generation on a single GPU. It covers data preparation, adapter initialization, training with rectified flow loss, inference, and evaluation metrics.

  • LoRA and DoRA enable efficient fine-tuning of large world models by injecting small trainable adapters, reducing memory and avoiding catastrophic forgetting.
  • Training uses 92 robot manipulation videos with rectified flow loss and MSE loss on non-conditioned frames.
In-site article

NVIDIA Introduces a 4-Bit Pretraining Methodology Using NVFP4, Validated on a 12B Hybrid Mamba-Transformer at 10T Token Horizon

NVIDIA introduces a 4-bit pretraining methodology built around the NVFP4 microscaling format — combining selective BF16 layers, 16×16 Random Hadamard Transforms on Wgrad inputs, 2D weight scaling, and stochastic rounding on gradients — validated on a 12B hybrid Mamba-Transformer trained on 10 trillion tokens, the longest publicly documented 4-bit pretraining run, with downstream accuracy closely tracking the FP8 baseline (62.58% vs 62.62% on MMLU-Pro).

  • NVFP4 is a 4-bit microscaling format natively supported on Blackwell Tensor Cores, quantizing only linear-layer GEMMs while keeping other components in higher precision.
  • Trained a 12B hybrid Mamba-Transformer on 10T tokens, achieving 62.58% on MMLU-Pro vs 62.62% for FP8 baseline.
In-site article

We Tried All the Spots on Jensen Huang's Beijing Must-Eat List! Houhai Bar Owner: He Promised to Come Every Year

This article follows NVIDIA CEO Jensen Huang's half-day citywalk in Beijing, visiting places like Yin San Douzhi, Gulou Mantou, Huangwa Zengfu Wealth Temple, Mixue Ice Cream & Tea, Ziguangyuan Yogurt, Fangzhuancheng No. 69 Zhajiangmian, Daoxiangcun, a toy store, a Houhai bar, Qingyunlou Restaurant, and Chaofu Linyuan. It records interactions with shop owners and fans, and provides an open-source route guide.

  • Jensen Huang did a half-day citywalk in Beijing, hitting multiple landmarks and food spots.
  • His reactions to douzhi (fermented mung bean drink), drinking Mixue, and praying at the wealth temple went viral.
In-site article

Yum Brands, Nvidia will deploy new AI at 500 restaurants

Yum Brands partners with Nvidia to accelerate AI development, deploying tools in about 500 restaurants in Q2 2025 across Pizza Hut, Taco Bell, KFC, and Habit Burger. Focus areas include voice AI for drive-thrus and call centers, computer vision for operations and labor monitoring, and restaurant-level analytics.

  • Yum Brands is Nvidia's first restaurant partner.
  • AI deployment in Q2 2025 across 500 locations.
In-site article

A Robot Dog Overturns NVIDIA's Computing Throne

Blue Technology's BabyAlpha A3 quadruped robot breaks from NVIDIA's ecosystem with a self-developed heterogeneous computing cluster, delivering 10x efficiency, on-device 7B-parameter models, and human-level perception, aiming to bring embodied AI into homes.

  • 6600MP camera, HDR140db, 223.2M point clouds/sec surpass human vision
  • Proprietary 6-chip heterogeneous compute cluster (22 cores) avoids NVIDIA route
In-site article

NVIDIA Introduces SANA-WM: A 2.6B-Parameter Open-Source World Model That Generates Minute-Scale 720p Video on a Single GPU

NVIDIA's SANA-WM is an open-source world model that generates 60-second 720p video with camera control, trainable on 64 H100s and inferable on a single GPU. Its distilled variant generates a full minute of 720p video in 34 seconds on a single RTX 5090.

  • SANA-WM generates 60-second 720p video from a single image and 6-DoF camera trajectory.
  • Uses hybrid linear attention (Gated DeltaNet) and dual-branch camera control for efficient long-sequence generation.
In-site article

Ten Chinese firms including ByteDance reportedly get US clearance for AI chips they're not allowed to accept

The US has cleared roughly ten Chinese companies—including Alibaba, Tencent, and ByteDance—to buy up to 75,000 Nvidia H200 chips each. But not a single chip has shipped. According to Commerce Secretary Lutnick, Beijing is blocking the purchases to protect its domestic chip industry.

  • US approved up to 75,000 Nvidia H200 chips for each of about ten Chinese firms.
  • No chips have shipped; China blocks purchases to protect domestic industry.
In-site article

Designing better quantum circuits with AI

Researchers from the group of theoretical physicist Hans Briegel have collaborated with NVIDIA to develop an AI method that automatically generates efficient quantum circuits, a key bottleneck in making quantum computers practically useful.

  • Research team collaborates with NVIDIA to auto-generate quantum circuits
  • Efficient quantum circuits are key to practical quantum computers
In-site article

Company Directory