NVIDIA AI News

Source Mix

Hacker News AI15
NVIDIA Blog10
MarkTechPost9
SiliconANGLE AI3
Artificial Intelligence News2
AWS Machine Learning Blog2
Hugging Face Blog2
LangChain Blog2

Topic Mix

Chips50
Agents35
Models16
Research6
Startups5
Policy2
Robotics1

Timeline

2026-07-0810
2026-07-096
2026-07-015
2026-06-304
2026-07-024
2026-07-044
2026-07-064
2026-07-104

Latest Updates

Big Tech piles on $350B in debt to fuel AI data center race

2026-07-12 04:49 UTC

The five largest U.S. tech companies—Alphabet, Amazon, Meta, Microsoft, and Oracle—have doubled their debt to $350 billion over five years to fund AI data centers. While investors have been supportive, Amazon's recent $25 billion bond issuance received a cool reception, signaling limits to market appetite. Oracle was downgraded by S&P due to rising AI spending, and Intel's debt woes serve as a cautionary tale. Hyperscalers plan to spend up to $725 billion this year, primarily on data centers and Nvidia chips.

Big Tech debt has doubled in five years, adding $350 billion
Amazon's $25 billion bond sale met with investor caution

A Coding Guide to NVIDIA’s Tile-Based GPU Programming: From cuTile and Triton Kernels to Flash Attention

2026-07-12 00:01 UTC

This tutorial explores NVIDIA's tile-based GPU programming with TileGym, building a Colab workflow that runs across different hardware. We probe the CUDA environment, try the real cuTile backend, and fall back to Triton when standard Colab GPUs lack the cuTile stack. We learn the core tile idea: operate on whole data tiles instead of single threads, then load, compute, and store them. We implement vector addition, fused GELU, row-wise softmax, tiled matrix multiplication, and flash attention, checking each against PyTorch.

Introduces NVIDIA's tile programming model, operating on data blocks rather than individual threads.
Provides a runnable Colab script that works with both cuTile and Triton backends.

This Week in AI: Chips, Checks, and Changing Jobs

2026-07-10 16:04 UTC

This week, Christina Stathopoulos covers AI hardware breakthroughs (IBM sub-1nm chips, OpenAI/Broadcom Jalapeño, NVIDIA liquid cooling), expanding government oversight (Anthropic model access restored, OpenAI equity stake proposal), workforce evolution (forward-deployed engineers, SAP external hiring vs IKEA retraining), and a hopeful story about AI-powered earthquake alerts.

IBM unveils 0.7nm chip technology with 50% performance boost and 70% lower power consumption.
OpenAI and Broadcom launch Jalapeño, a chip designed specifically for LLM inference.

Fine-tune NVIDIA Nemotron 3 models with Amazon SageMaker AI serverless model customization

2026-07-10 15:35 UTC

This post explores the unique Nemotron 3 architecture, available fine-tuning techniques (SFT, RLVR, RLAIF), and provides a step-by-step guide to getting started with serverless customization using SageMaker Studio.

NVIDIA Nemotron 3 models feature a hybrid Mamba-Transformer Mixture-of-Experts architecture supporting up to 1M-token contexts.
Amazon SageMaker AI now offers serverless model customization for Nemotron 3 Nano and Super, requiring no infrastructure management.

How to shrink the token budget without shrinking the team

2026-07-10 09:34 UTC

Jensen Huang proposes a test for engineers: their annual AI token consumption should be at least half their salary. Nvidia aims for $2 billion yearly token bill. Many firms cut headcount to fund AI, but Gartner finds 80% saw no ROI improvement. Optimization techniques like prompt caching, model routing, and RAG can reduce token costs significantly. Retaining and training junior engineers is crucial for long-term success.

Jensen Huang suggests engineers' AI token usage should be at least 50% of their salary.
Companies are cutting jobs to afford AI token costs, often with poor returns.

Can AI Answer the $3T Question?

2026-07-10 06:22 UTC

Three years ago, Sequoia partner David Cahn was one of the first to quantify the financial implications of Silicon Valley's massive AI infrastructure spending. Starting from Nvidia's $50B GPU revenue, he calculated that $200B in revenue would be needed to pay back the upfront investment.

David Cahn first calculated the ROI requirements for AI infrastructure three years ago
He derived a $200B revenue threshold from Nvidia's $50B annual GPU revenue

Meet Nemotron Labs 3 Puzzle 75B A9B: A Compressed Hybrid MoE LLM Delivering 2.03x Server Throughput

2026-07-09 19:31 UTC

NVIDIA has released Nemotron-Labs-3-Puzzle-75B-A9B, a compressed variant of Nemotron-3-Super. Using iterative Puzzle compression, it reduces total parameters from 120.7B to 75.3B and active parameters from 12.8B to 9.3B. On a single 8xB200 node, it achieves 2.03x throughput at 100 tok/s per user; on one H100, 1M-token concurrency rises from 1 to 8. The model maintains strong performance on most benchmarks, with minor regressions in instruction following and agentic evaluations.

NVIDIA releases compressed MoE model Nemotron-Labs-3-Puzzle-75B-A9B, reducing parameters by ~38% and active parameters by 27%.
Achieves 2.03x throughput improvement on 8xB200 and 8x concurrency for 1M context on single H100.

Fast token generation emerges as the key differentiator as heterogeneous inference takes hold

2026-07-09 19:14 UTC

The race for low-latency token generation is driving a shift from GPU-only inference to heterogeneous architectures. d-Matrix’s Corsair accelerators, paired with NVIDIA GPUs, deliver a commercial-scale solution that increases memory bandwidth by stacking DRAM and logic. This enables premium fast tokens that can be priced up to 10x higher than standard tokens, creating new revenue opportunities for inference providers.

Fast token generation is the new battleground in AI inference, with prices up to 10x higher than standard tokens.
d-Matrix’s Corsair platform uses 3D stacking to combine DRAM and logic, improving memory bandwidth and energy efficiency.

DDN targets GPU efficiency with AI data infrastructure as the make-or-break layer

2026-07-09 18:56 UTC

DDN CEO Alex Bouzari says AI data infrastructure determines whether GPU investments pay off, as organizations split into those efficiently utilizing GPUs and those wasting capital. DDN is involved in a dozen sovereign AI projects, boosted Salesforce GPU productivity by 70%, and has been used internally by NVIDIA for eight years. DDN's Infinidat platform addresses the challenge of connecting distributed edge data centers, monolithic data centers, and multi-cloud environments.

AI data infrastructure is the decisive factor for GPU investment returns, with organizations bifurcating based on GPU utilization efficiency.
Data sovereignty drives nations to build their own AI factories; DDN is involved in a dozen sovereign AI projects.

DeepSeek aims to make its own AI chip

2026-07-09 14:42 UTC

DeepSeek, a Hangzhou-based AI startup, is designing its own inference chip to reduce dependence on Nvidia and Huawei, leveraging its strengths in cost optimization and co-design. This move reflects China's adaptation to US export controls and could intensify the AI pricing war.

DeepSeek is developing its own chip targeting AI inference, not training.
The chip aims to cut serving costs and reduce reliance on Nvidia and Huawei.

NVIDIA Releases Nemotron-Labs-3-Puzzle-75B-A9B: A Compressed Hybrid MoE LLM Delivering 2.03x Server Throughput at Matched User Throughput

2026-07-09 08:47 UTC

NVIDIA has released Nemotron-Labs-3-Puzzle-75B-A9B, a compressed variant of Nemotron-3-Super. Through iterative Puzzle compression, the model reduces total parameters from 120.7B to 75.3B and active parameters from 12.8B to 9.3B. On a single 8xB200 node, it achieves 2.03x total throughput at 100 tok/s per user, and on one H100, 1M-token concurrency rises from 1 to 8 requests.

Compression: 120.7B total / 12.8B active → 75.3B / 9.3B active.
Throughput: 1.60x to 2.14x boost on 8xB200 at matched throughput.

The OpenClaw Foundation

2026-07-09 06:10 UTC

OpenClaw has grown from a weekend project into a global movement, with 4.5 million new claws born every week and the fastest-growing repository in GitHub history. Today, it announces the formation of a non-profit foundation to steward the project as open and independent. The foundation will provide governance, stable funding, and a full-time team. Partnerships with OpenAI, NVIDIA, Microsoft, and the University of Michigan aim to advance personal AI agents.

OpenClaw evolves from a personal project to a global open-source movement with millions of weekly users.
A new 501(c)(3) non-profit foundation ensures long-term openness and independence.

Nvidia, Hugging Face Collaborate on Open Source Robot Models

2026-07-08 19:35 UTC

The move is seen as supporting accessibility and deployment for physical AI and also boosting Nvidia’s already strong presence in the field.

Nvidia and Hugging Face partner to develop open-source robot models.
The collaboration aims to enhance accessibility and deployment of physical AI.

Data for Agents

2026-07-08 17:16 UTC

NVIDIA emphasizes the importance of open data and synthetic data for building agentic AI, highlighting data inspectability, quality, and trust. The article details Nemotron datasets, the Prompt Atlas visualization tool, and the use of synthetic personas for local diversity.

Synthetic data is crucial for scaling agentic AI while protecting proprietary signals.
NVIDIA's Nemotron open datasets span over 10 trillion pretraining tokens and millions of post-training samples.

LangChain and NVIDIA Launch NemoClaw Deep Agents Blueprint

2026-07-08 15:04 UTC

LangChain and NVIDIA launch the NemoClaw Deep Agents blueprint, combining Deep Agents Code, Nemotron 3 Ultra, and OpenShell for open, governed enterprise agents.

The blueprint integrates LangChain's Deep Agents framework, NVIDIA's Nemotron 3 Ultra model, and NVIDIA OpenShell runtime.
It achieves a 0.86 score on LangChain's agent eval suite at $4.48 cost, roughly 10x lower than competing models.

NVIDIA Nemotron Achieves Benchmark-Leading Performance With LangChain Deep Agents Harness

2026-07-08 15:00 UTC

NVIDIA Nemotron 3 Ultra is offering leading performance at lower cost than top closed models with the largest and most widely adopted AI agent orchestration platform. LangChain tuned its Deep Agents harness for NVIDIA Nemotron 3 Ultra, achieving the highest accuracy among open models, while completing more tasks at higher throughput and running at 10x lower inference cost per run than leading closed models.

LangChain's Deep Agents harness tuned for NVIDIA Nemotron 3 Ultra achieves highest accuracy among open models, with 10x lower inference cost than closed models.
All performance gains come from engineering the environment around the model, not retraining the model itself.

Deep Agents Code on NVIDIA NemoClaw

2026-07-08 15:00 UTC

Run Deep Agents Code on NVIDIA NemoClaw with deny-by-default networking, human approval, and audit logs for sensitive code modernization.

Deep Agents Code (dcode) runs as a governed blueprint on NemoClaw with the open Nemotron 3 Ultra model, giving you control over source, model, and audit trail.
Deny-by-default networking, human approval, and full audit logs provide the controls a regulated team needs.

ZML releases free product to speed inference across AI chips

2026-07-08 08:18 UTC

ZML, a French AI startup endorsed by Turing Award winner Yann LeCun, has released free inference software enabling various open-source LLMs to run on multiple chips including Nvidia, AMD, Google TPU, Apple Metal, and Intel Arc.

ZML, backed by Yann LeCun, launches free inference software
Supports diverse AI chips, challenging Nvidia's dominance

NVIDIA’s Cosmos-Framework Tutorial: Designing a Colab-Friendly Miniature of Cosmos 3 World Models with Omnimodal Mixture-of-Transformers

2026-07-08 07:15 UTC

This tutorial explores NVIDIA's Cosmos framework from a practical Colab angle, honestly assessing the hardware needed for real Cosmos 3 checkpoints. It builds and trains a compact omnimodal Mixture-of-Transformers world model using the framework's real structure, CLI surface, and input schema. Using synthetic physical-world data and autoregressive rollout, it shows how the model predicts future latent states across text, vision, and action modalities.

Starts with hardware probing to explain why standard Colab cannot run full Cosmos 3 16B+ models
Builds a ~4M-parameter miniature omnimodal Mixture-of-Transformers based on the real NVIDIA cosmos-framework

Forget the GPU Shortage: The Real AI Bottleneck Was Diagnosed in 2007

2026-07-08 03:13 UTC

The article argues that the AI bottleneck is memory bandwidth, not GPU compute, referencing a 2007 paper by Ulrich Drepper about the memory wall. Recent moves by AMD, Qualcomm, and Nvidia reflect this. Solutions like FlashAttention and small language models are workarounds that optimize data locality.

The AI memory bottleneck was identified in a 2007 paper by Ulrich Drepper.
GPU compute is outpacing memory bandwidth, making data movement the real constraint.

[AINews] Lilian Weng summarizes 35 papers on Harness Engineering for RSI

2026-07-08 02:20 UTC

This edition of AINews covers a broad range of AI developments from July 6-7, 2026. Highlights include Lilian Weng's deep dive into harness engineering for recursive self-improvement, Meta's launch of Muse Image and preview of Muse Video with agentic generation loops, and major product updates from Anthropic, LangChain, and Google on agent platforms. Other notable items: NVIDIA's Audex audio model, Cohere's Arabic ASR, robotics integrations with Hugging Face and NVIDIA, Liquid AI's Antidoom method to reduce reasoning loop failures, and Anthropic's controversial J-space interpretability work. Also covered: benchmarks for agents and legal AI, research automation, and inference efficiency advances.

Lilian Weng's blog post reframes recursive self-improvement around the harness rather than direct weight modification, emphasizing that harness engineering is critical for specifying goals and context.
Meta's Muse Image and Muse Video showcase agentic generation with planning, tool use, and self-refinement, quickly ranking high on public leaderboards.

NVIDIA Releases Audex (Nemotron-Labs-Audex-30B-A3B): A Unified Audio-Text LLM That Preserves the Text Intelligence of Its Backbone

2026-07-08 00:50 UTC

NVIDIA has released Audex, a unified audio-text large language model using MoE architecture (30B total, 3B active). It handles audio understanding, speech recognition, translation, TTS, and audio generation, while retaining the text intelligence of its Nemotron-Cascade-2 backbone through multi-stage SFT and text-only RL. Leading open models in speech recognition (6.82 WER on OpenASR) and capable of general audio generation. Released under noncommercial license.

Audex unifies audio understanding, ASR, translation, TTS, and audio generation in a single MoE model with minimal text performance regression.
30B total parameters, 3B activated per token; compatible with Megatron-LM and vLLM.

AI Innovators Adopt NVIDIA Vera — Why Max Single-Threaded CPU at Scale Matters

2026-07-07 15:00 UTC

Max single-threaded CPUs at scale are a new category of CPUs built for the agentic AI era. Across the creation and deployment of an agentic system, the CPU is on the critical path for reasoning, response time and learning. CPUs are the processor which executes the work the AI model commands: the tool calling, code execution, data processing, KV-cache and result analysis.

NVIDIA Vera is a max single-threaded CPU at scale designed for agentic AI.
It features Olympus cores with 50% higher IPC than Grace, and 1.2 TB/s memory bandwidth.

Nvidia GPU Debt Backstop Unleashes the AI Project Trinity: Capital, Offtake

2026-07-07 07:55 UTC

Nvidia has introduced a GPU rental backstop program to address financing bottlenecks in AI compute, aiming to broaden access and support market diversification. By providing minimum revenue guarantees to neoclouds, Nvidia facilitates debt financing, enabling shorter-term rentals and expanding the buyer base. The article forecasts AI capex and debt financing growth, and analyzes Nvidia's strategic move to reshape the GPU market structure.

Nvidia launches a GPU rental backstop program, offering revenue guarantees to neoclouds to ease financing.
AI projects require capital, offtake, and datacenter; Nvidia's backstop helps assemble this trinity.

NVIDIA and Hugging Face Bring New Models and Frameworks to LeRobot for the Open Robotics Community

2026-07-07 06:00 UTC

NVIDIA and Hugging Face collaborate to integrate the NVIDIA Isaac GR00T 1.7 model and Isaac Teleop framework into LeRobot, with NVIDIA Cosmos 3 planned soon. These integrations provide developers with a more accessible and standardized path for robot development, driving innovation in the open robotics community.

NVIDIA and Hugging Face bring Isaac GR00T 1.7 model and Isaac Teleop framework to LeRobot.
LeRobot gains NVIDIA physical AI capabilities for data collection, model training, and simulation.

When the sovereign AI diagnosis goes prime time

2026-07-06 18:34 UTC

Palantir CEO Alex Karp went on CNBC and delivered a fiery critique of the AI industry, calling it 'insane' and accusing OpenAI and Anthropic of running a 'wealth tax on American business.' However, beneath the theatrics, he articulated the core of the sovereign AI thesis: customers want control over their compute, models, and data. Palantir and Nvidia recently shipped a reference architecture for sovereign AI OS, enabling deployment of Nvidia's Nemotron models in air-gapped environments, which led to a 9% stock jump.

Alex Karp criticized the AI industry on CNBC, calling it 'insane' and accusing AI companies of imposing a 'wealth tax' on American businesses.
Karp emphasized the need for customers to control their own compute, models, and data, aligning with the sovereign AI thesis.

How Open Models Are Driving AI Research

2026-07-06 16:00 UTC

At ICML 2026, over 2,000 papers cite NVIDIA GPUs, and open models like Nemotron, Cosmos, and BioNeMo are foundational to AI research across robotics, life sciences, and synthetic data generation. NVIDIA had 74 papers accepted, highlighting trends in vision, reinforcement learning, and agent training.

Open frontier models and infrastructure are now foundational to AI research.
NVIDIA’s Nemotron family is used as a research stack for reasoning, data curation, and safe inference.

How Nations Are Deploying AI for Strategic Priorities

2026-07-06 15:00 UTC

Nations are investing in domestic AI infrastructure including AI factories, foundation models trained on local data, and workforce development to tailor AI to local needs, driven by generative and agentic AI. Examples from Europe, Asia, and Latin America illustrate societal benefits.

AI is reshaping economies and societies, prompting countries to build domestic AI capabilities.
AI factories—next-gen data centers—are emerging as critical infrastructure for AI production.

AI Data Centers

2026-07-06 13:42 UTC

Epoch AI's independent database covers 67 large AI data centers globally, tracking their construction timelines via satellite imagery and public documents. The largest facility is SpaceXAI's Colossus 2 in Memphis, with 946 MW IT power and compute equivalent to 1,112k H100 GPUs. The US dominates, especially in Texas and Ohio. Total IT power capacity reaches 10.8 GW, with facility power around 14 GW. Hardware mainly features NVIDIA GPUs, with Google and Amazon using custom chips.

Epoch AI database tracks 67 AI data centers; largest is SpaceXAI's Colossus 2.
US hosts most large centers, concentrated in Texas, Ohio, etc.

Meituan Trained a 1.6T-Parameter AI Model Without Nvidia GPUs

2026-07-05 04:59 UTC

Meituan released LongCat-2.0, a 1.6-trillion-parameter Mixture-of-Experts model trained and deployed entirely on domestic AI ASIC superpods without Nvidia GPUs. The model quietly appeared on OpenRouter as Owl Alpha, achieving high usage rankings. While not the absolute best in performance, it demonstrates the viability of training trillion-parameter models on China's domestic compute infrastructure, reducing reliance on Nvidia.

LongCat-2.0 has 1.6T total parameters with ~48B activated per token using MoE architecture.
Training and inference ran on a domestic AI ASIC superpod, reportedly ~50,000 Huawei Ascend 910C chips.

China's LongCat-2.0 Becomes the Biggest AI Model Without Nvidia Chips

2026-07-05 04:58 UTC

Meituan unveiled LongCat-2.0, a 1.6 trillion parameter open-source LLM trained entirely on domestic hardware, signaling a major step in China's AI self-reliance.

LongCat-2.0 is the first trillion-parameter model to complete both training and inference on Chinese chips.
It outperforms older models like Gemini 3.1 Pro on some benchmarks but trails GPT-5.5 and Opus 4.8.

Nvidia Has Become the Bank Behind the AI Boom

2026-07-04 23:59 UTC

Nvidia is financing the neoclouds that buy its GPUs, renting back idle capacity and taking a share of their cloud revenue, gradually turning itself into something more than a hardware company.

Nvidia provides financing to neoclouds for GPU purchases
Nvidia retains rights to rent back idle capacity and share cloud revenue

Anthropic Launches Claude Science Beta: A Multi-Agent AI Workbench for Reproducible Genomics, Proteomics, and Cheminformatics Pipelines

2026-07-04 16:21 UTC

Anthropic released Claude Science in beta on June 30, 2026. The app runs on existing Claude models. A coordinating agent delegates to domain specialists, a reviewer agent flags and corrects citations and numbers, and every figure ships with its exact code, environment, and full message history. It manages compute across local machines, HPC over SSH, and Modal, and connects to 60+ databases plus NVIDIA BioNeMo skills.

Claude Science is an AI workbench for scientists, enabling multi-step research with full provenance tracking.
It uses a multi-agent architecture: a coordinator, domain specialists, and a reviewer agent for citation and number accuracy.

NVIDIA HORIZON: A Hands-Free Agent that Evolves Git Worktrees and Hits 100% RTL Benchmark Completion

2026-07-04 16:04 UTC

NVIDIA Research introduces HORIZON, a hands-free agent framework that treats hardware design as repository-level code evolution using Git worktrees. It achieves 100% pass rate on all evaluated RTL benchmarks, though the team notes agentic hardware design is not yet solved.

HORIZON hosts design problems as version-controlled Git repositories, evolving code iteratively.
It uses a structured Markdown harness with goal, directions, evaluator, and acceptance predicate.

NVIDIA AI Introduces ASPIRE: A Self-Improving Robotics Framework Reaching 31% Zero-Shot on LIBERO-Pro Long Tasks

2026-07-04 06:32 UTC

NVIDIA's ASPIRE writes and refines robot control programs, then distills validated repairs into a reusable skill library. It gains up to 77 points on LIBERO-Pro and transfers zero-shot to unseen long-horizon tasks.

ASPIRE localizes failures via per-primitive multimodal traces instead of coarse task-level feedback
Validated fixes are stored as reusable skills, enabling cross-task knowledge accumulation

NVCF: Deploy and Route GPU-Accelerated AI Workloads at Scale

2026-07-03 08:18 UTC

NVIDIA Cloud Functions (NVCF) is an open-source platform for deploying, managing, and running GPU-accelerated workloads at scale. It supports long-running functions and asynchronous tasks, leveraging Kubernetes for orchestration, and provides a unified control plane, load-balanced routing, multi-cluster autoscaling, and more. This article covers NVCF's architecture, workload types, core capabilities, and how to build with Bazel.

NVCF is an open-source platform by NVIDIA for GPU workloads, supporting inference, streaming, and batch processing.
Architecture includes control plane, invocation plane, and compute plane, managed via Kubernetes.

DGX station and "frontier" models, my hunt for answers

2026-07-03 03:48 UTC

An investigation into NVIDIA's DGX Station reveals its true capabilities for running large AI models locally, including memory architecture details, real-world benchmarks, and community skepticism about its 748GB coherent memory claim.

DGX Station has 748GB coherent memory split into 252GB HBM3e (fast) and 496GB LPDDR5X (slower), not all GPU-speed.
Priced around $100k, competing with multi-GPU rigs and cloud inference.

Show HN: AI Infrastructure Knowledge Base

2026-07-02 17:11 UTC

A practical, citable knowledge base for deploying, operating, and optimising GPU clusters, from the physical datacentre and the InfiniBand fabric up through Kubernetes, Slurm and Ray, distributed training and reinforcement-learning post-training, and LLM inference serving at scale. Covers the full NVIDIA range: Ampere, Hopper, and Blackwell datacenter GPUs, RTX consumer and workstation cards, and DGX systems (including DGX Spark). Current to mid-2026.

Practical reference for engineers operating GPU clusters across hardware, orchestration, training, and serving.
Covers NVIDIA hardware from Ampere to Blackwell Ultra, with operational differences.

NVIDIA BioNeMo accelerates Anthropic Claude Science

2026-07-02 14:38 UTC

Anthropic's Claude Science integrates NVIDIA BioNeMo Agent Toolkit to accelerate computational life sciences research, enabling scientists to use natural language to execute complex workflows.

Claude Science public beta integrates NVIDIA BioNeMo Agent Toolkit.
Scientists can use natural language to run end-to-end research workflows.

[AINews] not much happened today

2026-07-02 07:10 UTC

This issue covers the relaunch of Anthropic's Fable 5 with safety fallbacks and the ecosystem's multi-model orchestration response. Open models like GLM-5.2 see progress with ZCode and benchmarks. Agent infrastructure gains wiki memory and structured composition patterns, while Devin Security Swarm demonstrates agent-based vulnerability discovery. Architecture advances include NVIDIA TwoTower and on-device inference breakthroughs.

Anthropic relaunched Fable 5 with safety fallbacks, leading to widespread integration and multi-model orchestration strategies.
Open models like GLM-5.2 gain traction with ZCode IDE and competitive benchmarks, while inference optimizations (vLLM, DSpark) improve speed.

NVIDIA Unlocks AI Compute at Scale, Inviting Capital Partners to Power the AI Infrastructure Buildout

2026-07-02 03:34 UTC

As AI shifts from development to production inference, compute demand is growing and moving to continuously operating AI factories. NVIDIA introduces a new strategy to provide large-scale accelerated computing access to startups and enterprises through a revenue-sharing model, with initial deployments by Sharon AI and Firmus.

AI compute demand shifts to production inference requiring large-scale multi-tenant infrastructure
NVIDIA opens compute access via revenue-sharing and credit-support model

Run NVIDIA Nemotron and OpenAI GPT OSS models on Amazon Bedrock in AWS GovCloud (US)

2026-07-01 18:14 UTC

AWS GovCloud (US) now supports OpenAI's open-weight GPT OSS models (120B and 20B) and NVIDIA Nemotron models (Nano 9B v2, Nano 12B v2, Nano 30B, Super 120B) via Amazon Bedrock. Inference runs entirely within the US on infrastructure operated by US citizens, meeting FedRAMP, DoD SRG, and other compliance frameworks.

Amazon Bedrock adds OpenAI GPT OSS (120B/20B) and NVIDIA Nemotron (multiple sizes) models in AWS GovCloud (US).
All inference stays within the AWS GovCloud (US) boundary, with data never leaving the US.

NVIDIA and Partners Build in America, for America

2026-07-01 13:00 UTC

NVIDIA and its partners are investing in American manufacturing, supply chains, energy grids and skilled workforces so the U.S. can produce the infrastructure needed for better healthcare, breakthrough scientific discovery, stronger industrial productivity and global technology leadership.

NVIDIA and partners are building AI infrastructure across 43 states, with plans to produce up to $500 billion of AI infrastructure in the U.S.
In 2026 alone, NVIDIA-driven AI demand will contribute $485 billion to U.S. GDP and support over 100,000 jobs.

NVIDIA Releases Nemotron-Labs-TwoTower: an Open-Weight Diffusion Language Model Built on a Frozen Autoregressive Nemotron-3-Nano-30B-A3B Backbone

2026-07-01 08:10 UTC

NVIDIA has released Nemotron-Labs-TwoTower, a diffusion language model with a two-tower architecture that achieves 2.42× generation throughput while retaining 98.7% of the autoregressive baseline quality. The model is open-weight and supports diffusion, mock-AR, and AR decoding modes.

TwoTower decouples diffusion into a frozen AR context tower and a trained denoiser tower.
It retains 98.7% of AR quality at 2.42× throughput (γ=0.8, S=16, 2×H100).

Serving Local AI on My Jetson Through Durable Streams

2026-07-01 01:00 UTC

The author built a local text-to-speech app StreamTTS using NVIDIA Jetson Orin Nano Super and Kokoro-82M, replacing traditional request-response with durable streams (S2) for shareable, live-updating audio generation, addressing slow inference, fair scheduling, and deduplication.

Self-hosted TTS on Jetson Orin Nano Super with Kokoro-82M model.
Uses S2 durable streams for persistent, replayable output.

Hugging Face and Cerebras bring Gemma 4 to real-time voice AI

2026-07-01 00:00 UTC

Hugging Face and Cerebras have collaborated to create a real-time voice AI system powered by Gemma 4, achieving dramatically lower latency through an open modular architecture. The pipeline integrates Nvidia's speech recognition, Cerebras's fast inference, and Alibaba's text-to-speech, and is already deployed in over 9,000 Reachy Mini robots.

Hugging Face and Cerebras debut a real-time voice AI demo using Gemma 4 with ultra-low latency.
The system uses an open cascaded architecture: speech input → ASR → model inference → TTS → spoken response.

NVIDIA BioNeMo Agent Toolkit Brings Accelerated AI to Life Sciences Researchers in Claude Science

2026-06-30 17:00 UTC

NVIDIA announced the BioNeMo Agent Toolkit, integrated with Anthropic's Claude Science, enabling scientists to use natural language to run accelerated AI workflows in drug discovery, genomics, and more. The toolkit includes GPU-accelerated tools like Parabricks, RAPIDS-singlecell, and nvMolKit, and is used by 18 of the top 20 pharmaceutical companies. Claude Science is now in public beta.

NVIDIA BioNeMo Agent Toolkit integrates with Claude Science for natural language-driven research
Includes accelerated tools: Parabricks (genomics), RAPIDS-singlecell (single-cell analysis), nvMolKit (cheminformatics)

Anthropic launches Claude Science, an AI workbench for scientific research

2026-06-30 17:00 UTC

On Tuesday, Anthropic launched Claude Science, a new application for scientists that can run locally on macOS and Linux, or on a remote machine. It integrates multiple databases and tools into a single workbench, currently in beta and focused on life sciences but planned to expand. Available on Claude's paid plans, it uses standard Claude models with a coordination agent and connects to Nvidia's BioNeMo and HPC/Modal for large computations.

Anthropic launches Claude Science, an AI workbench for scientific research, now in beta.
Integrates databases like PubMed and tools like Jupyter, R, and terminal into one interface.

How NVIDIA’s Inference Software Stack Powers the Lowest Token Cost

2026-06-30 15:00 UTC

NVIDIA's inference software stack, co-designed with GPUs, CPUs, networking, and systems and strengthened by open source, continuously improves hardware performance. On Blackwell, it reduced token costs by up to 5x for DeepSeek V4 in one month. The article details how software optimizations across production operations, application acceleration, and infrastructure access compound to lower cost per token.

NVIDIA's full-stack inference software reduced token costs by 5x on Blackwell for DeepSeek V4 within a month.
Companies like Baseten, Cognition, Deep Infra, and Together AI leverage TensorRT-LLM and Dynamo for significant gains.

How Jaiveer Singh Is Helping Robots — and Developers — Move Faster

2026-06-30 15:00 UTC

Jaiveer Singh leads NVIDIA's Isaac ROS team, building open-source infrastructure for robotics on ROS 2 with CUDA acceleration. His journey from LEGO Mindstorms to leading a team whose intern project became Isaac ROS highlights the power of open source and NVIDIA's long-term vision in physical AI.

Jaiveer Singh leads NVIDIA Isaac ROS, an open-source robotics framework built on ROS 2 with CUDA acceleration.
His intern project at NVIDIA evolved into Isaac ROS, now used for autonomous mobile robots, manipulators, and humanoids.

NVIDIA

Source Mix

Topic Mix

Timeline

Latest Updates

Big Tech piles on $350B in debt to fuel AI data center race

A Coding Guide to NVIDIA’s Tile-Based GPU Programming: From cuTile and Triton Kernels to Flash Attention

This Week in AI: Chips, Checks, and Changing Jobs

Fine-tune NVIDIA Nemotron 3 models with Amazon SageMaker AI serverless model customization

How to shrink the token budget without shrinking the team

Can AI Answer the $3T Question?

Meet Nemotron Labs 3 Puzzle 75B A9B: A Compressed Hybrid MoE LLM Delivering 2.03x Server Throughput

Fast token generation emerges as the key differentiator as heterogeneous inference takes hold

DDN targets GPU efficiency with AI data infrastructure as the make-or-break layer

DeepSeek aims to make its own AI chip

NVIDIA Releases Nemotron-Labs-3-Puzzle-75B-A9B: A Compressed Hybrid MoE LLM Delivering 2.03x Server Throughput at Matched User Throughput

The OpenClaw Foundation

Nvidia, Hugging Face Collaborate on Open Source Robot Models

Data for Agents

LangChain and NVIDIA Launch NemoClaw Deep Agents Blueprint

NVIDIA Nemotron Achieves Benchmark-Leading Performance With LangChain Deep Agents Harness

Deep Agents Code on NVIDIA NemoClaw

ZML releases free product to speed inference across AI chips

NVIDIA’s Cosmos-Framework Tutorial: Designing a Colab-Friendly Miniature of Cosmos 3 World Models with Omnimodal Mixture-of-Transformers

Forget the GPU Shortage: The Real AI Bottleneck Was Diagnosed in 2007

[AINews] Lilian Weng summarizes 35 papers on Harness Engineering for RSI

NVIDIA Releases Audex (Nemotron-Labs-Audex-30B-A3B): A Unified Audio-Text LLM That Preserves the Text Intelligence of Its Backbone

AI Innovators Adopt NVIDIA Vera — Why Max Single-Threaded CPU at Scale Matters

Nvidia GPU Debt Backstop Unleashes the AI Project Trinity: Capital, Offtake

NVIDIA and Hugging Face Bring New Models and Frameworks to LeRobot for the Open Robotics Community

When the sovereign AI diagnosis goes prime time

How Open Models Are Driving AI Research

How Nations Are Deploying AI for Strategic Priorities

AI Data Centers

Meituan Trained a 1.6T-Parameter AI Model Without Nvidia GPUs

China's LongCat-2.0 Becomes the Biggest AI Model Without Nvidia Chips

Nvidia Has Become the Bank Behind the AI Boom

Anthropic Launches Claude Science Beta: A Multi-Agent AI Workbench for Reproducible Genomics, Proteomics, and Cheminformatics Pipelines

NVIDIA HORIZON: A Hands-Free Agent that Evolves Git Worktrees and Hits 100% RTL Benchmark Completion

NVIDIA AI Introduces ASPIRE: A Self-Improving Robotics Framework Reaching 31% Zero-Shot on LIBERO-Pro Long Tasks

NVCF: Deploy and Route GPU-Accelerated AI Workloads at Scale

DGX station and "frontier" models, my hunt for answers

Show HN: AI Infrastructure Knowledge Base

NVIDIA BioNeMo accelerates Anthropic Claude Science

[AINews] not much happened today

NVIDIA Unlocks AI Compute at Scale, Inviting Capital Partners to Power the AI Infrastructure Buildout

Run NVIDIA Nemotron and OpenAI GPT OSS models on Amazon Bedrock in AWS GovCloud (US)

NVIDIA and Partners Build in America, for America

NVIDIA Releases Nemotron-Labs-TwoTower: an Open-Weight Diffusion Language Model Built on a Frozen Autoregressive Nemotron-3-Nano-30B-A3B Backbone

Serving Local AI on My Jetson Through Durable Streams

Hugging Face and Cerebras bring Gemma 4 to real-time voice AI

NVIDIA BioNeMo Agent Toolkit Brings Accelerated AI to Life Sciences Researchers in Claude Science

Anthropic launches Claude Science, an AI workbench for scientific research

How NVIDIA’s Inference Software Stack Powers the Lowest Token Cost

How Jaiveer Singh Is Helping Robots — and Developers — Move Faster

Company Directory

OpenAI

Anthropic

DeepSeek

Google

Meta

Microsoft

NVIDIA

Mistral

Hugging Face

LangChain