AI News HubLIVE

GPU Infrastructure updates

NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark

Artificial Analysis released AgentPerf, the industry's first benchmark for agentic AI. Initial results show NVIDIA Blackwell Ultra NVL72 leading, running 20x more agents per megawatt than Hopper. The benchmark measures how many concurrent agentic tasks a platform can support under real-world coding agent workloads.

  • AgentPerf is the first benchmark designed for agentic AI workloads, focusing on chained LLM calls and tool calls.
  • NVIDIA GB300 NVL72 delivers 20x more agents per megawatt than H200 on DeepSeek V4 Pro model.
In-site article

Empires Once Marched on Roads – AI marches on extension cords

The article compares AI infrastructure to Roman legion camps, arguing that AI companies like Meta are adopting temporary, rapidly deployable structures to match the fast depreciation of chips and prioritize time-to-market over permanence. This strategy echoes historical frontier booms, marking a shift from permanent assets to time-sensitive investments.

  • AI infrastructure is shifting from permanent buildings to rapid-deployment temporary structures, akin to Roman army camps.
  • Chips depreciate faster than concrete ages, inverting the traditional relationship between infrastructure and investment.
In-site article

AINews: Loopcraft: The Art of Stacking Loops

The article discusses the emerging trend of designing loops to drive AI agents instead of manual prompting, covering key figures' insights, Anthropic's Fable 5 rollout controversy, automated research systems, data infrastructure bottlenecks, inference speed optimizations, and agent tooling developments.

  • Advocating loops over manual prompting for maximizing AI agent efficiency and leverage.
  • Anthropic's Fable 5 faced backlash over covert degradation policy, later reversed.
In-site article

G-MAPP: GPU-accelerated Multi-Agent Planning and Perception for Reactive Motion Generation

This paper presents G-MAPP, a GPU-accelerated framework for reactive motion generation that achieves up to 5x speedup by parallelizing world modeling and planning on GPU, enabling real-time perception-action coupling in dynamic environments.

  • GPU acceleration provides up to 5x speedup over CPU version
  • Tighter perception-action loop coupling for real-time reactive motion
In-site article

Stereo Vision-Based Fall Prediction and Detection using Human Pose Estimation on the AMD Kria K26 SOM

This paper presents a portable, low-power, battery-operated vision-based fall prediction and detection system using human pose estimation on an AMD Kria K26 SOM. The system uses an Intel RealSense D455 camera and a three-stage pipeline (quantized YOLOX, A2J, and CNN) to achieve real-time, privacy-preserving fall detection on the edge. Results show 4.5 FPS throughput with 75.85% classification accuracy.

  • Privacy-preserving fall detection system implemented on AMD Kria K26 edge device
  • Three-stage pipeline: YOLOX for human detection, A2J for joint estimation, CNN for fall classification
In-site article

Jeff Bezos’ Prometheus raises $12B to accelerate industrial engineering projects

Prometheus Inc., an AI startup co-led by Jeff Bezos, raised $12 billion in Series B funding at a $41 billion valuation. The company is developing AI tools to accelerate hardware development, focusing on prototyping and pre-production manufacturing. The funds will mainly be used for computing infrastructure.

  • Prometheus raised $12B from investors including Bezos, JPMorgan, BlackRock, etc.
  • The startup is developing AI tools to speed up hardware design by 10x or more.
In-site article

AI agents need infrastructure: Why Europe’s regional cloud strategy matters

As generative AI evolves into agentic AI, European enterprises face new challenges in data sovereignty, cost control, and infrastructure. This article argues that regional cloud providers like Vultr offer better compliance, performance, and cost efficiency than traditional hyperscalers for agentic workloads.

  • The agentic AI market is projected to reach $139.19 billion by 2034, with Europe growing at 42% CAGR.
  • European businesses must balance innovation with regulatory compliance, requiring localized cloud infrastructure.
In-site article

Neura Robotics Raises $1.4B for Physical AI

Funding from investors including Nvidia, Amazon and Qualcomm will support the vendor’s development of humanoid robots and physical AI.

  • Neura Robotics raises $1.4 billion in funding
  • Investors include Nvidia, Amazon, and Qualcomm
In-site article

Save Big and Play Bigger: GeForce NOW Summer Sale Brings Major Membership Savings

NVIDIA's GeForce NOW summer sale offers up to $70 off a 12-month Ultimate membership and $35 off a Performance membership. The cloud gaming service eliminates hardware barriers, provides instant access to high-performance RTX gaming across devices, and announces Guild Wars 3 coming to the platform with exclusive rewards for current Guild Wars titles.

  • GeForce NOW summer sale: $70 off Ultimate and $35 off Performance annual memberships for a limited time.
  • Cloud gaming removes hardware constraints, offering instant game access, automatic updates, and cross-device play.
In-site article

Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding

Cohere has released its first developer-facing coding model, North Mini Code, a 30B total parameter mixture-of-experts model with only 3B active parameters per token. It runs on a single H100 GPU, supports 256K context length, and is optimized for code generation, agentic software engineering, and terminal tasks. The weights are open under Apache 2.0.

  • North Mini Code is Cohere’s first coding model, 30B total parameters with 3B active, supporting 256K context and 64K max output.
  • Runs on a single H100 at FP8; weights open under Apache 2.0 via Hugging Face, Cohere API, and more.
In-site article

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

This article is the second part of the PyTorch profiling series, delving into the internals of nn.Linear layers, including transpose operations, bias-fused epilogue techniques, and the impact of torch.compile on a single linear layer. It then dissects the performance characteristics of a Multilayer Perceptron (MLP) with GeGLU activation, showcasing the scheduling and execution of GPU kernels.

  • nn.Linear fuses bias addition into the matrix multiplication kernel via an epilogue, avoiding extra memory accesses.
  • torch.compile offers no significant speedup for a single nn.Linear layer but eliminates CPU dispatch overhead.
In-site article

DiffusionGemma: Google's Open-Source High-Speed Text Generation Model

Google has released DiffusionGemma, a new open-weight model under Apache 2 license, available for free via NVIDIA's NIM cloud API. It delivers impressive generation speeds exceeding 500 tokens per second.

  • Google releases open-source DiffusionGemma model under Apache 2 license.
  • Free hosting on NVIDIA NIM cloud API.
In-site article

Google's new open model DiffusionGemma generates text from noise instead of word by word

Google released DiffusionGemma, a 26-billion-parameter model that generates text via diffusion, achieving 1,000 tokens per second on an H100 GPU—four times faster than autoregressive models, but with lower quality. It's currently experimental.

  • 26-billion-parameter diffusion model for text generation
  • Reaches 1,000 tokens/sec on a single H100 GPU
In-site article

For Robotaxis, Safety Must Be Built In, Not Bolted On

As robotaxi services expand globally, NVIDIA introduces Halos OS—a comprehensive safety system integrating certified OS, standardized interfaces, AI guardrails, and a validation framework to ensure safety is built into autonomous vehicles from the ground up.

  • Multiple robotaxi programs are launching worldwide using NVIDIA DRIVE Hyperion, including Uber/Autobrains in Munich, Foxconn in Taiwan, VinFast in Southeast Asia, and HUMAIN in Saudi Arabia.
  • NVIDIA Halos OS addresses four key safety challenges: a safety-certifiable operating system, safe interfaces, AI with verifiable guardrails, and validation at scale.
In-site article

Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion for Up to 4x Faster Generation

DiffusionGemma is Google DeepMind's experimental open text generation model that uses text diffusion instead of standard autoregressive decoding, achieving up to 4x faster generation on dedicated GPUs. The 26B MoE model (3.8B active parameters) is built on the Gemma 4 backbone, supports multimodal inputs (text, image, video), has a 256K context window, covers 140+ languages, and is released under Apache 2.0.

  • DiffusionGemma is a 26B Mixture of Experts (MoE) model with 3.8B active parameters that generates text in parallel via diffusion, not token-by-token.
  • It achieves 1000+ tokens/s on a single NVIDIA H100 and 700+ tokens/s on an RTX 5090, fitting in 18GB VRAM when quantized.
In-site article

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

Google DeepMind released DiffusionGemma, an experimental open model for fast text generation using parallel token generation. NVIDIA optimized it to run faster on GeForce RTX, RTX PRO, and DGX Spark systems, achieving up to 1000 tokens/sec locally.

  • DiffusionGemma generates up to 256 tokens in parallel per step, unlike traditional autoregressive models. Based on Gemma 4 (26B parameters, MoE), activating only 3.8B per step. Up to 4x faster performance. Open source under Apache 2.0, runs locally with no cloud dependency.
In-site article

PRC-linked influence operations are targeting AI debates in the US

A new report from OpenAI details PRC-linked influence operations using AI to target U.S. tech debates, data center narratives, tariffs, and false claims about ChatGPT.

  • OpenAI report reveals PRC-linked influence operations
  • Operations use AI to target US tech debates
In-site article

Timing Trick Cuts Energy Used in LLM Training by Up to 14 Percent

Researchers at the University of Twente have shown that by adjusting GPU clock frequencies at the per-kernel level, they can save up to 14% of the energy used in LLM training with minimal impact on speed.

  • Researchers applied dynamic voltage and frequency scaling (DVFS) at the per-kernel granularity on GPUs.
  • Achieved 14% energy savings with only 0.6% increase in training time.
In-site article

Easybilling: AI-Native Billing & Payments for Usage-Based AI Products

Easybilling is an AI-native billing and monetization platform designed for AI SaaS, APIs, agents, and GPU platforms. It supports subscription, usage-based, and credit-driven pricing with real-time usage tracking, prepaid wallets, automated invoicing, and global payments, enabling AI companies to scale monetization without building complex billing infrastructure.

  • AI-native billing platform tailored for usage-based AI products.
  • Supports hybrid pricing: subscriptions, usage-based, and credits.
In-site article

NVIDIA Confidential Computing to Help Expand Apple’s Private Cloud Compute

NVIDIA GPUs with Confidential Computing are now used for confidential inference in Apple's Private Cloud Compute (PCC), which is expanding from Apple's own data centers to Google Cloud. The technology provides a hardware-based security layer to protect user data during processing.

  • NVIDIA Confidential Computing GPUs now used in Apple Private Cloud Compute
  • Apple expands PCC to Google Cloud
In-site article

Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI

This post demonstrates how to train robot policies for the Unitree H1 humanoid using NVIDIA Isaac Lab on Amazon SageMaker AI. It covers two compute options: SageMaker HyperPod for resilient, persistent clusters, and SageMaker Training Jobs for ephemeral, on-demand training. The solution provides a unified Docker image, experiment tracking with MLflow, and detailed walkthroughs.

  • Use NVIDIA Isaac Lab on SageMaker AI to scale reinforcement learning for humanoid robots.
  • Two compute options: HyperPod (persistent, resilient clusters) and Training Jobs (ephemeral, on-demand).
In-site article

SpaceX wants to put data centers in orbit, and Musk says it's no big deal

SpaceX wants to launch data centers into space, and Elon Musk is pitching it as a near-trivial engineering problem ahead of the company's IPO. A first AI satellite would match the output of a single Nvidia GB300 rack. But Google's own research suggests real AI training would require about 10,000 tightly coupled satellites.

  • SpaceX plans to put data centers in orbit; Musk calls it a trivial engineering challenge.
  • First AI satellite would equal a single Nvidia GB300 rack's performance.
In-site article

Amazon Taps Fiber Optics Producer Corning for Data Center Expansion

Amazon has partnered with Corning for data center expansion, following Corning's previous AI infrastructure deals with Nvidia and Meta.

  • Amazon has entered into an agreement with Corning, a fiber optics producer, to support its data center expansion.
  • The deal follows similar AI infrastructure partnerships between Corning and Nvidia, and Corning and Meta.
In-site article

Beijing's $295 billion AI buildout would require 80 percent domestic chips, locking out US suppliers

China plans to invest roughly $295 billion in a nationwide AI data center network over the next five years, with at least 80% of technology from domestic suppliers like Huawei. Meanwhile, Taiwan is considering criminalizing AI chip smuggling to China for the first time.

  • China plans $295B investment in AI data centers over five years
  • 80% of chips and tech to come from domestic suppliers like Huawei
In-site article

Apple's AI pitch will live or die by its privacy promise

Apple debuted AI features at WWDC with a strong privacy focus, but reliance on Google and Nvidia servers due to its late start raises questions about whether the promise can be kept.

  • Apple unveiled AI features at WWDC, emphasizing privacy, but acknowledged using Google and Nvidia servers.
  • Apple's Private Cloud Compute expands to Google Cloud, using Nvidia GPUs and Intel CPUs.
In-site article

Apple Intelligence gets a second shot with help from Google and Nvidia

At WWDC 2026, Apple showed off a rebuilt version of Siri. The assistant runs on foundation models developed with Google. For complex queries, it taps Nvidia GPUs.

  • Apple unveiled a rebuilt Siri at WWDC 2026.
  • The assistant leverages foundation models co-developed with Google.
In-site article

Decentralized AI Inference Marketplace

T4T is a decentralized marketplace for AI inference where GPU providers bid for prompts, payments in xBZZ on Gnosis, routing over Swarm, without API keys or middlemen.

  • Pay-per-token AI inference without a central operator.
  • Providers stake xBZZ and compete for inference jobs.
In-site article

NVIDIA cuTile Python Tutorial: Building Tiled GPU Kernels for Vector Addition, Matrix Addition, and Matrix Multiplication in Colab

This tutorial provides a hands-on workflow for NVIDIA cuTile Python, a tiled GPU programming interface. It covers environment setup, implementing tiled vector addition, matrix addition, and matrix multiplication, with a PyTorch fallback. Includes correctness checks and benchmark comparisons.

  • Set up NVIDIA cuTile Python in Colab with GPU and driver checks.
  • Implement tiled kernels for vector addition, matrix addition, and matrix multiplication.
In-site article

Apple rebuilt its on-device AI stack at WWDC 2026

WWDC 2026 brought no new silicon, but a structural rebuild of how AI runs on Apple silicon: a new inference framework (Core AI), a new model format (.aimodel), a new generation of on-device models (AFM 3), and a changed posture toward the cloud including a partnership with Google and NVIDIA. The most surprising tell: Apple's flagship cloud model runs on NVIDIA GPUs in Google Cloud.

  • Core AI replaces Core ML for neural network inference, introducing the .aimodel bundle format.
  • M5 and A19 GPUs integrate neural accelerators in each shader core, boosting matrix multiplication by 4-8x.
In-site article

MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning

MuJoCo-Drones-Gym is an open-source Gymnasium-compatible multi-drone environment built on MuJoCo, supporting arbitrary Crazyflie 2.x nano-quadcopters with modular physics, action, and observation APIs, and including a PettingZoo wrapper for multi-agent RL and seven task environments.

  • Leverages MuJoCo physics engine for GPU-accelerated simulation with high physical fidelity.
  • Modular API selects physics model (rigid body, explicit dynamics, ground effect, etc.), action interface (RPM, thrust, velocity, waypoint), and observation space (state, cameras, adjacency).
In-site article

Migrating Your GitHub CI to Hugging Face Jobs

This article explains how to migrate GitHub Actions CI to Hugging Face Jobs to overcome limitations of GitHub-hosted runners, such as slow speed and lack of GPU access. By setting up a dispatcher Space, a GitHub App, and modifying the runs-on label, CI jobs can run on Hugging Face infrastructure with CPU or GPU hardware, streaming logs in real-time. Trackio's experience shows a ~30% reduction in CPU job time.

  • GitHub Actions default runners are generic, slow, and lack GPU support.
  • Hugging Face Jobs provides serverless infrastructure with flexible hardware (CPU, T4, H200).
In-site article

Apple bets cheaper AI will woo small developers

Apple announced that developers with fewer than 2 million first-time App Store downloads can use its Foundation Models in Private Cloud Compute at no cost, aiming to attract smaller developers by lowering AI infrastructure barriers.

  • Apple offers free access to Foundation Models via Private Cloud Compute for developers under 2 million first-time downloads.
  • The move is intended to reduce AI infrastructure costs for small developers.
In-site article

HPE ProLiant Compute DL394 Gen12 Brings Nvidia Vera CPU to Agentic AI

HPE announced the ProLiant Compute DL394 Gen12, a 2U server based on the NVIDIA Vera CPU, designed for agentic AI and data-intensive workloads. It integrates HPE's enterprise management and security stack, with a collaboration with NVIDIA and Redpanda; the NYSE is exploring it for its agentic AI infrastructure. The server features a monolithic architecture, LPDDR5X memory with up to 1.2TB/s bandwidth, and quantum-resistant cryptography. Availability is expected in fall 2026.

  • HPE launches DL394 Gen12 with NVIDIA Vera CPU for agentic AI.
  • Collaboration with NVIDIA and Redpanda; NYSE as early adopter.
In-site article

Nvidia Forges South Korea Tech Deals in AI Push

The deals come as Nvidia expands its presence in South Korea, spanning robotics, chip design and AI infrastructure.

  • Nvidia strikes multiple tech deals in South Korea
  • Collaboration covers robotics, chip design, and AI infrastructure
In-site article

Intel gets a second life as Google and Nvidia explore it as a TSMC backup for AI chips

Google has ordered more than three million AI chips from Intel for 2028. Nvidia is testing Intel's manufacturing tech for its upcoming Feynman architecture. Both moves come as TSMC can't keep up with AI chip demand. Intel's long-struggling foundry division is getting a rare second chance.

  • Google orders over 3 million AI chips from Intel for 2028 delivery.
  • Nvidia tests Intel's manufacturing process for its Feynman architecture.
In-site article

Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs

Xiaomi's MiMo team, with TileRT, released MiMo-V2.5-Pro-UltraSpeed, a serving mode for the MiMo-V2.5-Pro model. It decodes over 1000 tokens per second on a 1-trillion-parameter model using a single 8-GPU commodity node. The speedup comes from FP4 quantization, DFlash speculative decoding, and the TileRT runtime. API trial runs June 9–23, 2026.

  • 1T-parameter MoE model achieves 1000+ tokens/sec on commodity GPUs
  • Three coordinated techniques: FP4 quantization, DFlash speculative decoding, TileRT runtime
In-site article

How the UK Is Turning Sovereign AI Ambition Into Action With NVIDIA Technologies

One year after NVIDIA CEO Jensen Huang and UK PM Keir Starmer declared the UK would be an AI maker, NVIDIA and partners showcase progress at London Tech Week. Key developments include doubling of sovereign AI cloud providers, Isambard-AI supercomputer, Sovereign AI Fund backing startups, and enterprise AI deployments across sectors.

  • Number of AI cloud providers planning UK deployments has doubled; Nebius, CoreWeave, BT and Nscale announce new infrastructure.
  • Isambard-AI, the UK's most powerful computer with 5,400 NVIDIA GH200 superchips, powers ambitious AI research.
In-site article

"AI is someone else's GPU"

This new adage is a modern twist on the classic programmer quip "the cloud is someone else's computer," reflecting the reality of AI reliance on external infrastructure.

  • "AI is someone else's GPU" is a corollary to "the cloud is someone else's computer."
  • The phrase spawns variations about startups, life advice, job security, image generation, and product integration.
In-site article

Accelerated Fourier SAT (AFSAT): Fully Realising a GPU-based Symmetric Pseudo-Boolean SAT Solver

We present Accelerated Fourier SAT (AFSAT), a GPU-accelerated solver for pseudo-Boolean satisfiability based on continuous local search (CLS). AFSAT realises the proof-of-concept approach, FastFourierSAT, into a fully-engineered solver supporting any heterogeneous mixture of symmetric constraint types and lengths within a single problem instance. Using the JAX compiler, AFSAT leverages pure function composition, automatic vectorisation, automatic differentiation, and just-in-time (JIT) compilation to perform massively parallel CLS across batches of candidate assignments. We demonstrate substantially improved numerical stability, runtime performance, and memory efficiency over the proof-of-concept. We achieve this by way of identifying and addressing various limitations that arise from memory latency and floating-point representation, as well as leveraging automatic parallelisation and compact representations. The inherent representational and stability limitations of floating point are partially addressed by a tailored discrete Fourier transform implementation. We achieve near-linear throughput when scaling to multiple accelerators via JAX array sharding.

  • AFSAT is a GPU-accelerated pseudo-Boolean SAT solver using continuous local search, improving upon FastFourierSAT.
  • It leverages JAX for massive parallelization via function composition, vectorization, differentiation, and JIT compilation.
In-site article

NVIDIA and LG Group Build an AI Factory to Advance Physical AI, Mobility and AI Infrastructure

NVIDIA and LG Group are building an AI factory to accelerate LG's AI-driven businesses in robotics, autonomous driving, data center technologies, and GPU cloud services. The collaboration integrates NVIDIA's full-stack AI factory platform with LG's leadership in consumer electronics and robotics, aiming to create a unified workflow for physical AI systems.

  • NVIDIA and LG collaborate on an AI factory covering robotics, autonomous driving, data centers, and GPU cloud.
  • LG Electronics will use NVIDIA Isaac Sim and Isaac Lab for home robots, and explore the GR00T model.
In-site article

The Open Source Community is backing OpenEnv for Agentic RL

OpenEnv is a tool for creating an agentic execution environment like terminals, browsers, or anything an agent can interact with. Today, we’re excited to announce that OpenEnv is becoming even more open, to make the future of training agents open source. Starting today, OpenEnv will be coordinated by a committee that so far includes Meta-PyTorch, Reflection, Unsloth, Modal, Prime Intellect, Nvidia, Mercor, Fleet AI, and Hugging Face. OpenEnv now lives at huggingface/OpenEnv. The project focuses on being an interoperability layer for RL environments, not a reward framework or trainer.

  • OpenEnv is an open-source tool for creating agentic execution environments.
  • It is now governed by a committee of major AI organizations including Meta-PyTorch, Reflection, Unsloth, etc.
In-site article

NVIDIA and Doosan Group Collaborate to Advance Physical AI and AI Factory Infrastructure

NVIDIA and Doosan Group are expanding their collaboration to advance new opportunities across physical AI, robotics and AI factory infrastructure, spanning Doosan Robotics, Doosan Bobcat, Doosan Enerbility and Doosan Corporation Electro-Materials BG.

  • Doosan Robotics integrates NVIDIA Isaac Sim and other platforms to advance Agentic Robot OS.
  • Doosan Bobcat plans to use NVIDIA physical AI for autonomous equipment.
In-site article

AI Companies' Shared Destiny Recalls Dot-Com Bubble Memories

The AI infrastructure market is seeing a structure where giants invest in each other, buy each other's services, and generate each other's revenue, reminiscent of the dot-com bubble. SpaceX's pre-IPO AI computing leases with Google and Anthropic raise concerns about revenue authenticity. Investment focus should shift from GPU sales to bottlenecks like power and cooling.

  • SpaceX signed AI computing leases with Google ($920M/month) and Anthropic ($1.25B/month), totaling ~$26B annual revenue.
  • Market questions whether these are genuine external demand or circular internal transactions to boost IPO valuation.
In-site article

Show HN: Every Claw Deserves a Face

Nyxclaw is an open-source project that gives AI agents a real-time face and voice. It runs locally without a GPU, offers self-hosted servers with end-to-end encryption, two voice pipelines (OpenAI Realtime and local CPU stack), and supports ARKit blendshapes at 30 FPS.

  • Nyxclaw provides AI agents with a real-time face and voice, running locally with no GPU required.
  • Self-hosted server with cryptographic pairing ensures data privacy, no cloud dependency.
In-site article

I saw the Surface Laptop Ultra at Computex and it's clear: Microsoft has gone beastmode

Microsoft's Surface Laptop Ultra, powered by Nvidia's RTX Spark SoC, offers up to 128GB unified memory, a 20-core CPU, and RTX 5070-level GPU. Hands-on impressions reveal a premium build, improved thermals, and repairability, but pricing and battery life remain uncertain.

  • Surface Laptop Ultra is flagship RTX Spark laptop with up to 128GB unified memory.
  • Hands-on demo showed smooth gaming and video editing performance.
In-site article

Her · हेर — a detective for your Claude Code sessions

Her is a tool that analyzes Claude Code session traces, reconstructing events in plain English, flagging risky moves (deploys, config changes, secrets), and showing token usage. It runs entirely on the local GPU, no third-party AI API is called, and includes an 'Ask Her' assistant to answer questions from the trace.

  • Her reads Claude Code .jsonl session files, summarizing events and highlighting risks.
  • All processing is done locally on GPU, no third-party API calls ensuring privacy.
In-site article

Show HN: Best setup local LLM found for a 5090 (llama.cpp fork + turboquant)

This article details the configuration and memory calibration required to run the Qwen 3.6 35B MoE model at a 450,000 token context window on a single 32GB VRAM GPU (NVIDIA RTX 5090) using llama.cpp with TurboQuant and YaRN scaling. It covers model selection, quantization trade-offs, KV cache quantization, RoPE scaling, multimodal setup, replication guide, VRAM lifecycle management, and performance evaluation.

  • Run Qwen3.6-35B-A3B-Q6_K on a single RTX 5090 with 450K context using llama.cpp TurboQuant fork and YaRN scaling.
  • Achieve 450K context by compressing KV cache to 3-bit (turbo3) and extending RoPE beyond native 262K with YaRN, but at cost of perplexity and retrieval accuracy.
In-site article

NVIDIA, KRAFTON, NC and Reigning ‘League of Legends’ Champions T1 Celebrate RTX Spark at Korea’s PC Bangs

At GTC Taipei at COMPUTEX last week, NVIDIA unveiled RTX Spark, the superchip that reinvents Windows PCs for the era of personal AI agents. On the heels of this announcement, NVIDIA founder and CEO Jensen Huang headed to South Korea, where he introduced RTX Spark to the nation’s passionate gaming community. Leading game developers — including Korea’s KRAFTON and NC — are already working to bring their titles to RTX Spark-powered systems.

  • NVIDIA unveiled RTX Spark superchip for AI, creation, and gaming, supporting AAA games at 1440p over 100 fps.
  • Jensen Huang met with T1’s LoL world champions including Faker, and showcased RTX Spark at T1 Base Camp.
In-site article

NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors

This tutorial walks through NVIDIA garak as an end-to-end framework for defensive LLM red-teaming. It covers setup, plugin discovery, dry runs, real-model scans, multi-probe evaluations, report analysis, custom probe and detector creation, and AVID export. The workflow enables comprehensive LLM security testing and vulnerability reporting.

  • NVIDIA garak is an open-source framework for defensive LLM red-teaming. The tutorial demonstrates a complete workflow from setup to custom extensions.
  • It includes multi-probe scanning, safety score and attack success rate analysis, and inspection of flagged outputs.
In-site article

More growth tags

GPU Infrastructure AI News | AI News Hub