xAI is building Colossus 2, a gigawatt-scale training datacenter, leveraging innovative power sourcing across state lines and partnerships with Solaris Energy. The project aims to surpass rivals by Q3 2025, with potential Middle East funding.
xAI completed Colossus 1 in a record 122 days; Colossus 2 is even larger and faster, with 200MW cooling capacity in six months.
Power for Colossus 2 comes from a former Duke Energy plant in Mississippi via gas turbines, bypassing local opposition in Tennessee.
Nvidia announced the Rubin CPX, a solution specifically optimized for the prefill phase, emphasizing compute FLOPS over memory bandwidth. This is a game changer for inference, second only to the March 2024 announcement of the GB200 NVL72 Oberon rack-scale form factor. Specialized hardware for prefill and decode unlocks the full potential of disaggregated serving. Nvidia's rack system design gap has become canyon-sized, forcing competitors to reconfigure their roadmaps.
Rubin CPX is a prefill-dedicated GPU with 20 PFLOPS FP4 dense compute and 2 TB/s memory bandwidth, using 128GB GDDR7 for lower cost.
New VR200 NVL144 CPX and dual-rack configurations provide flexible prefill-to-decode ratios for disaggregated inference.
Huawei is ramping up Ascend AI chip production using a die bank from TSMC and capacity from SMIC. However, HBM (High Bandwidth Memory) shortages will become the primary bottleneck for future production. China's domestic HBM supplier CXMT is ramping quickly but cannot meet demand in the near term. The article also analyzes the impact of export controls and the potential implications of Nvidia H20 chip sales to China.
Huawei shipped 507k Ascend units in 2024 and is on track for 805k in 2025, mostly 910C.
SMIC capacity is no longer the bottleneck, but HBM supply will limit Huawei to under 1 million chips next year.
Two-and-a-half years ago, SemiAnalysis flagged a looming 'cloud crisis' at AWS. Today, evidence mounts as Azure leads quarterly cloud revenue and Google Cloud narrows the gap. Yet SemiAnalysis makes an out-of-consensus call for an AWS AI resurgence, driven by its partnership with Anthropic. Anthropic's revenue surged from $1B to $5B annualized in 2025, and AWS is building over 1.3GW of datacenter capacity for it, hosting nearly a million Trainium2 chips. Despite Trainium2 lagging Nvidia on specs, its memory bandwidth per TCO advantage aligns with Anthropic's reinforcement learning roadmap. The collaboration is evolving into a custom silicon program, poised to boost AWS growth above 20% by end of 2025.
AWS faces AI cloud market share loss, but SemiAnalysis predicts resurgence driven by Anthropic partnership.
This article provides an in-depth analysis of H100 and GB200 NVL72 training benchmarks, covering model flops utilization (MFU), total cost of ownership (TCO), cost per million tokens, energy consumption, and reliability. It reveals that H100 achieved up to 57% throughput improvement over 12 months via software optimization alone. Meanwhile, GB200 NVL72 offers potential performance advantages but faces reliability challenges and has not yet completed large-scale training runs. Detailed benchmarks for models like GPT-3 175B and Llama 3 405B are presented, along with three recommendations for Nvidia: increase benchmark transparency, expand to native PyTorch, and improve GB200 diagnostic tools.
H100 improved BF16 MFU from 34% to 54% and FP8 MFU from 29.5% to 39.5% over one year through software alone.
GB200 NVL72's total cost per GPU is about 1.6x that of H100, requiring at least 1.6x performance for TCO advantage.
GPT-5's release disappointed power users, but the real focus is on monetizing ChatGPT's 700M+ free users. The article analyzes how OpenAI's new router technology distinguishes query intent and enables future monetization through agentic purchasing and transaction fees, potentially creating a consumer superapp.
GPT-5 prioritizes free users over power users, using a router to prepare for ad monetization.
The router can differentiate between informational and commercial queries, allocating more compute to high-value ones.
Robots have powered manufacturing for decades, yet they stayed single-purpose and thrived only in perfect settings. Previous attempts at intelligent machines overpromised and underdelivered. But they were too early. Today, modern AI paradigms convert most robot roadblocks into data problems and push machines toward capabilities once thought impossible. As these models absorb real-world experience, robots will sharpen current skills, gain new ones, and deploy faster, absorbing ever-increasing shares of labor.
The article introduces the industry-first 'Robotics Levels of Autonomy' classification, with 5 levels from Level 0 to Level 4.
Current general-purpose robots are achieving early production in Level 2 and early pilots in Level 3.
Meta’s shocking purchase of 49% of Scale AI at a ~$30B valuation shows that money is of no concern for the $100B annual cashflow ad machine. Despite seemingly unlimited resources, Meta has been falling behind foundation labs in model performance.
Meta acquires 49% of Scale AI at ~$30B valuation.
Zuckerberg builds a Superintelligence team with massive compensation offers ($200M-$1B).