Self-driving labs combine AI with automated hardware to let the system learn from experiments and autonomously decide what to do next, moving beyond mere automation to true autonomy.
Self-driving labs use AI to close the loop between design, make, test, and learn.
They differ from automation by making decisions based on real-time results.
One of the main frontier AI models is adding embodied AI capabilities. Alibaba's Qwen-Robot Suite aims to bridge the gap between perception and action with three specialized models.
Qwen models have been confined to software with no physical interaction.
Alibaba launched Qwen-Robot Suite with three models for navigation, manipulation, and world modeling.
A deep dive into one of the most important techniques in modern AI — distillation — and how it addresses the cost, deployment, and specialization challenges of large-scale models.
Distillation makes AI models more efficient and deployable, addressing scale-induced challenges.
Scale drove AI progress but led to expensive, slow, and difficult-to-specialize models.
LayerLens launches the Stratix Cup, a soccer tournament where top AI models compete as agents in a simulated environment, testing planning, adaptation, and multi-agent coordination.
LayerLens introduces the Stratix Cup, a soccer tournament for AI models.
The competition tests agentic capabilities: pre-game strategy, real-time gameplay, and halftime adaptation.
A week of really unexpected turns in the AI market: SpaceX acquires Cursor for $60B, key researchers leave Google, and Midjourney reveals a full-body medical scanner.
1. SpaceX acquires Cursor for $60B in stock, signaling AI tooling as strategic infrastructure.
2. Noam Shazeer and John Jumper leave Google, highlighting talent consolidation in AI frontier.
Google DeepMind has released DiffusionGemma, a text-diffusion model that challenges traditional transformer architectures by not generating text left-to-right token by token.
DiffusionGemma is a text-diffusion model from Google DeepMind.
It challenges the conventional transformer architecture.
This article concludes the series on alternatives to the Transformer, covering four families: recurrent/linear-recurrent models, state space models, text diffusion models, and liquid/continuous-time models. It also announces a new series on knowledge distillation.
Self-attention has quadratic scaling and memory costs for long sequences.
Four alternative directions: recurrent (constant memory), state space (linear scaling), text diffusion (parallel generation), liquid (continuous-time dynamics).
A major week in AI: Anthropic launches Claude Fable 5 and Mythos 5, Apple debuts Siri AI, SpaceX goes public in record IPO, and Bezos's Prometheus raises $12B to build an 'artificial general engineer'.
Anthropic releases Claude Fable 5 and Mythos 5, decoupling capability from access
Apple unveils Siri AI with a custom 1.2-trillion-parameter Gemini model, leveraging personal context
The paper 'Language Models Need Sleep' argues that LLMs suffer from anterograde amnesia, unable to learn after training, and proposes a sleep-like consolidation mechanism.
LLMs are static after pre-training, unable to learn new information.
They exhibit anterograde amnesia, lacking long-term memory formation.
The Transformer is currently the reference architecture for AI due to its scaling properties, but its attention mechanism is expensive. The article questions whether Transformers are the final architecture or just the first scalable one.
Transformers excel due to attention mechanism, applicable to diverse data types.
Attention is computationally expensive and scales poorly with sequence length.
Jensen Huang's five-layer AI cake seems harmonious, but strategists see a battlefield over margin pools. The key to control is owning the scarce layer and the seam adjacent to it.
Huang's cake metaphor from a chip vendor's perspective highlights mutual reinforcement.
Strategists view the stack as five stacked margin pools vulnerable to commoditization.
Claude Opus 4.8, released on May 28, 2026, may seem like a minor version bump, but it delivers significant reliability improvements including a 4x reduction in undetected code flaws, fixes for silently skipped tool calls, better compaction recovery for long trajectories, dynamic workflows, adaptive thinking, and a fast mode that is 2.5x faster and 3x cheaper than 4.7. The release focuses on calibration and honesty, making it a critical update for production agent loops.
Opus 4.8 improves calibration and honesty, reducing instances of the model leaving flaws in its own code unremarked by about 4x.
It fixes silently skipped tool calls and improves compaction recovery, enhancing long-horizon run reliability.
This article examines the limitations of the Transformer architecture and introduces liquid models as a promising alternative for low-latency, private on-device intelligence.
Transformer's global attention leads to high memory and compute costs during inference.
Liquid models use dynamics instead of attention, offering efficiency for real-time and edge scenarios.
Anthropic's Claude Opus 4.8 edges closer to operational profitability; OpenRouter and Cognition raise massive rounds; Snowflake inks $6B AWS deal; Pope Leo XIV warns against AI dominance. The industry shifts from model-centric competition to a token-based economy.
Claude Opus 4.8 shows modest gains in coding and reasoning, with new effort control, dynamic workflows, and improved honesty.
OpenRouter raises $113M at $1.3B valuation, processing 25T tokens weekly; Cognition raises $1B, with Devin writing 89% of internal code.
For most of the modern AI era, scaling laws drove progress. But recursion — the ability of models or systems to revisit, revise, search, and simulate — is becoming the new scaling dimension. This shift marks a paradigm change from single forward passes to iterative computation.
Traditional AI progress relied on larger models and more data, but recursion is emerging as the new frontier.
Recursion enables models to iteratively improve answers rather than producing a one-shot output.
This article criticizes Chain-of-Thought (CoT) reasoning in LLMs as inefficient, since it forces reasoning to leave the residual stream and become discrete tokens. Sapient Intelligence's HRM-Text addresses this by performing reasoning in latent space, providing variable internal depth for fixed-depth Transformers, thus challenging current reasoning paradigms.
Chain-of-Thought (CoT) is not true reasoning but a workaround that makes models 'rent depth' from output tokens.
Sapient Intelligence's HRM-Text performs reasoning in latent space, not in the token stream.
Text diffusion models challenge the autoregressive paradigm by generating text through iterative denoising, treating generation as editing rather than typing. Three key systems define the field: LLaDA (proof of scaling), Mercury (commercial speed advantage), and Gemini Diffusion (frontier validation), representing the three phases of a new architecture class: scientific proof, industrial deployment, and frontier validation.
Text diffusion models generate text by iterative refinement from noise, using bidirectional context.
LLaDA proved diffusion can scale to a large language model.
The last three weeks marked a phase transition in AI: Google unveiled Gemini Omni and an agent-first platform; Andrej Karpathy joined Anthropic to accelerate pretraining; Anthropic secured a $45B compute lease from xAI's Colossus; Cerebras IPO surged to a ~$95B market cap; and SpaceX, OpenAI, and Anthropic are planning to go public within six months, collectively worth trillions. Research highlights include HRM-Text efficient pretraining, AI reviewer evaluation, NVIDIA's unified AR-diffusion model, and more.
Google I/O introduced Gemini Omni, Gemini 3.5 Flash, Antigravity agent platform, and TPU 8i for a vertically integrated agent pipeline.
Andrej Karpathy joined Anthropic to lead a team using Claude to accelerate pretraining, signaling a practical self-improvement flywheel.
The next phase of AI agents will be defined by access to a computer—filesystem, terminal, browser, etc.—not just better models. The market for agentic sandboxes is emerging.
AI agents need a real execution environment including filesystem, terminal, network, etc.
An agent that can only emit tokens is a brain in a jar, lacking agency.
Text diffusion models are emerging as a credible alternative to autoregressive transformer models for language generation, overcoming limitations like generation drift and the reversal curse.
Diffusion models rule visual AI but have been an afterthought in text.
Autoregressive models have inherent flaws: left-to-right generation, no global planning, and cascading errors.
Last week in AI was marked by Cerebras's massive IPO, Thinking Machines' interactive models that embed collaboration into the model itself, Recursive Superintelligence's $650M launch for self-improving AI, and Junyang Lin's new AI lab at ~$2B valuation in China.
Cerebras IPO surged 68%, reaching ~$95B market cap, emphasizing the physical infrastructure of AI.
Thinking Machines unveiled interaction models where real-time collaboration is built into the model, not the harness.
Evaluations are becoming the fourth pillar of modern AI, alongside compute, data, and models. Every company needs its own dynamic evaluation suite tailored to its workflows, not generic benchmarks.
Evaluations are emerging as the fourth pillar of AI.
Companies require private evaluation systems for their unique workflows.
Anthropic's new Natural Language Autoencoders allow researchers to get direct English descriptions of what an LLM is thinking, marking a significant step in interpretability.
Anthropic introduces Natural Language Autoencoders (NLA) that produce unsupervised English explanations of LLM activations.
NLA allows researchers to ask 'what are you thinking?' and get bullet-point answers.
State space models (SSMs) are emerging as a viable alternative to Transformers, offering linear time complexity and constant memory during inference. This article explores the mathematical foundations, recent breakthroughs, and how SSMs now compete on key language tasks.
Transformer self-attention suffers from O(n²) complexity, limiting long-context scalability.
State space models achieve linear complexity with no KV-cache, enabling efficient inference.
This week's AI developments highlight a shift from a model race to an infrastructure race. Anthropic's natural language autoencoders enable interpretability via language, OpenAI's voice models push conversational interfaces, SubQ claims a 12M-token context window, and Chinese AI labs like DeepSeek and Moonshot see soaring valuations. The editorial underscores that AI is becoming more inspectable, conversational, memory-rich, and institutionally valuable.
Anthropic's natural language autoencoders turn model activations into readable text, opening new interpretability paths
OpenAI's voice models transform AI from text-based queries to real-time conversational agents
NVIDIA's Nemotron 3 Nano Omni is a multimodal reasoning model that unifies video, audio, image, and text processing into a single efficient model for agentic workflows, avoiding the lossy pipeline of separate models.
Nemotron 3 Nano Omni integrates video, audio, image, and text into one model.
Designed to replace the fragmented pipeline of separate ASR, VLM, and OCR models.