AI News HubLIVE
Public articles 17Collected articles 21Trust 88Refresh 720 min
Health HealthySource type ResearchFull-text rights Full text allowedLast ingested 2026-06-23ID interconnectsStatus Enabled

Public Substack newsletter by ex-Meta RLHF researcher; free posts allowed.

Latest public articles

GLM-5.2 is the step change for open agents

GLM-5.2, released by Z.ai, represents a significant leap for open-weight models, matching or exceeding closed-source models in agent and coding benchmarks. Its release amid the ban on Claude Fable highlights economic and geopolitical implications, sparking debates on open vs. closed models.

  • GLM-5.2 achieves top-tier performance on agent and coding benchmarks, rivaling Anthropic and OpenAI models.
  • Released during U.S. export restrictions on Claude Fable, it underscores open model economics and geopolitical tensions.
In-site article

Banning Open Source AI Would Be A Mistake

This article argues that banning or over-regulating open source AI would be a grave mistake. Open source software has been crucial for education, innovation, and competition, generating trillions in economic value. In AI, open source models provide a counterweight to monopolies and are more transparent and secure. Concerns about China should not lead to restrictions on open source; instead, support for domestic open source should be strengthened.

  • Open source software underpins over 90% of global software and has generated over $8 trillion in economic benefits.
  • Open source AI promotes education, innovation, and competition, empowering startups and smaller players.
In-site article

State of the blog, mid-2026

The author reflects on the blog Interconnects three years into weekly writing, discussing its role in their career goals, recent advising roles with Arcee AI and Mercor, and plans to evolve the blog's operations including paywalled comments and more paid articles to maintain a high-quality, niche audience.

  • The blog is an independent, raw voice focused on open science and frontier AI.
  • Recent advising roles with Arcee AI and Mercor support the author's missions.
In-site article

Frontier post-training recipe review with Finbarr Timbers

This podcast dives into the evolution of post-training recipes, from InstructGPT to the 2026 multi-teacher on-policy distillation (MOPD) era. Nathan Lambert and Finbarr Timbers reflect on challenges in open-source models like OLMo-3 and analyze how frontier labs leverage specialized teachers and distillation to push performance boundaries.

  • Post-training recipes have transformed dramatically, moving from single pipelines to multi-teacher strategies (MOPD).
  • MOPD trains domain-specialist teachers and distills into a general student, solving RL conflict issues.
In-site article

Claude Fable 5 and new AI safety fables

One step further into the power politics of frontier AI systems.

  • Claude Fable 5 is the most capable public model with major benchmark improvements.
  • Safety classifiers for cybersecurity, biology, and distillation trigger fallback to Opus 4.8 with user notification.
In-site article

Farewell Ai2

Nathan Lambert reflects on his time at the Allen Institute for AI (Ai2), where he worked on the Olmo models and led projects like Tülu 3. He emphasizes the importance of open research and shares his journey from a relatively unknown researcher to a prominent voice in AI.

  • Nathan Lambert spent two years at Ai2, leading key open language model initiatives.
  • He highlights the critical role of open research and relationship-building in AI.
In-site article

Some ideas for what comes next, May 2026

2026 continues to accelerate AI progress with open models lagging in agentic capabilities, Google's Gemini not yet competitive with Claude Code/Codex, American open models rising, a fierce competition between Anthropic and OpenAI, and power structures asserting control.

  • Open models are 5-6 months behind in agentic capabilities, likely extending to 12+ months.
  • Google's Gemini lacks a clear competitor to Claude Code and Codex.
In-site article

Latest open artifacts (#21): Open model bonanza! Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others. On CAISI's V4 assessment.

An eventful month with one flagship release after another. CAISI assessment shows open models lagging behind the US frontier, but methodology is questioned. Highlights include MiMo-V2.5-Pro, Gemma-4, Kimi-K2.6, Laguna-XS.2, and DeepSeek-V4-Flash.

  • Multiple open model releases from DeepSeek, Google, Moonshot AI, Xiaomi, and others.
  • CAISI evaluation shows large Elo gap, but benchmarks may underestimate real-world performance.
In-site article

How open model ecosystems compound

The article explains that 80% of compute for frontier models is R&D, not final training. Open ecosystems like China's reduce duplicated R&D costs. Open models lower future development costs but not immediate deployment. The author argues for an open model consortium to sustain cost advantages.

  • About 80% of compute goes to R&D, not final model training.
  • China's open ecosystem reduces duplicated R&D effort across labs.
In-site article

Notes from inside China's AI labs

An inside look at Chinese AI labs reveals a culture of humility, practical fast-following, and a focus on building rather than philosophical debates. Chinese researchers, many students, excel at meticulous LLM development with less ego, while the ecosystem lacks a developed data industry but shows early domestic AI demand.

  • Chinese AI labs cultivate a fast-follower culture with less ego, enabling efficient model building.
  • Students play a core role, bringing fresh perspectives and dedication.
In-site article

Reading today's open-closed performance gap

The performance gap between open and closed models is nuanced and not captured by a single number. Benchmarks evolve, trust diminishes, and frontier labs face economic pressure to constantly innovate. Chinese open models are competitive but may focus more on benchmarks, while real-world robustness still favors closed models.

  • The open-closed gap is dynamic and multi-dimensional, not a single metric.
  • Benchmarks shift over time and correlate less with real-world performance.
In-site article

Claude Mythos and misguided open-weight fearmongering

This article analyzes the wave of fear surrounding open-weight AI models after the announcement of Claude Mythos. The author argues that the concerns are similar to past overblown fears and calls for nuanced study rather than a general ban.

  • Claude Mythos raises fears about open-weight models enabling cyberattacks.
  • Similar panic occurred with GPT-2 and GPT-4, which did not materialize.
In-site article

Gemma 4 and what makes an open model succeed

The article explores the competitive landscape of open models in 2026, the key factors for their success (performance, provenance, license, tooling, finetunability), and analyzes Google's latest Gemma 4 series. It argues that success depends more on usability and ecosystem support than benchmark scores.

  • The open model market has grown from a few players to many competitors, but still holds huge potential.
  • Evaluating open models requires considering performance, license, tooling, and finetunability.
In-site article

Latest open artifacts (#20): New orgs! New types of models! With Nemotron Super, Sarvam, Cohere Transcribe, & others

This issue covers a diverse range of open models spanning OCR, RAG search, audio transcription, computer use, code editing, math theorem proving, and more. Models come from a broader set of builders including NVIDIA, Cohere, Sarvam, Mistral, and others, highlighting the industry's push for domain-specific, cost-effective models.

  • NVIDIA releases Nemotron-3-Super, a 120B param model with 12B active, 1M context, first to use NVFP4 in pretraining.
  • Cohere's Transcribe model, based on conformer, supports 14 languages under Apache 2.0.
In-site article

Lossy self-improvement

The article argues that AI progress, while significant, is better described as 'lossy self-improvement' rather than recursive self-improvement. Frictions such as narrow automatable research, diminishing returns from parallel agents, and resource bottlenecks suggest a more linear trajectory rather than exponential takeoff.

  • Automatable research is narrow, focusing on single metrics rather than complex trade-offs.
  • Adding more AI agents yields diminishing returns due to human supervision limits and task generation bottlenecks.
In-site article

GPT 5.4 is a big step for Codex

Despite incremental benchmark gains, GPT 5.4 in Codex offers real improvements in usability, speed, and context management, though Claude still wins on charm.

  • GPT 5.4 feels like a meaningful step in correctness, ease of use, speed, and cost for agentic tasks.
  • OpenAI's agent previously suffered from 'death by a thousand cuts'; GPT 5.4 removes those hard edges.
In-site article

All sources