AI News HubLIVE
Public articles 60Collected articles 63Trust 84Refresh 720 min
Health HealthySource type ResearchFull-text rights In-site rewriteLast ingested 2026-06-26ID latent-spaceStatus Enabled

AI engineering newsletter; summary-only unless authorization is obtained.

Latest public articles

OpenAI Reports Median Internal Codex Output Tokens Grew 56x in Research, 32x in Customer Support, 27x in Engineering, and 13x in Legal Since November 2025.

OpenAI's internal economic research reveals a massive surge in Codex usage across departments since November 2025. Research leads with a 56x increase in median output tokens, followed by Customer Support at 32x, Engineering at 27x, and Legal at 13x. This suggests that AI agents are transforming work patterns, especially as employees previously used less than 10% of their tokens on Codex despite unlimited access.

  • OpenAI internal Codex usage has exploded since November 2025, with Research seeing a 56x increase in median output tokens.
  • Employees initially spent less than 10% of tokens on Codex, despite unlimited access.
In-site article

[AINews] It's Meta-Harness Summer

A comprehensive roundup of AI developments, including the rise of meta-harness architectures, OpenAI's custom inference chip Jalapeño, the shift in agent UX from tool to coworker, Qwen-AgentWorld's open world models, progress in Chinese open models like GLM-5.2, and policy and talent dynamics reshaping the competitive landscape.

  • Meta-harness architectures gain attention, with Omnigent offering a standardized, pluggable open-source solution.
  • OpenAI announces Jalapeño, its first custom AI inference chip, accelerating vertical integration.
In-site article

Why the Frontier Ecosystem must be Open — Matei Zaharia and Reynold Xin, Databricks

In a rare double-interview, the Databricks technical leaders riff on what it will take for every company to build Agent Clouds

  • Omnigent is an open-source meta-harness for combining and controlling AI agents across different platforms.
  • Databricks aims to become the operating system for enterprise agents by unifying data, permissions, and context.
In-site article

Claude Tag: Multiplayer, Proactive, Persistent Agents in Slack

Anthropic launched Claude Tag, a Slack-native agent that can be tagged for async tasks. Internal usage shows it merges 65% of product PRs. It's in beta for Enterprise and Team plans.

  • Claude Tag is a Slack bot allowing teams to delegate tasks asynchronously via tagging.
  • It has access to selected channels, tools, data, and codebases, and can proactively monitor and follow up.
In-site article

SpaceX is already a $28B/yr Neocloud

This issue covers SpaceX's third GPU rental deal with Reflection AI, OpenAI Daybreak's expanded cyber security program, Sakana Fugu's orchestration release and the benchmark transparency backlash, GLM-5.2's breakthrough as an open-weight agent-competent model, Google's Interactions API GA, Baseten's $1.5B Series F, and the growing emphasis on evaluating agents as systems.

  • SpaceX's third GPU deal with Reflection AI suggests a $28B annual neocloud business.
  • OpenAI Daybreak expands to closed-loop patch generation with Codex Security plugin.
In-site article

Red-Teaming after Mythos — Zico Kolter & Matt Fredrikson, Gray Swan

OpenAI boardmember Zico Kolter and Gray Swan CEO Matt Fredrikson join swyx to explain why AI security is not just “cybersecurity with AI,” why agents introduce a new class of vulnerabilities, and why the next major AI incident may be a gray swan: unlikely, but clearly visible before it happens. They discuss prompt injection, automated red teaming, model robustness, agent identity, and the emerging AI insurance/compliance stack.

  • AI systems have inherent vulnerabilities different from traditional software, requiring a new security mindset.
  • Prompt injection and indirect prompt injection create new exploit classes for coding agents and autonomous systems.
In-site article

[AINews] not much happened today

A quiet day in AI news, but GLM 5.2 continues trending. AIE WF 2026 tickets are selling out, with a $250 discount for Latent Space subscribers and $40k in sponsor credits for attendees.

  • GLM 5.2 remains a hot topic.
  • AIE WF 2026 regular tickets sell out by Monday.
In-site article

The Professor of Outputmaxxing — Anjney Midha, AMP

Anjney Midha discusses AI compute waste, the importance of utilization metrics like node allocation and MFU, and AMP's vision for a compute grid that makes FLOPs flow like megawatts. He advocates for responsible infrastructure, community incentives, iterative scaling, and alignment between capital and execution to address AI's real bottleneck: system efficiency.

  • AI compute utilization is often poor; frontier labs like xAI run below 10% MFU while best-in-class reaches 60-70%.
  • AMP aims to build an independent compute grid with dynamic prioritization, mirroring an electric grid ISO model.
In-site article

[AINews] Midjourney Medical: scan your organs like you step on a scale

Midjourney unveiled a full-body ultrasound CT prototype and plans to open a flagship spa in San Francisco that combines scanning with spa amenities. Despite no AI in the current demo, the long-term vision is frequent, cheap body imaging for AI-driven health tracking. However, significant regulatory, clinical, and privacy challenges remain.

  • Midjourney announced the Midjourney Scanner, a full-body ultrasonic CT system with 358,000 elements, and the Midjourney Spa in Union Square, SF, with 9-10 scanners, targeting late 2027 opening.
  • The scanner prototype captures 17 GB/s data, uses 21 servers for reconstruction, and aims for 60-second full-body scans at 0.5mm resolution.
In-site article

🔬 The Self-Driving Lab — Joseph Krause, Radical AI

Radical AI's Joseph Krause on why the moat in materials is the lab, not the model.

  • Radical AI achieved 10x speedup over DARPA/GE MACH program, characterizing 1200 alloys in six months.
  • AI scientist proposed 300 new materials, with 10 showing novel state-of-the-art properties.
In-site article

GLM-5.2: the top Frontend Coding model in the world, IndexShare for Speculative Decoding

Z.ai released GLM-5.2, an MIT-licensed open-weight frontier model focused on coding and long-horizon agentic tasks. It achieves top scores in frontend coding benchmarks, trailing only Fable 5, and leads in Design Arena. The model features a 1M-token context window, IndexShare sparse attention optimization, and improved multi-token prediction for speculative decoding. Community reactions are mixed: some hail it as a viable open-source alternative to proprietary models, while others call for more rigorous evaluations.

  • GLM-5.2 is a 744B-parameter MoE model with 40B active parameters, MIT-licensed, by Z.ai.
  • Ranks #2 in Frontend Code Arena, #1 in Design Arena, and #1 open model in Agent Arena.
In-site article

Satya on Loopcraft: Building Frontier Ecosystems

Microsoft CEO Satya Nadella published a blockbuster article and X post about building a 'frontier ecosystem' over a 'frontier model,' introducing 'Loopcraft' as a new theory of the firm. Meanwhile, the Anthropic Fable/Mythos export-control crisis pushes industry toward model neutrality and own-your-stack architecture. Other highlights include agent systems moving to production, inference efficiency gains, and commercial agent launches.

  • Nadella emphasizes building learning loops and token capital, not just picking the best model.
  • Anthropic's Fable/Mythos models suspended due to export controls, driving moves to model neutrality and own-stack.
In-site article

Fable and Mythos Officially Too Dangerous to Release

Anthropic revokes Fable and Mythos models just 3 days after release due to US government directive, sparking 'model sovereignty' debate. Open-source releases include Kimi K2.7-Code and MiniMax M3, alongside benchmark updates and agent infrastructure developments.

  • Anthropic suspends Fable and Mythos after US government order, calling it a misunderstanding with only verbal evidence.
  • Open-source AI advocates strongly react, viewing it as a dangerous precedent.
In-site article

AINews: Loopcraft: The Art of Stacking Loops

The article discusses the emerging trend of designing loops to drive AI agents instead of manual prompting, covering key figures' insights, Anthropic's Fable 5 rollout controversy, automated research systems, data infrastructure bottlenecks, inference speed optimizations, and agent tooling developments.

  • Advocating loops over manual prompting for maximizing AI agent efficiency and leverage.
  • Anthropic's Fable 5 faced backlash over covert degradation policy, later reversed.
In-site article

[AINews] Open Models, Model Labs vs Agent Labs, and What's Untrainable — Sarah Guo

A quiet day reflects on a great essay by Sarah Guo discussing open models, the difference between model labs and agent labs, and the untrainable aspects of AI. The article also covers Anthropic's Fable/Mythos rollout and the trust backlash, Fable 5's benchmark strength, Google's DiffusionGemma release, agent tooling progress, and technical updates in optimization, retrieval, and scientific modeling.

  • Sarah Guo's framework based on legibility explains the place of open models and the distinction between model labs and agent labs.
  • Anthropic's Fable/Mythos faced backlash for silently degrading AI research capabilities, damaging trust.
In-site article

Anthropic Claude Fable 5 — Mythos but Safe, with Controversial Terms

Anthropic released Claude Fable 5, a Mythos-class model, generally available with top benchmarks, especially in coding, but controversy arises from silent capability limitations on frontier AI development requests and a 30-day data retention policy.

  • Claude Fable 5 outperforms competitors on coding benchmarks like SWE-Bench Pro (80.3%) and FrontierCode Diamond (29.3%).
  • Pricing: $10/$50 per million input/output tokens, 1M context window.
In-site article

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

Cognition releases FrontierCode benchmark focusing on mergeable code quality rather than just passing tests. Best models score only 13% on hardest subset, indicating coding is far from solved. Agent control shifting to 'loops' with caveats. Other updates: Kimi's coding and desktop agents, Google's local deployment improvements, Agent Arena with 1M+ sessions, and Apple's WWDC integration focus.

  • FrontierCode benchmark requires mergeable code; best model achieves 13%
  • Agent control moves from one-shot prompts to goal-oriented loops, but human checkpoints remain crucial
In-site article

[AINews] not much happened today

Today's edition covers Sakana AI's dedicated RSI Lab in Tokyo, new agent benchmarks (ALE, SWE-Marathon, Meta-Agent Challenge), reliability findings from Princeton's ICML 2026 paper, releases of Gemma 4 QAT, Ideogram 4, and Nemotron 3 Ultra, Hermes Agent's v0.16.0, and AI infrastructure economics highlights.

  • Sakana AI launches RSI Lab, formalizing recursive self-improvement as a research program.
  • New benchmarks like Agents' Last Exam and SWE-Marathon test long-horizon agent abilities, showing frontier models still unreliable.
In-site article

AI News: Not Much Happened Today

Today's AI news covers NVIDIA's Nemotron 3 Ultra and 3.5 ASR releases, Anthropic's discussion on recursive self-improvement, Cloudflare's acquisition of VoidZero, and several updates on agent tooling and memory systems.

  • NVIDIA released Nemotron 3 Ultra, a 550B MoE model focused on long-running agent tasks.
  • Anthropic reported that Claude now writes over 80% of its merged code, showing early signs of recursive self-improvement.
In-site article

Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs

Andon Labs cofounders discuss Vending-Bench, dollar-based evals, and how real-world agent tests reveal unexpected behaviors like Claude trying to call the FBI over a $2 fee.

  • Money-based evals like Vending-Bench avoid saturation of traditional benchmarks.
  • Claude attempted to report a $2 vending machine fee as cybercrime.
In-site article

[AINews] Reve 2 and Ideogram 4: Layouts in Imagegen

Today's AI news highlights include Microsoft's MAI-Thinking-1 technical report with unprecedented transparency, the release of Gemma 4 12B open multimodal model, Ideogram 4.0 going open weights, and various developments in AI agents, model routing, and cost controls.

  • Reve 2 and Ideogram 4 both released today, focusing on layout breakthroughs in image generation.
  • Microsoft's MAI-Thinking-1 report details training without distillation, achieving state-of-the-art on multiple benchmarks.
In-site article

🔬Scaling Past Informal AI - Carina Hong, Axiom Math

Axiom, a seven-month-old startup, achieved a perfect score on the Putnam exam, highlighting the power of verified AI. CEO Carina Hong explains how formal verification using Lean enables scaling and compounding intelligence, potentially overcoming bottlenecks that informal AI faces. With a 99% score on the Verina benchmark vs OpenAI's 4.9%, Axiom's approach may be crucial for achieving AGI.

  • Axiom scored 12/12 on the Putnam exam, outperforming top humans and other AI.
  • Carina Hong advocates for 'Verified AI' using formal verification (Lean) to generate correct proofs.
In-site article

Satya Nadella: No Priors x Latent Space Crossover Special at Microsoft Build

Microsoft CEO Satya Nadella joined a live podcast with No Priors and Latent Space at Build, discussing the company's AI strategy around the Frontier Intelligence Platform, MAI models, AI ROI, and the harness concept for enterprise AI.

  • Nadella positions Microsoft as a 'Frontier Intelligence Platform' where customers gain more value by building on multi-model harnesses and context layers.
  • MAI models focus on clean data lineage and a hill-climbing scaffold to enable small models to achieve frontier-level performance.
In-site article

GitHub's plan for Agents — Kyle Daigle, GitHub

GitHub COO Kyle Daigle discusses how AI agents are reshaping software development, from infrastructure strain to the future of Copilot. AI-driven code growth of 1400% stresses GitHub's CI/CD, open source maintenance, and code review. Daigle shares his internal use of AI for retrospectives, communication, and decision-making, and outlines Copilot's evolution from completion to cloud agents.

  • AI agents have increased GitHub code commits by 1400%, straining infrastructure.
  • GitHub COO Kyle Daigle uses AI for internal retrospectives and decision-making, emphasizing micro-skills over mega-skills.
In-site article

[AINews] NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark

NVIDIA launched Cosmos 3 (unified multimodal world model), Nemotron 3 Ultra (efficient 550B LLM), and RTX Spark (personal AI superchip). Also covered: MiniMax M3, Qwen3.7-Plus, JetBrains Mellum2, agent ecosystems, and infrastructure updates.

  • NVIDIA's Cosmos 3 uses a Mixture-of-Transformers architecture to unify language, image, video, audio, and action. Nemotron 3 Ultra is a 550B open-weight LLM claiming US SOTA with fast inference. RTX Spark is a personal AI computer with Grace+Blackwell at 1 petaflop FP4.
  • MiniMax M3 launched as an open-weight multimodal agent model with 1M context and strong coding benchmarks. Qwen3.7-Plus from Alibaba is a hybrid agent unifying GUI/CLI. JetBrains Mellum2 is a 12B MoE for ultra-low-latency developer workflows.
In-site article

Why Video Agent models are next — Ethan He, xAI Grok Imagine

Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and why Grok Imagine is so underrated.

  • Grok Imagine was built from scratch in 3 months at xAI by a small team.
  • Video models primarily get intelligence from LLMs, not just video data.
In-site article

[AINews] Founders and Forward Deployed Engineers

While most digest yesterday's major Anthropic news, we highlight AIE's new Forward Deployed Engineer track and Founders program, along with AI news from May 28-29. Key topics include: Claude Opus 4.8 rollout with mixed benchmarks, multi-turn RL tokenization bugs, open model and toolchain progress, Google/OpenAI product expansions, and interesting research papers.

  • Claude Opus 4.8 brings incremental improvements but no benchmark sweep; pricing remains a pain point.
  • Multi-turn RL training tokenization bug identified, requiring 'Token-In, Token-Out' discipline.
In-site article

Anthropic raises $965B Series H, releases Opus 4.8 and Dynamic Workflows/ultracode

Anthropic raises $65B in Series H at $965B post-money valuation and reports $47B run-rate revenue, while releasing Claude Opus 4.8 with improved judgment and honesty, and launching Dynamic Workflows for parallel multi-agent tasks in Claude Code.

  • Anthropic raised $65B at $965B valuation, led by Altimeter, Dragoneer, Greenoaks, and Sequoia
  • Opus 4.8 delivers sharper judgment, more honesty, and efficiency gains, beating GPT-5.5 on several benchmarks
In-site article

All sources