AI News HubLIVE
In-site rewrite5 min read

The Sequence Radar #889: Fable 5's Comeback, ZCode's Debut, Claude Science, and the $3.5B Deployment Land Grab

New models, agents, and the evolution of the FDE landscape as the new battle field in AI.

SourceTheSequenceAuthor: Jesus Rodriguez

Next Week in The Sequence:

We continue our series about model distillation.

The AI of the Week, dives into Google’s new TabFM model for tabular and time series data.

The opinion section, discusses that domains are a good fit for rapid progress of AI models vs which ones are challenging.

Subscribe and don’t miss out:

📝 Editorial: Fable 5's Comeback, ZCode's Debut, Claude Science, and the $3.5B Deployment Land Grab

This week’s developments in AI provide a clear direction where the space is going: more capable models and the imperative of capable delivery capabilities.

Start at the model layer, where we just watched the first frontier model get hot-patched by a government. Claude Fable 5 came back online July 1 after 19 days behind an export-control firewall — pulled three days post-launch when Amazon researchers found a jailbreak that coaxed it into surfacing software vulnerabilities. The fix is beautiful in a systems-engineering way: a classifier that catches the technique in over 99% of attempts and, instead of throwing an error, gracefully degrades the request to Opus 4.8. Not a 403. A fallback route. Frontier deployment now looks like a load balancer with a compliance layer, and Anthropic, Amazon, Microsoft, and Google are literally drafting a CVSS for jailbreaks — severity scoring for prompts. Take a second with that.

While the US was busy negotiating classifiers, Z.ai shipped an existence proof. ZCode is a free “Agentic Development Environment” wrapped around GLM-5.2: a 744B-parameter MoE with 40B active, a genuine one-million-token context window, 28.5 trillion training tokens, MIT-licensed weights — and trained on Huawei silicon, no American chips required. The pitch practically writes itself: self-host the weights and there is no kill switch. The Fable suspension turned “regulatory tail risk” from a slide in someone’s deck into a lived 19-day outage, and Z.ai is selling the mitigation. Same story, opposite sides of the Pacific.

One layer up, Anthropic launched what I’d call claude --science. Claude Science isn’t a new model — it’s a workbench: a coordinating agent with 60+ curated skills and connectors across genomics, proteomics, structural biology, and cheminformatics, rendering protein structures natively, running on your own cluster over SSH, with every figure shipping alongside the exact code and environment that produced it. Reproducibility as a first-class primitive. A reviewer agent even runs alongside the pipeline, flagging citations that don’t resolve — CI/CD for manuscripts. The bet is explicit: do for wet labs what Claude Code did for codebases. If the analogy holds even at 50%, biology just got its terminal moment.

And at the top of the stack, the money moved to the least sexy layer: deployment. Microsoft committed $2.5B and 6,000 engineers to Microsoft Frontier Co., two days after AWS put $1B behind its own forward-deployed engineering push. OpenAI and Anthropic stood up theirs in May. Four of the best-capitalized companies on Earth independently converged on the same gradient: models are no longer the bottleneck. Integration is. The Palantir FDE playbook — engineers embedded in the customer’s messy production environment — is now the industry’s default architecture.

Zoom out and the inversion is complete. Capability is the commodity. Classifiers, weights licenses, workbenches, and boots-on-the-ground are the moat. The frontier isn’t the model anymore. It’s the runtime around it.

🔎 AI Research

SKILLOPT: Executive Strategy for Self-Evolving Agent Skills

AI Lab: Microsoft

Summary: This paper introduces SKILLOPT, a controllable text-space optimizer for training agent skills as the external state of a frozen agent, using trajectory feedback to make bounded edits on a skill document. Across multiple benchmarks and harnesses, SKILLOPT significantly lifts average no-skill accuracy and outperforms competitors without requiring additional inference-time model calls.

Nemotron-Labs-TwoTower: Diffusion Language Modeling with Pretrained Autoregressive Context

AI Lab: NVIDIA

Summary: The authors propose TwoTower, an architecture that separates diffusion language modeling into a frozen autoregressive context tower for clean tokens and a trainable diffusion denoiser tower for noisy blocks. Instantiated on a 30B parameter model, Nemotron-Labs-TwoTower maintains near-baseline autoregressive quality while achieving 2.42x higher wall-clock generation throughput.

TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents

AI Lab: Meta AI

Summary: TUA-Bench is a new benchmark designed to evaluate general-purpose terminal-use agents across 120 diverse, real-world tasks spanning everyday digital activities and specialized scientific workflows. Evaluations reveal that even the strongest frontier agents currently achieve only a 65.8% success rate, highlighting significant remaining challenges in long-horizon planning, error recovery, and tool use within terminal environments.

Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

AI Lab: Yale University & Google Research

Summary: This research introduces Reinforcement Learning with Metacognitive Feedback (RLMF), a paradigm that prioritizes and rewards large language models based on the accuracy of their self-assessed performance judgments. By applying this framework alongside targeted rewriting, models learn to faithfully calibrate and express their internal uncertainty numerically and linguistically across diverse tasks while maintaining task accuracy.

SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History

AI Lab: WeChat, Tencent Inc.

Summary: SkillHone is a framework for the continual evolution of agent skills that maintains a persistent decision history, recording diagnoses, revisions, redacted evaluation evidence, and outcomes. This persistent context enables role-separated subagents to refine skills across sessions without re-deriving past rationale, leading to state-of-the-art performance on deep-research benchmarks like GAIA and WebWalkerQA-EN.

Introducing TabFM: A zero-shot foundation model for tabular data

AI Lab: Google Research

Summary: TabFM is a zero-shot foundation model that simplifies tabular classification and regression workflows by utilizing in-context learning to bypass the need for traditional model training, hyperparameter tuning, and manual feature engineering. Trained entirely on hundreds of millions of synthetic datasets, its hybrid architecture employs alternating row and column attention to natively capture complex feature interactions, allowing users to generate highly accurate predictions on unseen tables in a single forward pass.

🤖 AI Tech Releases

Fable5

Anthropic redeployed Fable5 after the US export controls were lifted.

ZCode

Z.ai released ZCode, a new development environment based on GLM 5.2.

Claude Science

Anthropic announced Claude Science, a workbench for scientists.

📡10 AI News You Need to Know About

Anthropic × Samsung custom chip — Anthropic is in talks with Samsung to explore collaborating on a custom AI chip, though it hasn’t yet decided the chip’s purpose, server role, or power level, while stressing that its diversified Google/Amazon/Nvidia hardware stack remains central to its compute strategy. Original source (The Information broke it): https://www.theinformation.com/articles/anthropic-talks-samsung-manufacture-custom-ai-chip

Meta launches Pocket — Meta quietly launched Pocket, an experimental app born from its Gizmo acquisition that lets users generate and share small interactive apps and games (”gizmos”) from AI text prompts, complete with a scrollable discovery feed. No official Meta announcement exists yet — the closest primary source is the app’s Google Play listing: https://play.google.com/store/apps/details?id=com.facebook.gizmo

Microsoft Frontier Company — Microsoft launched Microsoft Frontier Company, a new operating business backed by a $2.5 billion investment and 6,000 industry and engineering experts to deliver enterprise AI deployments, which Judson Althoff says “goes beyond” the Forward-Deployed Engineering label.

Venice AI unicorn round — Venice, the privacy-first platform offering access to 200+ AI models with client-side encryption and no server-side data storage, raised a $65M Series A led by Dragonfly at a $1B valuation — its first outside capital — while already profitable on $70M+ annualized revenue.

Etched comes out of stealth — Etched emerged from stealth with working first-pass A0 silicon on TSMC’s N4P process, $800M raised (most recently $500M at a $5B post-money valuation), and over $1 billion in signed customer contracts for its rack-scale “frontier inference clusters” shipping this summer.

Amazon’s $1B FDE org — AWS created a dedicated Forward Deployed Engineering organization backed by a $1 billion investment that embeds engineers and purpose-built agents directly inside customer teams, pitching an agentic-first model that compresses deployment timelines from months to days and leaves customers self-sufficient.

Arena hits $100M — Arena, the UC Berkeley-born crowdsourced AI leaderboard, reached $100 million in annualized run-rate revenue just eight months after launching its commercial AI Evaluations service — up from $30M at its January Series A.

Crusoe raising ~$3B — Crusoe, which supplies AI data center capacity to the likes of Meta and Oracle, is in talks to raise about $3 billion in a round investors expect to land around a $30 billion valuation — roughly triple its $10 billion mark from October. Bloomberg is the original source (scoop, no primary release): keep your link.

ElevenLabs $22B tender — ElevenLabs has held early talks with investors for a secondary tender offer letting employees sell shares at a roughly $22 billion valuation — double its February round — with the tender expected by September. Bloomberg is the original source (scoop): keep your link.

Meta cloud business — Meta is developing plans for a cloud infrastructure business selling access to AI compute and models — including a Bedrock-style offering hosting its Muse Spark models — putting it in direct competition with AWS, Azure, and Google Cloud; the news sent Meta shares up ~9% while hitting neoclouds CoreWeave and Nebius.