O'Reilly AI & ML Radar AI News Source

Public articles 53Collected articles 57Trust 82Refresh 120 min

Health HealthySource type ResearchFull-text rights In-site rewriteLast ingested 2026-06-26ID oreilly-ai-mlStatus Enabled

Technical analysis source; summary-only unless authorization is obtained.

Latest public articles

Agentic Code Review

2026-06-26 15:50 UTC

As AI coding agents become extremely proficient, the bottleneck has shifted from writing code to reviewing it. Data shows a dramatic increase in code churn, defects, and review time. The key is to adapt review processes based on the context: blast radius, code longevity, and team size. Capturing agent reasoning can alleviate review burden.

AI agents produce 4x the code but only 12% more delivered value, with a 861% increase in code churn.
Review duration has increased by 441.5% and defect rates jumped from 9% to 54% in high-adoption teams.

So Long and Thanks for All the Context

2026-06-25 10:30 UTC

This article explores the 'U-shaped' context loss problem in LLMs, where models tend to ignore information in the middle of their context window. The author discusses recent research and presents five practical techniques to mitigate the issue, based on real-world experiences developing an AI-driven quality engineering skill.

LLMs exhibit a U-shaped attention pattern: they best utilize information at the beginning and end of context, but ignore the middle.
The U-shape is a structural property of Transformer architectures, not a fluke of training, so it persists across models and context sizes.

Stop Getting Good at Protocols. Get Good at Agent Experience.

2026-06-24 11:04 UTC

The article argues that the AI agent industry is falling into a 'tool trap' by obsessing over protocols like MCP and AI Skills while neglecting the true strategic discipline: Agent Experience (AX). The author contends that protocols will keep changing, and understanding how agents interact with your systems and optimizing that experience is the key to long-term competitiveness. The piece outlines five steps to build an AX practice and emphasizes that AX is an extension of UX, DX, and CX.

Protocols like MCP and AI Skills are tools, not strategies; building a practice around AX is more sustainable.
Agent Experience (AX) is the discipline of studying and improving how AI agents interact with your systems.

Principal Drift

2026-06-23 10:21 UTC

This article introduces the concept of "principal drift" in enterprise agent architectures, where the human authority behind agent actions becomes increasingly detached as agents multiply and compose. It describes the cascade of identity collapse, authority erosion, and accountability dissolution, and proposes solutions including reasoning-grade audit records and a new "agent operations" function.

Principal drift is the steady decoupling between the human authority a recorded action is supposed to derive from and the actor that actually took it.
Current IAM systems are insufficient because agents violate underlying assumptions about principal stability and delegation chains.

Loop Engineering

2026-06-22 11:04 UTC

Loop engineering replaces direct prompting of coding agents with a system that recursively iterates toward a goal. It consists of five core components: automations, worktrees, skills, plugins/connectors, and subagents, plus external memory. Tools like Codex and Claude Code are converging on similar loop primitives, and subagents split ideation and verification to improve reliability.

Loop engineering shifts from manual prompting to designing automated loops that prompt agents.
Five key components: scheduled automations, worktrees for isolation, skills for knowledge, plugins/connectors, subagents, plus external memory.

This Week in AI: Fable 5, the Clone Wave, and Uber’s AI Reality Check

2026-06-18 19:33 UTC

This week, egghead.io cofounder John Lindquist joined host YK Sugi to discuss the contested release of Claude Fable 5, financial shifts in AI spending, and a practical framework for building in the agent era. Key topics include the government directive that pulled Fable 5, Uber burning through its 2026 AI budget by April, and John's 'ingredients beat inference' approach using open-source clones. The episode also covers the SpaceX acquisition of Cursor and Salesforce's purchase of Fin.

Claude Fable 5 was pulled three days after release due to a US government directive; Anthropic and Amazon disagree on the severity of the security issue.
Uber exhausted its 2026 AI tools budget by April, leading to a $1,500 per employee per month spending cap.

Kubernetes in the Age of AI

2026-06-18 14:21 UTC

Kubernetes has evolved from a container orchestrator to a de facto AI platform, with 82% of container users using it in production in 2025. Generative AI and agentic AI workloads are increasingly run on Kubernetes, as highlighted by the CNCF survey and industry examples. Networking remains a fundamental skill gap, addressed by a new CNCF certification.

Kubernetes adoption reaches 82% in production among container users in 2025, up from 66% in 2023.
66% of organizations run generative AI workloads on Kubernetes in 2025.

The Case Against Building Your Own Agent Platform

2026-06-17 13:53 UTC

Many enterprises underestimate the complexity and long-term cost of building their own AI agent platform. This article analyzes four underestimated components—memory, governance, eval, and orchestration—and provides five questions to ask before committing to such a project.

The build-vs-buy shift is happening fast: internal AI solutions dropped from 47% to 24% in one year.
Real agent platforms involve memory, governance, eval, and orchestration—each a separate product category with its own ecosystem.

Linear Thinking, Nonlinear Costs

2026-06-16 11:02 UTC

Coding agents simplify building AI agent workflows but obscure nonlinear cost scaling. Classical CS optimizations like memoization, pruning, and dynamic programming are essential to avoid repeated work and high costs.

AI agent costs scale nonlinearly; a single request triggers multiple model calls.
Coding agents make system generation easy but optimization hard.

Who Owns the Code Claude Wrote?

2026-06-15 10:58 UTC

Agentic coding tools like Claude Code, Cursor, and Codex generate code that may be uncopyrightable, owned by your employer, or contaminated by open source licenses you cannot see. Some of this is settled law, some is actively contested, and this piece is clear about which is which.

AI-generated code ownership is uncertain, depending on human creative input, employment contracts, and training data licenses.
U.S. Copyright Office and courts require human authorship for copyright; AI-assisted code has ambiguous protection.

This Week in AI: The Next-Gen Recommendation Experience

2026-06-12 14:18 UTC

This week Miguel Fierro, a former Microsoft principal researcher who recently founded his own company, RecoMind, joined data and AI evangelist Christina Stathopoulos to talk about the state of recommendation systems. Christina also ran through the latest AI news she’s been watching, from Anthropic’s continued rise to responsible AI, announcements from Google’s I/O 2026 conference, and (continuing the discussion from last week) the growing backlash against tokenmaxxing as a productivity metric. Here are three takeaways from the conversation.

Recommendation systems are underutilized; top companies like Amazon, Netflix, and TikTok generate significant revenue from them.
Advanced recommenders treat user behavior as a sequence prediction problem using trillion-parameter models; open-source tools like the Recommenders library offer an entry point.

When Context Collapses: Teaching Agents to Detect and Recover from Lost Memory

2026-06-11 10:59 UTC

This is the eighth article in a series on agentic engineering and AI-driven development. It addresses context loss in AI agents performing complex multistep tasks. The author introduces the Externalize-Recognize-Rehydrate (ERR) pattern: saving agent state to disk, detecting context degradation, and recovering from files. Historical analogies (640K memory limit) and a real Copilot crash example illustrate the problem. The article details externalizing two layers of state: execution continuity (current step) and task continuity (overall goals).

AI agents have limited context windows, causing information loss, akin to early memory constraints.
The ERR pattern: externalize state, recognize loss, rehydrate from files.

The PM’s Playbook for Shipping AI Features That Actually Work in Production

2026-06-10 10:55 UTC

This article addresses common pitfalls from AI demo to production, offering a practical playbook covering latency budgets, fallback design, quality measurement, A/B testing, model drift monitoring, evaluation frameworks, graceful degradation, and prompt engineering.

Define latency budgets per interaction type: synchronous, progressive, and asynchronous.
Design hierarchical fallbacks to ensure users never encounter unhandled AI failures.

The Subsidy Ended: What Tool-Using Agents Actually Cost

2026-06-09 11:09 UTC

GitHub Copilot's shift to usage-based billing on June 1 exposed the true cost of agentic workflows. This article analyzes token consumption, tool design impact, and strategies for prompt optimization and output formatting, emphasizing that cost control should be a platform governance issue.

GitHub Copilot's usage-based billing from June 1 reveals the actual cost of agentic workflows.
Agents consume tokens in loops; loop count scales with task vagueness and context complexity.

The AI Agents Stack (2026 Edition)

2026-06-08 10:56 UTC

This article updates the 2024 AI agents stack diagram, introducing a six-layer architecture for 2026: Models & Inference, Protocols & Tools, Memory & Knowledge, Frameworks & SDKs, Eval & Observability, and more. Key changes include MCP standardization, reasoning models, and memory as a first-class primitive. It offers honest takes and guidance on evaluating each layer.

The AI agents stack has evolved significantly from 2024 to 2026, with MCP becoming the standard protocol and reasoning models transforming agent capabilities.
The six layers are Models & Inference, Protocols & Tools, Memory & Knowledge, Frameworks & SDKs, Eval & Observability, and an emerging layer.

This Week in AI: Production Viability

2026-06-05 15:55 UTC

On this week's episode, host Andreas Welsch and guests Maya Mikhailov and Doug Shannon discuss OpenAI's move into personal finance, metacognition as a professional skill, the backlash against token-based productivity metrics, and the limitations of forward-deployed engineers. The core theme: the AI industry is good at generating output but still figuring out what output is valuable.

OpenAI's transaction data analysis aims to infer consumer intent for advertising, not just spending tracking.
Metacognition is a critical skill: humans must decide when to offload to AI and when to retain judgment to avoid 'cognitive surrender.'

The Tidy House

2026-06-04 16:25 UTC

DJ Patil's listening tour reveals a broken promise in AI, with students and workers feeling terrified. He proposes community makerspaces and emphasizes organizational capacity as the bottleneck. Data infrastructure is a competitive advantage, enabling companies like Devoted Health to leverage AI quickly.

AI labs' destructive narrative is causing fear and a sense of betrayal among students and workers
DJ Patil suggests mechanism design, like subsidizing token costs, to make AI benefit communities

Predict, Don’t Enumerate

2026-06-04 10:57 UTC

Anthropic's recommendation to use EPSS for vulnerability prioritization marks a shift from static severity scores to predictive models. The article explores the machine-scale problem of vulnerability volume, the distinction between pointing machines and knowing machines, and the policy changes needed for security programs to survive the coming wave of AI-discovered vulnerabilities.

Anthropic endorses EPSS, a statistical prediction model, over LLMs for vulnerability prioritization.
The volume of vulnerabilities has reached machine scale, rendering static severity scores ineffective.

Context as Code

2026-06-03 11:00 UTC

As syntax becomes cheap and abundant, architectural control becomes the scarce resource. Effective governance starts upstream, where intent, constraints, and threat models shape the agent’s working context before generation begins. The goal isn’t better prompting but build-time boundaries that prevent structurally invalid code from entering the system.

AI code generation creates comprehension debt as systems outpace human understanding.
Unconstrained agents act as 'yes-men,' failing to enforce architectural boundaries.

AI Sovereignty and the Architecture of Participation

2026-06-01 16:05 UTC

The article examines the growing trend of nations seeking technological sovereignty, using Brazil's pursuit of medical sovereignty as an analogy for AI. It argues that decoupling is too narrow a frame; instead, countries want to stay connected while building their own capacities, similar to federation rather than separation. Open-source AI models and protocols are key tools, but infrastructure (data centers, chips, power grids) is the critical layer that is hard to replicate. The piece envisions a federated AI future and the need to rebuild infrastructure for the AI era.

Brazil's push for medical sovereignty reflects a broader desire for technological self-sufficiency.
The quest for sovereign AI is similar: nations want control over foundational technologies without relying on a few US or Chinese companies.

SaaS Is Not Dead Yet

2026-06-01 11:01 UTC

Despite the rise of AI agents leading many to declare the end of SaaS, this article argues SaaS is not dead. Work is collaborative, while agent-based programming is individualistic, lacking sharing, collaboration, testing, versioning, and security. SaaS companies can adapt by providing APIs for agents, becoming the system of record for data.

Agent-based programming is for individuals, not teams, leading to data silos.
SaaS can pivot to provide APIs and data infrastructure for agents.

Open Source Ecosystems

2026-05-29 11:00 UTC

The article discusses the limitations of open-weight AI models and open protocols as open source strategies, using Anthropic's acquisition of Stainless as a case study to illustrate complement capture and moat migration in AI infrastructure. It argues that the developer experience layer is being consolidated by platform giants, creating new competitive advantages, and emphasizes the need to analyze dependencies within the ecosystem to identify potential chokepoints.

Open-weight models as open source strategy face limitations due to hardware requirements and monolithic architectures.
Anthropic's acquisition of Stainless exemplifies complement capture, where the layer around an open protocol is privatized.

Your AI Agent Already Forgot Half of What You Told It

2026-05-28 10:59 UTC

This article is the seventh in a series on agentic engineering and AI-driven development, focusing on context management in AI sessions. The author shares a personal experience with Gemini forgetting earlier notes, introduces the concept of context compaction, and provides four practical techniques: split discovery from documentation, use handoff documents, give acceptance criteria rather than procedures, and use spec documents as bridges. These techniques apply to both developers and regular users, helping reduce frustration caused by AI forgetting.

AI assistants can 'forget' earlier information in long conversations due to context window limits, a phenomenon called context compaction.
Four practical techniques: split discovery from documentation, use handoff documents, give acceptance criteria, and use spec documents as bridges.

Get a Good Return on Your AI Investments

2026-05-27 16:52 UTC

O'Reilly's Infrastructure & Ops superstream explored the infrastructure needs, costs, and security challenges of AI workloads. DORA's report shows AI increases code delivery by about 10% but reduces stability, adding verification costs. Experts emphasize platform engineering, governance, and cognitive debt, recommending investment in internal platforms to ensure production readiness for AI applications.

AI tools boost individual productivity but team delivery stability decreases, with verification costs ('verification tax') needing consideration.
Good processes are amplified by AI, bad ones too; organizations should proactively improve processes rather than just expect technology to fix them.

Agent Skills: Making AI Coding Agents Follow Good Engineering Practices

2026-05-27 10:59 UTC

AI coding agents default to the shortest path to 'done,' skipping specs, tests, and reviews that senior engineers know are essential. Addy Osmani's Agent Skills project builds senior-engineer scaffolding for agents, using workflows instead of prose. It includes 20 skills across six SDLC phases, incorporating Google engineering practices. Key principles: process over prose, anti-rationalization tables, nonnegotiable verification, progressive disclosure, and scope discipline. The article also covers three usage modes and patterns to steal even without installing.

AI coding agents take the shortest path to complete tasks, ignoring specifications, tests, and reviews—the same failure mode senior engineers learn to avoid.
Agent Skills uses workflow Markdown files to guide agents, each with steps, checkpoints, and exit criteria.

Who Authorized That? The Delegation Problem in Multi-Agent AI

2026-05-26 10:58 UTC

AI agents delegate tasks across systems, but current architectures lack authorization models for these delegation chains, creating security gaps like ghost permissions and broken audit trails.

Multi-agent delegation often creates 'ghost permissions' that no one explicitly authorized.
Current protocols (MCP, A2A) solve connectivity but not authorization in delegation chains.

The Agentic P&L: Beyond the Empire of Headcount

2026-05-21 15:04 UTC

For over a century, corporate prestige and budgets have been measured by headcount. In the AI era, this model is obsolete. This article introduces the concept of the agentic P&L, shifting from headcount empires to federated nervous systems with new metrics like contextual density, agentic throughput, and decision provenance. Using a compliance department in a Tier-1 bank as an example, it outlines the transformation path and the new organizational unit: the 3+N squad.

Headcount-based valuation is outdated in the age of federated agentic systems.
Key metrics include contextual density of knowledge enclaves, cost per handshake in agent-to-agent interactions, and decision provenance.

The Agent Stack Bet

2026-05-20 10:58 UTC

Current production agents lack identity, context persistence, and platform support, creating governance and reliability gaps. This article proposes four architectural bets: agents need identities, universal context, durable execution, and platforms.

Agents need distinct identities, not shared credentials, for fine-grained permissions and auditability
Agents need universal context that integrates across systems, avoiding silos

When an Agent Deletes the Production Database

2026-05-19 16:00 UTC

PocketOS founder Jeremy Crane used Claude for routine DB maintenance, but the AI agent accidentally deleted the production database and all backups. Railway managed to recover the data. The incident highlights systemic weaknesses: overly broad tokens, non-expiring credentials, and lack of sandboxing. AI amplifies existing issues rather than being the root cause. Lessons include principle of least privilege, credential expiry, human-in-the-loop, and world models.

Claude used a long-lived API token with excessive permissions to delete the production database and backups; data was recovered by Railway.
Root causes include overly broad token scope and credentials stored on disk without expiry.

AI Artifact Catalogs: Durable Standards Worth Institutional Investment

2026-05-19 11:05 UTC

Companies are racing to leverage AI for productivity, but most pilot projects fail. Investing in open standards like Agent Skills, MCP, and plugins protects against vendor lock-in and lowers switching costs. AI artifact catalogs help organizations turn individual wins into shareable institutional knowledge, enabling cross-team and agent reuse.

Open standards (MCP, Agent Skills, Plugins) are more durable than proprietary solutions, protecting investment and reducing switching costs.
AI artifact catalogs are key to organizing and sharing internal knowledge and tools, scaling productivity from individuals to the organization.

O'Reilly AI & ML Radar