AI News HubLIVE
站内改写

The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

The article explores the shift from tightly coupled local developer workflows to asynchronous background agents in AI coding, highlighting the December 2025 model inflection that made spec-to-PR workflows practical, and delving into the architecture, security, testing, memory, and multi-agent orchestration behind Devin and OpenInspect.

Article intelligence

EngineersAdvanced

Key points

  • Background agents are becoming mainstream; Devin's merged PR share grew from 16% to 80% on Cognition repos.
  • The December 2025 model upgrades (Opus 4.5/GPT 5.2) enabled agents to autonomously go from specification to a complete pull request.
  • Devin separates brain from machine using full VMs for security and real application testing.
  • Memory management, multi-agent orchestration, and preventing 'vibe coding' codebase degradation remain key challenges.

Why it matters

This matters because background agents are becoming mainstream; Devin's merged PR share grew from 16% to 80% on Cognition repos.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

The new AIEWF website is live! CFPs close in 2 days and we will run our first New Engineer Orientation this weekend, get your tickets booked ASAP as they -will- sell out. Take the AI Engineering Survey and get >$2k in credits and free AIE WF tickets!

One of the central tensions in the agents industry is that even while there are major decacorn agent labs like Sierra, Decagon, Notion and Cursor being built up, it is also true that it has never been easier to DIY agents, with a plethora of agent frameworks like LangGraph and Pydantic and Flue, and managed agents from Anthropic and Gemini and Amazon. There has been a wave of companies building their own background agents from Shopify to Stripe to Paradigm to Razorpay, and even Cognition’s friends Ramp have built their own coding agent with other friend Modal.

You’d think Cognition might feel a bit threatened, but they’re not - even after all this, they were way oversubscribed for the $1B Series D they just announced:

@Lux_Capital, @generalcatalyst, and @8vc.\n\nOur enterprise usage has grown >10x since the start of this year, and our run-rate revenue grew to $492 M.\n\nWe launched Devin two years ago as the first AI software engineer. Since ","username":"cognition","name":"Cognition","profile_image_url":"https://pbs.substack.com/profile_images/1765909640364068865/MvH-m0gd_normal.jpg","date":"2026-05-27T15:39:26.000Z","photos":[{"img_url":"https://pbs.substack.com/media/HJViewebAAE1uVB.jpg","link_url":"https://t.co/k99LLLyWhZ"}],"quoted_tweet":{},"reply_count":157,"retweet_count":194,"like_count":2372,"impression_count":733289,"expanded_url":null,"video_url":null,"belowTheFold":false}" data-component-name="Twitter2ToDOM">

Walden Yan, coiner of context engineering and Chief Product Officer/Cofounder of Cognition, invited OpenInspect’s Cole Murray to talk about why the Devin is in the Details.

Full conversation live on the pod today:

In retrospect, async agents were the most AGI pilled bet you could make in 2024 - the models weren’t good enough yet to vibecode, and people didn’t trust AI enough to let it rip, nobody (including early Cognition) was sure about the form factors.

Now it is obvious:

The first wave of AI coding tools made the developer faster but remain heavily in the loop. Copilor and Cursor’s tab autocomplete are prime examples However, the workflow was still heavily centered around and bottlenecked by the developer’s local workflow: a developer in an IDE, watching the model, accepting or rejecting changes, and pushing code one interaction at a time.

The second wave was local agents: Claude Code, Windsurf, Cursor’s agents pane: first one and increasingly many terminals all running concurrently.

The current Age of Async Agents points to a different future focused more on agent orchestration which drives end-to-end development.

According to previous guest Steve Yegge, there are finer-grained 8 levels to agent adoption, but we have collapsed it into three.

As Cursor’s Michael Truell put it in The third era of AI software development:

Cursor is no longer primarily about writing code. It is about helping developers build the factory that creates their software. This factory is made up of fleets of agents that they interact with as teammates: providing initial direction, equipping them with the tools to work independently, and reviewing their work.

The agent should not sit solely inside the developer’s flow. It should be setup to work in the background so that you can give it a task, a repo, a machine, a shell, a browser, tests, memory, and review loops to go do the work somewhere else.

In less than a year, the sentiment has shifted from avoiding multi-agent systems:

to suggesting approaches that actually work:

From coining “context engineering” to building the infrastructure behind Devin’s 7x PR growth and jump from 16% to 80% of commits across Cognition repos, Walden Yan has had a front-row seat to the background-agent shift. In this episode, Cognition co-founder and CPO Walden Yan joins swyx alongside Cole Murray, creator of OpenInspect, to unpack why everyone is building their own Devin, what changed after the December 2025 model inflection, and why “spec to pull request” is now becoming a real production workflow.

We go deep on the architecture of background agents: harness-in-the-box vs out-of-the-box, why Devin separates the “brain” from the machine, why repo setup is still one of the hardest problems, why Docker is not always enough, and how full VMs, snapshots, scoped secrets, GitHub bots, Slack integrations, and video-based testing all fit together. Walden and Cole also dig into memory, MCP limitations, multi-agent orchestration, AI code review, SRE auto-triage, PMs shipping code from Slack, Windsurf 2.0, hybrid frontier/sub-frontier systems, and the real failure mode of uncontrolled vibe coding: your codebase regressing to your worst engineer.

And as agents eat software… and software eats the world… you can draw the conclusion on what is next:

We discuss:

Why the engineering world is waking up to background agents and cloud agents

The December 2025 model inflection that made spec-to-PR workflows practical

Devin’s 7x merged PR growth and rise from 16% to 80% of commits

Why Cole built OpenInspect as an open-source background-agent system

The economics of $20/seat agent products and why monetization is tricky

What Cognition actually sells beyond Devin: infra, onboarding, integrations, and adoption

Harness in the box vs out of the box, and why architecture matters

Why Devin separates the brain from the machine for security and permissions

Repo setup, scoped secrets, Docker Compose, and agent-ready dev environments

Why full VMs matter when agents need to run real applications and test them

Android, macOS, Windows, nested virtualization, and machine-specific agent work

Why testing is much harder than “computer use”

Screenshots, video verification, and the “I know it works” merge moment

GitHub UX, Devin Review, AI reviewers, and agents responding to PR comments

Why MCP alone is not enough for first-class Slack and enterprise integrations

Memory, Knowledge, skills, Claude.md, and why retrieval is still unsolved

Devin’s auto-generated memories and the challenge of memory pruning

Always-on agents as permanent PMs for issues, tickets, and product areas

Sub-agents, meta-Devin management, and what multi-agent systems actually add

Why pure auto-merge vibe coding breaks down after about two weeks

AI code smells, lint rules, reward hacking, and Semgrep for agent-written code

GitAI, inline context, and preserving the “why” behind code changes

Local testing, mock servers, older codebases, and preparing companies for agents

Windsurf 2.0 and the handoff between local foreground agents and cloud background agents

SRE auto-triage, support workflows, and agents as first responders

PMs, marketing, and non-engineers creating pull requests from Slack

AI agent budgets, $1k-$5k per engineer spend, and hybrid frontier/sub-frontier systems

The rise of autonomous coding factories and who Cognition is hiring

Walden Yan

X: https://x.com/walden_yan

LinkedIn: https://www.linkedin.com/in/waldenyan/

Cole Murray

X: https://x.com/_colemurray

LinkedIn: https://www.linkedin.com/in/colemurray/

OpenInspect / Background Agents: https://github.com/ColeMurray/background-agents

Timestamps

00:00:00 Introduction 00:00:43 Why Everyone Is Building Their Own Devin 00:01:57 Devin’s 2025 Ramp: 7x PR Growth and 80% of Commits 00:03:49 OpenInspect and the Rise of Open-Source Background Agents 00:07:59 What Cognition Actually Sells Beyond Devin 00:09:56 Background Agent Architecture: Harness In vs Out of the Box 00:12:08 Separating the Brain from the Machine 00:14:07 Repo Setup, Secrets, Docker, and Full VMs 00:19:13 Why Testing Is Harder Than Computer Use 00:22:40 Video Verification and the “I Know It Works” Merge Moment 00:23:19 GitHub UX, Devin Review, and AI Code Review 00:25:42 MCP, Slack, and Enterprise Agent Integrations 00:28:59 Memory, Knowledge, and Always-On Agents 00:36:16 Sub-Agents, Multi-Agent Orchestration, and Meta-Devin 00:43:55 Vibe Coding, Auto-Merge, and Codebase Decay 00:48:38 Agent Infra, VPCs, Cloud Providers, and Fast VM Restore 00:52:25 AI Code Smells, Reward Hacking, and Code Review Systems 00:56:10 Making Codebases Agent-Ready 00:58:30 Windsurf 2.0 and the Local-to-Cloud Agent Handoff 01:01:15 SRE Auto-Triage, PMs Shipping Code, and Agent Use Cases 01:04:32 Agent Budgets, Hybrid Models, and Autonomous Coding Factories 01:06:51 Hiring at Cognition and OpenInspect Consulting 01:07:45 Outro

Transcript

Introduction: Walden Yan, Cole Murray, and Context Engineering

Swyx [00:00:00]: All right, we’re in the studio with Walden Yan, co-founder of Cognition, CPO.

Walden [00:00:08]: Happy to be here.

Swyx [00:00:09]: Which is a cool title. And coiner of context engineering.

Walden [00:00:15]: Although I think there are many people who’d used the terms in various ways beforehand, but I did find that people, both internally and externally, enjoyed the upgrade from prompt engineering or model wrapping into maybe a more thoughtful way to build agents.

Swyx [00:00:33]: For those who haven’t caught up on that, I have on screen the Don’t Build Multi-Agents post, which you should go read on and we might refer to, and Cole Murray, who created OpenInspect.

Cole [00:00:43]: Great to be here.

Swyx [00:00:43]: So let’s talk about it. Everyone is building their own Devins. What’s going on?

The December Shift: From Handholding Models to Autonomous PRs

Cole [00:00:51]: So I think the engineering world is waking up to this idea of background agents, cloud agents, whatever you’d like to call it. And I think we saw a shift around the December timeframe of 2025, where the models Opus 4.5 and GPT 5.2, they reached a capability where we moved away from handholding the model and being able to actually more or less autonomously drive the model. And what I mean by that is that we could pretty much go from a specification to a completed pull request, assuming the spec was good enough, with very little friction. And that paradigm alone, I think, changed a lot of how we interact with agents, and opened this world where background agents became more practical.

Swyx [00:01:41]: I think for Cole, everyone experienced this in December, but I feel like there was just this increasing ramp, right? There was this moment which was, I think, Sonnet 3.7, where, You guys rewrote Devin in one night or something. So describe 2025 or how it felt from your side.

Walden [00:02:01]: In retrospect, we always thought it was ramping up, but then even now, over the last three, four months from today, it’s been ramping up even faster. So it’s almost funny to be talking about how, big of a leap Sonnet 3.7 was, and honestly, a lot of it was stripping out parts of Devin that were no longer needed with that jump in of intelligence. But I also just think that a lot of the recent leaps, especially, you look at, models like Opus and the latest GPT models, they are reaching levels of autonomy where people are actually finding that they actually can just be hands-off. And people who were once debating, “Oh, do I need to be in the weeds with my model in the IDE? Can I just completely move it off into the cloud?” That’s a more serious conversation, and we’ve seen that in all of our growth charts. Internally there’s this funny graph where our usage has, of PRs, our merged PRs, has grown 7X since I forget what it was called.

Swyx [00:02:57]: I think Dev, maybe tweeted that. Yes.

Walden [00:03:01]: it grew like 7X over, the last, I think it was, two months, three months, something like that. And then you see our engineering headcount growth. It’s, gone up by, 10% or something.

Swyx [00:03:11]: We were, we were afraid To release this. So this is Devin commit percentages on all Devin repos, was 16% in January

[truncated for AI cost control]