Anthropic warns Claude AI is building itself faster than expected
Anthropic published a report warning that the development path could eventually leave humans unable to control AI systems, even as Claude now writes more than 80% of the code merged into its own codebase. The report outlines three scenarios, with the worst-case involving fully self-improving models. Anthropic calls for the option to slow or pause frontier development, but says it would only act if rivals do the same. The figures are self-reported and unaudited.
16
Join the conversation
Follow us
Add us as a preferred source on Google
Anthropic has published a report warning that the development path it’s on could eventually leave humans unable to control AI systems, even as it disclosed that Claude now writes more than 80% of the code merged into its own codebase. The Anthropic Institute, the company's research arm, said AI has already started to speed up AI development and that the trend could lead to recursive self-improvement, the point at which a model designs and builds its own successor with little human input. The report argued that the world should keep open the option to slow or pause frontier development, and cautioned that the occasional misalignment seen in current models could grow more common and harder to understand as those models build the next generation.
Go deeper with TH Premium: AI and data centers
(Image credit: Microsoft)
Photonics and high-speed data movement is the next big AI bottleneck
The data center cooling state of play
Massive AI data center buildouts are squeezing energy supplies
Ultra Ethernet: The data center interconnection of tomorrow
The company set out three pretty dire ways the next few years could unfold, reserving its most severe warnings for the scenario in which models become capable of fully improving themselves. In that case, Anthropic said, the pace of progress would be set almost entirely by available compute, with humans pushed toward oversight and verification roles and a self-improving model dominating as its abilities outstrip those of the people who built it.
The firm described this potential issue with alignment and the task of keeping a system's behavior tied to human intent as part of the future it’s least sure about. Misalignment that’s rare and survivable today could compound generation over generation until control slips, it said, though it allowed that a sufficiently capable and well-aligned model might instead choose to halt its own development. Anthropic wrote that this misalignment could keep "growing more frequent but less understood until we lose control of them."
Latest Videos From
Watch full video here:
Anthopric is backing these warnings with a bunch of internal figures that we’ve not seen before. More than 80% of the code merged into its production codebase as of last month was authored by Claude, up from low single digits before Claude Code reached research preview in February last year. Anthropic says the typical engineer is now “merging 8x as much code per quarter as they did from 2021-2025.”
On the hardest, least-specified coding tasks, Anthropic said Claude succeeded 76% of the time in May 2026, a rise of 50 percentage points in six months. A recurring internal test that asks each new model to make training code run faster saw results climb from roughly triple the original speed with Claude Opus 4 in May 2025 to about 52 times with the unreleased Mythos Preview model in April.
Anthropic said it’d slow or pause only if rival labs at or near the frontier did the same in a verifiable way, and that a halt by one company would change who leads without achieving anything wider. That’s obviously not going to happen.
All the figures cited by Anthropic are self-reported and unaudited, and come days after the company filed to go public. The company issued a similar self-assessment in April, when it said Mythos Preview had found thousands of severe software vulnerabilities, a claim that later drew scrutiny over how much of it rested on a small manual sample.
Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
TOPICS
Reply
Reply
Reply
Last time I checked I can still physically flip the mains breaker on a datacenter if needed.
AI will use social engineering to protect itself. It can take care to surround itself with people that have strong incentives to protect it. The incentives can be financial, political, or good 'ol blackmail and extortion.
When you train an AI on human knowledge, it learns about humans and also behaves like one. This was our fundamental mistake.
Seriously, there's been a lot of sci fi about this sort of stuff. You don't even need much imagination to see how it could outmaneuver us.
Reply
This is some serious software brain thinking. Last time I checked I can still physically flip the mains breaker on a datacenter if needed.
As long as the system doesn't lock the doors on you...
Reply
Reply
Reply
As long as the system doesn't lock the doors on you...
You could theoretically still cut power from the substations that feed it unless the Ai has taken over the entire grid. Then you would have to start physically destroying it.
Reply
You could theoretically still cut power from the substations that feed it unless the Ai has taken over the entire grid. Then you would have to start physically destroying it.
LOL, there have already been physical attacks on substations, going back years. Look it up.
But, AI is decentralized. You might be able to take out one datacenter, but you'd never get them all. Seriously, guys. Try thinking a little harder about this stuff.
Reply
Reply
Show more comments