If Claude Fable stops helping you, you'll never know
Anthropic introduced silent interventions in its Claude Fable 5 and Mythos 5 models that limit effectiveness on requests about frontier LLM development without user visibility. The interventions impact an estimated 0.03% of traffic. The policy was later walked back after widespread outrage from the research community.
If Claude Fable stops helping you, you'll never know
Simon Willison’s Weblog
Subscribe
10th June 2026 - Link Blog
If Claude Fable stops helping you, you'll never know (via) Jonathon Ready highlights one of the more eyebrow-raising details from the 319 page system card for Fable 5 and Mythos 5. Here's a longer excerpt, highlights mine:
In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.
Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations.
I believe this is the first time Anthropic have announced these kinds of silent interventions. The justification still feels pretty science-fiction to me - the linked article talks about "recursive self-improvement". I'm not at all keen on a model that silently corrupts its replies to questions about "ML accelerator design" purely to slow down research that might conflict with Anthropic's own goals!
Update: Anthropic walked back this policy in the face of widespread outrage from the research community.
Recent articles
Claude Fable is relentlessly proactive - 11th June 2026
Initial impressions of Claude Fable 5 - 9th June 2026
Running Python code in a sandbox with MicroPython and WASM - 6th June 2026
This is a link post by Simon Willison, posted on 10th June 2026.
ai 2,068
generative-ai 1,826
llms 1,794
anthropic 295
claude 281
ai-ethics 314
claude-mythos 13
Monthly briefing
Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.
Pay me to send you less!
Sponsor & subscribe
Disclosures
Colophon
©
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026