Fable 5 just set a new AI freelance work performance record - but it can't replace humans yet
Anthropic's Fable 5 model scored 16.1% on the Remote Labor Index, doubling the previous record, but still far from replacing human freelancers.
Follow ZDNET: Add us as a preferred source on Google.ZDNET's key takeaways Fable 5 accelerates AI's success rate on remote tasks to 16%. AI capabilities remain all over the map.Still, agent skills have "quadrupled in under eight months," said CAIS. After a brief hiatus, Anthropic's lauded Fable 5 model is back, and it's resetting the bar for automating work. The US government re-authorized the model -- which Anthropic said shares capability similarities with Mythos 5, still only available for select organizations' use -- on June 30. But before it was pulled, the Center for AI Safety (CAIS) tested Fable 5 on its Remote Labor Index (RLI), released in October 2025. It blew Anthropic's Opus 4.8 and OpenAI's GPT-5.5, each relatively new and considered impressive, out of the water.Also: How to beat the AI algorithm and get the job of your dreams RLI measures "how often AI agents can complete real, economically valuable freelance projects [...] at a quality a paying client would actually accept," CAIS explained in the study. These can include computer-assisted and graphic design, data analysis, video work, and more. As in other similar human ability tests, each deliverable the models create is evaluated by humans against a professional standard deliverable. The resulting automation rate reflects the distribution of projects where evaluators found what the AI produced to be as good as or better than human professional work. CAIS asked Fable 5, GPT-5.5, and Opus 4.8 to design a 3D mockup of an engagement ring, create a video ad, and map a floor plan, among other tests. Researchers gave each model human-generated input files to get started, similarly to how you'd prep a human freelancer with relevant documents and information for a job. Also: Anthropic's Mythos is evolving faster than expected, reports AI safety agencyFable 5 hit an automation rate of 16.1%, a record for the benchmark -- and double Opus 4.8, which scored 8.3%. GPT‑5.5 came in third at 6.3%, but CAIS noted that all three models scored higher than every model it's evaluated thus far. "For context, the previous published leader sat at 4.17% (Opus 4.6 with the Claude Cowork scaffold), and the field topped out at 2.5% when RLI was released," CAIS said. "The frontier has more than quadrupled in under eight months, a concrete signal of how quickly economically capable AI agents are advancing." CAIS noted that its testing was cut short by the government shutting down Fable 5 in mid-June, but that even these partial results set the model apart. "Even under the worst-case assumption that Fable 5 failed every missing project, its automation rate would still be 14.6%, higher than any other model," the researchers said. What this means for freelancersWhile the rate of AI model acceleration is significant in just a few months, that doesn't automatically translate to freelance job replacement or loss across the board. Sixteen percent isn't anywhere close to 100% yet. Beyond that, despite demonstrable gains, AI isn't a flawlessly appealing solve for every organization; security concerns and other adoption roadblocks often make integrating AI tools slow, multi-step processes for most companies, at least to start. In order to fully replace human freelancers, organizations would likely need a network of agents to check elements like work quality, budget, and timeline; the tradeoff isn't one-to-one. Also: I had Gemini and Claude write my email replies - but only one sounds like meCAIS tried to replace the human evaluator with an "LLM judge," ostensibly to see how far away from human-in-the-loop this experiment could reasonably get, but the model failed. "Evaluating an RLI deliverable is itself a demanding, agentic task," CAIS explained. "Doing it properly means opening the project's files in the right professional applications, operating those applications competently, and forming a judgment the way a client would, the very computer-use skills that today's agents are still weakest at." Also: How I set OpenAI API usage limits to stop agent overspending and other AI billing nightmaresThat said, improving abilities could shrink some freelance opportunities for specific companies already successfully integrating AI. In addition, if computer-use skills are the current limitation and poised to improve based on the industry's investment in increasingly agentic models, that roadblock could eventually disappear. At the rate models have been improving on other benchmarks that measure agentic skill, that may arrive sooner than we can imagine. Speaking of time: CAIS also found that when a task takes longer for a human, that doesn't necessarily mean it will be harder for AI to complete. That time-horizon analysis holds true for coding, for example, but not the broader array of remote tasks RLI measures for. Right now, it's hard to draw conclusions from that for the future. "Some work that is quick for a skilled professional stays out of reach [for AI], such as transcribing music or playtesting a real-time game, while other work that would take a person hours, such as digital art or coding, is finished by current models in minutes," CAIS wrote.