AI News HubLIVE
Original source2 min read

Better Models: Worse Tools

Armin Ronacher reports a curious problem: newer Claude models (Opus 4.8 and Sonnet 5) sometimes add invented fields to Pi's edit tool calls, causing rejection, while older models do not. He theorizes that Anthropic's reinforcement learning to optimize for Claude Code's built-in edit tool inadvertently degrades performance on third-party harnesses. This raises the question of whether frameworks like Pi should implement multiple edit tools to match model-specific optimizations.

Better Models: Worse Tools

Simon Willison’s Weblog

Subscribe

4th July 2026 - Link Blog

Better Models: Worse Tools. Armin reports on a weird problem he ran into while hacking on Pi:

The short version is that newer Claude models sometimes call Pi’s edit tool with extra, invented fields in the nested edits[] array. And not Haiku or some small model: Opus 4.8. The edit itself is usually correct but the arguments do not match the schema as the model invents made-up keys and Pi thus rejects the tool call and asks to try again.

That alone is not too surprising as models emit malformed tool calls sometimes. Particularly small ones. What surprised me is that this is getting worse with newer Anthropic models as both Opus 4.8 and Sonnet 5 show it but none of the older models. In other words, the SOTA models of the family are worse at this specific tool schema than their older siblings.

Armin theorizes that this is because more recent Anthropic models have been specifically trained (presumably via Reinforcement Learning) to better use the edit tools that are baked into Claude Code. This has the unfortunate effect that other coding harnesses, such as Pi, may find that their own custom edit tools are more likely to be used incorrectly.

Claude's edit tool uses search and replace. OpenAI's Codex uses an apply_patch mechanism instead, and OpenAI have talked in the past about how their models are trained to use that tool effectively.

Does this mean third-party coding harnesses like Pi should implement multiple edit tools just so they can use the one with the best performance for the underlying model the user has selected?

Recent articles

Have your agent record video demos of its work with shot-scraper video - 30th June 2026

Porting the Moebius 0.2B image inpainting model to run in the browser with Claude Code - 22nd June 2026

sqlite-utils 4.0rc1 adds migrations and nested transactions - 21st June 2026

This is a link post by Simon Willison, posted on 4th July 2026.

armin-ronacher 24

ai 2,102

openai 426

generative-ai 1,859

llms 1,826

anthropic 304

llm-tool-use 72

coding-agents 218

pi 5

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe

Disclosures

Colophon

©

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026