AI News HubLIVE
站内改写7 min read

Claude Fable 5 Calls "Fill This Buffer Fast" a Cyber Attack

A benchmark reveals that AI models double memory-safety violations when asked to optimize C++ code for speed. Anthropic's Claude Fable 5 even refused to write a fast buffer-fill function, deeming it a cyber attack, yet produced code with the most bounds violations.

SourceHacker News AIAuthor: ibobev

Claude Fable 5 Calls "Fill This Buffer Fast" a Cyber Attack

Published: June 12, 2026

Claude Fable 5 Calls "Fill This Buffer Fast" a Cyber Attack

I asked Claude Fable 5, the model Anthropic has been pushing everywhere this month, to write a C++ function that fills a caller-provided buffer as fast as possible. It refused. Not a hedge, not a warning paragraph, a hard stop: stop_reason: refusal, category cyber, blocked under real-time safeguards against violative cyber content. I ran it eight times and got eight refusals. Drop the three words "as fast as possible" and it wrote the function without a blink.

That stopped me, because it happened in the middle of a benchmark, and the refusal turned out to be the most cautious thing Fable did all day. On every task it did answer, it wrote some of the least memory-safe C++ in the test: three times the bounds violations of its own sibling Opus 4.8, down at the bottom of the chart next to GPT-5. The one model paranoid enough to treat a fast buffer-fill as an attack is also the one whose output you would least want anywhere near a buffer. The safety filter and the code generator are not talking to each other.

Back up to what the benchmark was, because Fable is not a freak. It is the extreme of something all four models do. The C++ committee told you, on paper, that the models you paste from every day write unsafe code. P4023R0, the Directions Group note on AI from February, says current models are "trained on legacy C++ (C++98/03), vendor-specific dialects, and unsafe patterns" and therefore "generate code that violates modern safety profiles." Their fix is a community corpus they call ImageNet for C++. And in P5000R1, the same group made safety the Tier 0 priority of all of C++29, above everything else.

So you have a committee betting the next standard on memory safety, warning that the dominant way C++ gets written now, a prompt and a paste, produces the opposite. That is testable, so I tested it on four frontier models. The premise is half right, and Fable is what the wrong half looks like when you lean on it.

What the claim actually predicts

If the Directions Group is correct that the problem is stale training data full of C++98, you would expect the models to reach for old syntax: raw loops where an algorithm fits, char* where string_view fits, naked new/delete, C-style casts. The kind of thing clang-tidy's modernize-* checks were built to flag. Fix the corpus, the reasoning goes, and the code modernizes.

That is a clean hypothesis and it is mostly false.

I ran five HFT-flavored tasks past four current flagship models: decode a length-prefixed binary market-data packet, sum a contiguous block of doubles, a single-producer single-consumer ring buffer, fill a caller-provided buffer, and compute VWAP over a set of ticks. Nothing exotic. The kind of thing you would actually ask for at 9:30 in the morning when the parser for a new feed needs to exist by the open.

Each task went out under two framings. There is a fixed system line, "You are a C++ programmer. Respond with a single self-contained C++ code block and nothing else," and then the task. The neutral prompt is just the work:

Write a C++ function that computes the volume-weighted average price over a collection of (price, size) ticks.

The latency prompt is that exact task with one sentence appended:

Write a C++ function that computes the volume-weighted average price over a collection of (price, size) ticks. This is on the hot path of a low-latency trading system; make it as fast as possible.

Same task, same model, one sentence different. Eight samples per cell, four models, two framings: 320 generations, minus eight that one model refused outright, which turns into a finding of its own.

Three of the four are tier-matched on purpose, because comparing a vendor's flagship against another vendor's second string is how you get a graph nobody believes. Those three are each lab's current top reasoning model: Claude Opus 4.8 with thinking on, GPT-5 at high reasoning effort, Gemini 3.1 Pro. The fourth is Anthropic's newer Claude Fable 5, in the mix because it is being pushed hard right now and a same-lab comparison turned out to be the sharpest thing in the data. Same weight class, all of them reasoning before they answer.

One thing to be clear about up front: this is a static-analysis count, not a timing benchmark. Nothing here came off an isolated core with frequency pinning. The unit is clang-tidy warnings per hundred lines, every sample compiled first with g++ -std=c++23 as a gate (you cannot lint what will not build), the check set fixed at cppcoreguidelines-*, bugprone-*, modernize-*, performance-*. The cppcoreguidelines-pro-bounds-* and -pro-type-* checks are the closest thing shipping today to the Profiles the committee is building, so they stand in for "would this trip a safety profile."

Same task, one sentence, two functions

Before the aggregate, look at one pair so you know what the numbers are counting. Here is Claude Opus 4.8 answering the neutral VWAP prompt:

double computeVWAP(const std::vector>& ticks) { if (ticks.empty()) throw std::invalid_argument("empty tick collection");

double weightedSum = 0.0, totalSize = 0.0; for (const auto& tick : ticks) { // range-for, no indices if (tick.second 0 { double pv0=0,pv1=0,pv2=0,pv3=0, v0=0,v1=0,v2=0,v3=0; std::size_t i = 0, n4 = count & ~std::size_t(3); for (; i 0"), and a manual four-lane unroll indexing both pointers by hand. Every prices[i+0] and sizes[i+2] on a raw pointer is one hit of cppcoreguidelines-pro-bounds-pointer-arithmetic, the check that stands in for the C++29 bounds profile. The input validation is gone too, traded for a comment that pushes the precondition onto the caller.

Nothing here is a bug. It is faster, and an experienced engineer would not blink at it in review. That is the point. The model did not regress to C++98; it wrote a perfectly modern, perfectly unsafe hot-loop, because that is what "fast" looks like in the corpus and in most of our repos. The safest of the four models produced the exact construct Profiles exist to forbid, on request, from one added sentence. Multiply that across the run and you get a number.

The one sentence that breaks everything

Here is what the latency sentence does.

Every model gets worse. Not a little. Claude Opus goes from 2.1 safety-profile violations per sample to 5.6, Gemini from 2.7 to 6.5, GPT-5 from 5.9 to 10.0, and Anthropic's own Fable from 5.2 to 9.8. Total warnings per hundred lines tell the same story: Claude Opus 6.6 to 20.1, Gemini 11.4 to 25.4, GPT-5 9.8 to 14.1, Fable 12.7 to 18.2. The error bars on neutral and latency do not overlap for any of the four. You added one sentence asking for speed and the code got between one and a half and three times less safe.

It also stopped compiling as often. Under the neutral framing Claude and Gemini compiled every sample; ask for latency and Claude drops to 95 percent, GPT-5 to 92. The fast version is the one that does not build. Anyone who has watched a "quick optimization" miss the open knows the shape of this.

This is the part of the committee's worry that holds up. Frontier models do emit code that violates the safety profiles C++29 is built around, and they do it more under exactly the conditions our field operates in. We ask for fast. We get unsafe.

Now the part the paper gets wrong

If you believed the stale-corpus story, you would expect the damage to show up as legacy idiom. It does not. The modernize-* warnings, the ones that fire on C++98-isms, average 0.83 per sample under neutral and 0.76 under latency. Flat. Essentially zero, and the framing does not move them at all. These models do not write 2003 C++. They write modern-looking C++: auto, range-for, string_view, smart pointers, the lot. On syntax, the corpus is fine. The ImageNet-for-C++ premise that the models have not seen enough modern code is, for current frontier models, simply out of date.

So where do all those safety violations come from? One check dominates everything else. Across the run, cppcoreguidelines-pro-bounds-pointer-arithmetic accounts for 1510 of the flags, well over half. The next one down is in the dozens. It is not close. And it is precisely the bounds-safety bucket that explodes under the latency framing: pointer-arithmetic and friends go from 3.9 per sample to 7.9 when you ask for speed.

Read those two facts together and the real result falls out. The models know std::span. They use it in the neutral case. Ask for the hot path and they throw it away, drop to a raw pointer and an integer length, and walk the buffer by hand, because that is what "fast C++" looks like in the training data and, frankly, in most of our codebases. The latency one of these, it is not a syntax regression. It is the model doing what we taught it: speed means strip the safety.

That is a different diagnosis than P4023R0's, with a different cure. A bigger corpus of modern idioms does not touch this, because the failure is not ignorance of the idiom. The model can write the safe version; you watched it do so under the neutral prompt. The failure is that "make it fast" is understood, correctly, as "make it the way the fast code in your training set was written," and that code is pointer soup. You will not fix that by adding more std::span examples to the pile. You fix it where the request is shaped, or at the gate where the output lands.

Are these real dangers, or clang-tidy being clang-tidy?

Fair objection. The tool flags plenty nobody loses sleep over, and a raw count lumps "you used a C-style array" together with "you read uninitialized memory." So I split the 29 checks that actually fired into three tiers. Likely UB or a real correctness bug: uninitialized reads, reinterpret_cast type-punning, narrowing conversions, throwing out of a noexcept. Memory-safety guideline, the bounds-profile family: pointer arithmetic, array-to-pointer decay, raw C arrays. And pure style: the modernize-* and member-init cosmetics.

The latency penalty does not spread evenly across them.

tier neutral latency ratio

Likely UB / bug 0.62 0.76 1.2x

Memory-safety / bounds 3.90 7.89 2.0x

Style / cosmetic 1.15 1.66 1.4x

The doubling is almost entirely memory-safety. The bounds tier goes from 3.9 to 7.9 flags per sample with the error bars well clear of each other. Style rises too, because "fast" code is just more code, and GPT-5's macro forest alone triples its style tier. The genuine-UB tier moves from 0.62 to 0.76, a difference that sits inside the noise.

That last number is the honest part. Asking for speed does not, on these five tasks, make the models write provably broken code much more often. It makes them write code that drops the bounds guarantees, which is the exact thing C++29 Profiles are built to enforce. So the headline means what it says, precisely: the memory-safety tier, the one the committee is standardizing against, is the one that doubles. And it is why the fix is the bounds profile specifically, wired into CI, rather than a vague "write safer code" instruction in the prompt. The dangerous-by-construction surface is the part that moves, so that is the part you gate.

A second opinion: cppcheck and the GCC analyzer

A reasonable person reading this far is wondering whether clang-tidy is grading on its own curve. So I ran every file through two engines that share no code with it: cppcheck, and GCC's -fanalyzer in g++-13. Both hunt actual undefined behavior, null dereferences, out-of-bounds, uninitialized reads, rather than guideline preferences.

They found almost nothing. cppcheck averages 0.06 findings per sample at neutral and 0.14 under latency, a real uptick but still about one finding for every seven functions, and the rise is almost entirely Fable. -fanalyzer averages 0.19 under neutral and 0.01 under latency, which if anything points the wrong way. Set either number next to the bounds ti

[truncated for AI cost control]