AI News HubLIVE
站内改写6 min read

Turn a $3M AI bill into $1.9M

Flowstate is an intelligent proxy that routes AI requests to the most cost-effective model and attributes spending to projects, potentially reducing AI bills by up to 42%. The article explains the two main leaks inflating AI costs: default flagship model usage and lack of spend attribution.

SourceHacker News AIAuthor: speckx

Right now, someone on your team is using the most expensive AI model to edit a slide deck. They didn’t choose it; it’s just the default. Repeat that invisible choice a few thousand times a day, and your AI bill quickly starts to resemble payroll.

Two things are inflating that number. The default model is wrong for the task: flagship prices for work a cheaper one would nail. And the task is invisible on the invoice, a single lump sum with no way to tell which project or which model it went on.

Flowstate sits in the request path to close both leaks. We route each prompt to the model the task actually needs, and tie every dollar to the work it paid for. Nobody ships less: the same output that ran you $3m now costs $1.9m, and for the first time you can see which work the money actually bought.

You’re paying Opus prices for Sonnet work

Almost nobody picks a model. They use whatever’s selected when the box loads, and the default is the flagship: the most expensive model on offer. That’s the right call for a genuinely hard problem and pure waste on a one-line email. You can’t expect a marketer to know their default chat window costs five times more than necessary. The price isn’t on the screen, and the vendor has no incentive to show it.

So don’t make them learn it. The task should pick the model, not the person typing, and that decision belongs at the request layer rather than in anyone’s head. A summary or a reformat goes to Haiku, everyday coding and drafting to Sonnet, the genuinely hard reasoning to Opus. You can even split a single job, planning it in Opus and running the execution on Sonnet, keeping the expensive thinking for the steps that need it. The person, whatever they do all day, types the same prompt and gets the same answer. The bill is just smaller. And this isn’t only Claude Code: the same default sits in front of every chat your sales, ops and marketing people open too.

How much smaller? Peer-reviewed research, like Ding et al.’s Hybrid LLM, shows you can cut calls to the expensive model by up to 40% with no measurable drop in quality1. It’s just arithmetic on your model mix, and it works on any deployment you legitimately run.

This is the lever that grows with usage: the harder your team leans on AI, the more a wrong-model default costs you, and the more routing hands back. In the calculator below it’s the gap between your bold line and the green one.

The bill you can’t see

It’s day one. An engineer joins a company, gets handed an Enterprise Claude account, and burns $145 in his first five prompts. On a flat-rate plan that usage would have stretched all week; on a metered Enterprise plan, it’s gone before lunch. HR is already asking questions he can’t answer, and he’s doing the maths on a $5,000 month: “more than my salary.” Where the usage page should show a limit, it shows one word: Unlimited. That’s a real post from r/ClaudeCode, and it’s the second leak in a single screenshot.

The first leak was the model nobody chose. This is the other: the meter nobody’s watching. As of this year, Enterprise charges for every token your team spends in chat, Claude Code and Cowork, at standard API rates on top of the seat2. (Teams keeps a flat seat with an included allowance instead.) Metered pricing is cheap for a light team but runs away from you at scale, and because it lands as one undifferentiated invoice, nobody catches the spike until finance raises a flag. You can’t route what you can’t see, and you can’t choose between two deployments you’ve never compared. So compare them:

Pick your door. Size your team. Drag the usage.

Annual AI spend for your chosen deployment (bold), the same deployment with Flowstate routing, and the other doors for comparison. The gap to the green line is what routing saves you; the gap to the cheapest dashed line is what the door costs you.

Claude for Enterprise, today

$3.26m/yr

your selection, no routing

+ Flowstate routing

$1.89m/yr

saves $1.38m (42%)

Cheapest door: Teams (Premium)

$2.35m/yr

$915k less than your door

Claude for Enterprise — todayClaude for Enterprise + FlowstateClaude for Teams (Premium)

Deployment strategy

Team members (seats)

Tokens / team member / month375M

Watch what happens as you drag it. At low usage the two doors barely differ, and Enterprise is actually the cheaper one, which is why none of this matters for a light team. Push the usage up and the metered line runs away. Routing pulls a third to a half straight back off it, and at the top end even moving to a flat Teams seat starts to win. But you can only make either move once you can see the bill clearly enough to compare, and most teams can’t.

Which projects actually paid for themselves

Routing fixes what you pay per task. The harder question is what you bought with it, and you can’t read that off the invoice at all. Cost is only the half people argue about. Attribution is the half that quietly costs more.

When someone spends $300 of Opus this month, the question isn’t which model, it’s which project. If you can’t answer that, every dollar lands in the same undifferentiated OpEx bucket and gets expensed the moment it’s spent. Finance sees a charge from Anthropic and a number, can’t tie it to a person or a piece of work, and so can do nothing with it but watch it grow. It’s a second payroll with no cost centres.

A bill without context is just a bill, a number that went up. With context it turns into a map. You can see that the team building the new billing flow is burning $40k of model time a month, while an experiment nobody signed off on is burning $60k. You can see which features cost more to ship than they’ll ever earn back, and which cheap ones are quietly carrying the roadmap. That isn’t cost-cutting; it’s knowing where your leverage is, which work to feed and which to starve. Attributed spend stops being the number finance dreads and becomes the sharpest read you’ve got on where value is actually being made.

And it changes the accounting, not just the reporting. AI spend that goes into building new software can be capitalised and amortised over its useful life, just like traditional software development under IAS 38 or ASC 350-403. The blocker was never the accounting rules; it was the lack of attribution. You can’t capitalise what you can’t attribute, and the provider’s invoice attributes nothing. Flowstate ties every call to a person, a project, a model and a cost class, so the work building real value stops hiding in OpEx.

And the more of your work qualifies, the bigger this gets. If 70% of your development effort is genuinely building new product (and for a lot of teams it is), attribution shifts the bulk of that AI spend off this quarter’s P&L and onto the balance sheet, to be amortised over the years the software earns. On a seven-figure AI bill that isn’t housekeeping; it’s the difference between a margin hit now and an asset you recoup later. (Whether a given project qualifies is a judgement for your finance and audit team, not a blog post.)

Where we fit

Flowstate is an intelligent proxy: think Zscaler, but for AI traffic. We don’t pool accounts and we don’t hold your contracts; you keep your own keys and your own deal with every provider your team uses. We sit in the request path and do three things to each call as it passes through: route it to the model the task actually needs, inspect it for the things that should never leave (source code, customer PII heading somewhere it shouldn’t), and log it against a person, a project and a cost class. That’s the visibility Enterprise charges a premium for, without the premium, and without handing anyone your contracts.

Because we’re a proxy and not an account pool, where you sit on a provider’s terms stays your decision, made with the whole picture in front of you instead of in the dark. You can see what each deployment really costs, route the spend down, and turn the taps up or down per team based on how much risk you’re willing to carry. The two leaks above are the same machine doing two jobs: sending each request to the right model, and making the deployment you’ve chosen legible enough to manage.

A couple of caveats, plainly. Flowstate makes a deployment observable and controllable; it doesn’t rewrite your contract. If you need a BAA, data residency or a contractual no-training clause, that’s the Enterprise door, and there our job is the routing and the ledger: keeping the metered bill from running away from you. And the whole thing is a heavy-usage story: for a light team the metered bill never gets near the point where any of this pays for itself, as the calculator shows the moment you drag the usage down.

For years, the trade looked binary: let people reach for whatever model is in front of them and eat the bill, or lock the whole thing down and police every prompt by hand.

This shouldn’t be a binary choice between eating the bill and grinding your team to a halt with usage limits. Route the task, and you stop paying Opus prices for Sonnet work. Attribute the spend, and AI stops being an undifferentiated margin hit. You just need a proxy in the middle to give you the controls.4

Footnotes

Ding et al., Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing, ICLR 2024, reports up to 40% fewer calls to the large model with no drop in response quality. Ong et al., RouteLLM: Learning to Route LLMs with Preference Data, reports cost reductions of over 2× on parts of its benchmark without compromising quality. Vendor routers advertise higher (40–70%); I’ve modelled to the peer-reviewed figure. arxiv.org/abs/2404.14618 · arxiv.org/abs/2406.18665 ↩

Anthropic decoupled Enterprise seat fees from token usage in 2026: per the Claude Help Center, “Usage isn’t included in the seat fee… Every token your team uses — in chat, Claude Code, or Cowork — is billed at standard API rates on top of your seat cost.” Per-seat session limits differ by tier, with a Team Premium seat about 6.25× Pro’s per-session limit. Published base seats are $20 (Team Standard and Enterprise) and $100 (Team Premium); true Enterprise pricing is sales-negotiated. support.claude.com/en/articles/9797531 · support.claude.com/en/articles/9266767 · claude.com/pricing ↩

IAS 38 Intangible Assets (development phase) and ASC 350-40 (internal-use software) govern when development cost may be capitalised rather than expensed. Eligibility is a matter of judgement and audit, not assertion; nothing here is accounting advice. ↩

I co-founded Flowstate, so the obvious conflict-of-interest disclosure applies. The routing figures above are modelled from public pricing and the cited research, not from a customer account. ↩