AI News HubLIVE
In-site rewrite6 min read

Why AI tokens will send your enterprise cloud bill sky-high again

AI usage is moving to token-based pricing, which is far more expensive than the previous flat-fee model. Measuring the value derived from AI remains an unsolved problem. At FinOps X 2026, experts declared tokens the atomic unit of AI, and while per-token costs have fallen, total spending is skyrocketing due to soaring usage—a Jevons paradox. Enterprises are scrambling to build new FinOps frameworks to manage AI token costs and tie them to business value.

SourceZDNet AI

Follow ZDNET: Add us as a preferred source on Google.ZDNET's key takeawaysAI usage is moving to token-based pricing.Token pricing is far more expensive than the previous flat-fee model. Measuring the value derived from AI remains an unsolved problem.SAN DIEGO -- A few months ago, most people paid a flat fee for their AI access. That was then. This is now. The days of AI pricing as a loss-leader are over. As everyone has discussed here at FinOps X 2026, AI's token-based pricing model is becoming the foundation of the entire generative AI economy, and it's far more expensive than older models. Just ask CoPilot users who are having fits over the new token-based pricing. For many enterprise customers, this reminds them of the early days of cloud pricing when they had to deal with volatile invoices and business models shifting under their feet. Underneath the confusion, tokens are quietly standardizing how labs translate scarce GPU capacity into billable units, how enterprises measure AI usage, and how software vendors reprice their products.Also: Rolling out AI agents? 4 ways to move fast and furious - but with extreme cautionTokens: The atomic units of AIIn this new world, the token is the basic unit of AI work. J.R. Storment, executive director of the FinOps Foundation, calls it "the atomic unit of AI." In his FinOps keynote, Storment said that "tokens serve more roles in the modern economy than almost any other commodity has in modern history, maybe, maybe oil in the 20th century." Tokens, he told the FinOps X audience, are simultaneously "the unit of output from all of the hardware and compute and data centers," "how the labs price their outputs and inputs," and "the value unit that enterprises are looking to monetize."That abstraction is precisely why labs and hyperscalers like it. Instead of charging for GPU types, memory, and power directly, they can expose a single unit -- tokens per million -- over a bewildering mix of architectures and deployment topologies. OpenAI, Anthropic, Google, and others now publish per‑model rate cards with separate prices for input tokens (everything you send the model) and output tokens (everything it generates back), usually quoted in dollars per million tokens.Also: Building an agentic AI strategy that pays off - without risking business failureSo what are tokens anyway? An AI token, said Storment, "is the smallest unit a word or phrase can be broken down into when being processed by a large language model (LLM)." Before a model can work with text, it breaks it into fragments, a process called tokenization. For English, a common rule of thumb is that "one token is roughly four characters, or about three-quarters of a word," so "100 tokens ≈ 75 words."The token hides enormous complexity. As SAP's FinOps team put it in their session, "You pay per token, and this little token hides an enormous complexity underneath predictability," from model choice and quantization to how aggressively you use caching or agents. That complexity is exactly what FinOps teams are now being asked to decode.The all‑you‑can‑eat token era is over.If 2023 through early 2025 was the era of cheap experiments, the last 18 months have been a rude awakening. Storment describes three distinct phases: The "old days of AI" before ChatGPT, the "good old days of AI" when chatbots "could write some decent code," and then the post‑November‑2025 world when major model releases "took AI from pretty good to really good."In the good old days, the era of all-you-could-eat tokens and subscriptions, we went through a brief period of token maxing. Then everybody was excited about their token leaderboard, which showed who had the most token usage. Today, token leaderboards are painfully obsolete because no one can afford to waste tokens. As Amazon senior vice president Dave Treadwell begged, "Please don't use AI just for the sake of using AI." Objectively, between June and November last year, Storment said global token usage grew in a "nice linear path." Then those new models and agentic patterns landed. Context windows "went from a few thousand or tens of thousands or hundreds of thousands up to millions of tokens in a single conversation," and "agentic hit the scene and exploded," adding "loops and retries and corrections and all this insanity."Also: The autonomous business is coming. Here's why that shift is good news for professionalsCompanies had happily subsidized that behavior… until they saw the bills. Storment recounted how some "$200-a-month" power users actually cost "upwards of tens of thousands of dollars a month when you were running everything on the latest model." For example, SemiAnalysis, an AI analytics company, recently estimated that a $200 Anthropic plan used to give $8,000 worth of Claude tokens, while a similar OpenAI offering gave $14,000 worth of Codex tokens. Those days and prices are done. Moving forward, companies will have to pay the real cost of AI tokens."So now what matters more than anything is AI value," Storment told the room. "We've got to bring value back to what we're doing… We're in an era where tokens are the main measurement. We're in an era where tokens are in everything in software, and they're driving a lot of the global token economy." Scarcity keeps token prices from collapsingIf Moore's law and hyperscale competition were the only forces at work, you'd expect token prices to keep falling. To some extent, they have. "Since 2023, token prices have fallen dramatically," Storment acknowledged. SAP's internal telemetry tells a similar story. "This is our cost per token over the same time period," said SAP data scientist Maida Nazifi, showing their internal chart. "It's clearly trending down, even with a bit of flattening at the end. And honestly, it matches the narrative that everyone wants to believe, right? Token prices keep on falling."But both stress the caveat: The floor may be in sight. Storment notes that if "you look at the top labs and their pricing, you go back to the Wayback Machine. Token prices have been pretty flat since November 2025," which he links directly to hardware and power constraints: "We can't get enough hardware, we can't get enough power… we're seeing backlogs, we're seeing long commitment periods, and we're seeing shortages."Also: AI agents are getting their own search engineHe cited Intel's CEO saying he doesn't expect real relief in GPU and related component supply "until 2028." Nazifi and SAP VP Frederik Pohl are seeing the same patterns at their company: Pohl warned, "We have supply chain constraints, we have hardware prices that are rising, and the prices of new frontier models are growing ever more expensive."The net result is a classic Jevons paradox: Falling unit cost, exploding total spend. "Even with falling token prices, we see that our spend is still rising, and that's the famous paradox," Pohl said. "At our scale, we had unit costs falling, but we saw in some months that spend was doubling."Storment thinks the paradox is just beginning. Goldman Sachs, he said, estimates global usage rising from "6 quadrillion tokens" today to "120 quadrillion forecasted tokens" within about 3.5 years. Even if token prices drop further once supply loosens, they are unlikely to fall 24x as fast as volume grows."FinOps discovers token economicsFor the FinOps community, which cut its teeth on cloud right‑sizing and reserved instances, token pricing is both familiar and completely alien. The familiar part is that its usage‑based, the invoices are big, and forecasting is hard. The alien part? The unit is tied to language, not infrastructure, and it changes as fast as model releases, not as slowly as server depreciation schedules.Pohl asserted that "AI does not just stretch the cloud playbook, it breaks it; AI is more different from the cloud than cloud was to the data center." Unlike CPUs, "AI models are nothing like that… they have their unique strengths and weaknesses… They have different cost profiles, and swapping out an LLM is not just a pricing decision. It's also a quality-of-output decision."SAP's experience is a case study in how enterprises are retooling. Its Business AI platform, Pohl explained, runs across "multiple different LLMs," including "ChatGPT, Anthropic, Gemini… other open source models," layered on "different hyperscalers." Also: Work IQ is Microsoft's big bet on agent-first enterprise IT, and I have questionsWhen SAP first went looking for AI cost data, "we immediately hit a wall," Nazifi recalled. "The existing [cloud] tools were very blind to the nuance of LLMs, so they could tell us we spent this amount on [a provider], but not really which model, or how much the model. It really was like trying to optimize your gold mining operation by looking at the total weight of ore."So they did it the hard way: "We pulled data manually, we merged data across tables, and then we had this first picture by hand." That picture, once it reached their global infrastructure lead and then the CTO, transformed the conversation. "Within days, it went from like, OK, this is interesting, keep me posted,' to… 'I need this regularly, I need more,'" Nazifi said. Pohl added the FinOps lesson: "If you have a CTO asking for a number, that's not a question, it's a mandate.That demand forced SAP to formalize an internal AI FinOps framework built around three pillars:Spend visibility: "What we consume, how we consume it, and where we consume it," across models, platforms, business units, and regions.Economics: "How efficiently are you leveraging AI," measured with token‑level metrics like input/output ratios, cached token ratios, and "token to spend drift" to see whether costs are rising because of volume or mix shifts to pricier models.Value: Connecting AI spend to business outcomes with "cost per use case" and "inference cost by revenue," so they can tell "which AI features are economically viable" and whether "your AI product margins actually work.""Every token needs to earn its cost," Pohl said, echoing Nvidia CEO Jensen Huang's phrase "token factory effectiveness." That factory spans everything from silicon and data center leases to model routing and prompt design.Tokenomics: beyond just counting tokensIf FinOps is about cost control and accountability, tokenomics, at least as the Linux Foundation is positioning it, is about the full lifecycle of tokens as an economic good. Storment defines it as "the emerging discipline of converting energy and capital into AI tokens and resources, consuming those tokens and all the related technology to drive efficient intelligence, and then ultimately drive value on the backend."In his view, that breaks into three buckets:Production: "Take energy and capital and create tokens," whether in cloud data centers, colos, edge devices, or, as Elon Musk likes to imagine, "data centers in space."Consumption: All the allocation, forecasting, and optimization, which kind of sounds a lot like FinOps for AI," spanning model routing, quantization choices, agent limits, and cache strategies.Value: "How do we monetize those tokens? How do we adjust our pricing based on the cost of those tokens? What are the labor implications in our entire company based on the cost of that AI?"That last piece is where token pricing directly collides with software-as-a-service (SaaS) business models. As Storment told me in an interview, "Tokenomics is getting over to the price of the tokens and how effectively we manage this production and consumption of them is changing pricing models for Fortune 100 companies."He points to Microsoft's GitHub moves, shifting Copilot toward more explicit usage‑based charging, as an early example. Developers "who love the unlimited tokens" are now "really just angry at Microsoft," because their implicit subsidy vanished.Also: Why Anthropic suddenly pulled Fable 5 and Mythos 5 for everyoneThe labs themselves are also tightening the screws in ways that are i

[truncated for AI cost control]