BLOG SERIES — POST 5 OF 7

Why Cheaper Tokens Mean Higher Bills.

The Jevons Paradox, AI Inference Economics, and Why Every CFO Instinct About Falling Prices Points the Wrong Way

This is Post 5 of 7: The AI Inference Cost Crisis. Post 4 covered the agentic token multiplier and margin math. Post 6 covers the regulatory calendar — what CFOs must act on before enforcement begins.

Every CFO who has engaged with the AI cost problem has heard the same counterargument from technology leadership: token prices are falling rapidly, so the cost problem will sort itself out. It is a reasonable instinct — and it is precisely wrong. The economic dynamic at work in AI inference markets is one of the most thoroughly documented patterns in industrial history. Its lesson is the opposite of what the falling-prices argument implies.

Understanding why requires going back to 1865 and a British economist named William Stanley Jevons — and then coming forward to January 2025 and a post from the CEO of Microsoft.

The Deflation Is Real

Begin with what is true. AI token prices have fallen at a rate that has no precedent in enterprise technology. Stanford’s AI Index 20251 documents the price of querying a GPT-3.5-level model: $20 per million tokens in November 2022. By October 2024, the same capability cost $0.07 per million tokens — a 280-fold reduction in under two years. Andreessen Horowitz’s ‘LLMflation’ analysis2 finds that inference costs for equivalent performance have fallen roughly tenfold per year, with GPT-3 quality dropping from $60 per million tokens in 2021 to $0.06 in 2024 — a thousand-fold reduction in three years. Epoch AI’s research3 documents that the rate of improvement accelerated: from 50 times per year before January 2024 to 200 times per year after. The deflation is real, accelerating, and remarkable.

The question is not whether token prices are falling. They are. The question is what happens to total AI spend when they do — and the answer, consistently and predictably, is that it goes up.

What Jevons Observed in 1865

William Stanley Jevons was a British economist who, in 1865, published a book called “The Coal Question.” His central observation was one that ran against every intuition of the industrialists of his day: improvements in the efficiency of coal-burning steam engines had not reduced England’s consumption of coal. They had dramatically expanded it. The more efficient the engine, the more applications that became economically viable for steam power. New factories, new railways, new mines — all became feasible at the new economics. Total coal consumption surged even as the coal burned per unit of useful work fell.

Jevons formalized this as a general principle: when the efficiency of using a resource increases, total consumption of that resource increases rather than decreases, because the lower effective cost opens new uses that were previously uneconomical. This is the Jevons Paradox, and it has been confirmed across every major technology cost reduction since — semiconductor performance per dollar, internet bandwidth, cloud computing, and now AI inference.

The AI Market Is Living the Paradox Right Now

When DeepSeek released its low-cost model in January 2025, Nvidia’s stock fell sharply. The market’s logic: cheaper inference means less GPU demand. Then Satya Nadella posted: “Jevons paradox strikes again!”4 As AI gets more efficient and accessible, he wrote, we will see its use skyrocket, turning it into a commodity we just cannot get enough of. Nvidia stock recovered. The market had misread the economics.

Sam Altman’s February 2025 blog post5 made the same point: the cost to use a given level of AI falls tenfold every twelve months, and lower prices lead to much more use. Every time a new tier of model capability becomes affordable, a new tier of use cases becomes viable. Every new tier of use cases adds to total token consumption.

The market data is unambiguous. Menlo Ventures’ 2025 enterprise AI report6 shows enterprise generative AI spending growing from $1.7 billion in 2023 to $11.5 billion in 2024 to $37 billion in 2025 — a 22-fold increase in two years, occurring precisely while per-token prices were falling more than 90 percent. CloudZero’s May 2025 survey7 found average monthly enterprise AI spend up 36 percent year over year — even as per-token prices continued declining. The cheaper tokens became, the more organizations spent on them.

Enterprise GenAI spending grew 22x in two years. Per-token prices fell more than 90% over the same period. This is the Jevons Paradox running at technology speed. The CFO who is planning for AI costs to fall as token prices fall is planning for the wrong scenario.

The Frontier Scarcity Counterforce

Gartner’s March 2026 research8 adds a dimension to the Jevons dynamic that makes the planning problem more complex: the deflation is not uniform across capability levels. Commodity inference — running a two-year-old model at commodity performance — is approaching near-zero cost. Frontier inference — running the latest reasoning models with the longest context windows — is not on the same curve. The compute infrastructure required for frontier AI remains constrained, and frontier model providers are already rationing access. Anthropic introduced weekly usage rate limits in August 2025. Cursor and OpenAI made similar usage adjustments within the same period.

This creates a two-tier planning problem. Workloads that can run on commodity inference will see continued price deflation and the associated Jevons consumption expansion. Workloads that require frontier reasoning capability will find that frontier prices are declining more slowly than the headline numbers suggest, and that access is gated by rate limits that were not present twelve months ago. A budget model that applies the commodity deflation rate to frontier inference workloads will systematically underestimate cost.

What This Means for CFO Planning

Goldman Sachs equity research9, cited by The Stack, found that inference budgets in engineering at some firms were already approaching 10 percent of headcount cost, with projections placing them on par with headcount within several quarters. That finding was documented before the full agentic deployment wave, before Anthropic’s enterprise pricing unbundling, and before the multiplier effect of agentic workflows became a production reality at most organizations.

The planning implication is the inverse of the conventional instinct. Waiting for AI costs to fall before building governance means waiting for the condition that will accelerate the spend problem, not resolve it. Lower token prices create more viable use cases. More viable use cases create more token consumption. More token consumption creates higher total bills. The compounding does not stop because the unit price is lower — it accelerates.

Three Planning Principles for the Jevons Environment

Plan for consumption growth, not consumption stability. A falling per-token price in an AI budget model is not a cost reduction. It is the condition that will drive usage expansion. Any AI budget that holds total spend flat against a forecast of falling token prices is underforecasting consumption.

Govern the workload, not the price. The per-token price is a vendor variable the CFO cannot control. The workload — what gets run, how often, at what token volume, whether it is generating returns — is a governance variable the CFO can control. Organizations that contain AI cost growth in a Jevons environment govern at the workload level.

Distinguish between commodity inference and frontier inference in the cost model. The 280x price decline Stanford documented applies to commodity-level capability. Frontier reasoning models generating enterprise value in complex analytical workflows are not on the same price curve. A budget that applies commodity deflation rates to frontier workloads will systematically underestimate cost.

Next in the series: Post 6 covers the regulatory imperative — the EU AI Act, the EU Data Act, Texas TRAIGA, California SB 53, SR 11-7 for financial institutions, and the compliance calendar that is already running.

Start the conversation.

GUUT helps enterprise organizations govern AI inference spend at the delivery layer — eliminating the token multiplier for structured, repeatable intelligence outputs.

Eric Ford  |  Chief Data and Analytics Officer  |  GUUT

eric.ford@guutit.com
guutit.com

Sources & Citations

  1. Stanford HAI, “AI Index 2025,” April 2025. Token price: ~$20/M (Nov 2022) → ~$0.07/M (Oct 2024) — 280x reduction in ~18 months. https://hai.stanford.edu/news/ai-index-2025-state-of-ai-in-10-charts
  2. Andreessen Horowitz, “LLMflation,” November 2024. Inference cost falls ~10x/year. GPT-3 quality: $60/M (2021) → $0.06/M (2024). https://a16z.com/llmflation-llm-inference-cost/
  3. Epoch AI, “LLM Inference Price Trends,” 2024. Rate accelerated from 50x/yr pre-Jan 2024 to 200x/yr post-Jan 2024. https://epoch.ai/data-insights/llm-inference-price-trends
  4. Satya Nadella on X, January 27, 2025. https://x.com/satyanadella/status/1883753899255046301
  5. Sam Altman blog, “Three Observations,” February 9, 2025. https://blog.samaltman.com/
  6. Menlo Ventures, “2025: The State of Generative AI in the Enterprise,” December 2025. Enterprise GenAI spend: $1.7B → $11.5B → $37B. https://menlovc.com/perspective/2025-the-state-of-generative-ai-in-the-enterprise/
  7. CloudZero, “The State of AI Costs,” May 2025. Avg monthly AI spend $85,521, up 36% YoY. https://www.cloudzero.com/state-of-ai-costs/
  8. Gartner, “By 2030, LLM Inference Will Cost 90% Less,” March 25, 2026. Total inference spend to INCREASE despite falling per-token rates. https://www.gartner.com/en/newsroom/press-releases/2026-03-25-gartner-predicts-that-by-2030-performing-inference-on-an-llm-with-1-trillion-parameters-will-cost-genai-providers-over-90-percent-less-than-in-2025
  9. Goldman Sachs Equity Research (via The Stack, 2025). Inference budgets approaching 10% of headcount cost. https://www.thestack.technology/inference-budgets-are-breaking-the-bank-what-now/

Leave a Reply