The Death of the Flat-Rate Promise.
This is Post 2 of 7: The AI Inference Cost Crisis. Post 1 introduced the series. Post 3 covers the cloud and data platform governance failures that set the stage.
For most of the last decade, the enterprise software procurement model rested on a single economic premise: once code is written, it costs almost nothing to serve one more user. This is the zero-marginal-cost logic that made SaaS work as a business model and made multi-year subscription contracts rational for buyers. Bessemer Venture Partners’ research1 documents the financial expression of that premise: mature cloud companies sustain median gross margins of 65 to 70 percent, with top-quartile performers clearing 80 to 85 percent. Those margins justified the economics on both sides of the table. Buyers got predictable costs. Vendors got compounding revenue on an essentially fixed cost base. The model was elegant, stable, and — it turns out — dependent on conditions that AI inference does not meet.
The Premise That Broke First
The first crack appeared not in AI but in traditional SaaS. By February 2023, OpenView’s research2 found that 61 percent of SaaS companies had already adopted some form of usage-based pricing, with another 21 percent testing it. The driver was not ideology — it was cost structure. Vendors with genuine variable costs could not sustain flat-rate contracts without either losing money on heavy users or overcharging light ones. Usage-based pricing was the market acknowledging that the zero-marginal-cost premise had limits.
Salesforce’s Agentforce, priced at $2 per conversation. Intercom’s FinAI, at $0.99 per resolution. Microsoft Security Copilot, at $4 per hour. These are not pricing experiments. They are admissions that the costs are real and variable, and that flat-rate contracts cannot honestly represent them. AI inference did not introduce this shift. It accelerated it past the point of no return.
The Genuine Cost of Frontier Inference
Every query sent to a large language model consumes GPU cycles, electricity, cooling, and hardware that depreciates with each use. These costs are not theoretical. SaaStr’s December 2025 analysis3 is precise: OpenAI’s compute margin ran at roughly 35 percent in January 2024, recovering to approximately 70 percent by October 2025 — and that recovery required sustained, costly engineering investment. GitHub Copilot, priced at $10 per user per month at launch, was losing more than $20 per user per month on heavy users. Anthropic posted gross margins of negative 94 to negative 109 percent through 2024. These are not growing pains. They are the genuine unit economics of frontier AI inference at commercial scale.
The vendors had three options: raise prices, restrict usage, or wait for efficiency gains to close the gap. In practice, they did all three. But usage has expanded faster than prices have fallen — and the economics that required a subsidy at launch continue to require a managed cost recovery strategy at scale.
GitHub Copilot lost more than $20 per user per month at a $10 price point. The margin on AI delivery has never been what the subscription price implied.
How the Unbundling Is Happening
The clearest example of the structural shift is Anthropic’s November 2025 enterprise pricing change. The Register reported4 in April 2026 that Anthropic eliminated token bundles from its enterprise seat fee entirely. Every token is now billed at standard API rates on top of the base seat. The legacy plans were discontinued no later than March 8, 2026. Adrien Laurent, CEO of IntuitionLabs, described what this means in practice: for clients where the seat fee was already only 20 percent of the total bill, the shift is a formality. For those who had benefited from the bundled subsidy, it is a structural cost increase — arriving at renewal, without the ability to renegotiate from the prior year’s baseline.
Anthropic’s move is representative, not unique. Microsoft’s Copilot Credit model introduced consumption-metered billing for agent actions in September 2025. OpenAI’s enterprise tier has moved progressively toward usage-based pricing for API-heavy workloads. Google’s Gemini for Workspace prices advanced AI actions on a per-event basis. The pattern across all three major AI providers is identical: seat-based access for entry-level usage, consumption billing for anything that scales.
The Scale of What Has Already Changed
Menlo Ventures’ 2025 enterprise AI report5 documents the trajectory: enterprise generative AI spending grew from $1.7 billion in 2023 to $11.5 billion in 2024 to $37 billion in 2025. Seventy-four percent of startup compute workloads are now inference. CloudZero’s May 2025 survey6 found average monthly enterprise AI spend at $85,521 — up 36 percent year over year — with the share of organizations spending more than $100,000 per month doubling in a single year.
Snowflake’s FY2026 earnings7 show what consumption pricing looks like when it compounds at enterprise depth: product revenue grew 30 percent year over year in Q4 to $1.23 billion. Remaining performance obligations reached $9.77 billion, up 42 percent. Net revenue retention held at 125 percent. Snowflake is growing because existing customers are consuming more. That is the vendor’s financial expression of consumption pricing. From the buyer’s side, it is a cost that scales with usage, compounds without active management, and arrives on an invoice the procurement team did not model when they signed the original contract.
What This Means for the Finance Organization
The death of the flat-rate promise has three concrete implications for how the finance organization needs to operate.
Contract structure now requires a different negotiation model. The relevant question at renewal is no longer price per seat. It is the relationship between the seat fee, the included usage allowance, and the overage rate. Organizations that sign multi-year AI contracts without explicit terms governing all three dimensions are accepting open-ended exposure on a cost that scales with every deployment decision their technology teams make after the contract is signed.
Budget modeling needs workload-level attribution. Aggregate AI spend figures tell the finance organization almost nothing about whether the spend is efficient, which business outcomes it is generating, or which workloads are driving the growth. A company that cannot attribute AI spend to specific workflows and product lines cannot govern it. It can only watch it.
The vendor relationship has structurally changed. AI vendors are not raising prices arbitrarily. They are recovering genuine costs. That means the negotiation is not about pushing back on a price increase — it is about building a governance architecture that keeps the relationship sustainable as usage scales.
Three Questions Every CFO Should Ask Today
1. What is in our current AI contracts — explicitly — about usage allowances and overage rates? Not what the vendor says is included, but what the contract language actually commits them to. If your contract does not specify a token allowance, there may not be one.
2. What are we actually spending per workload, and what is that spend buying? Most finance organizations can answer this in aggregate. Almost none can answer it at the workload level.
3. What happens to our AI cost structure if agentic deployments expand on our current contract terms? Gartner projects 40 percent of enterprise applications will feature AI agents by end of 2026. Agentic workloads consume 5 to 30 times more tokens per task. If that expansion happens inside a consumption-billing contract with no usage cap, the exposure is multiplicative, not linear.
Next in the series: Post 3 examines the cloud egress and data platform governance failures that set the stage for everything described above — and explains why AI is inheriting a structural problem that was already unresolved.
Start the conversation.
GUUT helps enterprise organizations govern AI inference spend at the delivery layer — eliminating the token multiplier for structured, repeatable intelligence outputs.
Eric Ford | Chief Data and Analytics Officer | GUUT
Sources & Citations
- Bessemer Venture Partners, “Scaling to $100 Million.” Median gross margin 65–70%; top quartile 80–85%. https://www.bvp.com/atlas/scaling-to-100-million ↩
- OpenView Partners, “State of Usage-Based Pricing: 2nd Edition,” February 2023. 61% of SaaS companies adopted usage-based pricing. https://openviewpartners.com/blog/state-of-usage-based-pricing/ ↩
- SaaStr / Jason Lemkin, “Have AI Gross Margins Really Turned the Corner?” December 2025. OpenAI compute margin 35% Jan 2024 → 70% Oct 2025. https://www.saastr.com/have-ai-gross-margins-really-turned-the-corner-the-real-math-behind-openais-70-compute-margin-and-why-b2b-startups-are-still-running-on-a-treadmill/ ↩
- Thomas Claburn, The Register, “Anthropic squeezes enterprises by ejecting bundled tokens from seat deal,” April 16, 2026. https://www.theregister.com/2026/04/16/anthropic_ejects_bundled_tokens_enterprise/ ↩
- Menlo Ventures, “2025: The State of Generative AI in the Enterprise,” December 2025. https://menlovc.com/perspective/2025-the-state-of-generative-ai-in-the-enterprise/ ↩
- CloudZero, “The State of AI Costs,” May 2025. n=500+. Avg monthly AI spend $85,521, up 36% YoY. https://www.cloudzero.com/state-of-ai-costs/ ↩
- Snowflake Q4 FY2026 Earnings, February 25, 2026. Product revenue $1.23B Q4 (+30% YoY); RPO $9.77B (+42%); NRR 125%. https://www.nasdaq.com/press-release/snowflake-reports-financial-results-fourth-quarter-and-full-year-fiscal-2026-2026-02 ↩