The Problem That Was Already There.
This is Post 3 of 7: The AI Inference Cost Crisis. Post 2 covered the death of the flat-rate promise. Post 4 examines the agentic AI token multiplier and its margin consequences.
There is a version of the AI inference cost story that treats it as something new — a problem that arrived with large language models and will be solved when the technology matures. That framing is wrong. The governance failure now compounding inside AI budgets is structurally identical to a failure that was already running inside cloud infrastructure and enterprise data platform budgets years before the first LLM enterprise contract was signed. AI did not introduce consumption-cost volatility to the enterprise technology stack. It inherited it — and added a multiplier.
Cloud Egress: The Original Metered Trap
Cloud egress fees — the charges AWS, Azure, and Google levy whenever data leaves their networks — are the original consumption-cost trap in enterprise technology. AWS charges $0.09 per gigabyte for standard internet egress from US East, with cross-availability-zone transfers adding another $0.01 per gigabyte in each direction, and NAT Gateway processing adding $0.045 per gigabyte on top. These fees accumulate invisibly across billing cycles, denominated in fractions of a cent per gigabyte, almost never appearing as a named negotiation item in enterprise procurement conversations.
The 2025 Backblaze and Dimensional Research survey1 of 403 IT leaders managing more than 250 terabytes in the public cloud found that 95 percent had encountered unexpected storage charges that disrupted budgets or restricted strategic flexibility. Fifty-eight percent identified data-movement cost as the single biggest barrier to multi-cloud strategy. Ninety-seven percent cited egress costs and technical complexity as their top two obstacles to switching providers.
Flexera’s 2025 State of the Cloud report2, drawing on 759 respondents, documents that 84 percent cite managing cloud spend as their top challenge. An estimated 27 percent of cloud spend is wasted — a figure that rose to 29 percent in the 2026 edition. Budgets average 17 percent over plan. This pattern has held across nine consecutive years of cloud cost optimization appearing at the top of enterprise IT priority lists. It is structural: consumption-based pricing transfers cost risk from vendors to buyers, and organizations that do not govern at the workload level systematically overpay.
Snowflake: When Credits Replace Subscriptions
Snowflake’s credit model brought this dynamic into the enterprise data stack. The official credit consumption table3 prices compute at $2 to $4 per credit depending on edition — Standard at approximately $2.00, Enterprise at $3.00, Business Critical at $4.00. Virtual warehouse compute drives 60 to 90 percent of a typical Snowflake bill. A poorly structured query can consume ten times more credits than an optimized one. Neither the query inefficiency nor the resulting cost spike is visible until the invoice arrives.
The mechanics of the billing move with usage in ways that are not intuitive. A warehouse sized at an XL tier for an occasional peak workload burns four times the credits of an L tier for every routine query run through it. An idle warehouse that was not suspended is still consuming credits. Teams optimizing for query performance rather than credit consumption — which is to say, almost all teams — routinely run workloads that cost far more than they would with basic configuration discipline.
Snowflake’s FY2026 earnings4 make the downstream financial consequence visible: product revenue grew 30 percent year over year in Q4 to $1.23 billion. Remaining performance obligations reached $9.77 billion, up 42 percent. Net revenue retention held at 125 percent. That 125 percent NRR — meaning the existing customer base is collectively spending 25 percent more than the prior year without any new logos — is what consumption pricing looks like when it compounds at enterprise depth.
Databricks: The Double Billing Problem
Databricks introduces a further layer of complexity through a genuinely dual-billed cost structure. Official Databricks pricing5 covers only the DBU layer — the platform compute charge. Jobs Compute runs at approximately $0.15 per DBU on Standard tier. All-Purpose Compute runs at approximately $0.40 per DBU. None of these figures include the underlying cloud provider’s cost for the virtual machine running the cluster. The cloud VM, its storage, and any egress it generates are billed separately by the cloud provider — and that additional layer adds 50 to 150 percent on top of the DBU charges, depending on instance type and region.
The practical consequence is consistent: teams that budget using only the Databricks pricing calculator routinely underestimate their total monthly spend by 50 to 200 percent. A cluster listed at $2.64 per hour in DBUs runs on cloud infrastructure that adds another $3.89 per hour — making the real operating rate $6.53 before storage or egress. Finance teams that built their budget model on the calculator figure are carrying unmodeled exposure for every cluster-hour their engineering teams have approved.
The full cost of a workload is always a combination of layers. Organizations that model only the visible platform layer carry structural underestimates on every workload they run.
The Governance Paradox
CloudZero’s February 2026 report with Benchmarkit6 captures the governance paradox in a single comparison: formal cloud cost management programs nearly doubled in a single year, growing from 39 to 72 percent of organizations surveyed. Over the same period, the mean Cloud Efficiency Rate fell 15 percentage points, from 80 to 65 percent. At the 25th percentile, it fell from 70 to 45 percent. The programs are maturing. The efficiency is declining. The cause is the same in every case: consumption-based costs accumulating faster than attribution frameworks can catch them.
What Effective Governance Actually Requires
The organizations that manage consumption costs effectively share three structural characteristics.
They govern at the workload level, not the aggregate level. Total cloud spend is not a governance metric. Cost per workload, per product line, and per customer is a governance metric. The difference between organizations that contain consumption costs and those that do not is almost always visible in whether they have built the attribution layer between the invoice and the business outcome.
They treat pricing model changes as a first-order risk. The vendors that moved from subscription to consumption pricing — Snowflake, Databricks, now Anthropic and the major AI providers — did not do so secretly. The signals were available. In a consumption-pricing environment, vendor pricing model changes are a budget risk that belongs in the risk register alongside FX exposure and interest rate sensitivity.
They model the full stack cost, not just the platform cost. Whether it is Databricks DBUs plus cloud VM costs, or an AI model API fee plus the data egress from the system feeding it, the full cost of a workload is always a combination of layers. Organizations that model only the visible platform layer carry structural underestimates on every workload they run.
Next in the series: Post 4 examines the token multiplier at the heart of the AI inference cost problem — the agentic AI dynamic that Gartner finds adds 5 to 30 times more tokens per task, and the margin compression math that follows.
Start the conversation.
GUUT helps enterprise organizations govern AI inference spend at the delivery layer — eliminating the token multiplier for structured, repeatable intelligence outputs.
Eric Ford | Chief Data and Analytics Officer | GUUT
Sources & Citations
- Backblaze / Dimensional Research, “The Hidden Cost of Cloud Storage,” October 2025. n=403 IT leaders at 250TB+ organizations. https://ir.backblaze.com/news/news-details/2025/Backblaze-and-Dimensional-Research-Reveal-Hidden-Cost-of-Cloud-Object-Storage-as-Enterprises-Confront-AI-Data-Demands/default.aspx ↩
- Flexera, “2025 State of the Cloud Report,” March 2025. n=759. 84% cite managing cloud spend as top challenge; 27% of cloud spend wasted. https://www.flexera.com/about-us/press-center/new-flexera-report-finds-84-percent-of-organizations-struggle-to-manage-cloud-spend ↩
- Snowflake Credit Consumption Table (official), 2026. AWS US East on-demand: Standard ~$2.00/credit, Enterprise ~$3.00. Compute drives 60–90% of typical bill. https://www.snowflake.com/legal-files/CreditConsumptionTable.pdf ↩
- Snowflake Q4 FY2026 Earnings, February 25, 2026. Product revenue $1.23B Q4 (+30% YoY); RPO $9.77B (+42%); NRR 125%. https://www.nasdaq.com/press-release/snowflake-reports-financial-results-fourth-quarter-and-full-year-fiscal-2026-2026-02 ↩
- Databricks pricing, 2026. Jobs Compute ~$0.15/DBU; All-Purpose ~$0.40/DBU. Cloud provider adds 50–150% on top of DBU costs. https://www.databricks.com/product/pricing ↩
- CloudZero + Benchmarkit, “FinOps in the AI Era,” February 12, 2026. n=475. CER fell 80% to 65% YoY. https://www.cloudzero.com/press-releases/20260212/ ↩