Microsoft Cancels Anthropic Licences Internally: The True Cost of Token-Based AI and How to Budget in 2025

Microsoft cancels internal Anthropic licences, revealing the real cost of token-based AI and offering budgeting strategies for 2025.

24 May 2026by Joshua Thompson4 min read82 views

Microsoft cancels internal Anthropic licences? What a viral Reddit claim says about token-based AI billing in 2025

A viral Reddit post claims Microsoft has cancelled internal Anthropic licences after a shift to token-based AI billing blew up annual budgets in months. The post is short on details and not independently verified, but the anxiety it taps into is real: unpredictable token costs are catching teams off guard.

AI has become so expensive that even Microsoft can not afford it. Inflation cancelled AGI.

Source: Reddit thread (not disclosed beyond the title and one-liner).

Token-based AI billing explained – and why budgets blow up

Most modern AI models, including Anthropic’s Claude, bill by “tokens”. A token is a small chunk of text (roughly 3–4 characters in English). You pay for:

Input tokens – your prompt and any retrieved documents you include.
Output tokens – the model’s response.

The “context window” is how much the model can read at once. Bigger windows enable richer tasks but multiply costs if you fill them. Tool use (function calls), retries, and long chats drive token counts up silently. The result: month two looks nothing like month one.

For current rates, see vendor pricing pages (these change often): Anthropic Claude pricing, Azure OpenAI pricing, OpenAI API pricing.

Did Microsoft really cancel Anthropic internally?

Not disclosed. The Reddit post provides no evidence beyond the headline. Microsoft partners deeply with OpenAI and offers multiple models via Azure, while Anthropic is available through various platforms. Internal procurement choices change all the time, particularly during cost reviews.

The broader point stands regardless: if a company with sophisticated FinOps can get surprised by token bills, smaller organisations are at risk too. Treat this as a nudge to tighten your own cost controls rather than a confirmed industry pivot.

How UK organisations should budget AI in 2025

Build a unit-cost model first

Define a “unit of value” (per document summarised, per customer chat, per report generated).
Measure average tokens per unit in a staging environment across a realistic workload.
Model low/median/high scenarios (seasonality, retries, long prompts, edge cases).
Apply currency risk if billed in USD – sterling volatility can move your costs by several percent.

Enforce hard guardrails from day one

Per-user and per-app quotas (daily/weekly token caps, hard timeouts).
Request budgets at the platform level – break the build when exceeded.
Disable oversized context windows by default; whitelist exceptions with approvals.
Log every request with tokens-in/out and a correlation ID; alert on anomalies.

Optimise prompts and architecture for token efficiency

Prompt hygiene – tighten instructions, drop redundant boilerplate, and avoid dumping whole documents.
Retrieval-augmented generation (RAG) – index your content and retrieve only the most relevant chunks.
Tiered models – route simple tasks to cheaper models; reserve premium models for complex reasoning.
Caching and re-use – memoise common answers and embeddings; don’t re-embed unchanged text.
Batching – process items in batches where latency allows to reduce per-call overheads.

Commercial protections

Price caps and throttling – use platform-level rate controls to prevent runaway spend.
Committed-use discounts – negotiate monthly minimums for lower marginal rates if your demand is stable.
Data egress awareness – mixing vendors can add bandwidth and latency costs across regions.
Exit ramps – multi-vendor abstraction helps you switch if pricing or performance shifts.

A quick checklist for product teams

Lever	Why it matters	How to implement
Token budgets	Prevents runaway dialogues and retries	Max tokens per request/session; reject or summarise
Context control	Bigger prompts = bigger bills	Top-k retrieval, chunking, aggressive pruning
Model routing	Match cost to task difficulty	Rules or small classifiers to choose model tier
Caching	Eliminates duplicate generation	Hash normalised prompts; store responses with TTL
Observability	Find regressions early	Track tokens, latency, errors, and per-unit cost

UK-specific considerations: data, compliance, and procurement

UK GDPR and confidentiality – check data processing terms, retention, and where prompts/responses are stored. Prefer UK/EU regions when possible.
Public sector and regulated industries – budget predictability often beats cutting-edge capability. Consider fixed-price tiers or hosted models for high-volume workloads.
Currency and tax – many AI services bill in USD. Agree on FX assumptions in budgets and revisit quarterly.
Vendor lock-in – use standardised APIs and modular RAG so you can swap providers without rewriting your stack.

Beyond the invoice: the real resource cost of AI

Tokens don’t just cost money – they consume energy and water through data centre operations. If your AI use is scaling, factor in sustainability reporting and environmental risk alongside pounds and pence.

For a grounded look at water use and cooling cycles in data centres, see my explainer: AI, waste water, and data centre cooling – what the numbers really mean.

What this Reddit post means – and what to do next

Whether or not Microsoft cancelled anything internally, the strategic takeaway is clear: token-based billing shifts AI from a licence mindset to usage economics. That’s powerful and risky.

If you’re planning 2025 budgets, treat tokens like cloud compute: instrument, cap, test with real workloads, and negotiate. The teams that win won’t be those with the flashiest models, but those that control unit costs while maintaining quality and compliance.

If you’ve already had a surprise bill, don’t panic. Start with observability, shrink your prompts, route to cheaper tiers where you can, and put hard limits in place. Then re-baseline your unit economics and iterate.

Share𝕏 in

AI
Why AI Data Centres Are Facing Backlash Over Water, Power and Planning
AI data centres are no longer just a technology story. They are becoming a planning, utilities and public trust issue, with lessons for UK councils, businesses and AI policy.
JoshuaJuly 19, 2026
AI
Demis Hassabis wants a new AI standards body for the AGI era - what it could mean for the UK
A discussion of Demis Hassabis' AGI framework highlights a proposed Frontier AI Standards Body, pre-release model testing and the need for practical safety rules before more capable AI systems arrive.
JoshuaJuly 19, 2026
AI
Could AI decide who gets laid off? What the Meta lawsuit means for UK employers
A lawsuit by 26 Meta employees alleges AI systems and workplace monitoring data were used in layoff decisions that disproportionately affected people on protected leave. For UK employers, the lesson is not to avoid AI in
JoshuaJuly 19, 2026

Tagged

Claude

Last updated

5 July 2026

Star Rating

No ratings yet

Comments

No comments yet - start the conversation.