cost optimization2026-07-01·6 min read

How I cut my LLM bill 47% using real usage data

Not with a cheaper model. Not by using AI less. By finally looking at where the tokens actually went.

My stack looked reasonable from the inside: a frontier model for coding (worth it), a mid-tier for writing, a fast tier for chat. The bill said ~$560/month and climbing ~18% quarter over quarter. I assumed that was the price of building with AI.

Then I exported three months of usage CSVs and aggregated them by model × task. One cell ruined my week: my frontier coding model was also running classification — ticket routing, intent tagging, spam triage. Twelve million input tokens a month of “is this a bug report or a feature request?” on a model priced for frontier reasoning.

The diagnosis, in numbers

Workload	Before	After	Δ / month
Classification	frontier · $75	nano tier · $1	−$74
Summarization	frontier · $75	fast tier · $16	−$59
Summaries → batch API	realtime	async, 50% off	−$8
Writing output caps	unbounded	concise prompts	−$5
Coding (kept frontier)	$315	$315	±$0

Total: $560 → $296/month, a hair over 47%. Quality impact where I moved traffic: classification accuracy dropped less than a point on my eval set (labels are labels), summaries needed one prompt revision. Coding stayed exactly where it was — that's the point. This isn't “use cheaper models.” It's match each task to the model that earns it.

The process, repeatable

Export & aggregate. Provider dashboards hide the task dimension. Tag usage by task (even roughly) and aggregate day × model × task. The waste is always in a cell you weren't looking at.
Rank by savings, not by price. A 90% discount on a $2 workload is noise. Sort candidate moves by absolute monthly dollars.
Simulate before migrating. Project the new cost and the quality delta on paper first. If a move survives a what-if and a 50-example eval, ship it.
Re-check monthly. Prices moved twice while I wrote this. Set a recurring look at the same matrix.

I got tired of doing this in spreadsheets, so I built WeavePrism — it imports the CSVs, builds the task × model matrix, finds the moves, simulates them, and turns the result into a living decision map. The free tier does everything in this post; the sample dataset is literally the workload above, waste included, if you want to watch the optimizer find it.

Honest caveats: numbers rounded from list pricing; your quality tolerance is yours — run your own evals before moving production traffic.

Find your version of the $74 cell

Import your usage — or just load the sample — and see what the prism finds.

Weave your path, free