How I cut my LLM bill 47% using real usage data
Not with a cheaper model. Not by using AI less. By finally looking at where the tokens actually went.
My stack looked reasonable from the inside: a frontier model for coding (worth it), a mid-tier for writing, a fast tier for chat. The bill said ~$560/month and climbing ~18% quarter over quarter. I assumed that was the price of building with AI.
Then I exported three months of usage CSVs and aggregated them by model × task. One cell ruined my week: my frontier coding model was also running classification — ticket routing, intent tagging, spam triage. Twelve million input tokens a month of “is this a bug report or a feature request?” on a model priced for frontier reasoning.
The diagnosis, in numbers
| Workload | Before | After | Δ / month |
|---|---|---|---|
| Classification | frontier · $75 | nano tier · $1 | −$74 |
| Summarization | frontier · $75 | fast tier · $16 | −$59 |
| Summaries → batch API | realtime | async, 50% off | −$8 |
| Writing output caps | unbounded | concise prompts | −$5 |
| Coding (kept frontier) | $315 | $315 | ±$0 |
Total: $560 → $296/month, a hair over 47%. Quality impact where I moved traffic: classification accuracy dropped less than a point on my eval set (labels are labels), summaries needed one prompt revision. Coding stayed exactly where it was — that's the point. This isn't “use cheaper models.” It's match each task to the model that earns it.
The process, repeatable
- Export & aggregate. Provider dashboards hide the task dimension. Tag usage by task (even roughly) and aggregate day × model × task. The waste is always in a cell you weren't looking at.
- Rank by savings, not by price. A 90% discount on a $2 workload is noise. Sort candidate moves by absolute monthly dollars.
- Simulate before migrating. Project the new cost and the quality delta on paper first. If a move survives a what-if and a 50-example eval, ship it.
- Re-check monthly. Prices moved twice while I wrote this. Set a recurring look at the same matrix.
I got tired of doing this in spreadsheets, so I built WeavePrism — it imports the CSVs, builds the task × model matrix, finds the moves, simulates them, and turns the result into a living decision map. The free tier does everything in this post; the sample dataset is literally the workload above, waste included, if you want to watch the optimizer find it.
Honest caveats: numbers rounded from list pricing; your quality tolerance is yours — run your own evals before moving production traffic.
Find your version of the $74 cell
Import your usage — or just load the sample — and see what the prism finds.
Weave your path, free