Staring at a $2,000 invoice from Anthropic last quarter, I cursed under my breath – same prompt volume as before, but Sonnet’s price hike hit like a freight train.
AI model pricing? It’s a dumpster fire. Over 100 LLMs out there via APIs, rates shifting – sometimes weekly, sometimes mid-week without a tweet. Providers like OpenAI, Anthropic, Google drop new models, deprecate old ones, tweak tokens quietly. You build on Claude 3, feel smart, then three months in, you’re bleeding cash while a $0.60-per-million-token upstart matches it.
Why Is AI Model Pricing Such a Total Mess?
Teams don’t track it. They pick one or two models – GPT-4o, maybe Gemini – revisit maybe quarterly, if that. Hardcode it, ship it, pray. But here’s the kicker: price doesn’t track quality. A cheapo $0.60/M model crushes 80% of tasks better than a $15/M behemoth.
Capabilities? Wild west. Vision here, tool-calling there, context windows ballooning to 128K, JSON modes popping up. No clean map from bucks to benchmarks.
And providers? Ten-plus of ‘em, each with wonky pricing pages, bizarre formats, update rhythms that laugh at your calendar.
WhichModel steps in – built by folks tired of this crap. Scrapes every major provider every four hours. Normalizes the mess. Cross-verifies, ‘cause one source lies (looking at you, outdated docs).
There are over 100 LLM models available through commercial APIs today. Their pricing changes constantly — sometimes multiple times per week. New models launch, old ones get deprecated, and providers quietly adjust rates.
That’s straight from their announcement. Spot on.
They track input/output/cached tokens, context sizes, features like streaming, vision. Flags mismatches. Exposes it not as some dashboard for humans – nah, as an MCP server. Model Context Protocol. For AI agents.
One line in your config:
{ “mcpServers”: { “whichmodel”: { “url”: “https://whichmodel.dev/mcp” } } }
No API key. No SDK hell. Your agent queries: “Cheapest model with tool-calling and 128K context?” Boom, answer.
Or: “Claude Sonnet 4 vs GPT-4.1 for code gen at 10K calls/day.” Real-time math on your burn rate.
Does WhichModel Actually Work for Shipping Code?
Tested it myself – plugged into a side project agent last week. Asked for data extraction under $0.002 per call. Spat back options I hadn’t considered, like Grok’s beta tier. Switched, saved 30% overnight.
But cynicism check: scraping’s brittle. Providers block it eventually? They normalize across APIs, docs, aggregators – smart hedge.
Surprises from their build? Pricing updates hit multiple times weekly ecosystem-wide. At scale – 10K calls/day – model pick’s a $6K/month call. Agents gotta decide solo; spreadsheets won’t cut it.
My unique take: this echoes AWS’s early days, 2006-2010. Instance types morphed monthly, spot pricing wrecked budgets. Devs hardcoded m1.small, woke to bankruptcy. Sounded familiar? Cloud tools like CloudHealth rose then. WhichModel’s that for LLMs – but agent-native, prescient.
Predict this: by 2025, every production agent routes dynamically via trackers like this. Hardcoding? Dead relic.
Look, I’ve covered Valley hype for 20 years. Remember Watson’s $100K/month promises? Or self-driving by 2018? Buzzwords bury bodies.
WhichModel skips spin. Open-source, MIT licensed. GitHub: Which-Model/whichmodel-mcp. Website: whichmodel.dev. Free forever, they say.
Who’s making money? Not them – you’re the winner, slashing bills while providers play rate roulette.
But wait – capability matrices shift too. Vision support flips with updates. They track that, tie price to powers.
One punchy caveat. Quality tiers don’t map clean. That $0.60 model? Gold for chat, flops on math. Agents must benchmark, not just price-shop.
Who Profits When Your AI Bills Balloon?
Providers, duh. OpenAI’s margins fatten on your laziness. They drop Opus at $75/M output, you pay ‘til bankruptcy whispers.
Indies like Mistral? Underdogs, pricing aggressive to steal share. Trackers level it.
Teams ignoring this? Screwed at scale. 80% tasks? Cheap models suffice. Why overpay for frontier fluff?
Here’s the thing – agents are the future. Autonomous deciders. Give ‘em WhichModel via MCP, they optimize live. No human babysitting spreadsheets.
Skeptical me digs the no-BS approach. No “revolutionary” claims. Just: we scrape, you save.
Doubters say: “Just use the best.” Cute, ‘til $6K/month bites.
Or: “I’ll monitor manually.” Ha. With 100+ models? Dream on.
Real talk: if you’re building LLM apps – agents, RAG, whatever – plug this in. Today.
🧬 Related Insights
- Read more: Sandbox Bug Turns LLM Judge into Model Blamer: The Postmortem
- Read more: React Hooks’ Secret: A Cursor on a List
Frequently Asked Questions
What is WhichModel and how do I use it?
Open-source MCP server tracking 100+ LLM prices every 4 hours. Add one line to your agent config: {“mcpServers”: {“whichmodel”: {“url”: “https://whichmodel.dev/mcp”}}} – query naturally.
Does WhichModel track all LLM providers?
Covers 10+ majors like OpenAI, Anthropic, Google – scrapes/normalizes input/output/cached prices, features, context windows.
Is WhichModel free for production agents?
Yes, fully open-source MIT license, no API keys, unlimited use.