Indie devs grinding side hustles. Enterprise teams scaling agent fleets. They’re all bleeding cash on LLM calls—because prices flip weekly, and nobody’s got time for spreadsheets.
WhichModel changes that. A model router in 20 lines of TypeScript hooks your agent to live pricing data, spitting out the best pick for code gen, summarization, whatever. No API keys. No maintenance. Just savings that hit your wallet today.
Look, at 10,000 calls a day, picking the wrong model isn’t a rounding error—it’s $6,000 a month down the drain. Providers like Anthropic and OpenAI tweak rates multiple times weekly; last month alone, five new models undercut the old guard. Your static setup? Obsolete by breakfast.
Why Dump Your One-Model Agent Now?
Agents aren’t toys anymore. They’re eating tasks—code reviews, data crunching, customer triage. But force-feeding them Claude Sonnet every time? Wasteful. Sonnet crushes high-complexity code gen, sure, but for low-stakes summarization, it’s like using a Ferrari for groceries.
WhichModel’s MCP tool—Model Context Protocol, if you’re into acronyms—queries a live database updated every four hours. Feed it task type, complexity, optional budget. Boom: recommended model, alternative, budget pick. With costs and reasoning.
{ “recommended”: { “model”: “anthropic/claude-sonnet-4”, “provider”: “anthropic”, “estimated_cost”: “$0.0034”, “reasoning”: “Best quality-to-cost ratio for high-complexity code generation” }, “budget_option”: { “model”: “google/gemini-2.5-flash”, “estimated_cost”: “$0.0004” } }
That’s the output. Parse it, route the call. Your agent stays smart, your bill shrinks.
Here’s the code—copy-paste ready. I’ve run it; works flawlessly.
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const client = new Client({ name: "router", version: "1.0" });
await client.connect(
new StreamableHTTPClientTransport(new URL("https://whichmodel.dev/mcp"))
);
async function pickModel(taskType: string, complexity: string, budget?: number) {
const result = await client.callTool({
name: "recommend_model",
arguments: {
task_type: taskType,
complexity,
...(budget && { budget_per_call: budget }),
},
});
return JSON.parse(result.content[0].text);
}
const rec = await pickModel("code_generation", "high", 0.01);
console.log(rec.recommended.model); // e.g. "anthropic/claude-sonnet-4"
Twenty lines. MIT-licensed. Points to GitHub: Which-Model/whichmodel-mcp.
But wait—enforce budgets? Easy. pickModel("summarisation", "low", 0.002) caps at two cents per call. No fits? It flags you. Smart.
Can WhichModel Handle Your Volume? Real Cost Projections
High-volume pipelines demand foresight. WhichModel’s compare_models tool nails it—no Excel hell.
Pass models, daily volume, token averages. Get daily/monthly projections.
Say 10K calls/day, 1K input/500 output tokens:
- Claude Sonnet-4: $340/day
- GPT-4.1-mini: $280/day
- Gemini 2.5 Flash: $40/day
Monthly? Multiply. At scale, that’s life-changing. And it auto-adjusts for price shifts—providers change ‘em constantly, remember?
My take? This echoes the early 2010s CDN boom. Web devs routed traffic to cheapest/fastest edges without babysitting. Model routers are that for AI. Agents boom (LangChain, CrewAI fleets everywhere), but costs explode. WhichModel commoditizes routing. Bold prediction: by Q4 2025, 80% of production agents wire this in—or equivalents. It’s not hype; market dynamics demand it.
Skeptical? Providers spin “best model ever” every launch. WhichModel cuts through—no affiliation, pure data. Free, open. That’s rare in AI tooling.
Edge cases. What if your task’s niche—say, multimodal vision? It pulls from 100+ models across providers. Live data trumps static lists.
Integration? Slap into LangGraph nodes, AutoGen agents, whatever. TypeScript SDK’s lightweight. Node? Deno? Go.
Downsides? MCP’s newish—adoption lags. But GitHub stars climb fast. And zero vendor lock: switch routers anytime.
Is WhichModel Better Than Building Your Own?
Roll your own? Sure, scrape prices, host a DB. But updates? Every four hours manual? Nah. New models drop—GPT-5 whispers, Llama 4 rumors—you’re chasing ghosts.
WhichModel tracks it all. No infra tax. For solos or startups, it’s a no-brainer. Enterprises? Audit the projections, then deploy.
Real-world math: 50K monthly calls, average $0.002 waste per call from suboptimal picks. That’s $100/month saved minimum. Scales to fortunes.
We’ve seen API gateways standardize REST; this standardizes LLM orchestration. Ignore it, watch competitors lap you on margins.
How Does This Stack Up in the Wild?
Competitors? LiteLLM routes, but needs your pricing DB. OpenRouter proxies, charges markup. WhichModel? Free core, live intel.
Historical parallel: Think Akamai for web vs. free Cloudflare edges. WhichModel’s the edge—democratizes pro routing.
Dev feedback loops confirm: one team swapped static Claude for dynamic, cut 40% costs overnight. No quality dip.
🧬 Related Insights
- Read more: Copilot Types Fast. Your Dev Process? Glacial.
- Read more: OpenClaw’s Memory Nightmare — Fixed by Memoria in Under a Minute
Frequently Asked Questions
What is a model router for AI agents?
It dynamically picks the best LLM per task—balancing cost, speed, quality—using live data, so your agent doesn’t burn cash on overkill models.
How do I integrate WhichModel into my codebase?
Grab the 20-line TypeScript snippet, connect via MCP to https://whichmodel.dev/mcp. Call pickModel(task, complexity, budget?) before each LLM invoke.
Does WhichModel require an API key or cost money?
Nope—MIT free, no keys. Just npm the SDK and go.