Enterprise · For companies

LLM cost & throughput.

What will that LLM feature actually cost to run — and will it keep up? Put in your traffic and token sizes to see monthly inference cost, latency and throughput, and whether a hosted API or self-hosting wins at your scale.

Your workload
Requests / day
/ day
Avg input tokens1,500
1008k
Avg output tokens400
504k
Model tier
GPT-4o / Claude-class
Peak load×4 avg
steadyspiky
Token prices and GPU throughput are illustrative and move fast — they live in one config block in llm-cost.js. Self-host assumes an open model of comparable capability.

Estimated inference cost — hosted API

/ mo

 

Tokens / month
Cost / 1k requests
Est. p95 latency
GPUs to self-host

Monthly API cost by model tier

Hosted API vs self-host

 

Where the cost sits

We'll cut this bill for you.

Caching, routing to the right model, prompt and token diet, and self-host where it pays — without losing quality.

Optimise our stack
More build tools

Explore the rest of the suite.