LLM Token Cost Calculator

Calculate and compare LLM API token costs. Enter your workload and the current price per one million tokens from the provider’s pricing page or your contract. The calculator separates input, cached input, and output tokens and can apply a batch discount.

Workload

Prompt, system instructions, context, and tool results sent to the model.
Tokens generated by the model, including billed reasoning tokens where applicable.
Use zero if caching is unavailable or the workload does not achieve cache hits.
No currency conversion is performed. Enter rates in the selected currency.

Scenario A

Scenario B

Scenario A

$0
Per request
$0
Per day
$0
Per year
$0
Monthly input / output
$0 / $0

Scenario B

$0
Per request
$0
Per day
$0
Per year
$0
Monthly input / output
$0 / $0
Price freshness: This calculator does not hardcode model presets. Copy current prices from the official provider page or use the rates in your contract. Providers may use different rules for context length, cached tokens, reasoning tokens, tool calls, images, audio, batch processing, and regional billing.

How token cost is calculated

Component Formula
Monthly requests Requests per day × active days
Uncached input cost Input tokens × uncached share × requests ÷ 1,000,000 × input price
Cached input cost Input tokens × cached share × requests ÷ 1,000,000 × cached input price
Output cost Output tokens × requests ÷ 1,000,000 × output price
Total (Input cost + cached input cost + output cost) × (1 − discount)

Official pricing sources

What counts as an input token?

Input tokens can include the user prompt, system instructions, conversation history, retrieved documents, tool results, and structured data sent with the request. Agentic workflows often use far more input tokens than a single chat because each step can add context and tool output.

Why output tokens often dominate cost

Many providers charge more for output than input. A workload with short prompts and long generated documents can therefore cost more than a retrieval workflow with substantial input context but concise answers. Measure real request logs instead of estimating only from prompt length.

Cached input and batch discounts

Prompt caching can reduce the price of repeated input prefixes when a provider supports it and the request meets its cache rules. A 50% cache-eligible share does not guarantee a 50% cache hit rate. Use observed billing data where possible.

Batch APIs may offer a discount in exchange for asynchronous processing or longer completion windows. Enter the documented discount only when the workload is eligible and operationally suitable for batch processing.

Costs not included automatically

  • Search, grounding, web browsing, and tool-call fees
  • Vector database, embedding, reranking, and storage costs
  • Image, audio, and video input or generation
  • Infrastructure, observability, retries, and failed requests
  • Taxes, exchange rates, minimum commitments, and enterprise discounts

Estimate the full business case

Token cost is only one line item. Combine it with implementation and labor assumptions in the free Automation ROI Calculator.

Open Automation ROI Calculator

Related resources

Frequently asked questions

How many tokens are in a word?

There is no universal conversion. Tokenization varies by model, language, formatting, code, and punctuation. Use the provider’s tokenizer or actual API usage for accurate planning.

Are reasoning tokens billed?

Billing rules vary by model and provider. Some models include or bill internal reasoning-related tokens differently. Check the current model documentation and usage response.

Does this calculator include tool calls and web search?

No. Add separate provider fees for search, tools, images, audio, storage, or other services to the calculated token cost.

Why are no model prices preselected?

API prices and model names change frequently. Manual entry keeps the calculation accurate for current public pricing, regional terms, or negotiated enterprise rates.