Model Configuration
LLM providers, cost per token, latency benchmarks, and routing preferences.
gpt-4o
OpenAI
128K
Context
620ms
Avg latency
$0.005
Input /1K
$0.015
Output /1K
Fleet usage42%
claude-3-5-sonnet
Anthropic
200K
Context
580ms
Avg latency
$0.003
Input /1K
$0.015
Output /1K
Fleet usage31%
gpt-4o-mini
OpenAI
128K
Context
340ms
Avg latency
$0.00015
Input /1K
$0.0006
Output /1K
Fleet usage18%
o3-mini
OpenAI
128K
Context
1.2s
Avg latency
$0.0011
Input /1K
$0.0044
Output /1K
Fleet usage6%
gemini-2.0-flash
1M
Context
280ms
Avg latency
$0.0001
Input /1K
$0.0004
Output /1K
Fleet usage3%
claude-3-haiku
Anthropic
200K
Context
220ms
Avg latency
$0.00025
Input /1K
$0.00125
Output /1K
Fleet usage0%