Model Configuration

LLM providers, cost per token, latency benchmarks, and routing preferences.

gpt-4o

OpenAI

active

128K

Context

620ms

Avg latency

$0.005

Input /1K

$0.015

Output /1K

Fleet usage42%

claude-3-5-sonnet

Anthropic

active

200K

Context

580ms

Avg latency

$0.003

Input /1K

$0.015

Output /1K

Fleet usage31%

gpt-4o-mini

OpenAI

active

128K

Context

340ms

Avg latency

$0.00015

Input /1K

$0.0006

Output /1K

Fleet usage18%

o3-mini

OpenAI

beta

128K

Context

1.2s

Avg latency

$0.0011

Input /1K

$0.0044

Output /1K

Fleet usage6%

gemini-2.0-flash

Google

active

Context

280ms

Avg latency

$0.0001

Input /1K

$0.0004

Output /1K

Fleet usage3%

claude-3-haiku

Anthropic

deprecated

200K

Context

220ms

Avg latency

$0.00025

Input /1K

$0.00125

Output /1K

Fleet usage0%