Providers
Caboose supports 15+ LLM providers out of the box. All providers use raw HTTP with server-sent events (SSE) for streaming — no SDK dependencies. This keeps the binary small and behavior consistent across providers.
Cloud Providers
Section titled “Cloud Providers”| Provider | Notes |
|---|---|
| Anthropic | Primary provider. Extended thinking support. Prompt caching on system prompts and tools. |
| OpenAI | Full tool use. Reasoning models (o1, o3, o4) supported. |
| Gemini | Fast inference, large context. Gemini 2.5 models support thinking. |
| OpenRouter | Aggregator with access to hundreds of models across providers. |
| xAI | Grok models. |
| Together AI | Open-source models at scale. |
| Fireworks AI | Fast inference for open-source models. |
| Cerebras | Ultra-fast inference on Cerebras hardware. |
| SambaNova | High-throughput inference. |
| Perplexity | Search-augmented models. |
| Cohere | Command models with strong RAG capabilities. |
| Qwen | Alibaba’s DashScope models. |
| DeepSeek | Cost-effective for bulk tasks. Reasoning content supported. |
| Groq | Ultra-low latency. |
| Mistral | EU-hosted option. |
Each provider entry in the model catalog includes the context window size, tool support status, and pricing information.
Local Providers
Section titled “Local Providers”Caboose auto-discovers local inference servers on startup by probing common ports. If a server is found, it appears in the model picker immediately.
| Provider | Default Port | Notes |
|---|---|---|
| Ollama | 11434 | ollama serve — auto-discovered |
| LM Studio | 1234 | Enable local server in LM Studio settings |
| llama.cpp | 8080 | llama-server --port 8080 |
| Custom | any | Any OpenAI-compatible server |
Connecting a local server
Section titled “Connecting a local server”Ollama, LM Studio, llama.cpp, and a custom server option are pinned at the top of the model picker. Select one to open the connect dialog, enter the address, and Caboose fetches available models from /v1/models. The session remembers manually connected servers.
You can also configure local providers directly in your config:
[[local_providers]]name = "ollama"base_url = "http://localhost:11434"default_model = "llama3.2"
[[local_providers]]name = "lmstudio"base_url = "http://localhost:1234"Switching Models
Section titled “Switching Models”Switch the active model at any time during a session:
- Type
/modeland select from the list - Press
Ctrl+Mto open the model picker directly
The model picker groups providers with collapsible ▼/▶ headers. The active provider is expanded by default; others are collapsed. Press Enter on a header to expand or collapse it. OpenRouter and any configured third-party providers are always shown regardless of which provider is active.
The change takes effect on the next message. Your selection is saved to config automatically.
Connecting a Provider
Section titled “Connecting a Provider”/connect anthropic/connect openaiKeys are stored locally and never transmitted anywhere except the provider’s own API endpoint. You can also set keys via environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, etc.).
Provider Resolution Order
Section titled “Provider Resolution Order”When Caboose starts, it resolves which provider to use in this order:
- CLI flag —
--provider anthropicor--model claude-sonnet-4 - Per-provider config — set in your project or global configuration
- Global default — the
default_providerfield in your config - Fallback —
"anthropic"if nothing else is configured
Thinking / Reasoning
Section titled “Thinking / Reasoning”Models that support extended thinking can be configured with /reasoning or toggled with Ctrl+T. Four levels are available: Off, Low, Medium, High. Provider-native mapping:
- Anthropic —
budget_tokensparameter - OpenAI —
reasoning_effort(o1/o3/o4 models) - Gemini —
thinkingConfig(gemini-2.5 models) - OpenRouter / DeepSeek —
reasoning_contentfield in streaming responses
Thinking blocks stream in real time and are rendered as collapsible sections in chat. The thinking toggle is hidden for models that don’t support it.
Retry Logic
Section titled “Retry Logic”All providers are wrapped in automatic exponential backoff on transient failures (rate limits, network errors, 5xx responses). Retry attempts are logged to the session.
Prompt Caching
Section titled “Prompt Caching”Caboose uses provider-native prompt caching where available:
- Anthropic —
cache_controlbreakpoints on system prompts and tool definitions - OpenAI — automatic prefix caching
- Other providers — used when supported
Cache hit/miss statistics appear in the token usage display after each response.