Skip to content

Providers

Caboose supports 15+ LLM providers out of the box. All providers use raw HTTP with server-sent events (SSE) for streaming — no SDK dependencies. This keeps the binary small and behavior consistent across providers.

ProviderNotes
AnthropicPrimary provider. Extended thinking support. Prompt caching on system prompts and tools.
OpenAIFull tool use. Reasoning models (o1, o3, o4) supported.
GeminiFast inference, large context. Gemini 2.5 models support thinking.
OpenRouterAggregator with access to hundreds of models across providers.
xAIGrok models.
Together AIOpen-source models at scale.
Fireworks AIFast inference for open-source models.
CerebrasUltra-fast inference on Cerebras hardware.
SambaNovaHigh-throughput inference.
PerplexitySearch-augmented models.
CohereCommand models with strong RAG capabilities.
QwenAlibaba’s DashScope models.
DeepSeekCost-effective for bulk tasks. Reasoning content supported.
GroqUltra-low latency.
MistralEU-hosted option.

Each provider entry in the model catalog includes the context window size, tool support status, and pricing information.

Caboose auto-discovers local inference servers on startup by probing common ports. If a server is found, it appears in the model picker immediately.

ProviderDefault PortNotes
Ollama11434ollama serve — auto-discovered
LM Studio1234Enable local server in LM Studio settings
llama.cpp8080llama-server --port 8080
CustomanyAny OpenAI-compatible server

Ollama, LM Studio, llama.cpp, and a custom server option are pinned at the top of the model picker. Select one to open the connect dialog, enter the address, and Caboose fetches available models from /v1/models. The session remembers manually connected servers.

You can also configure local providers directly in your config:

[[local_providers]]
name = "ollama"
base_url = "http://localhost:11434"
default_model = "llama3.2"
[[local_providers]]
name = "lmstudio"
base_url = "http://localhost:1234"

Switch the active model at any time during a session:

  • Type /model and select from the list
  • Press Ctrl+M to open the model picker directly

The model picker groups providers with collapsible / headers. The active provider is expanded by default; others are collapsed. Press Enter on a header to expand or collapse it. OpenRouter and any configured third-party providers are always shown regardless of which provider is active.

The change takes effect on the next message. Your selection is saved to config automatically.

/connect anthropic
/connect openai

Keys are stored locally and never transmitted anywhere except the provider’s own API endpoint. You can also set keys via environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, etc.).

When Caboose starts, it resolves which provider to use in this order:

  1. CLI flag--provider anthropic or --model claude-sonnet-4
  2. Per-provider config — set in your project or global configuration
  3. Global default — the default_provider field in your config
  4. Fallback"anthropic" if nothing else is configured

Models that support extended thinking can be configured with /reasoning or toggled with Ctrl+T. Four levels are available: Off, Low, Medium, High. Provider-native mapping:

  • Anthropicbudget_tokens parameter
  • OpenAIreasoning_effort (o1/o3/o4 models)
  • GeminithinkingConfig (gemini-2.5 models)
  • OpenRouter / DeepSeekreasoning_content field in streaming responses

Thinking blocks stream in real time and are rendered as collapsible sections in chat. The thinking toggle is hidden for models that don’t support it.

All providers are wrapped in automatic exponential backoff on transient failures (rate limits, network errors, 5xx responses). Retry attempts are logged to the session.

Caboose uses provider-native prompt caching where available:

  • Anthropiccache_control breakpoints on system prompts and tool definitions
  • OpenAI — automatic prefix caching
  • Other providers — used when supported

Cache hit/miss statistics appear in the token usage display after each response.