Skip to content

Provider System

Caboose talks to LLM providers through a unified Provider trait. Every backend — Anthropic, OpenAI, Gemini, OpenRouter — implements the same interface, making it straightforward to swap models at runtime.

Every provider implements a common Provider trait with methods for streaming chat completions, identifying the provider and model, and listing available models. The stream method returns an async stream of StreamEvent values:

VariantPurpose
TextDeltaIncremental text token from the model
ThinkingDeltaChain-of-thought content (extended thinking)
ToolCallStructured tool invocation request
DoneStream completed with usage metadata
ErrorProvider-side error

Caboose does not depend on any vendor SDK. Each provider builds HTTP requests directly and parses server-sent event streams by hand. This approach has several advantages:

  • No SDK version churn. Provider SDKs update frequently and sometimes introduce breaking changes. Raw HTTP is stable.
  • Full control over streaming. Caboose can handle partial SSE frames, reconnect logic, and timeout behavior exactly as needed.
  • Smaller binary. Avoiding SDKs keeps the dependency tree lean.

RetryProvider wraps any Provider with exponential backoff. When a request fails, the error is classified into one of several categories:

  • Auth — invalid or expired API key. No retry.
  • Rate limit — 429 response. Retry with backoff respecting Retry-After.
  • Context length — prompt too large. No retry; triggers compaction upstream.
  • Server — 5xx responses. Retry with backoff.
  • Network — connection failures, timeouts. Retry with backoff.

This classification lets the agent loop react appropriately — for example, a context-length error triggers conversation compaction rather than a blind retry.

For providers that support it (Anthropic), Caboose marks the system prompt and early conversation turns with cache-control headers. This reduces latency and cost on multi-turn conversations where the prefix remains stable.

Caboose maintains an internal registry of known models with their context window sizes, capability flags, and per-token pricing. This data is used to calculate cost estimates per turn and per session. The catalog is compiled into the binary and updated with each release, but users can also specify arbitrary model identifiers for providers that support them.