Provider System

Caboose talks to LLM providers through a unified Provider trait. Every backend — Anthropic, OpenAI, Gemini, OpenRouter — implements the same interface, making it straightforward to swap models at runtime.

The Provider Trait

Every provider implements a common Provider trait with methods for streaming chat completions, identifying the provider and model, and listing available models. The stream method returns an async stream of StreamEvent values:

Variant	Purpose
`TextDelta`	Incremental text token from the model
`ThinkingDelta`	Chain-of-thought content (extended thinking)
`ToolCall`	Structured tool invocation request
`Done`	Stream completed with usage metadata
`Error`	Provider-side error

Raw HTTP and SSE

Caboose does not depend on any vendor SDK. Each provider builds HTTP requests directly and parses server-sent event streams by hand. This approach has several advantages:

No SDK version churn. Provider SDKs update frequently and sometimes introduce breaking changes. Raw HTTP is stable.
Full control over streaming. Caboose can handle partial SSE frames, reconnect logic, and timeout behavior exactly as needed.
Smaller binary. Avoiding SDKs keeps the dependency tree lean.

Retry and Error Classification

RetryProvider wraps any Provider with exponential backoff. When a request fails, the error is classified into one of several categories:

Auth — invalid or expired API key. No retry.
Rate limit — 429 response. Retry with backoff respecting Retry-After.
Context length — prompt too large. No retry; triggers compaction upstream.
Server — 5xx responses. Retry with backoff.
Network — connection failures, timeouts. Retry with backoff.

This classification lets the agent loop react appropriately — for example, a context-length error triggers conversation compaction rather than a blind retry.

Prompt Caching

For providers that support it (Anthropic), Caboose marks the system prompt and early conversation turns with cache-control headers. This reduces latency and cost on multi-turn conversations where the prefix remains stable.

Model Catalog and Pricing

Caboose maintains an internal registry of known models with their context window sizes, capability flags, and per-token pricing. This data is used to calculate cost estimates per turn and per session. The catalog is compiled into the binary and updated with each release, but users can also specify arbitrary model identifiers for providers that support them.