Provider System
Caboose talks to LLM providers through a unified Provider trait. Every backend
— Anthropic, OpenAI, Gemini, OpenRouter — implements the same interface, making
it straightforward to swap models at runtime.
The Provider Trait
Section titled “The Provider Trait”Every provider implements a common Provider trait with methods for streaming
chat completions, identifying the provider and model, and listing available
models. The stream method returns an async stream of StreamEvent values:
| Variant | Purpose |
|---|---|
TextDelta | Incremental text token from the model |
ThinkingDelta | Chain-of-thought content (extended thinking) |
ToolCall | Structured tool invocation request |
Done | Stream completed with usage metadata |
Error | Provider-side error |
Raw HTTP and SSE
Section titled “Raw HTTP and SSE”Caboose does not depend on any vendor SDK. Each provider builds HTTP requests directly and parses server-sent event streams by hand. This approach has several advantages:
- No SDK version churn. Provider SDKs update frequently and sometimes introduce breaking changes. Raw HTTP is stable.
- Full control over streaming. Caboose can handle partial SSE frames, reconnect logic, and timeout behavior exactly as needed.
- Smaller binary. Avoiding SDKs keeps the dependency tree lean.
Retry and Error Classification
Section titled “Retry and Error Classification”RetryProvider wraps any Provider with exponential backoff. When a request
fails, the error is classified into one of several categories:
- Auth — invalid or expired API key. No retry.
- Rate limit — 429 response. Retry with backoff respecting
Retry-After. - Context length — prompt too large. No retry; triggers compaction upstream.
- Server — 5xx responses. Retry with backoff.
- Network — connection failures, timeouts. Retry with backoff.
This classification lets the agent loop react appropriately — for example, a context-length error triggers conversation compaction rather than a blind retry.
Prompt Caching
Section titled “Prompt Caching”For providers that support it (Anthropic), Caboose marks the system prompt and early conversation turns with cache-control headers. This reduces latency and cost on multi-turn conversations where the prefix remains stable.
Model Catalog and Pricing
Section titled “Model Catalog and Pricing”Caboose maintains an internal registry of known models with their context window sizes, capability flags, and per-token pricing. This data is used to calculate cost estimates per turn and per session. The catalog is compiled into the binary and updated with each release, but users can also specify arbitrary model identifiers for providers that support them.