Safety Model

Caboose runs arbitrary code on behalf of an LLM, so safety is enforced at multiple layers. No single mechanism is sufficient on its own — the defenses are designed to overlap.

Permission Modes

Every session runs in one of three permission modes, which gate tool execution:

Mode	Behavior
Plan	Read-only. The agent can inspect files and search the codebase but cannot write files or execute shell commands. Useful for exploration and planning.
Create	The agent can read and write files, but every destructive action (shell commands, file writes) requires explicit user approval via the TUI approval dialog.
Chug	Full autonomy. All tool invocations are auto-approved. Intended for trusted, well-scoped tasks.

Users can switch modes mid-session. The mode is displayed in the TUI status bar so it is always visible.

Command Policy

The shell tool passes every command through a command policy check before execution. This system maintains allow and deny lists and performs shell-segment analysis to catch dangerous patterns:

Deny list — commands that are never permitted regardless of permission mode (e.g., rm -rf /, mkfs, dd targeting block devices).
Allow list — common safe commands that skip the approval prompt even in Create mode (e.g., ls, cat, grep, git status).
Segment analysis — the command string is parsed into segments so that pipes, subshells, and command substitution cannot smuggle denied commands past the policy check.

Environment Filtering

Before any shell command executes, Caboose strips sensitive environment variables from the child process environment. API keys, tokens, and credentials that are present in the parent process are not leaked to commands the agent runs. The filter matches against known variable name patterns (e.g., *_API_KEY, *_SECRET, *_TOKEN).

Output Caps

Tool output is capped at 2,000 lines or 50 KB, whichever is hit first. This prevents a runaway command from flooding the context window (and burning tokens). Truncated output includes a notice so the agent knows the result was clipped.

Session Budgets

Each session can have a configurable maximum cost. Caboose tracks token usage and estimated cost per turn using the pricing data from the model catalog. When the budget is exceeded, the agent loop halts and notifies the user rather than silently continuing to spend.