Cut Burn by 70-97%

Stop paying for context you don't need. These battle-tested strategies will instantly slash your API bills without sacrificing agent performance.

1. Session Resets

Agents continuously append outputs to their context window. At 100k tokens, a single "yes" reply costs dollars.

Tactic: Clear old sessions or use a /clear command when switching sub-tasks.

2. Prompt Caching

Anthropic supports caching system prompts and large files. If your agent constantly reads a 50-page codebase, cache it.

Result: Users report a 90% drop in input token costs when reading large codebases using Claude 3.5 Sonnet.

3. Smart Model Routing

Don't use Opus for formatting JSON. Use a waterfall approach based on task complexity.

  • Haiku: Fast parsing, summaries, easy formatting
  • Sonnet: Coding, complex reasoning (Sweet Spot)
  • Opus: Heavy strategic planning
Configure via OpenRouter →

4. Strict Formatting

Agents waste tokens being polite ("Sure, I'd be happy to help with that! Here is the code...").

Tactic: Add "Respond ONLY with code, no conversational filler." to your system prompt. Use temperature 0.2.

Real Community Example

"By enforcing a 30k token auto-reset and switching my mundane tasks to Haiku via Anthropic's API, I went from burning $15/day to $1.20/day. The agent actually feels faster now."
Get Anthropic API Credits