Cut Your AI Token Usage by 65% With One Command

You open Claude Code. You ask it to fix a bug. It responds with four paragraphs.
First paragraph: what it understood. Second: what it's going to do. Third: the actual fix. Fourth: a summary of what it just did.
You read paragraph three. You ignore the rest. But you already paid for all four.
That's the problem Caveman solves.
First, what even is a token?
If you've never thought about this: AI models don't read words, they read tokens. A token is roughly 3-4 characters. The word "button" is one token. "backgroundColor" is two. A typical response from Claude is 200-500 tokens.
Why does this matter? Because:
- Claude has a context limit - once you hit it, the session resets
- Claude Pro and Max have usage limits - once you hit them, you wait
- Claude API costs money per token - input and output
The more an AI talks, the faster you burn through all three.
Out of 85 tokens in that exchange, only 20 were useful. You paid for the rest.
Caveman
Caveman is a plugin for AI coding agents - Claude Code, Codex, Gemini CLI - that forces the model to communicate like a cave person. No preamble. No recap. No filler. Just the answer.
The philosophy, straight from the project:
"Why use many token when few do trick"
It doesn't make the AI dumber. It doesn't change how the model thinks. It only changes how the model talks. The reasoning happens the same way - you just stop paying for words that add nothing.
Before and after
This is a real example from the Caveman docs. Same bug, same fix, same model.
Same fix. Same correctness. Far less to read.
Installing it
One command. That's it.
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash
# Windows (PowerShell)
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex
Takes about 30 seconds. Requires Node 18 or newer.
The installer detects every supported agent and sets up Caveman for all of them. If you only want it in one place:
claude plugin marketplace add JuliusBrussee/caveman
claude plugin install caveman@caveman
# Gemini CLI only
gemini extensions install https://github.com/JuliusBrussee/caveman
# Cursor / Windsurf / Cline / Copilot
npx skills add JuliusBrussee/caveman -a cursor
# replace cursor with: windsurf, cline, github-copilot
After that, open Claude Code and type /caveman. Done. To go back to normal, type "normal mode".
The commands
/cavecrew is useful for investigation tasks - "where is X defined", "what calls Y". Instead of Claude reading and reporting back in full paragraphs, it spawns a subagent that returns a compressed table of file:line results. Same information, fraction of the context.
/caveman-compress is interesting because it targets the other side of the equation. Your CLAUDE.md and memory files are read at the start of every session - that's input tokens. Compressing them once saves tokens on every future session, permanently.
Real numbers
Measured across 10 tasks:
The bigger the explanation, the more you save. Simple one-liners see less benefit. Long debugging sessions see the most.
One thing to understand
Caveman only affects output tokens - what the AI writes back to you.
It does not affect thinking tokens. When Claude works through a problem internally, that process is unchanged. You're not making the AI less capable. You're just cutting the part where it restates everything in four different ways before giving you the answer.
The project puts it better: "Caveman no make brain smaller."
One more thing worth knowing: if you're running Claude with extended thinking enabled, Caveman's impact shrinks. Thinking tokens can represent 80-90% of total token usage - and Caveman doesn't touch those. In a thinking-heavy session, the actual bill reduction is closer to 5-15%. The savings are real, but concentrated in conversational and explanation-heavy workflows. Debug sessions, code reviews, architecture questions - that's where it earns its keep.
One install, every tool
The installer detects what you have and sets up Caveman everywhere at once. Run the single curl command and it automatically wires into Claude Code, Cursor, Windsurf, and Gemini CLI - whatever is installed on your machine.
installed:
• claude
• claude-hooks
• caveman-shrink
• gemini
• cursor
• windsurf
You don't pick which tool gets it. They all get it. If you switch between Cursor during the day and Claude Code at night, both are running lean. You activate once, forget about it.
Who this is for
- You use Claude Code, Cursor, Windsurf, or Gemini CLI regularly
- You've hit usage limits and wondered where all the tokens went
- You're paying API costs and want them lower
- You find long AI responses annoying
If any of those are true, the install takes 30 seconds and the first /caveman is free. You'll know immediately whether it fits how you work.
Source: github.com/JuliusBrussee/caveman