Cut Your AI Token Usage by 65% With One Command

You open Claude Code. You ask it to fix a bug. It responds with four paragraphs.

First paragraph: what it understood. Second: what it's going to do. Third: the actual fix. Fourth: a summary of what it just did.

You read paragraph three. You ignore the rest. But you already paid for all four.

That's the problem Caveman solves.

First, what even is a token?

If you've never thought about this: AI models don't read words, they read tokens. A token is roughly 3-4 characters. The word "button" is one token. "backgroundColor" is two. A typical response from Claude is 200-500 tokens.

Why does this matter? Because:

Claude has a context limit - once you hit it, the session resets
Claude Pro and Max have usage limits - once you hit them, you wait
Claude API costs money per token - input and output

The more an AI talks, the faster you burn through all three.

Where tokens actually go in a typical session

Preamble ("I understand what you're asking...")~30 tokens

The actual answer~20 tokens

Recap ("In summary, I've fixed the bug by...")~25 tokens

Offers to help further~10 tokens

Out of 85 tokens in that exchange, only 20 were useful. You paid for the rest.

Caveman

Caveman is a plugin for AI coding agents - Claude Code, Codex, Gemini CLI - that forces the model to communicate like a cave person. No preamble. No recap. No filler. Just the answer.

The philosophy, straight from the project:

"Why use many token when few do trick"

It doesn't make the AI dumber. It doesn't change how the model thinks. It only changes how the model talks. The reasoning happens the same way - you just stop paying for words that add nothing.

Before and after

This is a real example from the Caveman docs. Same bug, same fix, same model.

Without Caveman - 69 tokens

"The React component is re-rendering because a new object reference is being created on every render. I'd recommend wrapping the object in a useMemo hook to stabilize the reference and prevent unnecessary re-renders."

/caveman (full) - 19 tokens

"Inline object prop = new ref = re-render. Wrap with useMemo."

/caveman ultra - 12 tokens

"Inline obj prop → new ref → re-render. useMemo."

Same fix. Same correctness. Far less to read.

Installing it

One command. That's it.

terminal

# macOS / Linux / WSL
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash

# Windows (PowerShell)
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex

Takes about 30 seconds. Requires Node 18 or newer.

The installer detects every supported agent and sets up Caveman for all of them. If you only want it in one place:

selective install

# Claude Code only
claude plugin marketplace add JuliusBrussee/caveman
claude plugin install caveman@caveman

# Gemini CLI only
gemini extensions install https://github.com/JuliusBrussee/caveman

# Cursor / Windsurf / Cline / Copilot
npx skills add JuliusBrussee/caveman -a cursor
# replace cursor with: windsurf, cline, github-copilot

After that, open Claude Code and type /caveman. Done. To go back to normal, type "normal mode".

The commands

Available commands

/caveman Activate default cave mode -65% output

/caveman lite Only removes filler words, keeps full sentences -30% output

/caveman ultra Telegraphic - almost no grammar -80% output

/caveman-commit Generates conventional commit messages under 50 chars shorter commits

/caveman-compress Shrinks your CLAUDE.md and memory files -46% input

/caveman-stats Shows how many tokens you've saved this session insight

/cavecrew Spawns a compressed subagent for a subtask — result comes back ~60% smaller than inline -60% context

/cavecrew is useful for investigation tasks - "where is X defined", "what calls Y". Instead of Claude reading and reporting back in full paragraphs, it spawns a subagent that returns a compressed table of file:line results. Same information, fraction of the context.

/caveman-compress is interesting because it targets the other side of the equation. Your CLAUDE.md and memory files are read at the start of every session - that's input tokens. Compressing them once saves tokens on every future session, permanently.

Real numbers

Measured across 10 tasks:

Task Before After Saved

Explain React re-render bug 69 tok 19 tok -73%

Fix auth middleware 112 tok 19 tok -83%

Debug PostgreSQL pool 94 tok 15 tok -84%

PR security review 180 tok 106 tok -41%

Average across all tasks - - -65%

The bigger the explanation, the more you save. Simple one-liners see less benefit. Long debugging sessions see the most.

One thing to understand

Caveman only affects output tokens - what the AI writes back to you.

It does not affect thinking tokens. When Claude works through a problem internally, that process is unchanged. You're not making the AI less capable. You're just cutting the part where it restates everything in four different ways before giving you the answer.

The project puts it better: "Caveman no make brain smaller."

One more thing worth knowing: if you're running Claude with extended thinking enabled, Caveman's impact shrinks. Thinking tokens can represent 80-90% of total token usage - and Caveman doesn't touch those. In a thinking-heavy session, the actual bill reduction is closer to 5-15%. The savings are real, but concentrated in conversational and explanation-heavy workflows. Debug sessions, code reviews, architecture questions - that's where it earns its keep.

One install, every tool

The installer detects what you have and sets up Caveman everywhere at once. Run the single curl command and it automatically wires into Claude Code, Cursor, Windsurf, and Gemini CLI - whatever is installed on your machine.

installed:
  • claude
  • claude-hooks
  • caveman-shrink
  • gemini
  • cursor
  • windsurf

You don't pick which tool gets it. They all get it. If you switch between Cursor during the day and Claude Code at night, both are running lean. You activate once, forget about it.

Who this is for

You use Claude Code, Cursor, Windsurf, or Gemini CLI regularly
You've hit usage limits and wondered where all the tokens went
You're paying API costs and want them lower
You find long AI responses annoying

If any of those are true, the install takes 30 seconds and the first /caveman is free. You'll know immediately whether it fits how you work.

Source: github.com/JuliusBrussee/caveman