Never Hit Claude Usage Limits Ever Again

A Max 5 subscriber hit their quota in 19 minutes. Expected: 5 hours.
A Pro user reported their limit maxed every Monday and didn't reset until Saturday - "out of 30 days I get to use Claude 12."
Anthropic acknowledged the issue. Part of it was a peak-hours quota change. Part of it was a prompt-caching bug that silently inflated costs by 10–20x. But here's the uncomfortable truth: most of it was them.
I know because I tracked myself.
The Data Nobody Wants to See
For 90 days I logged every Claude Code session: timestamp, prompt, response, token count, model, exit reason.
430 hours. 6 million input tokens. $1,340 in API spend.
Then I categorized every token:
27% productive. 73% overhead.
Not bad prompts. Not wrong models. Overhead - invisible until you instrument it.
Every pattern below has a number behind it, a root cause, and a fix you can do in 30 seconds.
The 9 Patterns
1. CLAUDE.md Bloat - 14% of total tokens
My CLAUDE.md grew to 4,800 tokens over 6 months. Every turn loaded all 4,800. Every session. Most rules were never relevant to the task at hand.
A 5,000-token CLAUDE.md costs you 5,000 tokens before you type a single word. Multiply by 200 turns per week: 1 million tokens a week from your CLAUDE.md alone.
The fix:
# Check your current size
wc -w ~/.claude/CLAUDE.md
wc -w .claude/CLAUDE.md # project-level
# Target: combined under 1,200 words (~1,500 tokens)If you're over, refactor ruthlessly:
- Move framework rules to project-level CLAUDE.md (only loads in that project)
- Extract repeated patterns into skills (loaded only when invoked)
- Delete anything you can't remember writing
- Convert "explain why" paragraphs into 3-word imperatives
I cut mine from 4,800 to 900 tokens. Same behavior. 31% reduction in baseline cost, instantly.
2. Conversation History Re-reads - 13% of total tokens
Every follow-up message re-tokenizes the entire conversation history. By message 30 in a chat, each turn is paying for messages 1–29 to be re-read. At ~500 tokens per exchange, message 30 costs 30× message 1.
I had sessions with 60+ messages. The last message was costing 60× the first.
The fix:
Three rules, in order:
- Edit the prior message instead of follow-up. Up-arrow → edit → re-send. The bad exchange gets replaced, not stacked.
- Hard cap at 20 messages. When you cross 20, ask Claude to summarize what's been done, start a fresh chat with that summary as the first message.
- Use
/compactinstead of/clearwhen you need continuity./compactsummarizes and restarts./clearnukes everything.
I went from 60-message sessions to 15-message average. 40% drop in re-read cost.
3. Hook Injection Waste - 11% of total tokens
I had 4 plugins installed. Three of them registered UserPromptSubmit hooks that injected context - branch names, recent file changes, memory snippets. Combined: 6,200 tokens injected on every prompt, before Claude even read my question.
Each hook is designed to be helpful. Together they're a wall.
The fix:
# See exactly what's being injected
cat ~/.claude/settings.json | jq '.hooks.UserPromptSubmit'
cat .claude/settings.json | jq '.hooks.UserPromptSubmit'
# Disable hooks you can't justify
/plugin disable <plugin-name>Audit every hook. If you can't articulate why it fires on every prompt, kill it. I went from 4 active UserPromptSubmit hooks to 1 (just git branch). 5,800 tokens saved per prompt.
4. Cache Misses on Session Resume - 10% of total tokens
Anthropic's prompt cache has a 5-minute lifetime by default. Take a coffee break. Come back. First message pays full price for ~8K tokens of content that was cached 6 minutes ago.
I tracked ~640 cache misses over 90 days. Each one re-tokenized 8,000+ tokens at full price instead of 10% (cache read price).
The fix:
- Fast workaround: Bind a "ping" command to a hotkey. Send any prompt within 4 minutes before a break. Keeps the cache warm.
- Real fix (paid plans): Upgrade to 1-hour cache lifetime. Cache write tokens are 2× base price (one-time), cache reads are 0.1×. If you have 10+ resumes per session, it pays for itself.
I run the 1-hour cache now. Cache miss cost dropped 80%.
5. Skill Loading on Irrelevant Tasks - 7% of total tokens
I had 9 skills installed. Each one auto-invokes when Claude detects relevance - and the detection is conservative: when in doubt, load. My UI design skill was loading on backend tasks. My video generation skill was loading on text-only tasks.
Each skill SKILL.md is 800–2,500 tokens. Loaded "just in case." 9 skills × ~1,500 tokens = 13,500 tokens of overhead per task that didn't need any of them.
The fix:
# See which skills were actually invoked
grep -h "skill_invoked" ~/.claude/logs/*.log | sort | uniq -c | sort -rn
# Disable everything not in the output
/plugin disable <skill-name>Run a 7-day audit. Disable skills you didn't actually use. I went from 11 active skills to 4. 9,000+ tokens saved per task.
6. "Just in Case" Tool Definitions - 6% of total tokens
I had 12 MCP servers connected. Each one ships its tool schema to every request, regardless of whether the task involves that tool. PostgreSQL MCP tool schema alone: ~1,200 tokens.
12 MCPs × ~600 avg tokens = 7,200 tokens per request. I used maybe 3 of these in 80% of my work.
The fix:
/mcp # list connected servers
/mcp disable <server> # disable for current sessionFor permanent control, edit ~/.claude/settings.json to remove unused MCPs from the auto-load list. Re-enable per-session when you actually need them. I went from 12 to 3 always-on MCPs. 6,000 tokens saved per request.
7. Extended Thinking on Simple Questions - 5% of total tokens
I left extended thinking on globally. Claude burned 3,000+ tokens of <thinking> on questions like "rename this variable to camelCase."
Extended thinking is real value for architecture decisions and complex debugging. On simple tasks it's pure overhead.
The fix: Default extended thinking OFF. Toggle on per-message when you genuinely need it (Alt+T in Claude Code). About 80% of tasks don't need it. The 20% that do - you'll know.
8. Wrong-Direction Generation - 4% of total tokens
Claude starts writing a 400-line response. Within the first 50 lines you can see it's going the wrong way. Most people let it finish, then re-prompt. The remaining 350 lines are wasted output tokens.
Output tokens are billed. Letting Claude finish a wrong response cost me roughly 4% of my total spend. Stupid tax.
The fix: Cmd+. (Mac) / Ctrl+. stops generation immediately. Claude keeps what it wrote. You redirect from there.
Train yourself to use this in the first 5 seconds of any response that's drifting. For Claude Code terminal users: double-Esc opens a checkpoint scroller - rewind to any prior state and try a different approach without re-running everything.
9. Plugin Auto-Update Redundancy - 3% of total tokens
Every installed plugin fires a SessionStart hook. Several add a small "loaded successfully" notification - ~50 tokens each. Nine plugins × multiple messages = ~1,400 tokens at every session start, just for confirmation noise you never read.
The fix:
# Audit SessionStart hooks
cat ~/.claude/settings.json | jq '.hooks.SessionStart'Kill anything that's just a status notification. It's the smallest item on this list, but it fires before you type a single word - every session, forever.
Before vs. After
Not 100% - there's an unavoidable floor of overhead. But 65% is a different category of life than 27%.
What Didn't Work
Things the obvious advice recommends, that didn't move the needle:
Switching to Haiku for "simple" tasks - helped ~3%. The real waste isn't the model, it's context bloat. A cheaper model on a bloated context still costs more than an expensive model on a lean context.
Aggressive /clear between every task - counter-productive. Lost context I genuinely needed. Better: long-running session per project with lean CLAUDE.md and instrumented hooks.
Disabling all skills entirely - initially saved tokens, then I started writing the same 200-token instructions manually in every prompt. Net negative. Keep 3–4 skills you actually use, disable the rest.
Off-peak scheduling - worked partially. Anthropic's peak-hour quota reduction is real (~7% of users affected). For most people, the bigger lever is the patterns above, not the time of day.
Subscription downgrade - doesn't work. Cost per work-hour stayed the same, limits just hit more painfully.
The Audit Script
Run this once. Fix what's flagged. Re-run weekly.
#!/bin/bash
# claude-audit.sh - run in your project root
echo "=== CLAUDE.md size ==="
wc -w ~/.claude/CLAUDE.md 2>/dev/null
wc -w .claude/CLAUDE.md 2>/dev/null
echo "Target: combined < 1,200 words"
echo
echo "=== Active hooks ==="
cat ~/.claude/settings.json 2>/dev/null | jq '.hooks // {} | keys'
cat .claude/settings.json 2>/dev/null | jq '.hooks // {} | keys'
echo
echo "=== UserPromptSubmit injections ==="
cat ~/.claude/settings.json 2>/dev/null | jq '.hooks.UserPromptSubmit'
cat .claude/settings.json 2>/dev/null | jq '.hooks.UserPromptSubmit'
echo
echo "=== Installed plugins ==="
ls ~/.claude/plugins/ 2>/dev/null
echo "Target: 3-5 active. Disable the rest."
echo
echo "=== Installed skills ==="
ls ~/.claude/skills/ 2>/dev/null
echo "Target: 3-5 active matching daily work."
echo
echo "=== Connected MCPs ==="
cat ~/.claude/settings.json 2>/dev/null | jq '.mcpServers // {} | keys'
echo "Target: 3 always-on. Per-session enable rest."The Mental Model Shift
The mistake: treating every Claude Code session as a fresh blank slate.
In reality, every session is an invoice that pre-charges you before you've done a single thing:
Productive tokens are the residual. What's left after all the overhead is paid.
If you want more productive tokens - which is what "more usage from your plan" actually means - you have to attack the overhead, not just write better prompts.
Better prompts help when overhead is small. When overhead is 73%, prompts barely matter.
Most "Claude got dumber" complaints in 2026 trace back to this. The model didn't get dumber. Your overhead grew. The model has the same compute budget, but most of it is spent re-reading your bloated CLAUDE.md, your 12 hook injections, and the 4 plugins you forgot you installed.
Attack the overhead. The rest follows.