May 18, 20266 min readBy Anggaclaude-code / ai / productivity

How I cut my Claude token bill by 90%

Two tiny tools and one habit. RTK trims bash output before it reaches the model. Caveman trims the model's replies before they reach me. Setup took ten minutes. The first session after barely registered on the bill.

A few weeks ago I started getting the email.

Not the angry one. The polite one. "You've used 80% of your Claude credits this billing cycle." I'd been working on three side projects with Claude Code, churning through long sessions, and I'd assumed that was just what it cost now. Two hundred bucks a month, like a gym membership for vibes-driven development.

Then a friend asked me a question at a meetup. "Have you tried RTK and Caveman yet?" He laughed when I said no. "Dude. You're paying for AI fan fiction."

He showed me his last session's numbers. About 25k tokens for a piece of work that, on my setup, would have eaten 200k. Same kind of task. Same model. Two small tools. I went home and installed both that night.

This is what I learned.

Where tokens actually go

The first thing you notice when you actually look at a Claude transcript is that the model rarely writes that much code. The code is usually short. The cost is in everything around the code.

Pull up any long session and check where the tokens live. Two places, every time.

Input side. Every bash command Claude runs pipes its output back into the conversation. git status returns 200 lines when you needed 5. npm install paints a screen of dependency tree even when nothing changed. A failing test dumps the entire stack trace plus the build output plus the lint warnings. By the time the model has read all of that, you've spent thousands of tokens before it writes a single character of reply.

Output side. The model talks. A lot. "Great question! Let me think about this." A paragraph explaining what it's about to do. The fix itself, maybe three lines. A paragraph explaining what it just did. A summary section. A "let me know if you have any questions" closer. The fix was tiny. The wrapper around the fix was 800 words.

If you only ever touched one side, you'd cut your bill in half. Touch both, and the savings multiply.

Tool one: RTK

RTK stands for Rust Token Killer. It's a small CLI proxy that sits between you and the commands you run. A Claude Code hook intercepts every shell call, runs it through RTK, and RTK strips the noise before the output ever reaches the model.

You install it once:

brew install rtk
rtk --version

That's it. The hook handles everything else. You keep typing the same commands you've always typed. git status, npm test, find . -name "*.tsx". Claude still gets the answer it needs. It just gets a smaller, sharper version.

Behind the scenes RTK has rules for most of the commands you actually use: git, npm, yarn, find, grep, ls, build output, lint runs. For a typical git log that would have been 12kb of context, RTK trims it down to the lines that actually matter. For an npm install that would have dumped the dependency tree, RTK reports "installed N packages in Xs."

There's a rtk gain command that shows you lifetime savings, and a rtk gain --history that breaks it down by command so you can see which ones are doing the most work. The first time you run it after a week of use is genuinely shocking. Mine showed 70% reduction on git, 85% on npm, 90% on find. Compounded across hundreds of bash calls, that's most of a session's bill.

For the rare case where you need the raw output, there's an escape hatch: rtk proxy <cmd> skips the filter. I use it maybe once a month.

Tool two: Caveman

RTK trims input. Caveman trims output.

Caveman is a skill that lives in your Claude config. When it's active, Claude drops the filler from replies. Same answers, far fewer words. Tech accuracy is preserved by design. It just stops opening every reply with "Great question!" and stops closing every reply with a polite recap of what it just did.

There are intensity levels. lite is mild compression and reads almost normal. full is the default and runs about 75% smaller than baseline. ultra is aggressive and reads telegraphic, which is fine for solo work but blunt enough to confuse a teammate reading over your shoulder.

To turn it on for a single session: /caveman in the chat. To turn it on forever, add a section to ~/.claude/CLAUDE.md:

# !!! CAVEMAN MODE: ALWAYS ON !!!
 
User wants caveman compressed speech for every chat reply, every
project, every directory, every session. No exception.
 
- Short sentences. Skip filler. Keep tech accuracy.
- Applies to chat replies, status updates, summaries.
- Does NOT apply to code, code comments, blog drafts, commit
  messages, any file written to disk.

That last line is the one that took me a few tries to get right. You want the model to speak in caveman to you, but you don't want it writing caveman code comments or terse commit messages, because then the artifacts read weird to other humans who never opted in. Keep caveman for chat, full English for anything that ends up on disk.

The compound effect

This is the part the friend laughed at me about.

Cutting input by 80% and output by 75% doesn't add. It multiplies. The model reads less, so it needs fewer tokens to respond, so it writes less, so its reply is cheaper, and the smaller reply doesn't bloat the context window for the next turn either. The savings cascade through the whole conversation.

A 200k-token session, run with both tools on, drops to somewhere in the 20-30k range. Same code. Same problem solved. Roughly one-tenth the bill.

I genuinely don't think there's another single configuration change in any tool I use that has this much leverage.

Gotchas

RTK is opinionated. The biggest place I've stubbed my toe: it rejects compound find predicates like -o and -exec. The fix is to use a simpler form or wrap with rtk proxy find. There's a learning curve of about an afternoon. After that you mostly forget RTK exists.

Caveman keeps tech accuracy but does read blunt. If you share screenshots of your Claude chats with non-technical teammates, switch to lite or off for those sessions. The compressed style is great for solo deep work, fine for engineering teammates, but reads like a Slack DM from someone who's mad at you if your audience isn't expecting it.

Neither tool removes the escape hatches you'd want. RTK has rtk proxy <cmd>. Caveman has /caveman off. You're never locked into the filtered view; you just default to it.

What I do now

I run both tools by default on every machine I use Claude on. Setup is two commands and one config file. The first session after installation is a small "huh" moment, where you notice the replies are tighter and the bash output is concise, but nothing feels lost. The second session is when you start to wonder why every AI tool doesn't ship like this.

A week later, when rtk gain reports its first real number, you'll get a small dopamine hit. A month later you'll realize the credit warning emails stopped coming.

If you use Claude every day, this is the single cheapest performance upgrade you can ship today. Same outputs, smaller bill. Set it once, save forever.

Tonight's a good night for it.