· 5 min read

How I reduced token burn in agentic workflows

A few basic tips to decrease token usage and optimize your workflow in agentic coding

With the current rapid advancements of agentic coding I believe that true productivity will come from AI enhancing our software engineering workflow and taking over many tasks at which agents are way more efficient than humans. I’m still however convinced we remain the orchestrator. An apt analogy I read recently was in an article of Ben Gregory, who calls it an “exoskeleton” 1. He writes:

Don’t ask “can AI do a developer’s job?” Ask “what are the 47 things a developer does in a given week, and which of those can be amplified?”1

It is an extension of our capabilities. Agents are way more efficient at some tasks, like pattern recognition, while they are inefficient at other tasks, like needing to gather context the entire time. Gathering this context requires sending many “tokens”2 back-and-forth, which is the unit at which the size of messages is measured. When you use something like Claude Code, you have a certain amount of tokens available to you. When the limit is reached you’ll have to wait for it to reset (or simply pay more). This does not only consume tokens, but also takes up valuable time for the agent to go through that context. However, if you set everything up properly you can make your workflow more efficient. Here are a few tips I use:

  1. Create context files, like CLAUDE.md. These serve as context and rulesets for your agents and give a quick introduction to the agent to point it in the right direction. Every time you launch a new session, this context is injected and serves as guidelines for the agent. This saves it from having to go through the entire repo every time to gather context or try things you don’t want to have it do. I have a general CLAUDE.md that has a few rules I prefer for all my repos, and then I have a few of them per repo, e.g. one at the root level and one in the /infra​ folder. This document should contain a reference to the structure of your project, your workflow, and the associated scripts in your package.json so the agents immediately knows what to run to validate its changes. E.g. I prefer running the development server myself, so I always have an entry for the agent to never run any dev​ or serve​ commands. I also want the agent to suggest changes if it thinks they could be relevant. This can potentially introduce errors but that is a trade-off I’m willing to take to keep it up-to-date.

Examples from my CLAUDE.md:

- This is a living document. Claude should proactively suggest edits and improvements as preferences become clearer across projects, and update this file when new patterns or rules emerge.
- Avoid over-engineering. Only make changes that are directly requested or clearly necessary.
- Never auto-commit. Only commit when explicitly asked.
- Never run `serve`, `dev`, `start`, or any local server command — the user always has their own server running and will verify changes themselves.
  1. Automate everything with scripts. Create a scripts​ folder in your repo and have the LLM write scripts for repetitive tasks. For example I have a script that scaffolds a new article for me. I need to input the title, date etc. and the file is generated. Or another script that takes care of testing, building, and deploying. The flow is identical every time, meaning I don’t need to waste any tokens to do that. Make sure to add an entry to your instructions file mentioned in the previous point so the agent can quickly find it.

  2. Clear your context if you start working on something new. Your entire chat history is taken along with the next prompt and the context fills up. Even though caching is applied to this context3, there is still a cost associated with writing and reading from the cache. So when you start working on something that doesn’t require the context, start fresh! In Claude just use /context to clear it.

  3. Choose the right model for your task. When I’m planning to create a new complex feature, I switch to Opus​ and enter plan mode. That gets the entire context of the repo and is well suited for complex reasoning. If the plan looks solid and is well written, I switch back to Sonnet​ and start the implementation. In my experience this works great and in general I don’t get significantly better results by using Opus​ in the implementation.

  4. Use a LLM chat like Claude.ai for brainstorming and research, use Claude Code for planning and implementation. Brainstorming often doesn’t require the full context of your repository. You could manage with just summarizing your stack and go from there. When I want to start something new I always search online and use chat​ to research my conceptual implementation before I go into plan mode within Claude Code. That way I have a better grasp of the implementation I’ll be working on and will be able to give Claude Code better directions, which is then more effective and uses fewer tokens.

These are the main improvements to my workflow that made it more efficient and consume fewer tokens. Let me know if this helped your workflow in any way!

Footnotes

  1. Stop Thinking of AI as a Coworker. It’s an Exoskeleton. - Ben Gregory 2

  2. Claude Glossary

  3. Claude prompt caching