Runcap, I built a local cost cap for coding agents
Runcap is a free, local CLI tool that estimates and caps the cost of AI coding agent runs. It provides cost estimation before execution, enforces a hard spending limit, compresses tokens, and offers rescue prompts when agents get stuck. Unlike existing observability tools that track costs after the fact, Runcap acts as a circuit breaker to prevent overspending.
Notifications You must be signed in to change notification settings
Fork 0
Star 3
BranchesTags
Open more actions menu
Folders and files
NameName
Last commit message
Last commit date
Latest commit
History
27 Commits
27 Commits
.github/workflows
.github/workflows
bin
bin
docs
docs
examples/broken-ts-app
examples/broken-ts-app
scripts
scripts
src
src
.env.example
.env.example
.gitignore
.gitignore
LICENSE
LICENSE
Open Dashboard.command
Open Dashboard.command
PRODUCT.md
PRODUCT.md
README.md
README.md
Run Agent.command
Run Agent.command
package.json
package.json
Repository files navigation
Know what your coding agent will cost before you build it, and set a hard ceiling so it never surprises you.
Runcap estimates the cost of an agent run as a range, enforces a hard spend ceiling that physically stops the run, and when the agent gets stuck it hands you the exact rescue prompt. Free, MIT, 100% local. Your code and tokens never touch a server.
Every other tool here is a rear-view mirror - it shows you the bill after you paid it. Runcap estimates the bill before you start and caps it. It is a circuit breaker, not a dashboard.
Why
Multi-agent coding runs burn roughly 15x more tokens than a single chat (Anthropic engineering). Agents loop on the same error, rewrite plans, and hand you a confident summary while the task is not actually done. You find out what it cost when the invoice - or the subscription limit - arrives.
Observability tools (Langfuse, Helicone, LangSmith, AgentOps) measure the past. Gateways (LiteLLM, Portkey, OpenRouter) route the present. None of them stop the spend before it happens. Runcap does the one thing the rear-view mirror can't:
estimate before build → cap during run → compress every call → rescue when stuck
The honest claim
Runcap does not promise an exact cost oracle. Agent trajectories are stochastic - nobody, including the model labs, can predict the exact token count of a run. So Runcap gives you a range plus a hard cap:
"This build is roughly $3-7. Cap it at $10." - then it kills the run the second it hits the ceiling.
The range is the headline. The hard cap is the product.
Who this is for
Runcap is a developer tool. It works by running a local gateway that your agent's API calls pass through, so it can price and cap them before they reach the paid provider. That means you need three things already in place:
Your own provider API key (OpenAI or Anthropic). Runcap does not sell or supply model access.
Your own agent - Claude Code, Codex, or any script that calls the OpenAI/Anthropic API.
Comfort running a CLI and a local process on your machine.
If you have those, Runcap caps your spend in one command. If you are looking for a no-account web app that runs the AI for you, this is not that - it is a circuit breaker for a setup you already own.
60-second demo
No API key required.
git clone https://github.com/kirder24-code/ai-agent-manager.git cd ai-agent-manager npm run setup npm run demo
- Catch a too-broad request before it spends anything:
$ runcap preflight -- claude "build the full mobile app with auth payments and production deploy"
Preflight: claude build the full mobile app with auth payments and production deploy Scope risk: high Fuel: 24% (medium confidence) Recommendation: Do not launch as one broad mission. Split into one vertical slice with a verification command.
- Wrap a run - and get a rescue prompt the moment it gets stuck:
$ runcap run --label demo -- npm run build
Error [ERR_MODULE_NOT_FOUND]: Cannot find package '@/components' ...
Runcap mission: 20260601T221531-demo-ff42c0a Status: stuck (medium confidence) Exit code: 1 Changed files: 0 Parsed errors: 1 Primary recommendation: Resolve missing import before continuing feature work
The rescue report hands back a copyable prompt:
Do not continue broad implementation. Diagnose this missing module first: Cannot find package '@/components'. Check package.json, tsconfig paths, and the latest git diff. Make the smallest change that resolves the import, then run the failing command again.
Install
npm install -g runcap # exposes runcap (and aim as a legacy alias)
Or run from source with node ./bin/runcap.mjs .
Core commands
runcap plan --fuel 24 -- "build a small auth feature and verify it" # range + recommended cap, before you spend runcap preflight -- claude "build a full SaaS app" # is this prompt too broad? runcap run --label fix -- claude "fix one failing check. stop if blocked." # wrap any agent/command runcap report # human-readable rescue report runcap export # evidence JSON with truth labels runcap dashboard # local cockpit at :8791 runcap gateway # cost-tracking proxy with hard budget cap runcap fuel set 24 # calibrate a %-only subscription
The hard cap (gateway)
Point any OpenAI- or Anthropic-compatible tool at the local gateway. It records real token usage, prices it from a sourced table, and blocks calls the moment your daily ceiling is hit.
OpenAI-compatible agents
OPENAI_API_KEY=sk-... AIM_DAILY_BUDGET_USD=5 runcap gateway
then: OPENAI_BASE_URL=http://127.0.0.1:8792/v1
Anthropic-native (Claude Code, /v1/messages)
ANTHROPIC_API_KEY=sk-ant-... AIM_DAILY_BUDGET_USD=5 runcap gateway
then: ANTHROPIC_BASE_URL=http://127.0.0.1:8792/v1
When spend crosses the ceiling, the next call returns 429 budget_guard instead of money leaving your account. Try it with no key: runcap gateway --mock.
Token compression (built in, no extra deps)
Every request that passes through the gateway is compressed before it's forwarded: embedded JSON is re-serialized compactly, long log/stack-trace dumps are collapsed to head + tail, and trailing whitespace is squeezed. This is lossless by construction - your prose instructions and code semantics are never altered, only machine "garbage" is trimmed. It's pure Node with zero ML or native dependencies, so it installs everywhere without the build pain heavier compressors have.
The dashboard shows the result as one number: "You saved $X · N tokens compressed · would have spent $Y." Disable it with AIM_COMPRESS=off if you ever want raw passthrough.
Pricing table
Costs are calculated from a sourced multi-provider table - Anthropic (Opus / Sonnet / Haiku) and OpenAI (GPT-5 family + legacy GPT-4), with cache-read and batch discounts handled - labeled with source and verification date. When a model is unknown, Runcap says unknown_price rather than guessing.
Trust model
Runcap is built not to fake certainty. Every important output carries a truth label:
observed - git diff, exit code, file changes, terminal output;
calculated - parsed errors, diff hashes, stuck score, cost from the sourced price table;
provider_usage - token usage returned by the upstream provider;
manual_calibration - subscription % you entered before/after a run;
unknown - Runcap cannot honestly know.
If it cannot prove something, it says so.
Pricing (the product, not the tokens)
Tier Price What you get
OSS (MIT, local) $0 forever All local runs, cost estimation, hard cap, run wrapping, stuck detection, rescue prompts, local dashboard. Never crippleware.
Founding Pro (limited) $49 once Lifetime Pro at the founder price - pay once, keep Pro forever, before it moves to $19/mo.
Pro $19/mo Cloud sync across machines, hosted dashboard, estimate-vs-actual trends, shareable reports, alerts on cap breach
Team $49/seat/mo Shared budget pools, org-wide ceilings, per-project rollups, role-based caps
The local core is free forever. Only persistence, collaboration, and aggregation are paid - the things that only matter once data leaves your laptop.
Current stage
A working local tool, not a hosted SaaS. Ready for: wrapping real Codex / Claude / Cursor sessions, catching stuck agents, and proving rescue prompts save time. Not yet: a hosted cloud platform or a universal observability standard. It is not trying to replace Langfuse or LiteLLM - it does the thing they don't.
Documentation
Product status
Quickstart
Roadmap
Business plan
Integrations
Trust model
The thesis: AI agents need managers.
About
Free local CLI that estimates, hard-caps, and compresses the cost of AI coding agents. MIT, 100% local.
launchsoloai.com/runcap
Topics
openai
developer-tools
cost-control
ai-agents
claude
cost-optimization
agent-management
llm
llm-observability
llm-cost
developer-tools-ai-agent
token-budget
cost-optimization-cloud-devops
llm-cost-optimization
cost-control-management-system
Resources
Readme
License
MIT license
Uh oh!
There was an error while loading. Please reload this page.
Activity
Stars
3 stars
Watchers
0 watching
Forks
0 forks
Report repository
Releases 1
v0.2.1 - launch-ready
Latest
Jun 5, 2026
Packages 0
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
JavaScript 99.0%
Shell 1.0%