2026-06-05 17:27 UTCIn-site rewrite4 min readUpdated: 2026-06-30 13:03 UTC

Runcap, I built a local cost cap for coding agents

Runcap is a free, local CLI tool that estimates and caps the cost of AI coding agent runs. It provides cost estimation before execution, enforces a hard spending limit, compresses tokens, and offers rescue prompts when agents get stuck. Unlike existing observability tools that track costs after the fact, Runcap acts as a circuit breaker to prevent overspending.

SourceHacker News AIAuthor: kirillAIsolo

Notifications You must be signed in to change notification settings

Fork 0

Star 3

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

27 Commits

.github/workflows

bin

docs

examples/broken-ts-app

scripts

src

.env.example

.gitignore

LICENSE

Open Dashboard.command

PRODUCT.md

README.md

Run Agent.command

package.json

Repository files navigation

Know what your coding agent will cost before you build it, and set a hard ceiling so it never surprises you.

Runcap estimates the cost of an agent run as a range, enforces a hard spend ceiling that physically stops the run, and when the agent gets stuck it hands you the exact rescue prompt. Free, MIT, 100% local. Your code and tokens never touch a server.

Every other tool here is a rear-view mirror - it shows you the bill after you paid it. Runcap estimates the bill before you start and caps it. It is a circuit breaker, not a dashboard.

Why

Multi-agent coding runs burn roughly 15x more tokens than a single chat (Anthropic engineering). Agents loop on the same error, rewrite plans, and hand you a confident summary while the task is not actually done. You find out what it cost when the invoice - or the subscription limit - arrives.

Observability tools (Langfuse, Helicone, LangSmith, AgentOps) measure the past. Gateways (LiteLLM, Portkey, OpenRouter) route the present. None of them stop the spend before it happens. Runcap does the one thing the rear-view mirror can't:

estimate before build → cap during run → compress every call → rescue when stuck

The honest claim

Runcap does not promise an exact cost oracle. Agent trajectories are stochastic - nobody, including the model labs, can predict the exact token count of a run. So Runcap gives you a range plus a hard cap:

"This build is roughly $3-7. Cap it at $10." - then it kills the run the second it hits the ceiling.

The range is the headline. The hard cap is the product.

Who this is for

Runcap is a developer tool. It works by running a local gateway that your agent's API calls pass through, so it can price and cap them before they reach the paid provider. That means you need three things already in place:

Your own provider API key (OpenAI or Anthropic). Runcap does not sell or supply model access.

Your own agent - Claude Code, Codex, or any script that calls the OpenAI/Anthropic API.

Comfort running a CLI and a local process on your machine.

If you have those, Runcap caps your spend in one command. If you are looking for a no-account web app that runs the AI for you, this is not that - it is a circuit breaker for a setup you already own.

60-second demo

No API key required.

git clone https://github.com/kirder24-code/ai-agent-manager.git cd ai-agent-manager npm run setup npm run demo

Catch a too-broad request before it spends anything:

$ runcap preflight -- claude "build the full mobile app with auth payments and production deploy"

Preflight: claude build the full mobile app with auth payments and production deploy Scope risk: high Fuel: 24% (medium confidence) Recommendation: Do not launch as one broad mission. Split into one vertical slice with a verification command.

Wrap a run - and get a rescue prompt the moment it gets stuck:

$ runcap run --label demo -- npm run build

Error [ERR_MODULE_NOT_FOUND]: Cannot find package '@/components' ...

Runcap mission: 20260601T221531-demo-ff42c0a Status: stuck (medium confidence) Exit code: 1 Changed files: 0 Parsed errors: 1 Primary recommendation: Resolve missing import before continuing feature work

The rescue report hands back a copyable prompt:

Do not continue broad implementation. Diagnose this missing module first: Cannot find package '@/components'. Check package.json, tsconfig paths, and the latest git diff. Make the smallest change that resolves the import, then run the failing command again.

Install

npm install -g runcap # exposes runcap (and aim as a legacy alias)

Or run from source with node ./bin/runcap.mjs .

Core commands

runcap plan --fuel 24 -- "build a small auth feature and verify it" # range + recommended cap, before you spend runcap preflight -- claude "build a full SaaS app" # is this prompt too broad? runcap run --label fix -- claude "fix one failing check. stop if blocked." # wrap any agent/command runcap report # human-readable rescue report runcap export # evidence JSON with truth labels runcap dashboard # local cockpit at :8791 runcap gateway # cost-tracking proxy with hard budget cap runcap fuel set 24 # calibrate a %-only subscription

The hard cap (gateway)

Point any OpenAI- or Anthropic-compatible tool at the local gateway. It records real token usage, prices it from a sourced table, and blocks calls the moment your daily ceiling is hit.

OpenAI-compatible agents

OPENAI_API_KEY=sk-... AIM_DAILY_BUDGET_USD=5 runcap gateway

then: OPENAI_BASE_URL=http://127.0.0.1:8792/v1

Anthropic-native (Claude Code, /v1/messages)

ANTHROPIC_API_KEY=sk-ant-... AIM_DAILY_BUDGET_USD=5 runcap gateway

then: ANTHROPIC_BASE_URL=http://127.0.0.1:8792/v1

When spend crosses the ceiling, the next call returns 429 budget_guard instead of money leaving your account. Try it with no key: runcap gateway --mock.

Token compression (built in, no extra deps)

Every request that passes through the gateway is compressed before it's forwarded: embedded JSON is re-serialized compactly, long log/stack-trace dumps are collapsed to head + tail, and trailing whitespace is squeezed. This is lossless by construction - your prose instructions and code semantics are never altered, only machine "garbage" is trimmed. It's pure Node with zero ML or native dependencies, so it installs everywhere without the build pain heavier compressors have.

The dashboard shows the result as one number: "You saved $X · N tokens compressed · would have spent $Y." Disable it with AIM_COMPRESS=off if you ever want raw passthrough.

Pricing table

Costs are calculated from a sourced multi-provider table - Anthropic (Opus / Sonnet / Haiku) and OpenAI (GPT-5 family + legacy GPT-4), with cache-read and batch discounts handled - labeled with source and verification date. When a model is unknown, Runcap says unknown_price rather than guessing.

Trust model

Runcap is built not to fake certainty. Every important output carries a truth label:

observed - git diff, exit code, file changes, terminal output;

calculated - parsed errors, diff hashes, stuck score, cost from the sourced price table;

provider_usage - token usage returned by the upstream provider;

manual_calibration - subscription % you entered before/after a run;

unknown - Runcap cannot honestly know.

If it cannot prove something, it says so.

Pricing (the product, not the tokens)

Tier Price What you get

OSS (MIT, local) $0 forever All local runs, cost estimation, hard cap, run wrapping, stuck detection, rescue prompts, local dashboard. Never crippleware.

Founding Pro (limited) $49 once Lifetime Pro at the founder price - pay once, keep Pro forever, before it moves to $19/mo.

Pro $19/mo Cloud sync across machines, hosted dashboard, estimate-vs-actual trends, shareable reports, alerts on cap breach

Team $49/seat/mo Shared budget pools, org-wide ceilings, per-project rollups, role-based caps

The local core is free forever. Only persistence, collaboration, and aggregation are paid - the things that only matter once data leaves your laptop.

Current stage

A working local tool, not a hosted SaaS. Ready for: wrapping real Codex / Claude / Cursor sessions, catching stuck agents, and proving rescue prompts save time. Not yet: a hosted cloud platform or a universal observability standard. It is not trying to replace Langfuse or LiteLLM - it does the thing they don't.

Documentation

Product status

Quickstart

Roadmap

Business plan

Integrations

Trust model

The thesis: AI agents need managers.

About

Free local CLI that estimates, hard-caps, and compresses the cost of AI coding agents. MIT, 100% local.

launchsoloai.com/runcap

Topics

openai

developer-tools

cost-control

ai-agents

claude

cost-optimization

agent-management

llm

llm-observability

llm-cost

developer-tools-ai-agent

token-budget

cost-optimization-cloud-devops

llm-cost-optimization

cost-control-management-system

Resources

Readme

License

MIT license

Uh oh!

There was an error while loading. Please reload this page.

Activity

Stars

3 stars

Watchers

0 watching

Forks

0 forks

Report repository

Releases 1

v0.2.1 - launch-ready

Latest

Jun 5, 2026

Packages 0

Uh oh!

There was an error while loading. Please reload this page.

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

JavaScript 99.0%

Shell 1.0%