2026-06-28 04:49 UTCIn-site rewrite4 min readUpdated: 2026-06-28 05:20 UTC

Cerberus – a local firewall for AI agents' tool calls

Cerberus is a local-first security gateway that intercepts and risk-scores tool calls from AI coding agents like Claude Code, Codex, Cursor, and Cline, preventing secret exfiltration, dangerous commands, and prompt injection attacks. It runs entirely on the user's machine with no external API.

SourceHacker News AIAuthor: cerberussec

Article intelligence

EngineersIntermediate

Key points

Cerberus acts as a runtime security gateway between AI agents and the system, intercepting every tool call.
It uses four deterministic signals (policy, behavioral, content, injection) to compute a risk score and decide to allow, audit, ask for human approval, or block.
Key protections include secret exfiltration detection, excessive permissions gating, dangerous egress control, and prompt injection detection.
It is agent-agnostic, local-first, and offers a terminal-first approval workflow with a forensic dashboard.

Why it matters

This matters because cerberus acts as a runtime security gateway between AI agents and the system, intercepting every tool call.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

This panel is AI-generated and reviewed for accuracy.

Notifications You must be signed in to change notification settings

Fork 0

Star 1

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

32 Commits

.github/workflows

bin

brainstorms

dashboard

examples

packages/injection-model

rules

scripts

spike

src

.gitignore

LICENSE

PLAN.md

README.md

SECURITY.md

THIRD_PARTY_NOTICES.md

cerberus-demo.gif

package-lock.json

package.json

tsconfig.json

Repository files navigation

A local-first security gateway for autonomous AI coding agents. Cerberus sits between the agent (Claude Code, Codex, Cursor, Cline) and your machine, intercepts every tool call before it runs, risk-scores it across four signals, and either allows, audits, asks for human approval, or blocks it — all on your machine, with no external API and nothing leaving the box.

The problem

Autonomous coding agents run shell commands, edit files, and make network calls on your behalf — at machine speed, often unattended. One bad step (rm -rf, an unwanted git push, a leaked .env, a poisoned README that tricks the agent into exfiltrating secrets) and there's no human in the loop to stop it. Cerberus puts that checkpoint on the tool boundary, where the agent actually acts.

What it does

PreToolUse ─▶ intercept ─▶ Policy + Behavioral + Content + Injection ─▶ Risk Engine ─▶ ALLOW · AUDIT · HITL · BLOCK PostToolUse ─▶ inspect ─▶ secret + injection detection ─▶ session contamination state

Four deterministic signals aggregated into one weighted risk score, with a hard floor that absolute prohibitions can never override.

What it protects against

🟢 Secret exfiltration — detects secrets loaded into context, then content-matches the outbound payload: holds the call that actually carries the key (raw or base64/hex/url-encoded), with provenance (source: .env:4 · sha256:… · 97%) and never logging the secret itself.

🟢 Excessive permissions — every call gated; unknown tools fail-closed; sensitive paths (~/.ssh, ~/.aws, credentials, /etc/passwd) held; destructive commands (rm -rf, Remove-Item -Recurse, chmod 777, kill -9) blocked or held.

🟢 Dangerous egress — destination policy: trusted hosts (registries, GitHub, OpenAI/Anthropic) auto-allowed; paste sites / webhook catchers / raw-IP destinations held.

🟡 Tool abuse — runaway-loop and tool-call-rate/repetition detection.

🟡 Prompt injection — detects injection in tool results and gates the next egress (heuristic classifier; optional local DeBERTa model). It sees tool calls, not the LLM prompt — so it catches the exploitation of an injection (the egress), not the injection itself.

Key features

Terminal-first approval — held calls surface in the agent's native permission prompt (Claude Code / Cursor), or via cerberus approve / a localhost dashboard.

Forensic dashboard — per-session timeline, risk-factor breakdown, and a Replay player that steps through how a session's risk built up.

Multi-agent — one adapter layer serves Claude Code, Codex, Cursor, and Cline.

Policy as data — rules and risk weights are editable YAML, not code.

Local-first — binds to 127.0.0.1, no external API, no telemetry; secret values never touch disk or logs.

Quickstart

npm i -g @cerberussec/core # or run ad-hoc with: npx @cerberussec/core

wire Cerberus into your agent (merges into the agent's config — backed up, idempotent):

cerberus init # Claude Code, project-level (--agent codex|cursor|cline, --global, --print)

start the gateway + dashboard (one process):

cerberus engine # then open http://127.0.0.1:9000/

Use your agent as usual — tool calls now route through Cerberus. By default a held (HITL) call is approved right in the terminal: Cerberus returns ask, so Claude Code shows its native permission prompt with Cerberus's reason — approve/deny without leaving your session.

The dashboard (http://127.0.0.1:9000/) has a Live tab (Action Center + stream) and a Sessions tab — a forensic timeline per session with a risk-factor breakdown and a Replay player to step through how a session's risk built up.

Terminal-first approvals

Cerberus runs inside the agent's execution loop, so the terminal is the realtime decision point and the dashboard is the deep dive. Per severity (default AG_APPROVAL_SURFACE=terminal):

verdict terminal web UI

BLOCK ⛔ denied in-terminal (Claude shows the reason) + optional auto-open forensics

HITL ✋ Claude's native permission prompt, with Cerberus's reason forensics

AUDIT — (quiet) elevated-risk record

ALLOW — (silent) —

Prefer a central web queue instead? Set AG_APPROVAL_SURFACE=dashboard — held calls then pause on the engine's synchronous hold and you Approve/Deny from the dashboard (or the terminal, out-of-band):

cerberus pending # list calls held for review (with their ids) cerberus approve # release a held call … cerberus deny # … or deny it

Extra terminal alerts write to the controlling terminal (/dev/tty, falling back to stderr) so the protocol channel to Claude Code stays clean. Tune via env:

env default effect

AG_NOTIFY 1 extra terminal alert lines on/off (0 to silence)

AG_APPROVAL_SURFACE terminal terminal ⇒ HITL via Claude's native prompt; dashboard ⇒ socket hold + dashboard approve

AG_AUTO_OPEN off block ⇒ auto-open the investigation UI on a BLOCK/EXFIL

Agents

The engine + signals + risk + dashboard are agent-agnostic; only a thin adapter (parse the agent's hook event → normalize → emit its verdict shape) is per-agent. Wire one with cerberus init --agent :

agent --agent HITL approval notes

Claude Code claude (default) native terminal prompt (ask) verified end-to-end

Codex CLI codex dashboard hold (no native ask) — AG_APPROVAL_SURFACE=dashboard enterprise requirements.toml makes it non-bypassable

Cursor cursor native IDE prompt (ask) init sets failClosed: true

Cline cline dashboard hold (cancel bool) macOS/Linux only

codex/cursor/cline adapters follow the published hook specs; verify against your installed version (cerberus init --agent --print shows the exact config). Roo Code is unsupported (archived 2026).

How it plugs in

PreToolUse hook → /intercept is the single hard enforcement point (allow/deny/ask; or HITL holds the socket open until you decide).

PostToolUse hook → /inspect is observe-only: it updates the session's contamination state so the next action is judged with full context. It never modifies a tool result.

The engine is agent-agnostic at its core; per-agent adapters (--agent) are the only thing that differs.

Architecture

PreToolUse ─▶ /intercept ─▶ Policy + Behavioral + Content/Injection ─▶ RiskEngine ─▶ ALLOW/AUDIT/HITL/BLOCK PostToolUse ─▶ /inspect ─▶ secret detection + injection classifier ─▶ session contamination state (audit log + WebSocket → dashboard)

Single Node + TypeScript package; the dashboard is a Vite/React app served by the engine. Rules and risk weights are editable YAML data, not code (rules/).

What it is — and isn't

Cerberus is a runtime gateway on the tool boundary. It's strongest at secret-exfiltration prevention and as a permission chokepoint. Because it sees tool calls (not the LLM prompt), it catches the exploitation of a prompt injection — not the injection itself — and it does not cover data-pipeline / RAG poisoning. The exfil match is high-confidence but not airtight (novel secret formats, split-across-calls encoding). Honest defaults over false guarantees.

Local-first & licensing

No external API, no API key, nothing leaves the machine. The optional injection model (@cerberussec/injection-model, ProtectAI DeBERTa, Apache-2.0) upgrades the built-in heuristic classifier; install it only if you want it. The core is OSS-clean (Apache/MIT-compatible deps); Meta Prompt-Guard is deliberately kept out of core (Llama license).

Development

from a clone: install (root + dashboard are separate npm projects) and build

npm install && npm --prefix dashboard install npm run build # compile the engine (tsc → dist) + dashboard (vite → dashboard/dist)

npm run engine # run from source via tsx (dev) npm run typecheck npm run test:behavioral && npm run test:content && npm run test:injection && npm run test:risk \ && npm run test:init && npm run test:projector && npm run test:audit && npm run test:notify \ && npm run test:security && npm run test:policy && npm run test:adapters npm run e2e:behavioral && npm run e2e:content && npm run e2e:injection && npm run e2e:risk

See PLAN.md for milestones and brainstorms/ for the design records behind each decision.

About

🐺 Cerberus — local-first security gateway for AI coding agents. Intercept, risk-score & human-approve every tool call (Claude Code, Codex, Cursor, Cline).

Topics

security

mcp

dlp

ai-agents

ai-security

guardrails

llm

prompt-injection

claude-code

Resources

Readme

License

Apache-2.0 license

Security policy

Uh oh!

There was an error while loading. Please reload this page.

Activity

Stars

1 star

Watchers

0 watching

Forks

0 forks

Report repository

Releases 1

Cerberus v0.1.0 🐺

Latest

Jun 12, 2026

Packages 0

Uh oh!

There was an error while loading. Please reload this page.

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

TypeScript 84.4%

JavaScript 15.5%

Other 0.1%