Gate – deterministic PII redaction for AI agent tool output (Rust)
Gate is a deterministic PII redaction tool written in Rust for AI agents. It uses regex, column heuristics, and Luhn checks instead of LLMs. It intercepts Bash commands and MCP tool calls via hooks, supports multiple harnesses, and provides scanning, real-time redaction, and aggregated reporting while keeping data local.
Notifications You must be signed in to change notification settings
Fork 0
Star 36
BranchesTags
Open more actions menu
Folders and files
NameName
Last commit message
Last commit date
Latest commit
History
301 Commits
301 Commits
.github/workflows
.github/workflows
assets
assets
crates
crates
dev
dev
docs
docs
scripts
scripts
.gitignore
.gitignore
.tool-versions
.tool-versions
CHANGELOG.md
CHANGELOG.md
CLAUDE.md
CLAUDE.md
CONTRIBUTING.md
CONTRIBUTING.md
Cargo.lock
Cargo.lock
Cargo.toml
Cargo.toml
DISCLAIMER.md
DISCLAIMER.md
LICENSE
LICENSE
README.md
README.md
README.zh-CN.md
README.zh-CN.md
SECURITY.md
SECURITY.md
THREAT-MODEL.md
THREAT-MODEL.md
cliff.toml
cliff.toml
Repository files navigation
Most PII guardrails for AI agents are themselves LLMs — they send your data to a model to decide whether it's sensitive. Gate takes the opposite approach.
gate LLM-based redaction
Decision method Regex + column heuristics + Luhn Model inference
Deterministic ✅ Same input always produces the same output ❌ Varies by run and model version
Data stays local ✅ Never leaves your machine ❌ Sent to a model API for classification
Latency ✅ -h -d -c "SELECT TABLE_NAME, COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'public' ORDER BY TABLE_NAME, ORDINAL_POSITION" | gate scan
See docs/scan.md for queries against MySQL, MS SQL Server (including native sqlcmd), Databricks, and toolkit-managed clients.
Risk level is weighted by category sensitivity — one SSN column matters more than twenty address columns. Exits with code 1 if any PII columns are found (scriptable in CI). Pass --verbose to show all detected columns, or --json for machine-readable output.
Sensitivity Categories Risk floor
Critical Government IDs, Health & medical, Financial, Biometric HIGH always; CRITICAL if ≥3 columns or >10% of schema
Elevated Contact, Names, Date of birth, Location of birth, Family & relationships, Employment HIGH if >5% of schema; CRITICAL if >25%
Standard Address & location, Online & technical, Demographics HIGH if >25% of schema
Note: gate scan detects PII by column name only. A LOW result means your column names look clean — it does not mean the data is safe. Gate 2 additionally inspects values at query time, catching PII in free-text, JSON, and ambiguously-named columns that scan cannot see. In multi-row results, if any value in a column matches a PII pattern, the entire column is promoted and all rows are redacted — not just the matching row.
For false positives (e.g. city in a products table), run gate scan --review to triage interactively and add columns to the allowlist. Allowlisted columns skip all redaction — both name-based and value-based. Only add a column to the allowlist when you are certain it contains no PII. Low-confidence pattern matches (below confidence_threshold) are redacted and flagged with a warning in _gate_summary; add the column to column_allowlist to suppress. Manage the list directly with gate allowlist add/remove/list.
Quickstart
Install gate
Homebrew — macOS and Linux (recommended)
brew tap GaaraZhu/gate && brew install gate
cargo binstall — downloads a prebuilt binary
cargo binstall gate
Or grab a binary from the releases page
https://github.com/GaaraZhu/gate/releases
Create your config (opens ~/.config/gate/config.yaml in your editor):
gate config
Register the hook with your agent harness:
Claude Code (default)
gate init
OpenCode
gate init --harness opencode
Cursor
gate init --harness cursor
GitHub Copilot CLI (project-scoped, run from repo root)
gate init --harness copilot-cli
Codex CLI
gate init --harness codex
Gemini CLI
gate init --harness gemini
Add --scope project for project-only setup. Restart your OpenCode, Cursor, or Gemini CLI session after gate init to load the hook. For Codex CLI, restart the session, then review the hook in the Trust & Permissions UI, mark it as trusted, and enable it. For Copilot CLI, the generated .github/hooks/PreToolUse.json is gitignored by default — each developer runs gate init --harness copilot-cli once in their local clone.
(Optional) Register MCP server proxies so tools/call responses also pass through gate:
Claude Code (default) — dry-run, shows what would change
gate init --wrap-mcp
OpenCode
gate init --harness opencode --wrap-mcp --yes
Cursor
gate init --harness cursor --wrap-mcp --yes
Copilot CLI
gate init --harness copilot-cli --wrap-mcp --yes
Codex CLI
gate init --harness codex --wrap-mcp --yes
Gemini CLI
gate init --harness gemini --wrap-mcp --yes
Add --scope project for project-level MCP config. For Cursor project-scoped MCP, re-enable the servers in Settings → Tools & MCPs after registration. See docs/mcp.md for --servers, per-harness paths, and manual single-server registration.
Start your AI session — gate intercepts query commands automatically. No changes to your prompts or tools required.
Run gate validate to confirm your config is valid before the first session.
How it works
gate covers two access paths agents use to reach data. The blog post has the full walkthrough; the short version:
Bash tooling path
Every Bash command passes through gate hook first. Commands that match a configured tool are silently rewritten to gate run -- , which spawns the subprocess and pipes stdout through the two-gate detection pipeline. The rewrite happens in the harness's pre-tool-execution hook — it is enforcing in Claude Code, OpenCode, Cursor, GitHub Copilot CLI, Codex CLI, and Gemini CLI; the agent cannot bypass it. Humans and CI scripts running outside the harness are untouched.
AI asks to run: tkpsql query --sql "SELECT * FROM users" │ harness hook fires (PreToolUse / tool.execute.before) │ gate hook rewrites to: gate run -- tkpsql query --sql "..." │ ┌──────────────┴──────────────┐ │ Gate 1: SQL inspection │ SELECT * → no column hints, defer to Gate 2 │ Gate 2: Value scanning │ regex + column-name heuristics + Luhn check └──────────────┬──────────────┘ │ {"id": 1, "full_name": "[PII:name]", "email": "[PII:email]", ..., "_gate_summary": {...}}
MCP path
gate mcp is a transparent stdio proxy registered in the harness as the MCP server. It forwards all JSON-RPC traffic verbatim except tools/call responses, which pass through Gate 2 before reaching the model. No changes to the upstream server are required.
Note: only tools/call responses are redacted — resources/read, prompts/get, and other MCP message types are forwarded without inspection.
AI ──tools/call──> gate mcp ──forward──> upstream MCP server │ │ ] placeholders. A _gate_summary field is appended reporting what was redacted.
{ "rows": [{"id": 1, "email": "[PII:email]", "ssn": "[PII:ssn]"}], "count": 1, "_gate_summary": {"redacted": 2, "types": ["email", "ssn"], "warnings": []} }
With hash_values: true in config, each placeholder gains an 8-char hex suffix derived from the original value ([PII:email:7f83b165]). The same raw value always produces the same suffix, so the AI can join or deduplicate across rows without ever seeing the underlying data. Error responses from the underlying tool pass through unchanged.
Protection retrospective
_gate_summary reports a single response. gate retro aggregates across all of them — total queries seen, PII fields redacted, hit rate, plus a breakdown by tool and PII category. Useful for periodic audits and for confirming the boundary is doing real work.
If any query produced a low-confidence redaction, gate retro surfaces a Low-confidence redactions section listing each unique warned column and the exact gate allowlist add command to suppress it. Once a column is added to the allowlist it disappears from this section automatically.
Stats are collected by default and written to a local JSONL log on disk — they never leave your machine. Disable with stats.enabled: false in config.
What gate does NOT protect against
gate is a deterministic redaction layer, not a sandbox. It assumes the agent is non-adversarial and only inspects output from commands listed under tools: in config. The following are deliberately out of scope:
Adversarial agents / prompt injection. Gate's threat model is an agent that inadvertently exfiltrates PII. gate protect (Unix) blocks the most direct bypass — a hijacked agent disabling gate via config edits — by transferring config ownership to root. But a determined attacker can still route around gate by invoking commands not in tools:, requesting non-JSON output formats, piping through encoders, or removing the hook entry from the harness settings file for the next session. Pair gate with a harness-level Bash allowlist to close the residual gap.
Commands not in tools:. The AI can invoke them freely; their output is never inspected.
Non-JSON tool output. Plain text, CSV, and other formats pass through unchanged. Configure tools to emit JSON.
Encoded or obfuscated PII. Base64-encoded emails, URL-encoded values, or deliberately spaced strings (a l i c e @ e x a m p l e . c o m) are not detected.
Non-US PII by value alone. The built-in SSN regex requires dashes. AU/NZ phone numbers are caught by value — mobile (04XX/02X local, +61 4XX/+64 2X international) and landline (0[2378]/0[34679] local, +61 [2378]/+64 [34679] international) — including the common +610/+640 stray-leading-zero variant and arbitrary whitespace in the number. International-prefix numbers (+61/+64) auto-redact regardless of column name; local-format numbers require a PII-named column. Other AU/NZ identifiers are also covered at the value layer: ABN (mod-89 checksum), Medicare (mod-10 checksum), formatted TFN and IRD numbers (mod-11, separators required), NZ NHI (alpha-prefix regex), and NZ bank account numbers. Bare/unformatted TFN and IRD strings without separators are not detected by value alone — column-name matching remains the safety net for those. Other non-AU/NZ formats rely solely on column-name matching — extend pii.column_names or pii.patterns for your region.
PII already in the model's context from prior turns, system prompts, file reads, or earlier summarisation. Gate filters what goes into the model from configured tools; what's already there stays there.
Tool-side network exfiltration. If a configured tool sends data to an external service directly (rather than returning it via stdout), gate never sees it.
Write operations. INSERT, UPDATE, DELETE are not inspected or blocked.
Credential exposure. Gate holds no credentials; that is the responsibility of the underlying tool. Prefer toolkit commands or MCP servers over raw clients that take credentials on the CLI.
For a stronger boundary, combine gate with harness-level tool restrictions and database-level read-only roles. See THREAT-MODEL.md for the full attacker model and known bypasses.
Supported query tools
Any command that returns JSON can be configured as a gate target — database clients, internal API calls via curl, or any other tool your AI agent uses to fetch data. The AI sees the same structured response it always did, with PII values replaced in-place.
Command Type Notes
tkpsql PostgreSQL (toolkit-managed) sql_arg: "--sql"
tkmsql MS SQL Server (toolkit-managed) sql_arg: "--sql"
tkdbr Databricks (toolkit-managed) sql_arg: "--sql"
databricks Databricks CLI (native) sql_arg: "--json", json_sql_path: "statement"
curl HTTP data sources pipe: "jq -c ."
psql, mysql, mariadb Raw DB clients Not enabled by default — see Raw database clients
Prefer toolkit commands or MCP servers over raw clients: raw clients typically require credentials on the command line, which lands in the agent's transcript, shell history, and process listing. Toolkit commands (tk*) inject credentials from a secrets store; MCP servers hide the connection string entirely. gate works with any JSON-return
[truncated for AI cost control]