2026-06-04 07:28 UTCIn-site rewrite6 min readUpdated: 2026-06-30 13:03 UTC

Gate – deterministic PII redaction for AI agent tool output (Rust)

Gate is a deterministic PII redaction tool written in Rust for AI agents. It uses regex, column heuristics, and Luhn checks instead of LLMs. It intercepts Bash commands and MCP tool calls via hooks, supports multiple harnesses, and provides scanning, real-time redaction, and aggregated reporting while keeping data local.

SourceHacker News AIAuthor: gzhuuu

Article intelligence

EngineersIntermediate

Key points

Gate uses deterministic methods (regex, column heuristics, Luhn) for PII redaction, not LLMs, ensuring consistent results and low latency.
It intercepts agent Bash commands and MCP tool calls via hooks, automatically rewriting commands to add a redaction layer.
Supports major AI agent harnesses including Claude Code, OpenCode, Cursor, GitHub Copilot CLI, etc.
Provides scanning, redaction, hashed values (for deduplication), and aggregated audit reports.

Why it matters

This matters because gate uses deterministic methods (regex, column heuristics, Luhn) for PII redaction, not LLMs, ensuring consistent results and low latency.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

This panel is AI-generated and reviewed for accuracy.

Notifications You must be signed in to change notification settings

Fork 0

Star 36

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

301 Commits

.github/workflows

assets

crates

dev

docs

scripts

.gitignore

.tool-versions

CHANGELOG.md

CLAUDE.md

CONTRIBUTING.md

Cargo.lock

Cargo.toml

DISCLAIMER.md

LICENSE

README.md

README.zh-CN.md

SECURITY.md

THREAT-MODEL.md

cliff.toml

Repository files navigation

Most PII guardrails for AI agents are themselves LLMs — they send your data to a model to decide whether it's sensitive. Gate takes the opposite approach.

gate LLM-based redaction

Decision method Regex + column heuristics + Luhn Model inference

Deterministic ✅ Same input always produces the same output ❌ Varies by run and model version

Data stays local ✅ Never leaves your machine ❌ Sent to a model API for classification

Latency ✅ -h -d -c "SELECT TABLE_NAME, COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'public' ORDER BY TABLE_NAME, ORDINAL_POSITION" | gate scan

See docs/scan.md for queries against MySQL, MS SQL Server (including native sqlcmd), Databricks, and toolkit-managed clients.

Risk level is weighted by category sensitivity — one SSN column matters more than twenty address columns. Exits with code 1 if any PII columns are found (scriptable in CI). Pass --verbose to show all detected columns, or --json for machine-readable output.

Sensitivity Categories Risk floor

Critical Government IDs, Health & medical, Financial, Biometric HIGH always; CRITICAL if ≥3 columns or >10% of schema

Elevated Contact, Names, Date of birth, Location of birth, Family & relationships, Employment HIGH if >5% of schema; CRITICAL if >25%

Standard Address & location, Online & technical, Demographics HIGH if >25% of schema

Note: gate scan detects PII by column name only. A LOW result means your column names look clean — it does not mean the data is safe. Gate 2 additionally inspects values at query time, catching PII in free-text, JSON, and ambiguously-named columns that scan cannot see. In multi-row results, if any value in a column matches a PII pattern, the entire column is promoted and all rows are redacted — not just the matching row.

For false positives (e.g. city in a products table), run gate scan --review to triage interactively and add columns to the allowlist. Allowlisted columns skip all redaction — both name-based and value-based. Only add a column to the allowlist when you are certain it contains no PII. Low-confidence pattern matches (below confidence_threshold) are redacted and flagged with a warning in _gate_summary; add the column to column_allowlist to suppress. Manage the list directly with gate allowlist add/remove/list.

Quickstart

Install gate

Homebrew — macOS and Linux (recommended)

brew tap GaaraZhu/gate && brew install gate

cargo binstall — downloads a prebuilt binary

cargo binstall gate

Or grab a binary from the releases page

https://github.com/GaaraZhu/gate/releases

Create your config (opens ~/.config/gate/config.yaml in your editor):

gate config

Claude Code (default)

gate init

OpenCode

gate init --harness opencode

Cursor

gate init --harness cursor

GitHub Copilot CLI (project-scoped, run from repo root)

gate init --harness copilot-cli

Codex CLI

gate init --harness codex

Gemini CLI

gate init --harness gemini

Add --scope project for project-only setup. Restart your OpenCode, Cursor, or Gemini CLI session after gate init to load the hook. For Codex CLI, restart the session, then review the hook in the Trust & Permissions UI, mark it as trusted, and enable it. For Copilot CLI, the generated .github/hooks/PreToolUse.json is gitignored by default — each developer runs gate init --harness copilot-cli once in their local clone.

(Optional) Register MCP server proxies so tools/call responses also pass through gate:

Claude Code (default) — dry-run, shows what would change

gate init --wrap-mcp

OpenCode

gate init --harness opencode --wrap-mcp --yes

Cursor

gate init --harness cursor --wrap-mcp --yes

Copilot CLI

gate init --harness copilot-cli --wrap-mcp --yes

Codex CLI

gate init --harness codex --wrap-mcp --yes

Gemini CLI

gate init --harness gemini --wrap-mcp --yes

Add --scope project for project-level MCP config. For Cursor project-scoped MCP, re-enable the servers in Settings → Tools & MCPs after registration. See docs/mcp.md for --servers, per-harness paths, and manual single-server registration.

Start your AI session — gate intercepts query commands automatically. No changes to your prompts or tools required.

Run gate validate to confirm your config is valid before the first session.

How it works

gate covers two access paths agents use to reach data. The blog post has the full walkthrough; the short version:

Bash tooling path

Every Bash command passes through gate hook first. Commands that match a configured tool are silently rewritten to gate run -- , which spawns the subprocess and pipes stdout through the two-gate detection pipeline. The rewrite happens in the harness's pre-tool-execution hook — it is enforcing in Claude Code, OpenCode, Cursor, GitHub Copilot CLI, Codex CLI, and Gemini CLI; the agent cannot bypass it. Humans and CI scripts running outside the harness are untouched.

AI asks to run: tkpsql query --sql "SELECT * FROM users" │ harness hook fires (PreToolUse / tool.execute.before) │ gate hook rewrites to: gate run -- tkpsql query --sql "..." │ ┌──────────────┴──────────────┐ │ Gate 1: SQL inspection │ SELECT * → no column hints, defer to Gate 2 │ Gate 2: Value scanning │ regex + column-name heuristics + Luhn check └──────────────┬──────────────┘ │ {"id": 1, "full_name": "[PII:name]", "email": "[PII:email]", ..., "_gate_summary": {...}}

MCP path

gate mcp is a transparent stdio proxy registered in the harness as the MCP server. It forwards all JSON-RPC traffic verbatim except tools/call responses, which pass through Gate 2 before reaching the model. No changes to the upstream server are required.

Note: only tools/call responses are redacted — resources/read, prompts/get, and other MCP message types are forwarded without inspection.

AI ──tools/call──> gate mcp ──forward──> upstream MCP server │ │ ] placeholders. A _gate_summary field is appended reporting what was redacted.

{ "rows": [{"id": 1, "email": "[PII:email]", "ssn": "[PII:ssn]"}], "count": 1, "_gate_summary": {"redacted": 2, "types": ["email", "ssn"], "warnings": []} }

With hash_values: true in config, each placeholder gains an 8-char hex suffix derived from the original value ([PII:email:7f83b165]). The same raw value always produces the same suffix, so the AI can join or deduplicate across rows without ever seeing the underlying data. Error responses from the underlying tool pass through unchanged.

Protection retrospective

_gate_summary reports a single response. gate retro aggregates across all of them — total queries seen, PII fields redacted, hit rate, plus a breakdown by tool and PII category. Useful for periodic audits and for confirming the boundary is doing real work.

If any query produced a low-confidence redaction, gate retro surfaces a Low-confidence redactions section listing each unique warned column and the exact gate allowlist add command to suppress it. Once a column is added to the allowlist it disappears from this section automatically.

Stats are collected by default and written to a local JSONL log on disk — they never leave your machine. Disable with stats.enabled: false in config.

What gate does NOT protect against

gate is a deterministic redaction layer, not a sandbox. It assumes the agent is non-adversarial and only inspects output from commands listed under tools: in config. The following are deliberately out of scope:

Adversarial agents / prompt injection. Gate's threat model is an agent that inadvertently exfiltrates PII. gate protect (Unix) blocks the most direct bypass — a hijacked agent disabling gate via config edits — by transferring config ownership to root. But a determined attacker can still route around gate by invoking commands not in tools:, requesting non-JSON output formats, piping through encoders, or removing the hook entry from the harness settings file for the next session. Pair gate with a harness-level Bash allowlist to close the residual gap.

Commands not in tools:. The AI can invoke them freely; their output is never inspected.

Non-JSON tool output. Plain text, CSV, and other formats pass through unchanged. Configure tools to emit JSON.

Encoded or obfuscated PII. Base64-encoded emails, URL-encoded values, or deliberately spaced strings (a l i c e @ e x a m p l e . c o m) are not detected.

Non-US PII by value alone. The built-in SSN regex requires dashes. AU/NZ phone numbers are caught by value — mobile (04XX/02X local, +61 4XX/+64 2X international) and landline (0[2378]/0[34679] local, +61 [2378]/+64 [34679] international) — including the common +610/+640 stray-leading-zero variant and arbitrary whitespace in the number. International-prefix numbers (+61/+64) auto-redact regardless of column name; local-format numbers require a PII-named column. Other AU/NZ identifiers are also covered at the value layer: ABN (mod-89 checksum), Medicare (mod-10 checksum), formatted TFN and IRD numbers (mod-11, separators required), NZ NHI (alpha-prefix regex), and NZ bank account numbers. Bare/unformatted TFN and IRD strings without separators are not detected by value alone — column-name matching remains the safety net for those. Other non-AU/NZ formats rely solely on column-name matching — extend pii.column_names or pii.patterns for your region.

PII already in the model's context from prior turns, system prompts, file reads, or earlier summarisation. Gate filters what goes into the model from configured tools; what's already there stays there.

Tool-side network exfiltration. If a configured tool sends data to an external service directly (rather than returning it via stdout), gate never sees it.

Write operations. INSERT, UPDATE, DELETE are not inspected or blocked.

Credential exposure. Gate holds no credentials; that is the responsibility of the underlying tool. Prefer toolkit commands or MCP servers over raw clients that take credentials on the CLI.

For a stronger boundary, combine gate with harness-level tool restrictions and database-level read-only roles. See THREAT-MODEL.md for the full attacker model and known bypasses.

Supported query tools

Any command that returns JSON can be configured as a gate target — database clients, internal API calls via curl, or any other tool your AI agent uses to fetch data. The AI sees the same structured response it always did, with PII values replaced in-place.

Command Type Notes

tkpsql PostgreSQL (toolkit-managed) sql_arg: "--sql"

tkmsql MS SQL Server (toolkit-managed) sql_arg: "--sql"

tkdbr Databricks (toolkit-managed) sql_arg: "--sql"

databricks Databricks CLI (native) sql_arg: "--json", json_sql_path: "statement"

curl HTTP data sources pipe: "jq -c ."

psql, mysql, mariadb Raw DB clients Not enabled by default — see Raw database clients

Prefer toolkit commands or MCP servers over raw clients: raw clients typically require credentials on the command line, which lands in the agent's transcript, shell history, and process listing. Toolkit commands (tk*) inject credentials from a secrets store; MCP servers hide the connection string entirely. gate works with any JSON-return

[truncated for AI cost control]