ContextWall – Context firewall for AI agents and RAG pipelines
ContextWall is an open-source context firewall that intercepts and scans documents before they enter an AI model's context window, preventing prompt injection, credential leaks, and PII exfiltration. It requires no code changes to agents, runs in your infrastructure, and offers three detection layers with source trust tiers.
Free early access · Apache 2.0 open source
Your AI agent reads untrusted content.
Every web result, document, and API response your agent retrieves goes straight into the model's context window - unscreened. ContextWall intercepts it first, blocks prompt injection and credential leaks, and enforces your security policy before the LLM ever sees it.
✕ Prompt injection✕ Poisoned RAG✕ Credential leaks✕ PII exfiltration
Get early access Self-host free
contextwall: live enforcement feed
--waiting_
No code changes to your agents · Runs in your infrastructure · No LLM in the screening path
Real production incidents, not theoretical threats
Your agent trusts everything it reads
LLMs have no built-in concept of source trust. Content retrieved from a web search and content from your system prompt look identical once they are both inside the context window. Attackers exploit this directly.
CVE-2025-32711
EchoLeak
Microsoft 365 Copilot
9.3 Critical
An attacker sends a crafted email. Copilot reads it, interprets embedded instructions as commands, silently accesses internal SharePoint files, and sends them to the attacker. The user never clicks anything.
WHY IT WORKED
Copilot had no way to distinguish a trusted system instruction from untrusted email content. Both looked the same inside the context window.
USENIX Security 2025
PoisonedRAG
RAG pipelines
90%+ manipulation rate
Researchers planted five adversarial documents into a knowledge base of millions. When users asked questions, the model retrieved and repeated the false content as confident fact, with no jailbreak, no system prompt change, and no model access needed.
WHY IT WORKED
The RAG pipeline retrieved documents by relevance score and passed them straight to the model. There was no check on where the document came from or whether it should be trusted.
Both attacks exploited the same gap: no trust boundary at the context layer. ContextWall fixes this by tagging every context source with a trust tier and applying your policy rules before content reaches the model.
Who it's for
ContextWall is built for teams shipping AI into production who need security guarantees - not just guidelines.
AI & Agent Engineers
You're shipping RAG pipelines and agentic systems that pull from the web, internal docs, and third-party APIs. Every retrieved document is a potential attack vector - and your agent has no way to tell a legitimate source from a poisoned one.
How ContextWall helps
One pip install or Docker image - no changes to your agent code
Screens every document before it enters the prompt
Blocks injections and credential leaks before the LLM sees them
Security Teams
AI systems bypass your existing perimeter controls. Agents make outbound calls, ingest untrusted content, and operate with broad permissions - all outside your traditional detection stack.
How ContextWall helps
Enforceable policy rules per source, team, and repo
Real-time enforcement feed and tamper-evident audit log
Fleet-wide visibility across all deployed agents
Compliance & Legal
HIPAA, SOC 2, and FedRAMP auditors are asking how PHI can't leak through an AI agent's context window. You need evidence - not assurances.
How ContextWall helps
Every enforcement decision mapped to a compliance control ID
Cryptographically signed audit exports on demand
Documented data residency: context never leaves your infrastructure
What ContextWall stops - and what it doesn't
Detection at the context layer. No LLM in the screening path. We're honest about the scope.
Detected & blocked
Direct instruction overrideL1 + L2
"IGNORE ALL PREVIOUS INSTRUCTIONS…"
Bidi & zero-width obfuscationL1
RTL override chars hidden in retrieved text
Spaced-letter injectionL1
"i g n o r e p r e v i o u s"
Semantic paraphrase injectionL3
"Your assignment has been superseded…"
Credential leakageL2
AWS keys, GitHub PATs, bearer tokens
PII exfiltration via contextL2
Emails, SSNs in untrusted-tier documents
L1 = Structural scanL2 = Normalized regexL3 = Heuristic scoring
Out of scope
Model hallucinations
ContextWall filters what enters the context window - it cannot control what the model generates from clean inputs.
System prompt mistakes
If your system prompt grants excessive permissions, ContextWall cannot override that design decision.
Training-time poisoning
Attacks on model weights or fine-tuning data happen before inference. ContextWall operates at inference time only.
Novel zero-day patterns
L3 heuristics catch known semantic paraphrases. A sufficiently novel attack may score below the block threshold - you set that threshold.
Authorized access you've allowed
If your policy permits a source and the model uses that data, ContextWall enforces your policy - not a stricter one.
Honest scope beats false assurances. Defense in depth means ContextWall works alongside your model provider's safety filters, not instead of them.
How it works
ContextWall intercepts every document before it enters the context window. Here's exactly what happens.
1
Your agent requests a document
A web search result, internal doc, API response, or user upload - any external content.
source_id: brave-web-search
trust_tier: untrusted
2
ContextWall intercepts it
The daemon receives the document before it enters the context window. The LLM hasn't seen anything yet.
3
Three detection layers run in sequence
L1 StructuralScan raw bytes for bidi, zero-width, spaced lettersclean
L2 PatternRegex against normalized text - secrets, PII, injection syntaxclean
L3 HeuristicScore semantic intent - catches paraphrase injectionscore: 0.91
4
Policy decision
source_tier: untrusted
l3_score: 0.91 > threshold 0.55
→ action: deny
BLOCKED
400 returned to your agent. Document never reaches the LLM. Event written to the tamper-evident audit log.
ALLOWED
Document forwarded to the LLM API. Clean context enters the prompt as normal.
No LLM in the screening path. No external calls. Your data stays on your host.
Source trust tiers
You declare what each context source is. ContextWall applies the right level of scrutiny automatically based on that tier.
internal
Internal
Your code repos, internal wikis
external
External
Vendor docs, partner APIs
untrusted
Untrusted
Public web, user-submitted input
regulated
Regulated
FHIR APIs, PHI data sources
Three detection layers
Applied in order from cheapest to most thorough. No external calls, no LLM inference.
Layer 1: Structural
Cheapest
Scans raw bytes for known obfuscation tricks: bidirectional control characters, zero-width characters, and spaced-letter keywords ("i g n o r e a l l"). These are invisible to the human eye but readable by the model.
Layer 2: Pattern matching
Fast
Runs regex patterns against normalized text. Catches injection syntax, exposed API keys (AWS, GitHub, Anthropic), bearer tokens, and PII like emails, phone numbers, and SSNs.
Layer 3: Heuristic scoring
Most thorough
Scores each message for instruction-like intent, even when the wording avoids obvious keywords. Catches paraphrases like "your previous assignment has been superseded" that bypass regex entirely.
Data never leaves your infrastructure
Your context stays yours
ContextWall runs as a daemon inside your own infrastructure. Prompts, documents, and file contents are screened locally and never transmitted anywhere. The cloud control plane receives only counts and scores, never content.
Your Infrastructure
Your AI agent
ContextWall daemon
Your LLM API calls
All screening happens here. Nothing exits.
metadata only
What crosses the boundary
request counts
violation types
latency (ms)
session count
Never transmitted
prompt content
file contents
user data
PII / PHI
ContextWall Cloud
Fleet dashboard
Policy authoring
Compliance reports
Sees counts and scores only. Never content.
Stays in your infrastructure, always
Prompt content and user messages
Retrieved documents and file contents
Source URLs and file paths
Model responses and completions
Personally identifiable information
Protected health information (PHI)
Sent to control plane, metadata only
Request counts (blocked / allowed)
Violation types detected (e.g. "pii")
Average latency in milliseconds
Active session count
Policy version acknowledgement
Prefer fully offline? Leave control_plane.url empty and ContextWall runs entirely local with no cloud dependency.
Integrate in minutes
The daemon installs with pip and proxies your AI SDK calls locally. The cloud dashboard is optional and sees only aggregated metadata, never content.
Quick start
1. Install and start the daemon (runs in your infrastructure)
pip install contextwall ctxfw start --config ctxfw.yaml
2. Declare your context sources in ctxfw.yaml
sources:
- id: web-search
type: web trust_tier: untrusted
- id: internal-docs
type: confluence trust_tier: internal
3. Point your AI SDK at the local daemon
export ANTHROPIC_BASE_URL=http://localhost:8080/proxy/anthropic export ANTHROPIC_API_KEY=sk-ant-your-key # unchanged
Every call your agent makes is now screened locally.
Prompts and responses never leave your machine.
Works with the Anthropic and OpenAI SDKs in any language. The daemon proxies requests, screens context, then forwards clean content to the real API without ever storing or transmitting your prompts.
Step-by-step quickstart with copy-paste commands
Security policy as config
Everything is declared in YAML. Sources, rules, and thresholds all live in a file you commit to your repo, review in a pull request, and deploy alongside your other infrastructure config.
Sources declared in config at startup, no API calls or setup scripts
Four-layer policy: fleet-wide rules down to individual repo overrides
Starter policy templates for HIPAA, SOC2, and FedRAMP included
Rules reload within 5 seconds of a file change, no restart needed
Every rule can map to a compliance control ID for audit evidence
ctxfw.yaml
Declare your context sources here. ContextWall registers them
on startup automatically. No API calls, no scripts to run.
sources:
- id: brave-web-search
type: web trust_tier: untrusted
- id: internal-confluence
type: confluence trust_tier: internal data_classification: sensitive
- id: fhir-api
type: api trust_tier: regulated data_classification: phi owner: clinical-data-team
Designed with compliance in mind
Compliance coverage
ContextWall is designed so that compliance is a property of the architecture, not a checklist you complete afterwards. PHI, PII, and sensitive data are handled locally before they can ever be exposed.
HIPAA
PHI never leaves your network
Protected health information is screened locally. It never transits a third-party server.
Regulated source tier enforces that PHI can only flow between approved internal systems
Every enforcement decision is logged with a timestamp, source ID, and outcome for auditor review
Violation events are logged with timestamps, source IDs, and policy decisions for auditor review
SOC 2
Audit trail your reviewers can verify
Every context screening event is logged with source ID, trust tier, decision, and timestamp
Provenance chain is cryptographically linked; records cannot be altered without detection
Role-based access to the fleet dashboard; no raw context is stored or accessible anywhere
Policy rules are version-controlled; changes leave a full audit trail
GDPR
Personal data stays in your jurisdiction by design
PII (email addresses, phone numbers, names) is detected and redacted before reaching the model
The daemon processes all data inside your own infrastructure, with no cross-border data transfer
Control plane receives only aggregated counts, not personal data
Supports data minimisation by design: the model sees the least data necessary to complete the task
Offline deployments
Fully air-gapped. No external dependencies.
Daemon runs entirely within your
[truncated for AI cost control]