Show HN: AI-native red-team for penetration testing and vulnerability research
Z3r0 is an AI-native red-team framework emphasizing authorization-first, role-governed execution and structured evidence records. It uses Docker sandboxes for controlled execution and supports resumable long-running tasks. The architecture includes specialized agents such as a chief security officer, audit engineer, and others, coordinated for reconnaissance, vulnerability validation, code audit, and more. The system design focuses on operational boundaries and traceability for human review.
Notifications You must be signed in to change notification settings
Fork 68
Star 373
BranchesTags
Open more actions menu
Folders and files
NameName
Last commit message
Last commit date
Latest commit
History
140 Commits
140 Commits
.z3r0
.z3r0
assets
assets
core
core
handler
handler
middleware
middleware
model
model
router
router
sandbox
sandbox
schema
schema
scripts
scripts
service
service
utils
utils
web
web
.dockerignore
.dockerignore
.gitignore
.gitignore
CHANGELOG.md
CHANGELOG.md
Dockerfile
Dockerfile
LICENSE
LICENSE
QUICKSTART.md
QUICKSTART.md
QUICKSTART_zh.md
QUICKSTART_zh.md
README.md
README.md
README_zh.md
README_zh.md
app.py
app.py
config.py
config.py
database.py
database.py
docker-compose.dev.yml
docker-compose.dev.yml
docker-compose.prod.yml
docker-compose.prod.yml
logger.py
logger.py
main.py
main.py
requirements.txt
requirements.txt
Repository files navigation
Authorization before automation: every workflow assumes an explicit legal scope, controlled targets, and operator accountability before any tool capability is used.
Role-governed red-team execution: the coordinator owns decomposition and synthesis; specialist agents handle reconnaissance, vulnerability validation, code audit, reverse engineering, and cryptographic review within scoped responsibilities.
Structured evidence over transient context: durable WorkProject records persist assets, findings, relationship edges, and attack paths outside model context so evidence remains reviewable after the conversation changes.
Resumable long-running work: notification obligations model background subagent work and sandbox jobs, allowing drivers to stop cleanly and resume only when integration work is ready.
Controlled execution boundary: command execution, browser workflows, file management, GUI tooling, and skills run through bound Docker sandboxes rather than the application host.
Stable contracts: the frontend consumes application REST, WebSocket, timeline, and generated schema contracts instead of model SDK or provider internals.
Architecture
flowchart TB Operator["Authorized Red-Team Operator"] Workbench["React Red-Team Workbench Presentation Layer"] API["FastAPI API API Layer"] Runtime["Agent Runtime Orchestration Layer"] Drivers["Instance Drivers Async Scheduling Layer"] Notifications["Notification Obligations Liveness Layer"] Graph["Session Agent Graph Capability Layer"] Timeline["Timeline Event Log Replay Layer"] Record["WorkProject Evidence Records Review Layer"] Evidence["Evidence Chain Assets / Findings / Attack Paths"] Sandbox["Docker Sandbox Execution Layer"] Tools["Sandbox Tool Surface Tool Layer"] Models["Model Providers Model Layer"] Events["Event Contract Streaming Layer"] Store[("PostgreSQL Store Persistence Layer")]
Operator --> Workbench Workbench -->|REST / WebSocket| API API --> Runtime Runtime --> Drivers Runtime --> Graph Runtime --> Record Runtime --> Sandbox Runtime --> Events Runtime --> Store Drivers --> Notifications Notifications --> Runtime Events --> Timeline Timeline --> Store Graph --> Tools Graph --> Models Sandbox --> Tools Record --> Store Record --> Evidence Evidence --> Workbench Events --> Workbench
Loading
The system is organized into explicit layers: user-facing red-team workbench, API boundary, runtime orchestration, resumable instance drivers, notification-backed liveness, session agent graph, controlled execution, model access, streaming event contract, durable timeline replay, and persisted WorkProject evidence records. The backend owns authentication, session lifecycle, context projection, event normalization, delegation, sandbox binding, tool mounting, notification obligations, persistence, project-scoped records, and history compaction. The frontend consumes stable REST and WebSocket contracts and does not depend on model SDK or provider internals.
Value Model
flowchart LR Scope["Authorized Red-Team Scope targets, owners, sandbox"] --> Agents["Specialist Agent Team coordinator + experts"] Agents --> Tools["Sandboxed Tooling commands, files, GUI, skills"] Tools --> Evidence["Evidence Records assets, findings, edges, paths"] Evidence --> Review["Human Review workspace, graph, replay"] Review --> Continuity["Continuity resume, audit, handoff records"]
Loading
This value chain keeps red-team and vulnerability research work operationally bounded. Scope is declared before execution, agents work through explicit tools, tool output is distilled into structured evidence, and reviewers can inspect the resulting graph and timeline without relying on hidden model state.
Agent Team
Code Name Role Responsibility
cso Z3r0 Chief Security Officer Task decomposition, coordination, result integration
cae V3ra Chief Audit Engineer Source code audit, dependency review, remediation verification
cie L1ly Chief Intelligence Engineer Reconnaissance, asset discovery, relationship mapping
cpe Fr4nk Chief Penetration Engineer Penetration testing, vulnerability validation, impact verification
cre J4m3 Chief Reverse Engineer File, binary, firmware, and APK reverse engineering
cce Nu1L Chief Cryptography Engineer Protocol review, key management, cryptographic implementation analysis
flowchart TB CSO["cso / Z3r0"] CSO --> CAE["cae / V3ra Code Audit"] CSO --> CIE["cie / L1ly Reconnaissance"] CSO --> CPE["cpe / Fr4nk Validation"] CSO --> CRE["cre / J4m3 Reverse"] CSO --> CCE["cce / Nu1L Cryptography"]
CAE --> A1["Knowledge and Sandbox Tools"] CIE --> K1["Knowledge and Sandbox Tools"] CPE --> S1["Knowledge and Sandbox Tools"] CRE --> S2["Knowledge and Sandbox Tools"] CCE --> S3["Knowledge and Sandbox Tools"]
Loading
Agent capabilities are assembled per session. AgentRegistry uses configuration, role specifications, knowledge generation, the current sandbox binding, and the current WorkProject binding to create a session-level agent graph. Command tools are mounted only when an authorized, running sandbox is bound to the session. WorkProject record tools are mounted only for project sessions, keeping ordinary chat sessions separate from assets, findings, relationship edges, and attack paths.
Runtime Model
sequenceDiagram participant U as User participant W as WebSocket participant P as AgentSessionPool participant S as AgentSession participant TR as TaskRuntime participant A as Agent participant N as Notifications participant T as Timeline participant DB as PostgreSQL
U->>W: send(text, agent_code, sandbox_id) W->>P: get_or_create(session_id) P->>S: start_turn(content) S->>S: launch main instance driver S->>TR: run_until_idle(initial_content) TR->>DB: load projected history TR->>A: Runner.run_streamed() A-->>TR: iter_interruptible_events() TR-->>S: normalized events S-->>W: publish to subscribers S->>T: stamp seq + upsert persistable event T->>DB: timeline event log TR->>DB: persist messages + metadata W-->>U: thinking / text / tool / done
Note over TR,A: Notification arrives during turn TR->>TR: InterruptSignal (deferred if tool pending) TR->>DB: flush_partial_context TR->>N: claim PENDING notification N-->>TR: notification prompt / user message TR->>TR: run notification turn S->>S: stop when no PENDING work and leave AWAITING work dormant
Loading
Key runtime boundaries:
Non-blocking instance drivers: AgentSession._drive and _SubagentDriver run the optional initial turn, drain currently claimable notifications, then settle. Drivers stop while background work is still AWAITING; completion notifications relaunch the owning main or subagent instance when integration work is ready.
Interrupt-driven task execution: run_until_idle manages the agent turn lifecycle; iter_interruptible_events races the SDK event stream against notification signals and raises InterruptSignal at safe points (after pending tool calls complete), modeled after CPU interrupt masking for atomicity.
Notification-backed liveness: AgentNotification rows are the single source of truth for active work. AWAITING tracks running background obligations, PENDING wakes the owning agent, and PROCESSING marks a claimed notification turn.
Turn-terminal async commands: execute_async_command dispatches a sandbox command, returns only status and run_id, and AgentRegistry ends the turn immediately via tool_use_behavior. The agent is resumed automatically when the command completes; there is no polling or list-wait loop.
Timeline event log: live events are stamped with stable seq values and item keys in TimelineLogWriter; persistable events are upserted into the durable event log so replay reads the same wire events instead of reconstructing UI state from SDK messages.
Event normalization: raw model and agent SDK events are converted into stable frontend events such as thinking_delta, text_delta, tool_call, tool_result, and subagent_task.
Session pool: AgentSessionPool manages active sessions, notification recovery, interruption, cancellation, idle eviction, and tool-binding invalidation.
History projection: Z3r0Session adds owner and nested-call metadata around SDK messages so each agent receives the right view of the shared conversation.
Context compaction: when context approaches the model window, the runtime summarizes earlier projected history while preserving recent context and durable facts.
Delegation Flow
sequenceDiagram participant CSO as CSO Agent participant D as Delegation Tools participant DB as PostgreSQL participant SJ as Subagent Driver participant Child as Specialist Agent participant N as Notifications participant P as Parent Driver
CSO->>D: start_subagent_task(agent_code, brief) D->>DB: create task + AWAITING parent obligation D->>SJ: register _SubagentDriver and spawn drive SJ-->>CSO: run_id (CSO ends turn) SJ->>Child: run_until_idle(brief) Child-->>SJ: stream progress / final output alt child starts nested work SJ->>N: sees outstanding target obligations SJ->>SJ: go dormant with no live task else child reaches terminal status SJ->>DB: complete / fail task DB->>N: AWAITING -> PENDING parent obligation N->>P: resume_target_instance(parent) P->>CSO: claim result notification CSO-->>CSO: integrate result end
Loading
Specialist agents run through resumable per-run _SubagentDriver instances. Starting a subagent creates the AgentSubordinateTask record and the parent SUBAGENT_FINISHED notification obligation in one database transaction, so the parent never observes a gap where the child is neither running nor pending integration. Each subagent driver uses the same run_until_idle executor as the main agent, streams nested events through the session event bus, and then settles into one of three states: relaunch if a claimable notification arrived during drain, go dormant if child work or async jobs are still outstanding, or complete/fail/cancel the task.
When a subagent completes or fails, the task update and parent obligation transition (AWAITING -> PENDING) commit together. resume_target_instance wakes the owning driver: main-agent targets route through AgentSessionPool.resume_session, while subagent targets relaunch their dormant _SubagentDriver. Canceled subagents resolve their obligation without waking the parent.
Sandbox Tooling
flowchart LR Agent["Agent Tool Call"] --> Binding["Sandbox Binding Check"] Binding -->|running + authorized| Sync["execute_sync_command"] Binding -->|running + authorized| Async["execute_async_command"] Binding --> Skill["load_skill"] Binding --> Knowledge["agent knowledge"] Sync --> Docker["Docker exec"] Docker --> Output["ToolResult JSON + output_file"] Output --> Agent Async --> Job[("SandboxAsyncJob AWAITING obligation")] Job -->|completed / failed| Notify["PENDING owner notification"] Notify --> Agent Agent --> Read["read_sandbox_command_output"] Read --> Docker
User[
[truncated for AI cost control]