AI News HubLIVE
站内改写5 min read

Show HN: AI-native red-team for penetration testing and vulnerability research

Z3r0 is an AI-native red-team framework emphasizing authorization-first, role-governed execution and structured evidence records. It uses Docker sandboxes for controlled execution and supports resumable long-running tasks. The architecture includes specialized agents such as a chief security officer, audit engineer, and others, coordinated for reconnaissance, vulnerability validation, code audit, and more. The system design focuses on operational boundaries and traceability for human review.

SourceHacker News AIAuthor: yv1ing

Notifications You must be signed in to change notification settings

Fork 68

Star 373

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

140 Commits

140 Commits

.z3r0

.z3r0

assets

assets

core

core

handler

handler

middleware

middleware

model

model

router

router

sandbox

sandbox

schema

schema

scripts

scripts

service

service

utils

utils

web

web

.dockerignore

.dockerignore

.gitignore

.gitignore

CHANGELOG.md

CHANGELOG.md

Dockerfile

Dockerfile

LICENSE

LICENSE

QUICKSTART.md

QUICKSTART.md

QUICKSTART_zh.md

QUICKSTART_zh.md

README.md

README.md

README_zh.md

README_zh.md

app.py

app.py

config.py

config.py

database.py

database.py

docker-compose.dev.yml

docker-compose.dev.yml

docker-compose.prod.yml

docker-compose.prod.yml

logger.py

logger.py

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

Authorization before automation: every workflow assumes an explicit legal scope, controlled targets, and operator accountability before any tool capability is used.

Role-governed red-team execution: the coordinator owns decomposition and synthesis; specialist agents handle reconnaissance, vulnerability validation, code audit, reverse engineering, and cryptographic review within scoped responsibilities.

Structured evidence over transient context: durable WorkProject records persist assets, findings, relationship edges, and attack paths outside model context so evidence remains reviewable after the conversation changes.

Resumable long-running work: notification obligations model background subagent work and sandbox jobs, allowing drivers to stop cleanly and resume only when integration work is ready.

Controlled execution boundary: command execution, browser workflows, file management, GUI tooling, and skills run through bound Docker sandboxes rather than the application host.

Stable contracts: the frontend consumes application REST, WebSocket, timeline, and generated schema contracts instead of model SDK or provider internals.

Architecture

flowchart TB Operator["Authorized Red-Team Operator"] Workbench["React Red-Team Workbench Presentation Layer"] API["FastAPI API API Layer"] Runtime["Agent Runtime Orchestration Layer"] Drivers["Instance Drivers Async Scheduling Layer"] Notifications["Notification Obligations Liveness Layer"] Graph["Session Agent Graph Capability Layer"] Timeline["Timeline Event Log Replay Layer"] Record["WorkProject Evidence Records Review Layer"] Evidence["Evidence Chain Assets / Findings / Attack Paths"] Sandbox["Docker Sandbox Execution Layer"] Tools["Sandbox Tool Surface Tool Layer"] Models["Model Providers Model Layer"] Events["Event Contract Streaming Layer"] Store[("PostgreSQL Store Persistence Layer")]

Operator --> Workbench Workbench -->|REST / WebSocket| API API --> Runtime Runtime --> Drivers Runtime --> Graph Runtime --> Record Runtime --> Sandbox Runtime --> Events Runtime --> Store Drivers --> Notifications Notifications --> Runtime Events --> Timeline Timeline --> Store Graph --> Tools Graph --> Models Sandbox --> Tools Record --> Store Record --> Evidence Evidence --> Workbench Events --> Workbench

Loading

The system is organized into explicit layers: user-facing red-team workbench, API boundary, runtime orchestration, resumable instance drivers, notification-backed liveness, session agent graph, controlled execution, model access, streaming event contract, durable timeline replay, and persisted WorkProject evidence records. The backend owns authentication, session lifecycle, context projection, event normalization, delegation, sandbox binding, tool mounting, notification obligations, persistence, project-scoped records, and history compaction. The frontend consumes stable REST and WebSocket contracts and does not depend on model SDK or provider internals.

Value Model

flowchart LR Scope["Authorized Red-Team Scope targets, owners, sandbox"] --> Agents["Specialist Agent Team coordinator + experts"] Agents --> Tools["Sandboxed Tooling commands, files, GUI, skills"] Tools --> Evidence["Evidence Records assets, findings, edges, paths"] Evidence --> Review["Human Review workspace, graph, replay"] Review --> Continuity["Continuity resume, audit, handoff records"]

Loading

This value chain keeps red-team and vulnerability research work operationally bounded. Scope is declared before execution, agents work through explicit tools, tool output is distilled into structured evidence, and reviewers can inspect the resulting graph and timeline without relying on hidden model state.

Agent Team

Code Name Role Responsibility

cso Z3r0 Chief Security Officer Task decomposition, coordination, result integration

cae V3ra Chief Audit Engineer Source code audit, dependency review, remediation verification

cie L1ly Chief Intelligence Engineer Reconnaissance, asset discovery, relationship mapping

cpe Fr4nk Chief Penetration Engineer Penetration testing, vulnerability validation, impact verification

cre J4m3 Chief Reverse Engineer File, binary, firmware, and APK reverse engineering

cce Nu1L Chief Cryptography Engineer Protocol review, key management, cryptographic implementation analysis

flowchart TB CSO["cso / Z3r0"] CSO --> CAE["cae / V3ra Code Audit"] CSO --> CIE["cie / L1ly Reconnaissance"] CSO --> CPE["cpe / Fr4nk Validation"] CSO --> CRE["cre / J4m3 Reverse"] CSO --> CCE["cce / Nu1L Cryptography"]

CAE --> A1["Knowledge and Sandbox Tools"] CIE --> K1["Knowledge and Sandbox Tools"] CPE --> S1["Knowledge and Sandbox Tools"] CRE --> S2["Knowledge and Sandbox Tools"] CCE --> S3["Knowledge and Sandbox Tools"]

Loading

Agent capabilities are assembled per session. AgentRegistry uses configuration, role specifications, knowledge generation, the current sandbox binding, and the current WorkProject binding to create a session-level agent graph. Command tools are mounted only when an authorized, running sandbox is bound to the session. WorkProject record tools are mounted only for project sessions, keeping ordinary chat sessions separate from assets, findings, relationship edges, and attack paths.

Runtime Model

sequenceDiagram participant U as User participant W as WebSocket participant P as AgentSessionPool participant S as AgentSession participant TR as TaskRuntime participant A as Agent participant N as Notifications participant T as Timeline participant DB as PostgreSQL

U->>W: send(text, agent_code, sandbox_id) W->>P: get_or_create(session_id) P->>S: start_turn(content) S->>S: launch main instance driver S->>TR: run_until_idle(initial_content) TR->>DB: load projected history TR->>A: Runner.run_streamed() A-->>TR: iter_interruptible_events() TR-->>S: normalized events S-->>W: publish to subscribers S->>T: stamp seq + upsert persistable event T->>DB: timeline event log TR->>DB: persist messages + metadata W-->>U: thinking / text / tool / done

Note over TR,A: Notification arrives during turn TR->>TR: InterruptSignal (deferred if tool pending) TR->>DB: flush_partial_context TR->>N: claim PENDING notification N-->>TR: notification prompt / user message TR->>TR: run notification turn S->>S: stop when no PENDING work and leave AWAITING work dormant

Loading

Key runtime boundaries:

Non-blocking instance drivers: AgentSession._drive and _SubagentDriver run the optional initial turn, drain currently claimable notifications, then settle. Drivers stop while background work is still AWAITING; completion notifications relaunch the owning main or subagent instance when integration work is ready.

Interrupt-driven task execution: run_until_idle manages the agent turn lifecycle; iter_interruptible_events races the SDK event stream against notification signals and raises InterruptSignal at safe points (after pending tool calls complete), modeled after CPU interrupt masking for atomicity.

Notification-backed liveness: AgentNotification rows are the single source of truth for active work. AWAITING tracks running background obligations, PENDING wakes the owning agent, and PROCESSING marks a claimed notification turn.

Turn-terminal async commands: execute_async_command dispatches a sandbox command, returns only status and run_id, and AgentRegistry ends the turn immediately via tool_use_behavior. The agent is resumed automatically when the command completes; there is no polling or list-wait loop.

Timeline event log: live events are stamped with stable seq values and item keys in TimelineLogWriter; persistable events are upserted into the durable event log so replay reads the same wire events instead of reconstructing UI state from SDK messages.

Event normalization: raw model and agent SDK events are converted into stable frontend events such as thinking_delta, text_delta, tool_call, tool_result, and subagent_task.

Session pool: AgentSessionPool manages active sessions, notification recovery, interruption, cancellation, idle eviction, and tool-binding invalidation.

History projection: Z3r0Session adds owner and nested-call metadata around SDK messages so each agent receives the right view of the shared conversation.

Context compaction: when context approaches the model window, the runtime summarizes earlier projected history while preserving recent context and durable facts.

Delegation Flow

sequenceDiagram participant CSO as CSO Agent participant D as Delegation Tools participant DB as PostgreSQL participant SJ as Subagent Driver participant Child as Specialist Agent participant N as Notifications participant P as Parent Driver

CSO->>D: start_subagent_task(agent_code, brief) D->>DB: create task + AWAITING parent obligation D->>SJ: register _SubagentDriver and spawn drive SJ-->>CSO: run_id (CSO ends turn) SJ->>Child: run_until_idle(brief) Child-->>SJ: stream progress / final output alt child starts nested work SJ->>N: sees outstanding target obligations SJ->>SJ: go dormant with no live task else child reaches terminal status SJ->>DB: complete / fail task DB->>N: AWAITING -> PENDING parent obligation N->>P: resume_target_instance(parent) P->>CSO: claim result notification CSO-->>CSO: integrate result end

Loading

Specialist agents run through resumable per-run _SubagentDriver instances. Starting a subagent creates the AgentSubordinateTask record and the parent SUBAGENT_FINISHED notification obligation in one database transaction, so the parent never observes a gap where the child is neither running nor pending integration. Each subagent driver uses the same run_until_idle executor as the main agent, streams nested events through the session event bus, and then settles into one of three states: relaunch if a claimable notification arrived during drain, go dormant if child work or async jobs are still outstanding, or complete/fail/cancel the task.

When a subagent completes or fails, the task update and parent obligation transition (AWAITING -> PENDING) commit together. resume_target_instance wakes the owning driver: main-agent targets route through AgentSessionPool.resume_session, while subagent targets relaunch their dormant _SubagentDriver. Canceled subagents resolve their obligation without waking the parent.

Sandbox Tooling

flowchart LR Agent["Agent Tool Call"] --> Binding["Sandbox Binding Check"] Binding -->|running + authorized| Sync["execute_sync_command"] Binding -->|running + authorized| Async["execute_async_command"] Binding --> Skill["load_skill"] Binding --> Knowledge["agent knowledge"] Sync --> Docker["Docker exec"] Docker --> Output["ToolResult JSON + output_file"] Output --> Agent Async --> Job[("SandboxAsyncJob AWAITING obligation")] Job -->|completed / failed| Notify["PENDING owner notification"] Notify --> Agent Agent --> Read["read_sandbox_command_output"] Read --> Docker

User[

[truncated for AI cost control]