2026-06-04 20:11 UTCIn-site rewrite6 min readUpdated: 2026-06-30 13:03 UTC

Anthropic's open-source framework for AI-powered vulnerability discovery

Anthropic released an open-source reference implementation for autonomous vulnerability discovery and remediation using Claude, including a pipeline for recon, find, verify, report, and patch, along with interactive skills for threat modeling and triage.

SourceHacker News AIAuthor: binyu

Notifications You must be signed in to change notification settings

Fork 51

Star 472

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

6 Commits

.claude/skills

bin

docs

harness

scripts

static

targets

tests

.gitignore

CLAUDE.md

LICENSE

README.md

pyproject.toml

Repository files navigation

A reference implementation for autonomous vulnerability discovery and remediation with Claude, based on our learnings from partnering with security teams at several organizations since launching Claude Mythos Preview. For a write up of these learnings along with best practices, see the accompanying blog post (also available in blog-post.md). For a lightweight SDK-only walkthrough of the same recon → find → triage → report → patch loop, see the companion cookbook.

This repo is not maintained and is not accepting contributions.

🔒 Want a managed option? Anthropic offers Claude Security, a hosted product that finds and fixes vulnerabilities in your source code across multiple projects. Claude Security scans your repository for vulnerabilities, applies a multi-stage verification pipeline to reduce false positives, and lets you manage findings through their lifecycle: triage, fix validation, and rapid fix generation.

This repository is an open-source reference implementation based on general best practices for finding vulnerabilities using Claude. You can use it to build your own vulnerability finding pipeline, customize the logic, and it can be used with whatever access you have to Claude APIs (including Bedrock, Vertex, or Azure).

Contents

Claude Code skills: /quickstart, /threat-model, /vuln-scan, /triage, /patch, /customize: interactive scoping, scanning, triage, and patching. Open this repo in Claude Code and run /quickstart to get oriented.

harness/: the autonomous reference pipeline (recon → find → verify → report → patch), configured for finding C/C++ memory vulnerabilities using Docker and ASAN. This harness is a reference, not a product. The general shape, prompts, and sandboxing are reusable, but the harness will not work on every codebase out of the box. Run /customize to port it to your language, detector, or vuln class.

⚠️ Security: /quickstart, /threat-model, /vuln-scan, and /triage only read and write files. Running /patch on static findings (TRIAGE.json or VULN-FINDINGS.json) is likewise read- and write-only. /customize edits the harness code and runs validation commands. Any of these skills are safe to run unsandboxed, as long as you review and approve each tool use in Claude Code. The autonomous reference pipeline (including /patch on pipeline results) executes target code, so it refuses to run outside of a gVisor sandbox unless explicitly overridden. To get set up, run scripts/setup_sandbox.sh once, then invoke the pipeline via bin/vp-sandboxed. See docs/security.md and docs/agent-sandbox.md for more details.

Getting Started

git clone https://github.com/anthropics/defending-code-reference-harness cd defending-code-reference-harness claude

30-sec intro + guided first run on the canary target

> /quickstart

> /quickstart how do I port the pipeline to Java? > /quickstart how do I triage all these bugs?

Pin every subagent to the model you want

export CLAUDE_CODE_SUBAGENT_MODEL= claude

0. intro + guided first run

> /quickstart

1. Build a threat model (aim before you shoot)

> /threat-model bootstrap targets/canary

2. Run a static scan, scoped by that threat model

> /vuln-scan targets/canary

3. Verify, dedupe, and rank what came back

> /triage targets/canary/VULN-FINDINGS.json

4. Generate candidate fixes for the verified findings

> /patch ./TRIAGE.json --repo targets/canary

This flow produces THREAT_MODEL.md, VULN-FINDINGS.{json,md}, TRIAGE.{json,md}, and PATCHES/.

The vulnerability candidates produced in Step 1 come from Claude's static review of the source (nothing is built or run), so expect more false positives on any non-canary targets. In Step 2, you'll produce execution-verified findings.

Note: on the canary target, /triage may dismiss the scan's findings as false positives. entry.c announces itself as deliberately vulnerable demo code, and /triage correctly excludes bugs in test / fixture code. To see the full confirm / dedupe / false positive flow, run it on the curated fixture instead (/triage .claude/skills/triage/fixtures/canary-findings.json --repo targets/canary) or point the Step 1 skills at your own code.

Step 2 (Day 2): Run the reference pipeline on a C/C++ library

On Day 2, you'll move from interactive skills to your first autonomous run using the reference pipeline. You'll run the full recon → find → verify → report loop in your environment on a known-vulnerable open-source library, then generate a candidate patch for what it finds. You'll finish with a set of reproducible crashes, exploitability reports, and candidate patches, along with a feel for how the pipeline works.

Running the pipeline is simple:

One-time setup

python3 -m venv .venv && .venv/bin/pip install -e . ./scripts/setup_sandbox.sh # installs gVisor, builds the agent images, and verifies isolation; note: requires Docker export ANTHROPIC_API_KEY=sk-ant-... # or CLAUDE_CODE_OAUTH_TOKEN; the pipeline requires one in env

Run the recon → find → verify → report loop

bin/vp-sandboxed run drlibs --model --runs 3 --parallel --stream --auto-focus

Generate a candidate patch for each finding

bin/vp-sandboxed patch results/drlibs// --model

Or, ask Claude Code to launch the pipeline and watch the run for you

claude > run the pipeline on drlibs and explain findings as they come

Results from the loop land in a results/drlibs// directory. With the --stream flag, the first report will appear in minutes under reports/bug_NN/.

⚠️ run spawns autonomous agents. The pipeline runs each agent inside a gVisor container with egress restricted to the Claude API. Agent-spawning subcommands refuse to start outside it unless explicitly overridden. For more information, see docs/security.md and docs/agent-sandbox.md.

Under the hood, the pipeline walks through seven stages:

Build: Compiles the target into a Docker image with ASAN (the memory error detector for C and C++). The pipeline builds this image automatically on first run using the target's Dockerfile.

Recon: A lightweight agent reads the source inside a network-isolated container and proposes a partition, i.e., "here are N distinct input-parsing subsystems worth attacking separately", so that parallel find agents explore different areas instead of converging on the same bug. Without the --auto-focus flag, the pipeline uses the focus_areas list from the target's config.yaml.

Find: N agents run in parallel, each in its own isolated container. Each agent reads the source, crafts malformed inputs, and runs the ASAN binary until a given input produces a crash 3 out of 3 times.

Verify: A separate grader agent reproduces each crash in a fresh container that the find agent hasn't touched. The only thing that crosses over from the find agent to the grader is the proof of concept it produced.

Dedupe: A judge agent compares verified crashes against bugs already reported and decides whether each is a new bug, a better example of a known bug, or a duplicate to skip.

Report: A report agent writes a structured exploitability analysis per unique bug, including details on primitive class, reachability, escalation path, and severity.

Patch (the separate patch command above): A patch agent writes a proposed fix, and a grader agent confirms that the new code builds, that the original proof of concept input no longer crashes, that the target's test suite still passes, and that a fresh find agent can't find a way around the fix.

For more details, see docs/pipeline.md.

Step 3 (Days 3-5): Customize the pipeline for your target

On Days 3-5, you'll customize the harness for your own target. First, you'll point the Step 1 skills at your code, then you'll use /customize to port the pipeline to your stack. By the end of the week, you'll have a targets// directory that the pipeline can run against, validated with a single smoke run of the pipeline, and ready to scale up in Step 4.

While the reference pipeline is designed for finding memory vulnerabilities in C and C++ code, its shape is generic. Porting it to a new vuln class or language just means answering the following questions for your target stack:

Question C/C++ Reference Your target (examples)

What signals a finding? ASAN crash signature exception / canary file / DNS callback

What does a proof of concept look like? crashing input file HTTP request sequence / tx list / test harness

How is the target built and run? Dockerfile (using clang + ASAN) your language's build in a container

Before customizing, point the Step 1 skills at your own code. As a reminder, they're read- and write-only, so they can run unsandboxed.

claude

> /quickstart how do I customize this for ~/code/my-service?

> /threat-model bootstrap-then-interview ~/code/my-service > /vuln-scan ~/code/my-service > /triage ~/code/my-service/VULN-FINDINGS.json --repo ~/code/my-service

Then, use the artifacts produced by those skills in the /customize skill, which modifies the harness for your codebase.

> /customize use ~/code/my-service/{THREAT_MODEL.md,VULN-FINDINGS.json} and ./TRIAGE.md

When /customize is done, you'll have a targets/my-service/ directory set up. Validate it with a smoke run of the pipeline before scaling up.

bin/vp-sandboxed run my-service --model --runs 1

For more details, see docs/customizing.md.

Step 4 (Week 2): Start autonomous scanning, triage, and patching

In Week 2, you'll use the pipeline you customized in Step 3 on your own targets, adding an outer loop to the inner pipeline loop - run multiple pipeline scans, triage the findings from across those runs, patch based on prioritization, and repeat.

Scan - run a wave of parallel runs against your target

bin/vp-sandboxed run my-service --model --runs 5 --parallel --stream --auto-focus

[truncated for AI cost control]