2026-07-02 22:29 UTCIn-site rewrite2 min readUpdated: 2026-07-02 22:34 UTC

Show HN: Mirrors – test AI agent changes by replaying real production traces

Mirrors replays production traces in an isolated environment to test AI agent changes safely before deployment. It creates deterministic replicas, scores tool call accuracy, and catches regressions without touching live systems.

SourceHacker News AIAuthor: aisinghal

Mirrors - Test AI Agents Against a Mirror of Production

Test agents against a real mirror of production.

Mirrors turns your production traces into an isolated copy of your agent's environment. Run a change against it and see what breaks before your users do.

Connect Mirrors to your agent. No API keys, just sign in via browser

Run in your terminal

claude mcp add --transport http mirrors https://api.runmirrors.com/mcp

Then run /mcp → mirrors → Authenticate via browser.

100%

reproducible

prod calls

<60s

to a live mirror

airline-support mirror

›

get_reservation_detailsprovided

38ms100%

get_user_detailsmirror db

22ms98%

cancel_reservationprovided

41ms100%

issue_refundllm-sim

67ms91%

AGENT

Done. Reservation HQ8ML2 (SFO → EWR) is cancelled and a $420.00 refund is on its way to your card ending 4242. As a Gold member there’s no cancellation fee. Anything else?

get_reservation_detailsprovided

35ms100%

get_user_detailsmirror db

24ms98%

get_baggage_policyllm-sim

58ms90%

AGENT

On your SFO → EWR flight (economy), your Gold membership includes 2 checked bags up to 50 lb each, plus a carry-on and a personal item. Extra checked bags are $40 each. Want me to add one?

get_reservation_detailsprovided

37ms100%

search_flightsmirror db

44ms97%

rebook_reservationprovided

52ms100%

AGENT

Rebooked HQ8ML2 onto UA478, departing SFO at 8:05 AM tomorrow (was today’s 6:40 PM). No fare difference on your Gold fare. Your new confirmation is UA478. Anything else?

mirror worldseed 730412 · #a3f1c9e2

reservationflightstatus

HQ8ML2SFO → EWRconfirmed

RT41KPJFK → LAXconfirmed

ZX9QW0ORD → SEAcancelled

same seed + prompt → byte-identical world. Every tool call replays against this deterministic mirror, not production.

how it works

Traces in, a runnable mirror out

Ingest traces

Drop in production traces from your ADK or observability platform. Mirrors finds the entities, rebuilds the schema, and discovers every tool.

Build the mirror

You get an isolated, runnable copy of your prod environment: a seeded database plus bound tools, each scored for how closely it matches the real traces.

Run and evaluate

Replay agents against the same world every time. Measure accuracy, catch regressions, and ship with confidence. Production is never touched.

why mirrors

What a mirror unlocks

Catch what would have broken in prod, and ship the change knowing it works.

Reproduce any bug on demand

The same seed and instructions give a byte-identical world, so the failure that paged you shows up every time.

Test the risky flows safely

Run refunds, deletes, and sends against the mirror. Your live systems never see them.

Catch regressions before they ship

Pin golden cases to recorded worlds and grade every build pass or fail.

Know if a change is better

Coverage and accuracy are scored per tool, so you ship on numbers instead of a hunch.

Sandboxes on demand

Each run gets its own mirror with on-demand launch, scale to zero, and metering by the minute.

Drive it from your own code

A versioned /v1 API and workspace keys let you run mirrors from your own apps.

pricing

Start free. Scale when you're ready.

Build mirrors free, with deterministic seeding and the in-app playground. When your team needs unlimited sandboxes, the API, and SSO, we'll tailor a Custom plan.

Free

$0/mo

60 sandbox min / mo

✓Build unlimited mirrors

✓Deterministic seeding

✓In-app playground

✓Community support

FOR TEAMS

Custom

Let's talk

Built around your team

✓Everything in Free

✓Unlimited on-demand sandboxes

✓Public /v1 API + keys, SSO

✓Eval suites + fidelity reports

✓Priority support & onboarding

Ship agent changes without the guesswork.

Build a mirror from your traces in minutes.