Show HN: Mirrors – test AI agent changes by replaying real production traces
Mirrors replays production traces in an isolated environment to test AI agent changes safely before deployment. It creates deterministic replicas, scores tool call accuracy, and catches regressions without touching live systems.
Mirrors - Test AI Agents Against a Mirror of Production
Test agents against a real mirror of production.
Mirrors turns your production traces into an isolated copy of your agent's environment. Run a change against it and see what breaks before your users do.
Connect Mirrors to your agent. No API keys, just sign in via browser
Run in your terminal
claude mcp add --transport http mirrors https://api.runmirrors.com/mcp
Then run /mcp → mirrors → Authenticate via browser.
100%
reproducible
0
prod calls
<60s
to a live mirror
airline-support mirror
›
get_reservation_detailsprovided
38ms100%
get_user_detailsmirror db
22ms98%
cancel_reservationprovided
41ms100%
issue_refundllm-sim
67ms91%
AGENT
Done. Reservation HQ8ML2 (SFO → EWR) is cancelled and a $420.00 refund is on its way to your card ending 4242. As a Gold member there’s no cancellation fee. Anything else?
get_reservation_detailsprovided
35ms100%
get_user_detailsmirror db
24ms98%
get_baggage_policyllm-sim
58ms90%
AGENT
On your SFO → EWR flight (economy), your Gold membership includes 2 checked bags up to 50 lb each, plus a carry-on and a personal item. Extra checked bags are $40 each. Want me to add one?
get_reservation_detailsprovided
37ms100%
search_flightsmirror db
44ms97%
rebook_reservationprovided
52ms100%
AGENT
Rebooked HQ8ML2 onto UA478, departing SFO at 8:05 AM tomorrow (was today’s 6:40 PM). No fare difference on your Gold fare. Your new confirmation is UA478. Anything else?
mirror worldseed 730412 · #a3f1c9e2
reservationflightstatus
HQ8ML2SFO → EWRconfirmed
RT41KPJFK → LAXconfirmed
ZX9QW0ORD → SEAcancelled
same seed + prompt → byte-identical world. Every tool call replays against this deterministic mirror, not production.
how it works
Traces in, a runnable mirror out
01
Ingest traces
Drop in production traces from your ADK or observability platform. Mirrors finds the entities, rebuilds the schema, and discovers every tool.
02
Build the mirror
You get an isolated, runnable copy of your prod environment: a seeded database plus bound tools, each scored for how closely it matches the real traces.
03
Run and evaluate
Replay agents against the same world every time. Measure accuracy, catch regressions, and ship with confidence. Production is never touched.
why mirrors
What a mirror unlocks
Catch what would have broken in prod, and ship the change knowing it works.
Reproduce any bug on demand
The same seed and instructions give a byte-identical world, so the failure that paged you shows up every time.
Test the risky flows safely
Run refunds, deletes, and sends against the mirror. Your live systems never see them.
Catch regressions before they ship
Pin golden cases to recorded worlds and grade every build pass or fail.
Know if a change is better
Coverage and accuracy are scored per tool, so you ship on numbers instead of a hunch.
Sandboxes on demand
Each run gets its own mirror with on-demand launch, scale to zero, and metering by the minute.
Drive it from your own code
A versioned /v1 API and workspace keys let you run mirrors from your own apps.
pricing
Start free. Scale when you're ready.
Build mirrors free, with deterministic seeding and the in-app playground. When your team needs unlimited sandboxes, the API, and SSO, we'll tailor a Custom plan.
Free
$0/mo
60 sandbox min / mo
✓Build unlimited mirrors
✓Deterministic seeding
✓In-app playground
✓Community support
FOR TEAMS
Custom
Let's talk
Built around your team
✓Everything in Free
✓Unlimited on-demand sandboxes
✓Public /v1 API + keys, SSO
✓Eval suites + fidelity reports
✓Priority support & onboarding
Ship agent changes without the guesswork.
Build a mirror from your traces in minutes.