Promptetheus – Trace, detect, and auto-repair AI agent failures
Promptetheus is debugging infrastructure for AI agents, providing a Python SDK, local replay tooling, hosted trace delivery, and MCP evidence access to help coding agents auto-repair failed runs.
Notifications You must be signed in to change notification settings
Fork 1
Star 30
BranchesTags
Open more actions menu
Folders and files
NameName
Last commit message
Last commit date
Latest commit
History
27 Commits
27 Commits
.agent/skills/promptetheus
.agent/skills/promptetheus
.github/workflows
.github/workflows
docs
docs
examples
examples
packages/promptetheus
packages/promptetheus
tests
tests
.gitignore
.gitignore
LICENSE
LICENSE
README.md
README.md
conftest.py
conftest.py
start.md
start.md
Repository files navigation
Promptetheus is debugging infrastructure for AI agents: a Python SDK, local replay tooling, hosted trace delivery, and MCP evidence access for coding agents that need to fix failing agent runs.
What You Get
One trace per user-visible agent task.
Decorators for top-level agent runs, tool calls, and nested spans.
Typed events for user messages, agent messages, tool calls, browser actions, DOM snapshots, screenshots, LLM calls, retrieval, metrics, errors, scores, and final goal checks.
Durable delivery that never crashes the host agent. If HTTP delivery is not configured or fails, events spool locally and can be replayed later.
Local CLI tools for doctor checks, spool inspection, session replay, diffing, and failure fingerprints.
Hosted MCP config snippets for read-only incident evidence scoped to a workspace and Supabase project.
Install
For a normal project, install from PyPI:
pip install promptetheus promptetheus version
Create or configure a hosted project key:
export PROMPTETHEUS_CONSOLE_TOKEN=... promptetheus init \ --workspace-name "Acme" \ --project-name "Browser Agent" \ --write-env .env source .env promptetheus doctor
For local self-hosted development:
promptetheus init \ --api-url http://127.0.0.1:4318 \ --console-token pt_console_token \ --write-env .env source .env
For contributor work from this repository:
pip install -e packages/promptetheus promptetheus version
With transport="auto", the SDK sends to the configured API when PROMPTETHEUS_API_KEY is present. Without a key, it writes to the local spool so the instrumented agent keeps running.
Observe With Decorators
Use decorators when you want instrumentation to sit directly on agent and tool functions:
import promptetheus as pt
@pt.tool def search_calendar(day: str) -> list[str]: return ["Tuesday 2pm", "Tuesday 3pm"]
@pt.traced("choose-slot") def choose_slot(slots: list[str]) -> str: return "Wednesday 2pm"
@pt.observe( agent="calendar-agent", user_goal="Book Tuesday at 2pm", transport="auto", # use "spool" to force local JSONL while trying this ) def run_agent(goal: str) -> str: pt.current().user_message(goal) slots = search_calendar("Tuesday") selected = choose_slot(slots) pt.current().agent_message(f"Booked {selected}") pt.current().goal_check( False, mismatches=["selected Wednesday, not Tuesday"], ) return selected
run_agent("Book Tuesday at 2pm")
What each decorator does:
@pt.observe(...) starts one trace/session around the top-level run.
@pt.tool records tool_call and tool_result events inside the current session.
@pt.traced("name") adds a nested span to the replay tree without starting a separate session.
pt.current() returns the active session so the agent can record user messages, agent messages, goal checks, errors, metrics, and other events.
goal_check(False) is visible in replay, fingerprints, and tail sampling. If a failed goal should also make the process fail, record the goal check and then raise an exception so the terminal session_end status is failed:
if not selected.startswith("Tuesday"): pt.current().goal_check(False, mismatches=["selected Wednesday"]) raise RuntimeError("agent selected the wrong day")
What You Can See
When no API key is configured, transport="auto" writes local JSONL. While learning, you can also pass transport="spool" to force local output. After a local or spooled run, list sessions:
promptetheus sessions
Example output:
01KVMZ4T7V2SN61ZWG1XTDBK47: 11 event(s)
Replay the timeline:
promptetheus replay 01KVMZ4T7V2SN61ZWG1XTDBK47
Example output:
[0] state_change name='session_started' [1] tool_call tool_name='run_agent' [2] user_message content='Book Tuesday at 2pm' [3] tool_call tool_name='search_calendar' [4] tool_result call_id='190a6438979141f5ac11b2e1b2ee29a0' [5] state_change name='span_start' [6] state_change name='span_end' [7] agent_message content='Booked Wednesday 2pm' [8] goal_check passed=False [9] tool_result call_id='a78566297e0a4a309d5ce44cefe0d836' [10] session_end status='completed'
Replay the run tree:
promptetheus replay 01KVMZ4T7V2SN61ZWG1XTDBK47 --tree
Example output:
[0] state_change name='session_started' [1] tool_call tool_name='run_agent' [2] user_message content='Book Tuesday at 2pm' [3] tool_call tool_name='search_calendar' [4] tool_result call_id='190a6438979141f5ac11b2e1b2ee29a0' [7] agent_message content='Booked Wednesday 2pm' [8] goal_check passed=False [9] tool_result call_id='a78566297e0a4a309d5ce44cefe0d836' [10] session_end status='completed' choose-slot span=span_163a8380174647e98bfe1f3fff9e15b9 duration_ms=0.0
Generate a failure fingerprint:
promptetheus fingerprint 01KVMZ4T7V2SN61ZWG1XTDBK47
Example output:
8ae0f41220d0 goal mismatch: selected wednesday, not tuesday
- goal:selected wednesday, not tuesday
Inspect the local delivery spool:
promptetheus spool list
Example output:
Spool: .promptetheus/spool pending : 11 event(s) across 1 session file(s), 4082 bytes dead : 0 event(s) across 0 file(s), 0 bytes 01KVMZ4T7V2SN61ZWG1XTDBK47: 11 pending
The raw spool is JSONL. Each line is an event envelope:
{ "type": "tool_call", "session_id": "01KVMZ4T7V2SN61ZWG1XTDBK47", "seq": 1, "idempotency_key": "01KVMZ4T7V2SN61ZWG1XTDBK47:29c5eff0:1", "payload": { "tool_name": "run_agent", "call_id": "a78566297e0a4a309d5ce44cefe0d836", "arguments": { "args": "('Book Tuesday at 2pm',)", "kwargs": "{}" } } }
Manual Trace API
Use pt.trace.start(...) when you control the run boundary and want explicit event calls instead of decorators:
import promptetheus as pt
with pt.trace.start( agent="demo-agent", user_goal="Book a meeting for Tuesday", transport="auto", ) as session: session.user_message("Please book the small room for Tuesday at 2pm") session.tool_call("calendar.search", {"day": "Tuesday"}, call_id="calendar-1") session.tool_result("calendar-1", result={"available": ["2pm", "3pm"]}) session.agent_message("Booking confirmed for Wednesday at 2pm") session.goal_check(False, mismatches=["booked Wednesday, not Tuesday"])
session_end is emitted automatically; transport flush runs on exit
Public SDK API
The package exposes these primary entry points:
import promptetheus as pt
pt.trace.start(...) pt.start(...) pt.observe(...) pt.tool pt.traced(...) pt.current() pt.Session pt.AsyncSession pt.AgentRuntime
Common session helpers:
session.user_message("Book Tuesday at 2pm Pacific") session.agent_message("I found availability") session.tool_call("browser.click", {"selector": "#checkout"}, call_id="click-1") session.tool_result("click-1", result={"ok": True}) session.retrieval("refund policy", documents=[{"id": "doc-1", "score": 0.91}]) session.browser_action("click", "#checkout", url=page.url) session.dom_snapshot(page.url, visible_text, selected_values={"day": "Tuesday"}) session.screenshot(page.screenshot()) session.replay_artifact("trace.webm", artifact_type="screen_recording", event_time_map={}) session.llm_call("gpt-5", input_tokens=100, output_tokens=40, latency_ms=900) session.score("goal_match", 0.2, comment="Selected the wrong day") session.metric("steps", 12, unit="count") session.error(RuntimeError("calendar API timeout"), handled=True) session.goal_check(False, mismatches=["selected Wednesday"]) session.end("failed") session.flush(timeout=2)
Every helper writes a schema-valid event envelope with type, session_id, timestamp, seq, idempotency_key, and payload. Use metadata for safe, low-cardinality context. Do not put raw secrets, cookies, tokens, or credentials into event payloads.
Async Agents
Use AsyncSession when the top-level agent run is async:
from promptetheus import AsyncSession
async with AsyncSession(agent="voice-agent", user_goal="Summarize the call") as session: session.user_message("Summarize this call") async with session.aspan("transcribe"): session.metric("audio_seconds", 42, unit="seconds") session.goal_check(True)
Browser Agents
Browser agents should record the user goal, critical browser actions, the final DOM state, and an explicit goal check:
session.browser_action("click", "#confirm", url=page.url) session.dom_snapshot( page.url, visible_text=await page.locator("body").inner_text(), selected_values={"day": "Wednesday", "time": "2pm"}, warnings=["Timezone changed from Pacific to Eastern"], ) session.goal_check( False, mismatches=["booked Wednesday", "timezone warning visible"], )
This is the path that lets Promptetheus replay a failure and produce fix-agent evidence instead of just storing generic logs.
Framework Adapters
Adapters are optional and imported lazily. Install only the extra you need:
pip install "promptetheus[openai]" pip install "promptetheus[anthropic]" pip install "promptetheus[langchain]" pip install "promptetheus[playwright]"
Available adapter exports:
from promptetheus.adapters import ( AnthropicAdapter, AutoGenAdapter, CrewAIAdapter, DSPyAdapter, HaystackAdapter, LangGraphAdapter, LiteLLMAdapter, LlamaIndexAdapter, OpenAIAdapter, OpenTelemetryBridge, PlaywrightAdapter, PromptetheusCallbackHandler, PydanticAIAdapter, )
Use adapters when a framework already emits structured callbacks. Keep custom instrumentation close to the real run boundary when the framework does not.
Runtime Coordination
AgentRuntime is a best-effort client for live, service-backed coordination. It is separate from durable trace storage and never raises into host code when the service is unavailable:
from promptetheus import AgentRuntime
runtime = AgentRuntime(session.session_id) runtime.remember("hypothesis", {"summary": "auth header may be missing"}) hint = runtime.before_tool_call("pytest", command="pytest tests/server")
result = run_tests() runtime.after_tool_call( "pytest", command="pytest tests/server", status="failed" if result.failed else "succeeded", error=result.error, ) runtime.heartbeat(phase="investigating", current_file="tests/server/test_mcp.py") next_hint = runtime.next_hint()
CLI Workflows
In a fresh install, local gateway and MCP commands need their extras:
pip install "promptetheus[server,mcp]"
promptetheus dev # boot local FastAPI ingestion on :4318 promptetheus doctor # config, reachability, spool summary promptetheus spool list # pending local delivery files promptetheus spool replay # retry pending delivery through the API promptetheus sessions # list locally spooled sessions promptetheus replay # print a flat timeline promptetheus replay --tree promptetheus diff promptetheus fingerprint promptetheus import exported-session.json
spool purge deletes local spool files. Use it only when you are sure the data is no longer needed.
MCP Evidence Access
Generate hosted MCP client config without mutating global client files:
promptetheus mcp install \ --client codex \ --workspace acme \ --project-ref abcdefghijklmnopqrst
Supported clients are codex, claude, and cursor. The generated config uses a stdio bridge to hosted Promptetheus MCP and defaults to read-only, project-scoped Supabase evidence. SDK clients and MCP client config should not receive Supabase service-role keys.
For local stdio development:
promptetheus mcp
Developing In This Repo
The SDK lives under packages/promptetheus/promptetheus. Tests live at the repository root under tests.
Useful commands:
uv run --project packages/promptetheus --extra dev pytest tests/sdk -q uv run --project packages/promptetheus --extra d
[truncated for AI cost control]