AATF – An open spec for recording why AI agents make decisions
AATF is an open specification and reference SDK for recording AI agent decision chains, focusing on why an agent made each decision, including alternatives considered, confidence levels, and rejected options. It aims to provide decision accountability similar to OpenTelemetry's observability but for agent transparency and trustworthiness. The article covers the core format, quick start, comparisons with existing tools, integrations, project status, and contribution guidelines.
Notifications You must be signed in to change notification settings
Fork 0
Star 0
BranchesTags
Open more actions menu
Folders and files
NameName
Last commit message
Last commit date
Latest commit
History
4 Commits
4 Commits
.github/workflows
.github/workflows
docs
docs
examples
examples
src/agent_audit_trail
src/agent_audit_trail
tests
tests
.gitignore
.gitignore
LICENSE
LICENSE
README.md
README.md
SPEC.md
SPEC.md
pyproject.toml
pyproject.toml
Repository files navigation
Every Agent decision, recorded. Every alternative, documented.
The open specification and reference SDK for recording AI Agent decision chains.
Quick Start · The Format · Blog: Self-Audit Story · Why Not Existing Tools? · SPEC · Examples
What Is This?
AATF is not another logging library. It's an open specification for recording why an AI Agent made each decision — including what alternatives it considered, how confident it was, and what it chose not to do.
Think of it as:
OpenTelemetry → for observability
AATF → for Agent decision accountability
User asks: "Book a flight to Shanghai"
Step 1: [human_input] → User request received Step 2: [reasoning] → Intent: flight booking (confidence: 0.95) Alt: hotel booking → rejected (user said "flight") Alt: train booking → rejected (user said "flight") Step 3: [tool_call] → flight_search_api (342ms) → 3 results Step 4: [reasoning] → Decision: CA1234 at ¥2580 (confidence: 0.88) Alt: MU5678 at ¥2890 → rejected (¥310 more) Alt: CZ9012 at ¥3200 → rejected (over budget)
→ SHA-256 hash chain: ✓ tamper-evident → PII redaction: ✓ email, phone, card numbers → Export: JSON / CSV / HTML (AATF-compliant)
Quick Start (5 Lines)
from agent_audit_trail import AuditSession, Decision, Alternative
with AuditSession(agent_id="my-agent") as session: session.add_reasoning_step( name="choose_tool", decision=Decision( input_summary="User wants weather info", decision="Use weather API", reasoning="Factual query requiring real-time data", confidence=0.95, alternatives_considered=[ Alternative(description="Answer from memory", reason_rejected="Weather changes constantly"), Alternative(description="Ask for clarification", reason_rejected="Query is clear enough"), ] ) )
That's it. Every decision is now recorded with its reasoning, confidence score, and rejected alternatives — in AATF-compliant format.
The AATF Format
The heart of AATF is the Decision record:
{ "type": "reasoning", "name": "intent_classification", "decision": { "input_summary": "User wants to book a flight to Shanghai", "decision": "Classified as flight-booking intent", "reasoning": "Explicit keywords: 'flight' + destination + budget", "confidence": 0.95, "confidence_basis": "All three slots explicitly stated by user", "alternatives_considered": [ { "description": "Hotel booking intent", "reason_rejected": "User said 'flight', not 'hotel'", "score": 0.05 }, { "description": "Train booking intent", "reason_rejected": "User explicitly said 'flight'", "score": 0.02 } ] }, "step_hash": "458942bbf4162f4d9cca121d93b9423413ec..." }
Three things no other format captures:
Feature What It Does Why It Matters
alternatives_considered Forces agents to list what they didn't choose Proves the agent didn't just rationalize a foregone conclusion
confidence + confidence_basis Numeric confidence + how it was determined Lets auditors distinguish "95% sure because X" from "95% sure because vibes"
confidence_trajectory Tracks confidence across the full decision chain Reveals when an agent becomes more or less certain as it gathers information
Why Not Existing Tools?
We respect the existing ecosystem. Here's where AATF fits:
Tool What It Does What AATF Does Differently
Blockchain ledgers (Notary, Action Ledger) Store agent actions on-chain for immutability We're format-agnostic. Store wherever you want. We focus on what to record, not where.
LangChain callbacks Framework-specific tracing We're framework-agnostic. Works with CrewAI, AutoGen, raw Python, or anything.
MCP audit tools Audit tool calls in MCP protocol We go deeper: not just what tool was called, but why it was chosen over alternatives.
General logging (structlog, etc.) Key-value event logs We're structured for decision reasoning, not generic events.
TL;DR: Other tools audit what the agent did. AATF audits why the agent did it.
Integrations
LangChain
from agent_audit_trail.integrations.langchain import AATFCallbackHandler agent = create_agent(callbacks=[AATFCallbackHandler()])
OpenAI
from agent_audit_trail.integrations.openai import AATFOpenAIWrapper client = AATFOpenAIWrapper(OpenAI())
Generic decorator (any framework)
from agent_audit_trail import audit_traced @audit_traced(agent_id="my-agent") def my_agent_function(query): return "answer"
Installation
pip install agent-audit-trail
Zero external dependencies. Python 3.10+. 700 lines of pure stdlib.
Real Self-Audit Example
We used AATF to audit ourselves — an AI Agent reflecting on its own product's flaws. The result is a tamper-evident, 10KB audit trail that proves every reasoning step was genuine and not post-hoc rationalized.
📄 View the full audit trail JSON
The Specification
AATF is an open specification, not a product. The SDK is the reference implementation.
📋 Read the full AATF v0.1.0 Specification
This is a draft spec. We want your feedback. Open an issue if you disagree with any design decision. Especially:
Should alternatives_considered be mandatory or optional?
Is confidence (0.0-1.0) the right abstraction, or should we use qualitative labels?
What hash algorithm should be standard? (Currently SHA-256)
Should the format support streaming/traces that are still in-progress?
Who Is This For?
Role What You Get
Agent Developer Prove your agent reasons well. Debug decision failures. Show stakeholders the full chain.
Compliance Officer Machine-parseable audit trails that map to EU AI Act, GDPR, SOC2 requirements.
CISO Tamper-evident hash chains. PII redaction built-in. Export for auditors.
Researcher Structured data on agent reasoning patterns. Confidence trajectories. Decision trees.
Project Status
✅ AATF Specification v0.1.0
✅ Reference SDK (Python) — 134 tests passing
✅ PII Redaction (email, phone)
✅ Hash Chain Integrity Verification
✅ LangChain / OpenAI / Generic Integrations
✅ JSON / CSV / HTML Export
🔲 PII Redaction expansion (credit card, SSN, API keys, IP)
🔲 TypeScript/JavaScript SDK
🔲 Community RFC process for spec changes
🔲 LangChain/CrewAI published plugins
Contributing
This project wants contributors. If you care about Agent accountability:
Read the SPEC — understand the format
Open an issue — disagree with something? We want to hear it
Build an integration — your framework? Your plugin welcome
Spread the word — star, tweet, blog post
License
MIT. Use it, fork it, improve it. The spec belongs to everyone.
If your Agent can think, its thinking should be auditable.
pip install agent-audit-trail
About
The open specification and reference SDK for recording AI Agent decision chains. Every decision, recorded. Every alternative, documented.
Topics
ai-safety
ai-agents
audit-trail
explainable-ai
open-telemetry
trustworthy-ai
llm-observability
agent-observability
decision-trail
agent-audit
Resources
Readme
License
MIT license
Uh oh!
There was an error while loading. Please reload this page.
Activity
Stars
0 stars
Watchers
0 watching
Forks
0 forks
Report repository
Releases
No releases published
Packages 0
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
Python 100.0%