AI News HubLIVE
站内改写

Feedback on a runtime-agnostic AI agent workflow spec (LangGraph/Mastra)

itsharness is a complete harness for building, running, and observing AI agent workflows. It offers a visual canvas to design flows, exports a runtime-agnostic spec, compiles to various frameworks, and supports running, tracing, and debugging. The spec is at version 0.2.0 with 14 node types and 5 example flows.

Article intelligence

EngineersAdvanced

Key points

  • itsharness provides a visual canvas for designing AI agent workflows and exports a runtime-agnostic JSON spec.
  • Adapters compile the spec to frameworks like LangGraph, CrewAI, Mastra, and Microsoft Agent Framework.
  • Features 14 node types including LLM call, tool invoke, condition, parallel, HITL breakpoint, memory, and more.
  • Currently in Phase 1 with a complete canvas; the LangGraph adapter is a stub awaiting RFC feedback.

Why it matters

This matters because itsharness provides a visual canvas for designing AI agent workflows and exports a runtime-agnostic JSON spec.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

Notifications You must be signed in to change notification settings

Fork 0

Star 2

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

4 Commits

4 Commits

adapter

adapter

flows

flows

spec

spec

src

src

.gitignore

.gitignore

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

docker-compose.yml

docker-compose.yml

index.html

index.html

package-lock.json

package-lock.json

package.json

package.json

tsconfig.json

tsconfig.json

tsconfig.node.json

tsconfig.node.json

vite.config.ts

vite.config.ts

Repository files navigation

A complete harness for building, running, and observing AI agent workflows.

Design flows on a visual canvas → export a runtime-agnostic spec → compile to your framework → run, trace, and debug — all from one tool.

flow.json → [ langgraph adapter ] → Python / LangGraph → [ crewai adapter ] → Python / CrewAI → [ mastra adapter ] → TypeScript / Mastra → [ microsoft_agent_framework adapter ] → C# + Python / MS Agent Framework → [ A2A protocol ] → any A2A-compatible runtime

What it is

Most agent tooling is either high-level (too much magic, hard to debug) or low-level (too much boilerplate, slow to iterate). itsharness sits in the middle:

Draw — 14 node types on a visual canvas. Every spec field is directly editable.

Own the spec — the canvas emits a versioned, runtime-agnostic JSON spec you control and can version alongside your code.

Compile — one API call transforms the spec into runnable code for whichever framework you use.

Run and observe — execution overlays, token streaming, Langfuse telemetry, HITL pause/resume.

Compose — deployed flows expose themselves as REST, MCP tools, and A2A agents simultaneously. External A2A agents (Google ADK, OpenAI Agents SDK, Claude Agent SDK) are invocable as canvas nodes without writing new adapters.

The spec is the contract. The canvas is the editor. The adapters are the compilers.

Repository structure

itsharness/ │ ├── spec/ ← @itsharness/flow-spec — published npm package │ ├── schema.ts Canonical Zod schema (source of truth) │ ├── schema.json Derived JSON Schema (use for non-TS validation) │ ├── CHANGELOG.md Version history │ └── package.json {"name": "@itsharness/flow-spec", "version": "0.2.0"} │ ├── flows/ ← Reference example flows (JSON) │ ├── 01-rag-agent-flow.json │ ├── 02-content-moderation-hitl-flow.json │ ├── 03-parallel-risk-assessment-flow.json │ ├── 04-research-crew-flow.json │ └── 05-debate-agent-a2a-flow.json │ ├── src/ ← Canvas app (React + TypeScript + XYFlow) │ ├── spec/ │ │ ├── schema.ts Canvas copy — kept in sync with spec/schema.ts │ │ ├── validation.ts Cross-ref rules (edge targets, store IDs, agent refs) │ │ ├── examples.ts 5 example flows as TS constants (for sidebar) │ │ └── schema.test.ts Vitest suite — validates all 5 flows │ ├── store/ │ │ ├── index.ts Zustand canvas store (persisted) │ │ └── library.ts Flow library store (persisted) │ ├── canvas/ │ │ ├── Canvas.tsx ReactFlow wrapper │ │ ├── nodes/ 14 node visual components + registry │ │ └── edges/ DirectEdge, ConditionalEdge │ └── components/ │ ├── Toolbar.tsx Top bar — undo/redo, auto-layout, validate, export │ ├── Sidebar.tsx Node palette + registry shortcuts + My Flows │ ├── ConfigPanel.tsx Per-node config panels (all 14 types) │ ├── EdgeConfigPanel.tsx Edge label + context_from editor │ ├── FlowSettingsModal.tsx 6-tab flow-level settings │ ├── FlowLibraryPanel.tsx Library management │ ├── ImportDialog.tsx File import with inline validation errors │ └── ProblemsPanel.tsx Validation error list │ ├── adapter/ ← LangGraph Python sidecar (FastAPI) │ ├── main.py /health + /compile stub (codegen after RFC) │ └── requirements.txt │ ├── docker-compose.yml ← One command: canvas + adapter ├── CONTRIBUTING.md ← Contribution process └── LICENSE ← Apache 2.0

spec/schema.ts vs src/spec/schema.ts spec/schema.ts is the canonical schema published as @itsharness/flow-spec. The canvas uses its own copy at src/spec/schema.ts — functionally identical but without .refine() on individual node types (Zod's z.discriminatedUnion() requires bare ZodObject members). When the spec changes, update both and run npm test to confirm all 5 example flows still validate.

The spec — @itsharness/flow-spec

The spec is a runtime-agnostic JSON format. You describe a workflow once; adapters translate it to runnable code for whichever framework you target.

Current version: 0.2.0 · RFC: open — see CONTRIBUTING.md

The 14 node types

Node What it does Runtime support

input Flow entry point; declares output schema All

output Flow exit point; optional exit code All

llm_call Single LLM invocation — structured output, validator, streaming All

tool_invoke Calls a named tool from the flow's tools registry All

condition Branching — JSONPath expression or fn_ref All

parallel_fork Fan-out to N concurrent branches All

parallel_join Fan-in — configurable reducer: merge, append, fn_ref All

hitl_breakpoint Suspend execution; wait for a typed human resume payload All (adapter variation)

memory_read Read from a named store — key-value or semantic (vector) All

memory_write Write to a named store — upsert or overwrite All

subgraph Embed another flow as a node LG/MA: full · CR/MS: partial

transform State transformation — mapping (no-code) or fn_ref All

agent_role Execute an agent persona from the agents[] registry CR: native · LG/MA/MS: synthesised

agent_debate Multi-agent conversation loop with termination condition MS: native GroupChat · others: synthesised

Validating a flow

JSON Schema (any language)

npx ajv-cli validate -s spec/schema.json -d flows/01-rag-agent-flow.json

Python

python3 -c " import json, jsonschema schema = json.load(open('spec/schema.json')) flow = json.load(open('flows/01-rag-agent-flow.json')) jsonschema.validate(flow, schema) print('valid') "

TypeScript (Zod)

import { parseFlowSpec } from './spec/schema' import flow from './flows/01-rag-agent-flow.json' const result = parseFlowSpec(flow) if (!result.success) console.error(result.error.issues)

A minimal flow

{ "spec_version": "0.2.0", "id": "hello-flow", "runtime_hints": { "preferred_adapter": "langgraph" }, "state_schema": { "type": "object", "properties": { "question": { "type": "string" }, "answer": { "type": "string" } } }, "nodes": [ { "id": "start", "type": "input", "output_schema": { "type": "object", "properties": { "question": { "type": "string" } } } }, { "id": "answer", "type": "llm_call", "prompt_template": "Answer this: {{$.state.question}}", "output_key": "answer" }, { "id": "done", "type": "output" } ], "edges": [ { "type": "direct", "from": "start", "to": "answer" }, { "type": "direct", "from": "answer", "to": "done" } ] }

Example flows

Five reference flows — each valid against spec/schema.json, each targeting a different adapter:

Flow Adapter Exercises

01 — RAG Agent LangGraph memory_read semantic, transform fn_ref, vector + kv stores, streaming

02 — Content Moderation + HITL Mastra llm_call structured output, condition, hitl_breakpoint + resume schema

03 — Parallel Risk Assessment CrewAI parallel_fork/join, agent_role ×3, memory_access: "isolated"

04 — Research Crew CrewAI context_from on edges, memory_access: "shared", tool_approval: "human"

05 — Debate Agent + A2A MS Agent Framework agent_debate, runtime_support overrides, full a2a_config

The canvas — Phase 1

A visual editor for the spec. Draw a flow, configure every field, validate, and export — the canvas emits clean spec JSON at all times.

Running locally

npm install npm run dev # → http://localhost:3000 npm test # 17 tests — all 5 example flows + cross-ref error cases

With Docker (canvas + Python adapter sidecar together):

docker compose up

canvas → http://localhost:3000

adapter → http://localhost:8000/health

What's built

Canvas

All 14 node types with per-type config panels — every spec field editable

Drag-to-add from the node palette; drag-to-connect between handles

Click any edge to edit label and context_from (CrewAI Task.context)

Auto-layout (dagre LR), undo/redo (50 steps), keyboard shortcuts (Delete, Escape, Ctrl+Z)

Runtime compatibility badges per node (LG / CR / MA / MS)

Flow settings (⚙ button)

6-tab modal: flow identity, state schema editor, memory stores registry, tools registry, agents registry, flow_config (checkpoint / streaming / telemetry / A2A)

Spec validation

Zod validation on every canvas change — errors shown inline

Cross-ref validation: edge targets, store IDs, agent refs

Problems panel listing all errors with clickable links to offending nodes

Import dialog with per-error display and "load anyway" path for warnings-only

Persistence

Auto-save to localStorage:itsharness:current on every change — survives page refresh

Flow library (localStorage:itsharness:library): save, load, rename, delete named snapshots

Dirty indicator — amber dot when unsaved library changes exist

Export

Export spec JSON (download as {id}.json)

Copy spec to clipboard

POST http://localhost:8000/compile — spec JSON → compiled code (stub in Phase 1)

The adapter — Phase 1 stub

The FastAPI sidecar at adapter/main.py accepts a FlowSpec JSON and will return compiled Python. In Phase 1 it returns a stub:

curl -s http://localhost:8000/health

{"status":"ok","adapter":"langgraph","phase":"1-stub"}

curl -s -X POST http://localhost:8000/compile \ -H "Content-Type: application/json" \ -d @flows/01-rag-agent-flow.json | jq .runtime

"langgraph"

Real codegen is gated on the RFC closing. The spec's field names — output_key, query_expr, context_from semantics, resume_schema — are the open questions most likely to attract feedback, and they're the ones the adapter hardcodes. Waiting avoids rewriting 300+ lines after feedback.

Adapter order and rationale

# Runtime Phase Key spec mappings

1 LangGraph · Python · MIT Phase 1 node→fn, edge→add_edge, condition→add_conditional_edges+router, hitl→interrupt()+update_state(), parallel→Send(), agent_role→named subgraph, agent_debate→conditional loop

2 CrewAI · Python · MIT Phase 3 agents[]→Agent(role,backstory,goal), agent_role node→Task(agent=...), context_from edge→Task.context=[], process_type→Crew(process=), agent_debate→consensual process, parallel→async_execution=True

3 Mastra · TypeScript · Apache 2 Phase 3 nodes→createStep(), edges→.then()/.branch()/.parallel(), agent_role→createAgent(), hitl→suspend()/resume(), state_schema→Zod schema, context_from→step input mapping

4 Microsoft Agent Framework · C# + Python · MIT Phase 4 agent_debate→GroupChat+GroupChatManager, agent_role→AssistantAgent, nodes→KernelProcessStep, edges→KernelProcessEvent, hitl→human_input_mode=ALWAYS, context_from→step input injection

~ A2A Protocol Phase 2 Not codegen — invocation + exposure layer. a2a_config→AgentCard, hitl→task state input-required, streaming→TaskArtifactUpdateEvent. Replaces custom adapters for Google ADK, OpenAI Agents SDK, Claude Agent SDK, and any future A2A-compatible runtime.

On A2A scope: You write custom adapters for 4 runtimes (LangGraph, CrewAI, Mastra, MS Agent Framework) — the ones where users want to author flows visually and export runnable code. For every other runtime, A2A gives invocation-level interoperability without a custom adapter. This bounds adapter build work to 4 runtimes, permanently.

Roadmap

Phase Scope Status

0 — Spec design Primitive extraction · concept map · node taxonomy · spec schema v0.2 · 5 example flows · RFC ✅ Complete

1 — Canvas + LangGraph adapter XYFlow canvas · 14 node components · spec validation · persistence · library · LangGraph adapter (Python sidecar) 🟡 Canvas complete · adapter awaiting RFC

2 — Observability + HITL + deploy + A2A Langfuse integration · live execution overlay · HITL pause/resume UI · LiteLLM gateway · flow versioning · A2A protocol la

[truncated for AI cost control]