2026-06-02 12:22 UTCIn-site rewrite6 min readUpdated: 2026-06-30 13:03 UTC

Dumb core, smart edge for AI agents

Many production agentic systems fail due to concentrated intelligence in a hard-to-test orchestrator. This article introduces the 'Dumb Core, Smart Edge' principle: orchestrators should be stateless state machines, while domain intelligence resides in replaceable specialist nodes. This design improves testability, cost efficiency, and replaceability, reducing coupling and enabling scientific improvement through causal engineering.

SourceHacker News AIAuthor: arizen

Article intelligence

EngineersAdvanced

Key points

A 'smart core' orchestrator handling routing, domain logic, memory, and tool selection is a common architecture failure in production agentic systems.
The Dumb Core, Smart Edge principle: orchestrators encode state transitions; specialists own domain intelligence within explicit boundaries.
Each specialist should emit a typed contract (e.g., Pydantic model); the orchestrator consumes the type, never raw text.
The replaceability test: you should be able to swap any specialist without modifying the orchestrator or adjacent nodes.

Why it matters

This matters because a 'smart core' orchestrator handling routing, domain logic, memory, and tool selection is a common architecture failure in production agentic systems.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

This panel is AI-generated and reviewed for accuracy.

Many agentic systems I've watched fail in production had the same shape: intelligence concentrated at the center, where it was hard to test, replace, or reason about. The orchestrator was doing too much — routing requests, holding domain knowledge, managing memory, selecting tools, and shaping outputs, all at once. When something broke, you couldn't isolate it. When requirements changed, you couldn't surgically update it. The whole thing had to move.

This is not a model quality problem. The teams building these systems weren't using inferior LLMs. They were running into an architectural error recurring enough to be useful to name, with a clear counter-principle.

The principle is this: Dumb Core, Smart Edge. The orchestrator — the central node that sequences work — should be nearly stateless, encoding only control flow. The intelligence belongs at the periphery, in specialist nodes that own their domain boundary and can be reasoned about, tested, and replaced in isolation. This is not just a stylistic preference. It is a useful topology when the system has multiple domains, tools, or responsibilities that need to evolve independently.

TL;DR — Key Takeaways:

A "smart core" orchestrator that handles routing, domain logic, memory, and tool selection in one prompt is a common architecture failure in production agentic systems.

The Dumb Core, Smart Edge principle: orchestrators encode state transitions; specialists own domain intelligence within explicit boundaries.

The economic advantage comes from narrower context: specialists receive the slice needed for their task instead of the full system prompt and every tool schema.

The replaceability test: you should be able to swap any specialist implementation without modifying the orchestrator or adjacent nodes.

Every specialist edge should emit a typed Pydantic contract — the orchestrator consumes the type, never raw text.

LangGraph conditional edges map directly to this pattern: typed state carries routing metadata, nodes carry intelligence.

Latency is a real tradeoff. Measure it against the gains in testability, replaceability, and parallel execution.

Why Intelligence at the Core Fails

The instinct to centralize intelligence is understandable. You have one powerful model. You write one system prompt. You give it all the tools and let it figure things out. In a demo, this works. The model is capable enough to juggle concerns simultaneously, and the happy path looks clean.

Production breaks this in four distinct ways.

First: entanglement. When the orchestrator encodes both routing logic and domain knowledge, changing one requires touching the other. A new tool means rewriting the prompt. A new domain rule means re-testing the routing. The system develops coupling that has no architectural boundary — just a long, fragile string of natural language instructions.

Second: untestability and opacity. You cannot write a unit test for an LLM prompt that does five things at once. You can only run end-to-end evaluations and observe whether the emergent behavior is stable. When an agent misbehaves, you need to know whether the failure was in routing, domain reasoning, tool selection, or output formatting. A smart core conflates all of these — regressions are invisible until production, and then you're debugging a black box with a flashlight.

Third: cost amplification. A smart-core agent sends every token of context — full history, all tool schemas, all domain rules — through the same path, for every step. In a multi-step workflow, this repeatedly pays for context the current step may not need. A dumb core routes to specialists. The specialists receive only what they need. At model-routing economics scale — millions of calls — the economics compound fast.

Dimension Smart Core (Monolithic) Dumb Core, Smart Edge

Testability End-to-end only; regressions invisible until production Each specialist unit-testable with typed inputs/outputs

Cost per request Full context and all tool schemas sent through one path Each specialist gets the context slice relevant to its task

Blast radius Any prompt edit risks cascading regression across all behaviors Changes isolated to one specialist; orchestrator untouched

Debuggability Single opaque prompt; failure source ambiguous Typed contracts at each edge; failure pinpointed to specific node

Model flexibility Locked to one model for all tasks Each specialist can use the model or deterministic function that fits its task

Latency Single call path, but one large prompt Extra routing step; specialists can run in parallel when the graph allows it

Figure 1: Two topologies — intelligence concentrated at the core (fragile) versus distributed to the edge (controllable)

The Principle Defined

The Principle of Structural Periphery: In agent systems with multiple domains or tool boundaries, the capacity for autonomous judgment should be maximally distributed toward the edges of the graph, leaving the core responsible only for state transitions and routing contracts.

This is not a new idea; it is old systems engineering showing up inside agent graphs. Unix pipes work because each program does one thing and communicates through a simple, stable interface. Microservices work — when they work — for the same reason. The internet's core protocols are deliberately simple; the intelligence lives in the endpoints. Dumb Core, Smart Edge applies this same logic to the agent graph.

The orchestrator's job is precisely scoped: receive a typed input, determine which specialist handles it, dispatch, await a result, and emit a typed output. It encodes no domain knowledge. It holds no persistent memory. It makes no judgment calls about content. It is a state machine — and it should be readable as one.

The specialists, by contrast, are autonomous within their domain boundary. A ResearchSpecialist knows how to query sources, evaluate credibility, and synthesize findings. A CodeReviewSpecialist knows the codebase conventions, the failure modes to check, and how to format its output. Each one can be prompted, fine-tuned, swapped, or evaluated entirely independently. The orchestrator never needs to know what changed.

Figure 2: The state machine contract — how a dumb orchestrator manages control flow without encoding domain knowledge

What "Dumb" Actually Means

"Dumb core" is a precise term, not a pejorative. The orchestrator can still use an LLM for the CLASSIFY_INTENT step — classifying an incoming request into a typed intent is a legitimate use of language understanding. What it must not do is reason about the content of that intent, apply domain heuristics, or make judgment calls about how to handle edge cases within a domain.

A useful test: if you removed all domain-specific vocabulary from the orchestrator's system prompt and replaced it with a generic placeholder, would the routing logic still work? If yes, the core is appropriately dumb. If the routing depends on understanding what "a refund request with a disputed charge" means, domain knowledge has leaked into the core and you have a coupling problem.

The orchestrator's system prompt should read like a traffic controller's manual — rules about flow, priority, and failure handling, with no opinion about the cargo. The specialists' prompts should read like expert briefs — deep, opinionated, and narrow.

This also governs memory architecture. Shared, global memory accessible to the orchestrator is a smell. Each specialist should own its working memory for the duration of its task. Persistent memory — user preferences, prior conversation summaries, learned facts — should be retrieved by specialists on demand, not pre-loaded into the orchestrator's context. The orchestrator passes a session identifier, not a memory dump.

The Replaceability Test

The practical proof of Dumb Core, Smart Edge is what I call the replaceability test: you should be able to swap any specialist node — replacing a prompt-based implementation with a deterministic algorithm, a smaller model, or a different specialist implementation — without modifying the orchestrator or any adjacent specialist. The blast radius should stay localized to that specialist boundary.

If a swap requires changes outside the node being replaced, the system has hidden coupling. The most common source: the orchestrator is parsing or depending on the internal structure of a specialist's output, rather than consuming a typed contract. The fix is always the same — define a Pydantic schema at the edge, validate on emission, and let the orchestrator consume the type, never the raw text.

Figure 3: The replaceability test — swapping an edge specialist without touching the orchestrator or adjacent nodes

This is where Causal Engineering enters the picture. When you can replace a node and observe the isolated effect on system behavior, you have a causal handle on the system. You can run controlled experiments: same orchestrator, same adjacent specialists, different specialist B. The delta in output quality is attributable to specialist B alone. This is how you improve an agentic system scientifically rather than through intuition and prayer.

Applying This in Practice

The implementation pattern in LangGraph is direct. Define your orchestrator as a graph with typed state — a TypedDict or Pydantic model that carries only routing metadata, session identifiers, and result slots. Each node in the graph is a specialist function that receives a narrowly scoped input and returns a typed output. The edges encode the control flow. The nodes encode the intelligence.

Here is a minimal dumb-core orchestrator in LangGraph v0.4:

from typing import Literal from pydantic import BaseModel from langgraph.graph import StateGraph, END

--- Typed state: routing metadata only, no domain knowledge ---

class OrchestratorState(BaseModel): user_input: str intent: str = "" specialist_result: str = "" session_id: str = ""

--- Intent classifier (the one LLM call the core is allowed) ---

async def classify_intent(state: OrchestratorState) -> dict:

Fast classifier model or deterministic classifier

intent = await llm_classify(state.user_input) # returns: "research" | "code_review" | "unknown" return {"intent": intent}

--- Routing edge: pure function over typed state ---

def route_by_intent(state: OrchestratorState) -> Literal["research", "code_review", "fallback"]: mapping = {"research": "research", "code_review": "code_review"} return mapping.get(state.intent, "fallback")

--- Specialist nodes (each owns its domain boundary) ---

async def research_specialist(state: OrchestratorState) -> dict: result = await run_research_agent(state.user_input, state.session_id) return {"specialist_result": result.model_dump_json()}

async def code_review_specialist(state: OrchestratorState) -> dict: result = await run_code_review_agent(state.user_input, state.session_id) return {"specialist_result": result.model_dump_json()}

--- Graph assembly ---

graph = StateGraph(OrchestratorState) graph.add_node("classify", classify_intent) graph.add_node("research", research_specialist) graph.add_node("code_review", code_review_specialist) graph.add_node("fallback", lambda s: {"specialist_result": "I can't help with that."})

graph.set_entry_point("classify") graph.add_conditional_edges("classify", route_by_intent) graph.add_edge("research", END) graph.add_edge("code_review", END) graph.add_edge("fallback", END)

app = graph.compile()

Notice what the orchestrator does not contain: no domain vocabulary, no tool schemas, no memory retrieval, no output formatting logic. The route_by_intent function is a pure mapping. If you added a new specialist tomorrow — say, legal_review — you'd add one node and one entry in the mapping. The existing specialists are untouched.

Each specialist enforces its contract with a typed output schema:

from pydantic import BaseModel, Field

class SpecialistResult(BaseModel): """Contract between any specia

[truncated for AI cost control]