AI News HubLIVE
站内改写1 min read

ETLs in the Era of AI and Sandboxes

This article describes an ETL architecture that combines AI agents with sandboxed execution environments, enabling secure and traceable data movement through clear contracts and boundaries.

SourceHacker News AIAuthor: lol-lol-lol-2

Flow Board / intent becomes evidence step 1 / plan

AI HarnessReads goal and writes a bounded spec.

calls Crabbox ->

CrabboxLeases a worker and injects the named profile.

runs command ->

Worker + AirbyteReads source and writes target. Agent never sees rows.

returns evidence ->

EvidenceLogs, metrics, JUnit, redacted config.

control = intent + command credentials = profile -> env data = source -> target evidence = artifacts -> decision

run replay

One job, traced from request to repair.

Click a row. The main flow jumps to the same boundary.

mental model

Everything is easier when each box owns one question.

Read the system as 4 contracts. Each box gets a narrow input, owns one decision, and emits a narrow output.

Intent -> Spec

Goal becomes refs, profile, retry policy, validation, artifacts.

Spec -> Run

Spec becomes a sandboxed command with a durable run id.

Profile -> Env

Profile name becomes scoped variables inside the worker only.

Source -> Target

Connector moves rows directly. The prompt never becomes the data plane.

Worker -> Evidence

Execution becomes logs, JUnit, metrics, counts, redacted config.

Evidence -> Action

Signals become finish, retry, repair, or alert.

runnable shape

The runnable shape has 3 contracts.

A useful agent output is not prose. It is a spec contract, an execution handoff, and an evidence contract.

ai-agent-dispatch.sh

Goal: sync CRM accounts into the warehouse safely.

crabbox pool ensure example-org/data-movement/main/provider/linux/etl \ --min-ready 3 \ --create -- \ --cache-volume airbyte-etl

mkdir -p .crabbox/generated cat > .crabbox/generated/accounts-sync.json

failure map

First find the owner. Then read the signal.

Failures are not mysteries. They are boundary breaks. Each class tells you where to look first and what you are allowed to change.

The loop is simple because the boundaries are hard.

Agent plans. Crabbox runs. Airbyte moves. Evidence returns. Repeat only when the evidence says what changed.

Replay