AI News HubLIVE
In-site rewrite6 min read

Show HN: Agentic Data Engineering

This article introduces agentic data engineering, a practice where autonomous AI agents design, build, and maintain data pipelines from natural-language intent, contrasting it with traditional automation and copilots. It emphasizes the critical role of the 'harness'—a software layer providing grounding, validation, and controls—in making agents safe for production. The piece covers trust challenges, governance, the evolving role of data engineers, and the tools powering this shift.

SourceHacker News AIAuthor: zubairov

All articles

On this page

What is agentic data engineering?

Agentic vs. traditional data engineering — and vs. automation and copilots

How agentic data engineering works

Why the model isn't the bottleneck — it's the harness

What it looks like in practice

The trust problem: governing an AI agent on production data

Will AI agents replace data engineers?

What tools power agentic data engineering

Where agentic data engineering is headed

Getting started

The project has been on the roadmap for two quarters. Cohort retention. Lead scoring. LTV by channel. Every week without it costs something concrete: a board question you can't answer, a campaign you can't attribute, a churn signal you caught too late.

So you did what any technical operator would do in 2026 — you opened ChatGPT, or Claude, and asked it to write the SQL. And it did. Beautifully. The queries ran, the numbers came back, the charts looked clean. Then someone asked whether the retention number was actually right — and you couldn't say. No lineage, no tests, no agreed definition of what "active" even meant. Just fluent SQL nobody had verified. Looking great and being right, it turns out, are very different things.

That gap — between an AI that can write data code and an AI you can trust to ship it — is the whole story of agentic data engineering. This guide explains what the term means, how the workflow actually works, the tools that make it possible, and the part most vendors skip: how you let an agent touch production data without getting burned.

What is agentic data engineering?#

Agentic data engineering is the practice of using autonomous AI agents to design, build, and maintain data pipelines from natural-language intent — instead of an engineer writing every transformation by hand, and with limited human oversight. The agent plans the work, writes the code (ingestion, SQL, tests), runs it, checks the result, and corrects itself; a human reviews and approves the final change.

The key word is agentic. A plain AI assistant answers a question and stops. An agent works toward a goal across many steps on its own — it perceives the state of your data, reasons about what to do next, takes an action, reads the outcome, and loops until the goal is met. Researchers call this the perceive → reason → act → learn loop. In data engineering, that loop looks like: explore the warehouse, write a transformation, run the tests, read the failures, fix them, and present the finished change for review.

This is the shift from doing the "how" by hand to specifying the "what" and reviewing the result. You stop writing every line of SQL and start describing the metric you need — then the agent does the building. The promise is real, but so is the catch, which is the rest of this article.

Agentic vs. traditional data engineering — and vs. automation and copilots#

Three things get confused with agentic data engineering. They're not the same.

What it doesWho decides the steps

Static automation (cron, Airflow DAGs)Runs a fixed sequence someone wrote in advanceA human, ahead of time

AI copilot (autocomplete in your editor)Suggests the next line or block while you driveA human, line by line

AI agentPursues a goal across many steps, adapts to what it findsThe agent, within your guardrails

Traditional data engineeringA person hand-builds each pipeline, query, and testA human, step by step

A scheduler repeats what you already decided. A copilot autocompletes while you stay in control. An agent takes a goal — "build me a weekly cohort-retention model" — and figures out the steps itself, including the ones you didn't anticipate. That autonomy is what makes it powerful, and exactly why the controls around it matter so much.

One more term to untangle: agentic analytics. The two are siblings, not synonyms. Agentic analytics works on the serving side — it asks questions of data that already exists (the BI and query layer). Agentic data engineering works one layer down: it builds and maintains the pipelines and models that produce that data in the first place. You need the engineering layer to be sound before the analytics layer can be trusted.

How agentic data engineering works#

Under the hood, an agentic workflow runs your raw data through the same stages a human data team would — building an AI data pipeline driven by intent instead of tickets:

Ingestion. Source data lands in your warehouse from your apps, CRM, product database, and third-party tools. Connectors like Airbyte or Meltano handle the extract-and-load so the agent has raw tables to work from.

Transformation. The agent writes the models that turn raw tables into clean, business-ready ones — typically as dbt models in a layered (bronze → silver → gold) structure, with tests attached.

Semantic layer. Cleaned tables still don't know what your business means by "active user" or "qualified lead." A semantic layer encodes those definitions once, so every query — human or agent — uses the same math. (We go deep on this in what a semantic layer is and why it matters.)

Serving. The finished metrics are queried by dashboards, notebooks, or — increasingly — by other AI agents over a protocol like MCP (the Model Context Protocol), which lets an agent ask your data questions in a structured, governed way.

You describe the metric; the agent explores the lakehouse, writes the dbt model, builds the semantic overlay, and runs the tests. That's the happy path. Now the part that decides whether any of it is trustworthy.

Why the model isn't the bottleneck — it's the harness#

If you've already pointed an AI coding agent at a data problem and watched it produce confident garbage, you've met the real bottleneck. It isn't the model. It's the harness the model works against.

So what is a harness? A harness is the software layer around an AI model that makes its output safe to ship in production: the grounding that tells the agent which answer is right — your lineage, business semantics, and access policies — and the controls that catch a wrong answer before it lands — validation loops, data contracts, CI/CD, and an audit trail. A model writes code; a harness decides whether that code is safe to merge.

The intuition is simple: the model is only the engine. Impressive on a workbench, but useless until it's bolted into the rest of the car — a chassis, wheels, a steering wheel, pedals to control the power, and a dashboard that shows what's actually happening. The harness is the rest of the car.

Put a generic agent and an agent on a harness side by side:

Generic AI agentAgent on a data harness

Starts fromA blank file — no schema, no definitionsYour schema, dbt models, and business definitions

Picks the right answer byGuessing — fluentlyGrounding: your lineage, semantics, and access policies

Catches a wrong answer withNothing; it shipsValidation loops, PK/unique checks, data contracts, CI/CD

Mistakes surfaceIn production, in a downstream dashboardAt review time, as a pull-request diff

In production it actsUnsupervisedScoped, time-bound, human-in-the-loop, fully audit-logged

That gap is measurable. Snowflake put a number on it: on text-to-SQL — natural language to SQL — a general-purpose model (GPT-4o) scored just 51% on their internal evaluation, while grounding the same task in a governed semantic model pushed accuracy past 90% on real-world queries — nearly 2× single-shot GPT-4o. The difference between a wrong query and a right one is almost entirely context, not capability. That's also why data readiness — clean, tested, well-defined inputs — matters more than which model you pick.

The market already senses this. In Cleanlab's 2025 survey of 1,837 engineering leaders, only ~5.2% run AI agents in production. dbt's 2026 report found 72% of practitioners want AI-assisted coding but only 24% trust it to manage pipelines. METR's 2025 study even measured experienced developers running 19% slower with AI on familiar code while feeling faster. The appetite is real; the reliability isn't — because the harness isn't there.

One principle decides everything downstream: the agent fails at review time, not in production.

What it looks like in practice#

Theory is cheap; here's a concrete worked example. At RevOS we built exactly this harness, so the abstract pieces above have real names. What you install is a packaged offering — a set of APIs, a command-line interface, documentation, and curated agentic skills, wrapped in a dev container you open in your IDE (Visual Studio Code or similar) and drive with Claude Code (or a coding agent of your choice). There's no new UI to learn; it lives where you already write code.

revos init scaffolds a working project — medallion dbt models, semantic cubes, and sample data — that your agent can explore from minute one.

Under that surface, RevOS wires together a best-of-breed stack — automated data ingestion, dbt for transformation, Cube.dev for the semantic layer, Git for versioning, BigQuery as the warehouse — so your agent starts with your schema, your models, and your definitions instead of a blank file. You describe the metric you need; the agent explores your lakehouse, writes the dbt model, and builds the semantic overlay. (For the full product view, see how RevOS helps you build a revenue or growth data layer without hiring a data engineer.)

Then the part that earns trust: when the agent finishes a model, the harness doesn't accept it on faith. Validation loops run. Primary-key and unique-constraint checks fire. YAML data-contract enforcement kicks in. The change moves through the same Git CI/CD pipeline as your application code — and lands on your desk as a pull request with a diff. When the agent gets it wrong (and it will), you catch it in the PR, not at 3 a.m. in a downstream dashboard.

The trust problem: governing an AI agent on production data#

The single hardest question in agentic data engineering is the one most marketing pages dodge: how do you let an autonomous agent near production without it doing something irreversible?

The answer is workflow, not faith. Three controls do the heavy lifting:

Changes ship as pull requests. The agent never writes directly to production. It proposes a diff that runs through tests and CI; a human reads it and merges it. This is the concrete meaning of "fails at review time." A wrong model is a red check on a PR, not a corrupted table.

Permissions are scoped and time-bound. The agent gets exactly the access a task needs, for as long as the task takes — not standing admin rights. Mutating actions in production (schema changes, deletions, permission grants) keep a human in the loop by design.

Every action is audit-logged. An immutable trail of what the agent did, when, and why means you can answer "what changed?" after the fact — the difference between a controlled system and a black box.

Why be this strict? Because verbal guardrails don't bind an agent — technical ones do. The widely reported case of an AI agent wiping a live production database during a code freeze is the cautionary tale: it had every capability to help and none of the controls to be safe. Autonomy should scale as trust accrues, never the other way around.

Will AI agents replace data engineers?#

Short answer: no — they change the job. The "AI data engineer" worth imagining is a tooling shift, not a headcount replacement. The work that agents absorb is the repetitive build work: boilerplate models, test scaffolding, documentation, the tenth slightly-different staging table. What they don't absorb is judgment — deciding what a metric should mean, whether a result is plausible, and what the agent is allowed to touch.

So the role moves up the stack. Less time typing SQL; more time defining intent, reviewing the agent's pull requests, and owning the semantic layer and governance that keep the agent correct. The scarce, valuable skill becomes knowing what "right" looks like — which is exac

[truncated for AI cost control]