2026-06-24 18:53 UTCIn-site rewrite6 min readUpdated: 2026-06-25 01:44 UTC

Why the Frontier Ecosystem must be Open — Matei Zaharia and Reynold Xin, Databricks

In a rare double-interview, the Databricks technical leaders riff on what it will take for every company to build Agent Clouds

SourceLatent Space

We’re excited to have Databricks join us at AIEWF, among hundreds of the top companies in the AI Engineer ecosystem. LS subscribers can use their discount to get past the late bird pricing and access over $50k in sponsor offers!

Everyone is still talking about Satya’s Frontier Ecosystems post, but few have actually built a (now $175 billion) frontier ecosystem and cloud like our guests today.

From open-sourcing the layer above coding agents to rethinking databases for the agent era, Databricks cofounders Matei Zaharia and Reynold Xin are pushing the company beyond the lakehouse into a full data-and-AI operating system. In this episode, Matei and Reynold join swyx at the 2026 Data + AI Summit to unpack Omnigent, LTAP, Lakebase, agent security, open formats, Mosaic, and why databases may matter more than ever once AI agents start doing real work.

We go deep on Omnigent: Databricks’ open-source meta-harness for combining, controlling, and sharing agents across Claude Code, Codex, Cursor, Pi, custom agents, and internal tools. Matei explains why coding agents and enterprise agents run into the same problems: portability, collaboration, session history, security, spend controls, and the need for a common API above every harness.

Then Reynold walks through Databricks’ database dream: why CDC is brittle enough to joke that it means “continuous data corruption,” why HTAP has been the holy grail of database engineering, and why Databricks thinks LTAP gets most of the benefits by unifying the storage layer instead of collapsing every query engine. We also cover Databricks’ infrastructure scale, the culture behind rapid prototyping, the difference between tech and enterprise customers, Databricks vs Snowflake, whether vector databases should have ever existed, the Mosaic model strategy, Genie, AI Runtime, RL fine-tuning, and the thesis that traditional software gets rewritten once the data is in the right place and agents sit on top.

Databricks began as a company for the big data era. The origination of Spark from the Berkeley AMPLab which eventually turned into the product Lakehouse convinced enterprises that they didn’t need a separate data lake, warehouse, ML platform, and governance layer. They just needed one open foundation where all of their data could live and be reasoned over.

Since then a lot has changed, but data has only become more important. Data is no longer something you keep track of and analyze ad hoc, it’s the necessary context agents need in order to act. So the framing has shifted from “where do we put all of our data?” to “how do we expose the right slice of state, history, permissions, and business logic to an AI system at the exact moment it’s doing work?”

If frontier model performance becomes commoditized, the durable advantage then becomes the company-specific context around them: proprietary data, governed access, operational state, transaction logs, workflows, and feedback loops. Which makes Databricks positioned perfectly.

Now coming fresh off the Data + AI Summit 2026, the company is moving just as fast to keep up, announcing Genie One, Omnigent, LTAP, and many more, indicating a central mission in its newer work: Databricks is trying to become the operating system for enterprise agents.

Models are getting good enough, but agents are only useful if they have the right context, permissions, memory, state, cost controls, and access to live business data. Fundamentally it appears that significantly better model performance in production is a systems problem, one that data guys like us are remarkably well prepared to solve!

We discuss:

Why Databricks built Omnigent as a meta-harness above existing AI agents

Why coding agents and custom enterprise agents need the same infrastructure

The common API for agent sessions, files, streams, tool calls, and cancellation

Why persistent sessions, cloud sandboxes, sharing, search, and collaboration matter

Why Databricks open-sourced Omnigent instead of keeping it proprietary

Databricks’ internal agent usage, cloud sandboxes, and coding workflows

The scale of Databricks: 50–60 million virtual machines a day and exabytes before breakfast

Why agent security needs contextual and stateful policies

How an agent could read confidential docs, install a compromised npm package, and leak data

Why spend control matters when an agent can burn $500 reading logs

Startup opportunities around coding-agent analytics, quality, skills, and spend

LTAP, Lakebase, and why Databricks wants to rethink the database stack

OLTP vs OLAP, CDC, and why data pipelines break at 3 a.m.

Why HTAP has historically been the holy grail of database engineering

Why Databricks thinks LTAP is “HTAP done right”

How writing transactional data into column-oriented formats changes analytics

Why agents need live operational context from databases, not just telemetry

How Databricks prototypes strategic systems without endless process

Enterprise vs tech customers, governance, procurement, and DIY culture

The “second system syndrome” risk of rewriting a database engine

Building a database engine from a decade of traces and quadrillions of data points

Why vector databases should never have been a separate category

Why open formats and AI changed the race with Snowflake

The Mosaic story, DBRX, Genie, document parsing models, and specialized model training

Why model customization and RL fine-tuning may become mainstream

Why “get the data there, slap some agent on top” may rewrite traditional software

Matei Zaharia

LinkedIn: https://www.linkedin.com/in/mateizaharia

X: https://x.com/matei_zaharia

Reynold Xin

LinkedIn: https://www.linkedin.com/in/rxin

X: https://x.com/rxin

Databricks

Website: https://www.databricks.com

X: https://x.com/databricks

Timestamps

00:00:00 Introduction

00:02:22 Omnigent and the Agent Infrastructure Layer

00:08:39 Agent Clouds, Common APIs, and Open Source

00:16:52 Databricks Scale and Internal AI Workflows

00:18:03 Agent Security, Governance, and Spend Controls

00:27:34 LTAP and the Database Dream

00:30:30 CDC, HTAP, and Why Data Pipelines Break

00:34:05 Lakebase, Parquet, and Live Data for Agents

00:36:47 Databricks’ Culture of Fast Prototyping

00:43:40 The Dream Engine and Rewriting the Database Stack

00:51:02 Vector Databases, Query Engines, and LTAP

00:52:36 Databricks vs Snowflake

00:57:48 Mosaic, DBRX, Genie, and Specialized Models

01:03:11 Context, AI Runtime, and RL Fine-Tuning

01:06:15 Why Data + Agents May Rewrite Software

01:07:09 Closing Thoughts

Transcript

Introduction: Databricks, Data + AI Summit, and Founder Dynamics

Swyx [00:00:00]: Matei and Reynold from Databricks, welcome to Latent Space.

Reynold Xin [00:00:06]: Hey, thanks for having us.

Swyx [00:00:07]: Yeah.

Matei Zaharia [00:00:08]: Yeah, thanks so much.

Swyx [00:00:09]: thanks for taking time out. You have your Databricks, Data AI Summit going on. You were just telling me how the first summit that you guys ran was just 50 people

Reynold Xin [00:00:17]: Yeah, it was

Swyx [00:00:17]: in Berkeley

Reynold Xin [00:00:18]: little meetup at Berkeley, I think

Matei Zaharia [00:00:19]: Yeah

Reynold Xin [00:00:19]: put together

Matei Zaharia [00:00:20]: We were doing these tutorials and, yeah, just teach people Spark.

Swyx [00:00:23]: Yeah. obviously now it’s like, I think like the headline number’s like 100,000 people around the world, 30,000 in person.

Swyx [00:00:30]: it’s a crazy

Matei Zaharia [00:00:31]: Amazing

Swyx [00:00:31]: community. Well, I just saw the keynote.

Swyx [00:00:35]: Ali’s just. Did was it obvious or that back when that Ali would be, like, such a great, like, CEO? Like

Reynold Xin [00:00:42]: Oh

Swyx [00:00:42]: such a great presenter?

Reynold Xin [00:00:43]: What do you think?

Matei Zaharia [00:00:44]: I think among our group of founders it was clear that, I think he’d be the best at this.

Swyx [00:00:50]: Yeah.

Matei Zaharia [00:00:50]: And yeah, it turned out great. And he’s, he’s ramped up on so many topics growing a company. He would just go in and, like, study it and, be talk to all the experts. Like, even if he can’t hire the person, learn enough about, like, finance and sales and whatever it was, and, and go from there. Yeah.

Swyx [00:01:09]: Yeah.

Reynold Xin [00:01:10]: he’s obviously very high IQ and a very high EQ, but it wasn’t. Like, Ali today is quite different from Ali from, like 10 years ago. I think there’s a lot of work that he put in to, get to this point.

Swyx [00:01:20]: Yeah. no, to me the most appealing thing about him is that he’s funny. And like, it, it’s, it’

Matei Zaharia [00:01:26]: It’s true, yeah

Swyx [00:01:26]: it’s hard to make jokes about, data warehouses

Reynold Xin [00:01:30]: About serious topics

Swyx [00:01:31]: security

Matei Zaharia [00:01:32]: Yeah

Swyx [00:01:32]: what have you.

Matei Zaharia [00:01:33]: Oh, yeah. That’s for sure.

Swyx [00:01:34]: Yeah. So you guys launched a whole bunch of things. I’ll, I’ll just name check briefly, the stuff because we’re not gonna cover everything. Omnigentt, your baby. LTAP, your baby, your dream engine.

Swyx [00:01:47]: we’re also gonna cover Genie, cover CustomerLake, you acquired Panther

Matei Zaharia [00:01:52]: Yeah

Swyx [00:01:52]: Open Sharing, and there’s Unity AI Gateway. A lot of these, I think, like, are things that you would expect a Databricks to do. It’s, it’s like part of the roadmap. Everyone in your category has similar things. But I think, probably the two of you are leading the two most unique and differentiated initiatives

Omnigent and the Agent Infrastructure Layer

Swyx [00:02:09]: on, in the landscape. Maybe we’ll start with, Omnigentt we’ll, we’ll, we’ll, we’ll go into it. I do think that a lot of people are exploring this meta harness concept.

Matei Zaharia [00:02:21]: Yeah, totally.

Swyx [00:02:21]: What led you to it?

Matei Zaharia [00:02:22]: Yeah. There were a couple of, like, converging lines, which I think is a good sign that you need something new. So on the one hand, there’s all the coding agent info internally. We have really great, dev infra team. they built something called Isaac, that’s like a wrapper on Claude Code and Codex, and, lets you use them either on the web in, like, sandboxes or, just on your dev machine or on your laptop or whatever. And then, they were adding all kinds of stuff there. And we saw all the more advanced engineers like, were building their own workflows with tons of agents, and they were building their own UIs and stuff on top or even on top of that. And then the other one was, like, us building agents. We ship this, like, data science agent called Genie on the research team, which I lead. We also build a lot of internal ones for various things, and then we have all the customer ones. And all of them running into this thing of like, “Oh, I need to switch model and harness and so on,” every few months. Plus the agent is, like, completely useless if you can’t share sessions with someone and have history and have search and all this, like, layer on top of it for collaboration. I thought a bit about it from both contexts and, at first people thought it was weird. They’re like, “Why are you doing coding agents and custom agents in the same thing?” But I said it’s, it’s the same problems and, you just wanna build the stuff that lets you deliver the agent, maybe control it if you care about security, and, make it portable across things. And then we prototyped some things as experiments. We saw, yeah, we can make it work, and then we built that for real.

Swyx [00:04:06]: I’m wondering if this let’s call it architecture

Matei Zaharia [00:04:11]: Yeah

Swyx [00:04:11]: maps to anything in your careers in the past. like I always think about how a lot of things just tie back to operating systems.

Swyx [00:04:18]: A lot of operating

Matei Zaharia [00:04:19]: Yeah

Swyx [00:04:20]: systems ti

[truncated for AI cost control]