Why the Frontier Ecosystem must be Open — Matei Zaharia and Reynold Xin, Databricks
In a rare double-interview, the Databricks technical leaders riff on what it will take for every company to build Agent Clouds
We’re excited to have Databricks join us at AIEWF, among hundreds of the top companies in the AI Engineer ecosystem. LS subscribers can use their discount to get past the late bird pricing and access over $50k in sponsor offers!
Everyone is still talking about Satya’s Frontier Ecosystems post, but few have actually built a (now $175 billion) frontier ecosystem and cloud like our guests today.
From open-sourcing the layer above coding agents to rethinking databases for the agent era, Databricks cofounders Matei Zaharia and Reynold Xin are pushing the company beyond the lakehouse into a full data-and-AI operating system. In this episode, Matei and Reynold join swyx at the 2026 Data + AI Summit to unpack Omnigent, LTAP, Lakebase, agent security, open formats, Mosaic, and why databases may matter more than ever once AI agents start doing real work.
We go deep on Omnigent: Databricks’ open-source meta-harness for combining, controlling, and sharing agents across Claude Code, Codex, Cursor, Pi, custom agents, and internal tools. Matei explains why coding agents and enterprise agents run into the same problems: portability, collaboration, session history, security, spend controls, and the need for a common API above every harness.
Then Reynold walks through Databricks’ database dream: why CDC is brittle enough to joke that it means “continuous data corruption,” why HTAP has been the holy grail of database engineering, and why Databricks thinks LTAP gets most of the benefits by unifying the storage layer instead of collapsing every query engine. We also cover Databricks’ infrastructure scale, the culture behind rapid prototyping, the difference between tech and enterprise customers, Databricks vs Snowflake, whether vector databases should have ever existed, the Mosaic model strategy, Genie, AI Runtime, RL fine-tuning, and the thesis that traditional software gets rewritten once the data is in the right place and agents sit on top.
Databricks began as a company for the big data era. The origination of Spark from the Berkeley AMPLab which eventually turned into the product Lakehouse convinced enterprises that they didn’t need a separate data lake, warehouse, ML platform, and governance layer. They just needed one open foundation where all of their data could live and be reasoned over.
Since then a lot has changed, but data has only become more important. Data is no longer something you keep track of and analyze ad hoc, it’s the necessary context agents need in order to act. So the framing has shifted from “where do we put all of our data?” to “how do we expose the right slice of state, history, permissions, and business logic to an AI system at the exact moment it’s doing work?”
If frontier model performance becomes commoditized, the durable advantage then becomes the company-specific context around them: proprietary data, governed access, operational state, transaction logs, workflows, and feedback loops. Which makes Databricks positioned perfectly.
Now coming fresh off the Data + AI Summit 2026, the company is moving just as fast to keep up, announcing Genie One, Omnigent, LTAP, and many more, indicating a central mission in its newer work: Databricks is trying to become the operating system for enterprise agents.
Models are getting good enough, but agents are only useful if they have the right context, permissions, memory, state, cost controls, and access to live business data. Fundamentally it appears that significantly better model performance in production is a systems problem, one that data guys like us are remarkably well prepared to solve!
We discuss:
Why Databricks built Omnigent as a meta-harness above existing AI agents
Why coding agents and custom enterprise agents need the same infrastructure
The common API for agent sessions, files, streams, tool calls, and cancellation
Why persistent sessions, cloud sandboxes, sharing, search, and collaboration matter
Why Databricks open-sourced Omnigent instead of keeping it proprietary
Databricks’ internal agent usage, cloud sandboxes, and coding workflows
The scale of Databricks: 50–60 million virtual machines a day and exabytes before breakfast
Why agent security needs contextual and stateful policies
How an agent could read confidential docs, install a compromised npm package, and leak data
Why spend control matters when an agent can burn $500 reading logs
Startup opportunities around coding-agent analytics, quality, skills, and spend
LTAP, Lakebase, and why Databricks wants to rethink the database stack
OLTP vs OLAP, CDC, and why data pipelines break at 3 a.m.
Why HTAP has historically been the holy grail of database engineering
Why Databricks thinks LTAP is “HTAP done right”
How writing transactional data into column-oriented formats changes analytics
Why agents need live operational context from databases, not just telemetry
How Databricks prototypes strategic systems without endless process
Enterprise vs tech customers, governance, procurement, and DIY culture
The “second system syndrome” risk of rewriting a database engine
Building a database engine from a decade of traces and quadrillions of data points
Why vector databases should never have been a separate category
Why open formats and AI changed the race with Snowflake
The Mosaic story, DBRX, Genie, document parsing models, and specialized model training
Why model customization and RL fine-tuning may become mainstream
Why “get the data there, slap some agent on top” may rewrite traditional software
Matei Zaharia
LinkedIn: https://www.linkedin.com/in/mateizaharia
X: https://x.com/matei_zaharia
Reynold Xin
LinkedIn: https://www.linkedin.com/in/rxin
X: https://x.com/rxin
Databricks
Website: https://www.databricks.com
X: https://x.com/databricks
Timestamps
00:00:00 Introduction
00:02:22 Omnigent and the Agent Infrastructure Layer
00:08:39 Agent Clouds, Common APIs, and Open Source
00:16:52 Databricks Scale and Internal AI Workflows
00:18:03 Agent Security, Governance, and Spend Controls
00:27:34 LTAP and the Database Dream
00:30:30 CDC, HTAP, and Why Data Pipelines Break
00:34:05 Lakebase, Parquet, and Live Data for Agents
00:36:47 Databricks’ Culture of Fast Prototyping
00:43:40 The Dream Engine and Rewriting the Database Stack
00:51:02 Vector Databases, Query Engines, and LTAP
00:52:36 Databricks vs Snowflake
00:57:48 Mosaic, DBRX, Genie, and Specialized Models
01:03:11 Context, AI Runtime, and RL Fine-Tuning
01:06:15 Why Data + Agents May Rewrite Software
01:07:09 Closing Thoughts
Transcript
Introduction: Databricks, Data + AI Summit, and Founder Dynamics
Swyx [00:00:00]: Matei and Reynold from Databricks, welcome to Latent Space.
Reynold Xin [00:00:06]: Hey, thanks for having us.
Swyx [00:00:07]: Yeah.
Matei Zaharia [00:00:08]: Yeah, thanks so much.
Swyx [00:00:09]: thanks for taking time out. You have your Databricks, Data AI Summit going on. You were just telling me how the first summit that you guys ran was just 50 people
Reynold Xin [00:00:17]: Yeah, it was
Swyx [00:00:17]: in Berkeley
Reynold Xin [00:00:18]: little meetup at Berkeley, I think
Matei Zaharia [00:00:19]: Yeah
Reynold Xin [00:00:19]: put together
Matei Zaharia [00:00:20]: We were doing these tutorials and, yeah, just teach people Spark.
Swyx [00:00:23]: Yeah. obviously now it’s like, I think like the headline number’s like 100,000 people around the world, 30,000 in person.
Swyx [00:00:30]: it’s a crazy
Matei Zaharia [00:00:31]: Amazing
Swyx [00:00:31]: community. Well, I just saw the keynote.
Swyx [00:00:35]: Ali’s just. Did was it obvious or that back when that Ali would be, like, such a great, like, CEO? Like
Reynold Xin [00:00:42]: Oh
Swyx [00:00:42]: such a great presenter?
Reynold Xin [00:00:43]: What do you think?
Matei Zaharia [00:00:44]: I think among our group of founders it was clear that, I think he’d be the best at this.
Swyx [00:00:50]: Yeah.
Matei Zaharia [00:00:50]: And yeah, it turned out great. And he’s, he’s ramped up on so many topics growing a company. He would just go in and, like, study it and, be talk to all the experts. Like, even if he can’t hire the person, learn enough about, like, finance and sales and whatever it was, and, and go from there. Yeah.
Swyx [00:01:09]: Yeah.
Reynold Xin [00:01:10]: he’s obviously very high IQ and a very high EQ, but it wasn’t. Like, Ali today is quite different from Ali from, like 10 years ago. I think there’s a lot of work that he put in to, get to this point.
Swyx [00:01:20]: Yeah. no, to me the most appealing thing about him is that he’s funny. And like, it, it’s, it’
Matei Zaharia [00:01:26]: It’s true, yeah
Swyx [00:01:26]: it’s hard to make jokes about, data warehouses
Reynold Xin [00:01:30]: About serious topics
Swyx [00:01:31]: security
Matei Zaharia [00:01:32]: Yeah
Swyx [00:01:32]: what have you.
Matei Zaharia [00:01:33]: Oh, yeah. That’s for sure.
Swyx [00:01:34]: Yeah. So you guys launched a whole bunch of things. I’ll, I’ll just name check briefly, the stuff because we’re not gonna cover everything. Omnigentt, your baby. LTAP, your baby, your dream engine.
Swyx [00:01:47]: we’re also gonna cover Genie, cover CustomerLake, you acquired Panther
Matei Zaharia [00:01:52]: Yeah
Swyx [00:01:52]: Open Sharing, and there’s Unity AI Gateway. A lot of these, I think, like, are things that you would expect a Databricks to do. It’s, it’s like part of the roadmap. Everyone in your category has similar things. But I think, probably the two of you are leading the two most unique and differentiated initiatives
Omnigent and the Agent Infrastructure Layer
Swyx [00:02:09]: on, in the landscape. Maybe we’ll start with, Omnigentt we’ll, we’ll, we’ll, we’ll go into it. I do think that a lot of people are exploring this meta harness concept.
Matei Zaharia [00:02:21]: Yeah, totally.
Swyx [00:02:21]: What led you to it?
Matei Zaharia [00:02:22]: Yeah. There were a couple of, like, converging lines, which I think is a good sign that you need something new. So on the one hand, there’s all the coding agent info internally. We have really great, dev infra team. they built something called Isaac, that’s like a wrapper on Claude Code and Codex, and, lets you use them either on the web in, like, sandboxes or, just on your dev machine or on your laptop or whatever. And then, they were adding all kinds of stuff there. And we saw all the more advanced engineers like, were building their own workflows with tons of agents, and they were building their own UIs and stuff on top or even on top of that. And then the other one was, like, us building agents. We ship this, like, data science agent called Genie on the research team, which I lead. We also build a lot of internal ones for various things, and then we have all the customer ones. And all of them running into this thing of like, “Oh, I need to switch model and harness and so on,” every few months. Plus the agent is, like, completely useless if you can’t share sessions with someone and have history and have search and all this, like, layer on top of it for collaboration. I thought a bit about it from both contexts and, at first people thought it was weird. They’re like, “Why are you doing coding agents and custom agents in the same thing?” But I said it’s, it’s the same problems and, you just wanna build the stuff that lets you deliver the agent, maybe control it if you care about security, and, make it portable across things. And then we prototyped some things as experiments. We saw, yeah, we can make it work, and then we built that for real.
Swyx [00:04:06]: I’m wondering if this let’s call it architecture
Matei Zaharia [00:04:11]: Yeah
Swyx [00:04:11]: maps to anything in your careers in the past. like I always think about how a lot of things just tie back to operating systems.
Swyx [00:04:18]: A lot of operating
Matei Zaharia [00:04:19]: Yeah
Swyx [00:04:20]: systems ti
[truncated for AI cost control]