AI News HubLIVE
站内改写4 min read

Databricks wants to merge the two databases every company runs

Databricks introduces LTAP architecture to unify transactional and analytical databases for AI agents, removing the divide between operational and analytical systems.

SourceThe New Stack AIAuthor: Frederic Lardinois

Databricks wants to erase the divide between the databases that run a business and the systems that analyze it. At its Data + AI Summit in San Francisco on Tuesday, the company introduced an architecture it calls Lake Transactional/Analytical Processing, or LTAP, built to collapse that split for AI agents.

Databricks started going down this path a while ago but made it concrete when it bought the serverless Postgres startup Neon and, later, Mooncake Labs in 2025. The bet here is that AI agents, not people, will become the primary users of the enterprise data stack, and that the infrastructure beneath them has to be rebuilt for them.

Credit: The New Stack.

A breakthrough 40 years in the making

“For decades, complicated data infrastructure was a tax that teams were forced to pay,” said Ali Ghodsi, co-founder and CEO of Databricks, in the announcement. “Then agents arrived. In a matter of months, organizations effectively doubled their workforce, just not with humans. Agents write code, make calls, and run loops at a pace human teams never could. The infrastructure that powered the last era of computing is now the bottleneck that no one can afford. LTAP removes it.”

LTAP, Ghodsi said in his conference keynote on Tuesday, is “a breakthrough the industry has been working on for 40 years. We think we finally pulled it off.”

Credit: The New Stack.

Historically, companies have had to run two kinds of databases. Online transactional processing systems handle the live operations of a business, like orders, payments, and inventory, in row-based formats tuned for fast writes. Online analytical processing systems then use what is essentially the same data for reporting and analysis in column-based formats specifically tuned for large scans. The two were kept apart for performance and reliability, and enterprises bridged them with ETL pipelines and replicas..

Databricks argues that agents need a different system because they can read live transactional data, reasoning over historical context, and act on both of them at once.

Earlier attempts to merge the two layers never quite worked, the company says, because hybrid transactional and analytical processing (HTAP) systems carried high costs and proprietary lock-in, while “zero-ETL” tools amounted to hidden change data capture, still leaving two copies of the data and the problem of data going stale.

Credit: The New Stack.

What is LTAP?

LTAP unifies transactional and analytical data in a single storage layer, governed once and stored in open formats on cloud object storage, while keeping separate compute engines for each kind of work.

The design builds directly on Lakebase, the Postgres-based operational database Databricks introduced in June 2025, which the company describes as a “new category” that separates compute from storage and places the data in the lake in open formats.

Now, the company is extending Lakebase for what it calls business-critical workloads, adding native vector and full-text search, real-time event ingestion through Zerobus, part of its Lakeflow Connect ingestion service, and Git-style branching that lets an agent copy a database to experiment and then discard it.

“Agents love to just branch out and experiment with the data, try something else, and they want to do it quickly,” Ghodsi said. “They don’t want to wait ten minutes on a database to come up.”

Credit: The New Stack.

Lakehouse//RT

The second piece is Lakehouse//RT, a real-time analytics engine, powered by a vectorized engine Databricks calls Reyden, that runs directly on Delta and Iceberg tables in the lakehouse.

Companies have long stood up separate, specialized systems to get millisecond query speeds, duplicating data into a “serving layer” that sits alongside the lakehouse. Databricks says Lakehouse//RT removes that layer, delivering millisecond-level latency on lakehouse data with no extra copies, pipelines, or governance gaps.

Databricks stresses the engines high concurrency. Mehrshad Setayesh, SVP of engineering at PointClickCare, says Lakehouse//RT “ran more than a third faster on average than our prior warehouse on our healthcare dataset, with 10x faster queries,” and that it removed the company’s need for a dedicated real-time system alongside its lakehouse.

Mooncake and Neon to the rescue

LTAP’s main pitch is that a single copy of the data can be stored once in open formats without the need for complex data pipelines. The Lakebase architecture, the company wrote last year, shares one storage layer across transactional and analytical workloads “without moving or duplicating it.”

Lakebase’s analytical speed comes from Mooncake, the startup Databricks bought to accelerate it. Mooncake mirrors Postgres changes into the lakehouse in real time, which is how transactions and analytics run on the same fresh data.

“Postgres changes are mirrored in real time to the lakehouse,” the company wrote when it announced the deal. Mirroring produces a second, columnar copy of the data, which is what makes the analytical queries fast.

Security, governance, auditing, and high availability, the company wrote, “only need to be implemented and managed once, on a single open foundation.”

The branching is the feature of this that was built specifically for agents — and a feature that is core to Neon, too. Because the data sits on object storage, an agent can fork a full database, test against it, and discard it, the way it would a Git branch. Databricks says even petabyte-scale databases can be copied in seconds, while on a traditional database, provisioning an instance takes minutes or hours and cloning production risks taking it down.

As Ghodsi noted in his keynote, agents love Postgres, but they do need better tools to work with them — and maybe better databases, too. “In the next 12 months, we’re going to see more software written than ever in the history of mankind,” he said. “All that software that your organizations are going to write using LLMs and coding tools need the database behind the scenes.”

What else is new?

LTAP was only one part of the company’s three-hour keynote. Like so many other enterprise vendors, Databricks is also thinking about how to get agent sprawl — and cost — under control. Databricks’ answer is Unity AI Gateway, a single control point for every model, agent, MCP server, and skill running in an organization. Among other features, it offers spending dashboards, budgets that can be set per team or per user, rate limits, and single sign-on across MCP servers.

The company also introduced Genie One, a general-purpose agent for business teams, fed by Genie Ontology, a new layer that builds a ranked graph of a company’s data with a PageRank-style algorithm it calls OntoRank.

Ghodsi also highlighted OpenSharing, a new protocol for sharing data, models, and agent skills across platforms (you may remember its predecessor Delta Sharing, but it is now a project under the Linux Foundation).

Databricks also debuted CustomerLake, a customer data platform aimed at marketing teams and announced an agreement to acquire Panther, a Python-based security company, to feed its Lakewatch security information and event management service.

Databricks’ moat?

It’s the data layer, though, and its data science history, where Databricks can really differentiate. At this point, it feels like every enterprise vendor, no matter their expertise, is adding agent builders, agent orchestration and governance tools. Databricks can be a relatively neutral player in this space — something Ghodsi also stressed in a press conference after the keynote.

But the company is also seemingly aware that while many of the other enterprise SaaS vendors can use their expertise and existing customer data that can feed AI agents as a moat, Databricks functions as more of a utility layer. It’s maybe no surprise then, that it is launching an industry-specific product like CustomerLake for the marketing industry that adds a pre-made product layer on data its customers already store on its platform.

The post Databricks wants to merge the two databases every company runs appeared first on The New Stack.