AI News HubLIVE
站内改写6 min read

The model is swappable, the ontology compounds

Databricks' Genie Ontology highlights the importance of ontology in enterprise AI, combining a governed semantic layer with a continuously learned enterprise context graph, using OntoRank to resolve ambiguities and marking a shift from RAG to ontology-based approaches.

SourceHacker News AIAuthor: cpard

<< goback()Opinion

The model is swappable, the ontology compounds

June 19, 2026

The model is becoming the swappable part.

The ontology is what compounds.

That, to me, is the important idea behind Databricks’ Genie Ontology. The persistent state that gets smarter with every query, every certified metric, every dashboard, every lineage edge, every business definition, and every correction from a human is the actual moat.

This is why Ali Ghodsi calling the Genie Ontology the “secret sauce” matters.

It signals that the industry is moving into a new phase. For the last couple of years, a lot of the conversation around AI products has been about models: which model is better, which model is cheaper, which model has the larger context window, which model is better at reasoning, coding, retrieval, or tool use.

That still matters, but I think the center of gravity is shifting.

For enterprise AI, the durable value will not live only in the model. It will live in the governed context layer around the model.

And that is why ontologies, which for years were treated as important but somewhat niche, are about to become much more central to how companies think about AI systems.

So, what is the Genie Ontology, and why is it interesting?

Based on what Databricks has shared so far, and what I managed to learn during the Databricks conference, here is my current mental model.

What is Databricks Genie Ontology?

The simplest way to think about Genie Ontology is this:

Humans define the canonical business meaning.

Machines learn how that meaning shows up across the company.

Humans correct the machine when it drifts.

The system gets better over time.

My current mental model is that there are two interacting layers.

The first layer is the governed semantic foundation.

The second layer is the machine-learned ontology that continuously builds enterprise context on top of that foundation.

Layer 1: The governed semantic foundation

The first layer is human-defined and curated.

This is the part that lives in and around Unity Catalog. It includes things like business glossaries, domains, and metrics.

The business glossary defines authoritative concepts, terms, and taxonomies. In theory, if a company already has some of this work done elsewhere, those definitions can be imported or migrated into the governed layer.

Domains group assets in a way that maps to the business. They create ownership and stewardship boundaries. They also help communicate which assets are trusted, certified, or relevant to a specific part of the organization.

Metrics define reusable business measures like revenue, churn, active users, gross margin, pipeline, or retention. The important part is that these are not just random SQL snippets copied across dashboards. They are governed objects that can be reused consistently by BI tools, agents, notebooks, dashboards, and downstream applications.

This layer matters because AI cannot answer business questions reliably if the business itself has not defined what the words mean.

If “revenue” means one thing to finance, another thing to sales, and a third thing to the product analytics team, the model is not going to magically fix that. At best, it will pick one definition. At worst, it will confidently mix them together.

So the governed layer provides the canonical starting point.

But that is not enough.

Because the official definition of the business is only one part of the story. The other part is how the business actually operates.

That is where the machine-learned layer comes in.

Layer 2: The machine-learned enterprise context graph

The Genie Ontology appears to use the governed semantic foundation as input into a broader, continuously learned enterprise context layer.

This is the part that tries to understand how the company actually works by looking across tables, queries, dashboards, pipelines, documents, connected applications, and workplace tools.

In other words, it is not just looking at schema, it is also looking at usage and that distinction is important.

A schema can tell you that a table has a column called customer_id.

It cannot tell you whether that column is trusted, whether the table is deprecated, whether analysts actually use it, whether executives rely on a dashboard built from it, whether another team has a better version, or whether the metric calculated from it is the one the business actually considers authoritative.

To know that, you need more than metadata, you need context.

That context comes from many places: lineage, query history, dashboards, pipelines, documents, certifications, permissions, domains, connected apps, and the way people actually interact with the data.

This is where Databricks is making a very interesting bet.

The system is not just retrieving data at query time. It is building a persistent representation of the enterprise in the background.

A living graph of business meaning.

The hard problem is authority

The hard problem in enterprise analytics is usually not that there is no answer.

The hard problem is that there are too many possible answers.

There are multiple revenue definitions. Multiple customer tables. Multiple churn calculations. Multiple dashboards that appear to answer the same question. Multiple teams using similar words in slightly different ways.

So the core problem becomes:

Which definition deserves to win?

This is where OntoRank becomes the most interesting part of the system.

Databricks describes OntoRank as similar to PageRank. Instead of ranking web pages, it ranks business definitions and relationships.

When multiple definitions of the same concept exist, OntoRank appears to weigh signals like:

Where the definition came from.

Who authored it.

How widely it is used.

How closely it connects to certified and widely used assets.

How recently it was updated.

A simple way to think about it is:

OntoRank is PageRank over business meaning, where certified, popular, fresh, well-connected, and authored-by-the-right-person definitions win.

That is a very elegant idea.

If it works, Databricks has a scalable way to resolve enterprise semantic ambiguity without requiring humans to manually curate every possible definition, relationship, and edge case.

If it does not work, Genie Ontology risks becoming another noisy metadata graph with a much better interface.

That is why I think OntoRank is the highest-leverage technical bet inside Genie Ontology.

The success of the system will not depend only on OntoRank, of course. It will also depend on governance adoption, connector coverage, lineage quality, human review workflows, metric-layer quality, permissions, and whether enough of the company’s activity is visible to the system.

But OntoRank feels like the center of gravity.

Because the real question is not whether Databricks can collect a lot of metadata.

The question is whether it can rank meaning.

Query history is behavior, not truth

One part of this that I find especially interesting is the role of usage signals.

From what Databricks has shared, query history is an important signal. Column popularity, for example, can be derived from how often historical queries read from a column.

That means the system is going a step further and is asking what do people actually use?

This makes sense. In a large company, usage is a valuable signal. If many important dashboards, workflows, and queries depend on a specific table or column, that probably tells you something.

But usage is also dangerous.

Query history is behavior, not truth.

A table can be popular because it is correct. It can also be popular because it is old, convenient, badly documented, or accidentally copied into a hundred dashboards years ago.

A metric can be widely used and still be wrong.

A query can be common and still encode a misunderstanding.

This is where the OntoRank bet becomes subtle.

The bet is not simply that query history is useful.

The bet is that usage history, when combined with governance, lineage, freshness, authorship, certification, and graph structure, can become a trustworthy authority signal.

That is very different from just retrieving old SQL queries and giving them to an agent.

And this is where the comparison with Anthropic is interesting.

The Anthropic tension

A few weeks ago, Anthropic shared how they built their internal analytics context layer. One thing that stood out was that raw query history did not help them very much when used directly. Giving the agent access to a large set of historical queries barely improved accuracy.

Their conclusion seemed to be that query history is too noisy to be treated as a direct source of truth.

But it can still be valuable as raw material.

You can mine it. You can distill it. You can turn recurring patterns into curated reference docs, reusable analysis workflows, and better semantic context.

That makes the contrast with Databricks interesting.

Anthropic seems to be saying:

Raw query history is noisy. Distill it before using it as context.

Databricks seems to be betting:

Usage history can become authority signal if it is embedded inside a governed graph and ranked properly.

Those are not necessarily contradictory positions.

In fact, they may be two versions of the same idea.

The question is not whether query history is useful.

The question is whether you can transform query history into reliable semantic signal.

That is what I want to watch closely with Genie Ontology.

If OntoRank can turn usage exhaust into trustworthy business context, that is a big deal.

If not, the human-governed layer will have to do much more of the work than the product positioning might imply.

Why this is different from RAG

This is also why I think Genie Ontology is best understood as part of a broader shift from RAG to ontology.

RAG systems usually retrieve chunks of text at query time and ask the model to reason from them.

That can work well in many situations. But in enterprise analytics, it often breaks down because the hard problem is not simply finding relevant text.

The hard problem is knowing which definition is authoritative, which data source is trusted, which metric should be used, which permissions apply, and which business context matters.

If a model retrieves three different explanations of “revenue,” it still needs to decide which one to trust.

If it retrieves an outdated dashboard, it may answer confidently with the wrong business logic.

If it retrieves SQL from a historical query, it may reproduce a pattern that was popular but incorrect.

The ontology approach front-loads more of the work.

Instead of assembling context only at query time, the system continuously builds a governed graph in the background. It ranks definitions. It connects metrics to tables. It uses lineage. It respects permissions. It routes the agent toward certified data and governed computation instead of letting it free-associate over scattered fragments.

The agent does not just ask the model to reason over random retrieved text.

The agent can use the ontology to resolve the right business concept, find the right metric, identify the right data assets, and generate SQL against governed data.

That is the shift.

RAG retrieves context while ontology maintains context.

RAG asks, “What relevant chunks can I find right now?” while ontology asks, “What does this business mean, and which meaning should be trusted?”

That is a much more powerful foundation for enterprise AI.

Why I think this matters

I am excited about Genie Ontology because it signals that the industry is entering the real building phase of AI.

We have spent a lot of time proving that LLMs can answer questions, write code, summarize documents, and use tools.

Now the question is:

What infrastructure do we need to turn models into reliable products?

I think governed context is one of the most important pillars.

A model without business context is a

[truncated for AI cost control]