What’s New in the AI Platform: Agents for ML Engineering, Our Deep Learning Platform, and New Capabilities for Real-Time ML
Databricks announces new AI platform capabilities at Data+AI Summit 2026: Genie Code for ML (coding agent integrated with ML stack), AI Runtime (public preview of serverless GPU training), and enhanced real-time ML support (low-latency, high-QPS Feature Store and Model Serving). These features accelerate the path from experimentation to production.
What’s New in the AI Platform: Agents for ML Engineering, Our Deep Learning Platform, and New Capabilities for Real-Time ML | Databricks Blog
Skip to main content
Build ML systems faster with Genie Code, a coding agent that helps data scientists and ML engineers develop, evaluate, and improve traditional machine learning systems.
Train and fine-tune AI models on serverless GPUs with AI Runtime, a unified deep learning platform optimized for large-scale GPU training and experimentation.
*Power real-time ML at scale with new Feature Store and Model Serving capabilities, including streaming features and high-QPS serving for the most demanding production workloads.
There’s never been a more dynamic, exciting time to be building your own AI models and systems. From demand forecasting and fraud detection to search, recommendations, personalization, and multimodal AI, machine learning is powering critical applications across every industry.
At Data + AI Summit 2026, we're thrilled to announce the following new capabilities within the Databricks AI Platform:
Genie Code for ML: Genie Code now comes with upgraded intelligence for ML engineering and native integrations across every Databricks ML Platform component: feature engineering, model training, serving, and monitoring.
AI Runtime (Public Preview): a serverless GPU training environment, enabling research-grade deep learning and fine-tuning without complex infrastructure management.
Enhanced Support for Real-time ML: including low-latency, high QPS support across our Feature Store and Model Serving products.
Together, these capabilities streamline the path from experimentation to production, enabling organizations to build, deploy, and scale AI applications an significantly faster than ever before.
Let's take a closer look at what's new.
Genie Code for Machine Learning
Today, bringing an ML model to production can take months, with teams spending countless hours on repetitive tasks across the ML lifecycle—from feature engineering and experiment management to model evaluation, and deployment. But agents have transformed how engineering and technical teams operate. To that end, at DAIS this year, we’re excited to announce Genie Code’s support for the entire ML lifecycle.
Building and operating ML models takes nuanced decisions that generic coding agents can't make. Can I rely on this dataset’s freshness and quality as a feature? Will this feature leak future information into the model? Is this serving endpoint starting to drift? Getting the details in ML right requires deep context, and that context only comes from tight integration with the data and ML platform: your data and its quality, feature lineage, experiment history, training infrastructure, and production performance.
That's where Genie Code comes in:
Context on your data via Unity Catalog: Genie Code understands your data, business semantics, and governance model. Through its integration with Unity Catalog, it knows which tables and features are high quality for ML, how data flows through your ML pipelines, and what access controls and policies must be respected.
Context on the Databricks ML stack: Genie Code is built for ML on Databricks and integrates deeply with Feature Store, Serverless Compute, AI Runtime, Model Serving, and Inference Tables. It can optimize training jobs, diagnose serving issues, evaluate challenger models, and take action across the ML stack, not just generate code that interacts with it.
Context on your ML lifecycle and workflows: Through MLflow, Genie Code understands the full ML lifecycle, from feature engineering and experimentation to deployment, monitoring, drift detection, retraining, and production operations. It doesn't stop when a model is shipped; it helps ensure the business metrics that model drives, such as CTR, conversion, or revenue, stay healthy in production.
And so, with Genie Code, your ML teams can move faster than ever before.
Expand
Genie Code handles feature engineering the way your senior ML engineer would—learning your team's existing patterns, reusing proven transformations, and building features that are consistent with what's already in production.
Genie Code doesn't just write ML code—it trains and tunes production-grade models. It automatically selects and configures the right infrastructure, whether that's CPU for lightweight experiments or GPU for distributed training, and logs every run natively to MLflow.
Expand
Genie Code takes models from notebook to production in one flow—registering to Unity Catalog, deploying to a serving endpoint, and keeping governance intact every step of the way.
Genie Code has completely changed how I work. I run upwards of 15 parallel threads scoped to different notebooks and assets every day, and managing all of that across tabs is one of the biggest sources of friction in my workflow. Full page Genie Code with concurrent sessions would give me a true workspace for running everything in parallel without constantly losing context.— Moritz Schiek, Solution Consultant, Bosch
With Genie Code, we moved from raw data to a governed, production-ready ML workflow in 90 minutes. Because it uniquely understands production ML workflows on Databricks, it helped us create Delta tables, explore the data, train and compare models, register them with MLflow and Unity Catalog, and deploy the champion model to a serving endpoint, with time left to optimize for the business outcome that mattered most.— Radu Dragusin, Principal Engineer, Data & AI, Danfoss
To learn more about Genie Code, please get started here!
Introducing AI Runtime: A Research Grade GPU Platform within the Lakehouse
GPUs power today’s most advanced AI workloads—from forecasting and recommendations to multimodal foundation models. But deep learning teams struggle to procure and manage GPU infrastructure, configure distributed training environments, and resolve performance bottlenecks. They prefer to focus on modeling instead of infrastructure.
In March, we launched a preview of AI Runtime, and today, we’re excited to share, as part of Data AI Summit, that AI Runtime now supports high performance multinode training. With AI Runtime, Databricks users now have:
Serverless, on-demand NVIDIA GPUs: Simply configure your notebook in 2-3 clicks, and get fast attach to Serverless A10 and H100 GPUs to start training – no cluster needed. Only pay for the GPUs that you use, without worrying about idle time, utilization, or upfront commitments.
Robust orchestration tools: Use the full power of Databricks’ orchestration suite with Lakeflow Jobs and DABs support for long-running GPU workloads.
Optimized distributed training: AIR bundles distributed GPU performance enhancements, like RDMA and high-performance data loading to achieve optimal performance for your GPU workloads.
Centralized governance and observability: run, observe, and govern GPU workloads exactly where your data resides, with built in experiment management via MLflow, access management with Unity Catalog, and Genie Code-assisted debugging.
With this launch, Databricks customers can now leverage the same research-grade GPU platform our own team used to power training of foundation models like DBRX and KARL. Today, AI Runtime now powers frontier workloads for hundreds of Databricks customers – helping bring state-of-the-art AI from research into production enterprise applications.
Attach Serverless A10 and H100 GPUs to your notebook in 2–3 clicks. No cluster management required; only pay for what you use.
Use Genie Code to help resolve performance bottlenecks, experiment with new architectures, or debug tricky bugs around model convergence or cryptic framework errors.
AI Runtime is a production-grade platform for accelerated computing. Develop your deep learning code in interactive notebooks, and then use the full power of Lakeflow to submit and orchestrate jobs on GPU compute.
Databricks' AI Runtime greatly streamlined the process of training a custom Text To Formula (TTF) model. With no infrastructure setup or delays, it was easy to choose the right compute based on prompt size and output token generation. This allowed us to move quickly, maintain our Lakehouse workflows, and deliver a high-quality model with full governance, reducing time to setup, train and deploy our model from days to hours.— Nikhil Sunderraj, Principal Machine Learning Engineer, FactSet Research Systems, Inc.
To get started training your next model on GPUs, please see our examples and documentation here!
Real-Time ML at Scale: Feature Store and Model Serving
The most impactful machine learning applications operate in real time: serving recommendations in milliseconds, stopping fraudulent transactions before they're approved, and delivering search results that feel instantaneous.
Deploying a model to production is a delicate balance: every request needs to complete within a few milliseconds, even when traffic spikes – but your costs should stay low when traffic is quiet. Keeping that balance at scale has historically been as hard as building the model itself. Under high QPS, serving infrastructure becomes the bottleneck. Latency grows unpredictable, costs climb, and teams burden their best engineers with re-tuning replica counts, concurrency limits, and autoscaling thresholds every time a model or its traffic shifts.
At Data + AI Summit, we're announcing new capabilities that eliminate that burden – and streamlines achieving low latency, high QPS serving on Databricks:
Declarative Feature Engineering — Define features once and automatically materialize them for training and serving.
Streaming Features — Build extremely fresh features on your event streams for ML that reacts to customer activity in real-time.
High-QPS Model Serving — An enhanced inference engine and network routing for low-latency serving across both CPU and GPU models, with no knobs to tune. The platform adapts to each model and its traffic automatically, reaching 300K+ QPS at under 10ms p99 latency overhead.
Online Feature Serving on Lakebase — Serve fresh features with low-latency access for production applications.
Genie ZeroOps for ML — Genie Code can query inference tables, debugs performance issues in serving endpoints, and runs root-cause analysis on alerts, bringing agentic operational observability to models in production.
Customers running on Databricks Model Serving have cut infrastructure costs by up to 90%+ versus self-managed stacks, improved p99 and p50 latency by up to 2x, and scaled past 100K QPS in production with little to no maintenance, all with enterprise grade reliability and availability. Leading ML teams like Grammarly, GoGuardian, and thousands of other customers rely on Databricks to serve their real-time ML systems.
Learn more at Data + AI Summit 2026!
For your next AI model, please give these new features a try! Learn more in the documentation or our detailed walkthrough blog posts:
Genie Code
AI Runtime
Model Serving
Feature Serving
See the AI Platform in action and learn how leading organizations are building and deploying AI models at scale at Data + AI Summit 2026.
Get the latest posts in your inbox
Subscribe to our blog and get the latest posts delivered to your inbox.
Sign up
View all blogs