Diffusion‑based LLMs that generate many parallel tokens rather than one‑by‑one
Inception builds next-generation LLMs powered by diffusion, enabling parallel token generation for faster speed and lower cost. Their Mercury models (Mercury 2 for reasoning, Mercury Edit 2 for code) achieve dramatic latency and cost reductions, deployed at Fortune 500 companies.
Inception – A new frontier in LLM speed
Mercury 2 and the Rise of Real-time Subagents
|
Learn more
Mercury 2 and the Rise of Real-time Subagents
A new frontier in LLM speed
A new frontier in LLMs speed
A new frontier in LLM speed
Inception’s breakthrough diffusion-based approach to language generation enables the world’s fastest, most efficient AI models with best-in-class quality.
Inception builds and deploys next‑generation large language models (LLMs) that are powered by diffusion rather than traditional auto‑regressive generation. By using diffusion, their models can produce many tokens in parallel, making them several times faster and less than half the cost of conventional LLMs. The diffusion framework also provides fine‑grained control over outputs, allowing adherence to specific schemas and semantic constraints. Additionally, it offers a unified paradigm for combining language with other data modalities such as audio, images, and video. The company’s team includes leading researchers and engineers from Stanford, UCLA, Cornell, Google DeepMind, Meta AI, Microsoft AI, and OpenAI, and they are currently deploying these diffusion LLMs at Fortune 500 companies.
Explain what Inception does
Inception builds and deploys next‑generation large language models (LLMs) that are powered by diffusion rather than traditional auto‑regressive generation. By using diffusion, their models can produce many tokens in parallel, making them several times faster and less than half the cost of conventional LLMs. The diffusion framework also provides fine‑grained control over outputs, allowing adherence to specific schemas and semantic constraints. Additionally, it offers a unified paradigm for combining language with other data modalities such as audio, images, and video. The company’s team includes leading researchers and engineers from Stanford, UCLA, Cornell, Google DeepMind, Meta AI, Microsoft AI, and OpenAI, and they are currently deploying these diffusion LLMs at Fortune 500 companies.
Explain what Inception does
Here are some prompts you can try with a diffusion-style LLM:
- Explain a complex topic step by step, showing intermediate reasoning.
- Generate multiple variations of a product tagline and refine them progressively.
- Write a short story that improves its wording over several iterations.
- Brainstorm startup ideas and evolve the best one through revisions.
- Refactor a piece of code and show incremental improvements.
- Describe an image concept and refine the details in stages.
- Compare two technologies with increasingly deeper analysis.
- Draft a landing page headline and iterate toward a clearer version.
- Simulate a design critique that becomes more precise each step.
- Turn rough notes into a polished summary through gradual refinement.
Suggest 10 prompts for a dLLM
Here are some prompts you can try with a diffusion-style LLM:
- Explain a complex topic step by step, showing intermediate reasoning.
- Generate multiple variations of a product tagline and refine them progressively.
- Write a short story that improves its wording over several iterations.
- Brainstorm startup ideas and evolve the best one through revisions.
- Refactor a piece of code and show incremental improvements.
- Describe an image concept and refine the details in stages.
- Compare two technologies with increasingly deeper analysis.
- Draft a landing page headline and iterate toward a clearer version.
- Simulate a design critique that becomes more precise each step.
- Turn rough notes into a polished summary through gradual refinement.
Suggest 10 prompts for a dLLM
Create a Javascript animation
Create a Javascript animation
Trusted by teams at
Trusted by teams at
The Mercury diffusion models introduce blazing fast inference with frontier quality at a fraction of the cost of other top-tier models.
The Mercury diffusion models introduce blazing fast inference with frontier quality at a fraction of the cost of other top-tier models.
Read our research
Speed Benchmark
Tokens/sec
Speed Benchmark
Tokens/sec
The diffusion difference. From sequential to parallel
All other LLMs generate text one token at a time. Mercury diffusion LLMs (dLLMs) generate tokens in parallel, increasing speed and maximizing GPU efficiency.
Parallel Generation
Mercury
zap
mango
crisp
lunar
wobble
spin
felt
droop
echo
Sequential Generation
ChatGPT
The
Quick
Brown
Fox
Jumps
Over
The
Lazy
Dog
Parallel Generation
Mercury
zap
mango
crisp
lunar
wobble
spin
felt
droop
echo
Sequential Generation
ChatGPT
The
Quick
Brown
Fox
Jumps
Over
The
Lazy
Dog
Blazing-fast performance you can notice
Write code
Real-Time Voice
Instant Agents
Write code
Real-Time Voice
Instant Agents
Build the future of AI apps with Mercury
Get Started
Lightning fast agents
Automate complex coding and other business workflows with with ultra-responsive AI.
Real-time voice
Engage naturally with AI in voice-powered workflows like customer support, translation, and immersive gaming.
Instant code editing
Stay in-the-flow with responsive autocomplete, intelligent tab suggestions, and fast chat responses.
Fast, creative co-pilots
Supercharge editorial and creative work—less waiting, more creating.
Rapid search
Instantly surface the right data from across your organization’s knowledge base.
Foundational models
Meet our family of diffusion models
Mercury 2
Get Started
Docs
The fastest reasoning LLM and the first reasoning dLLM. Ideal for complex applications where performance and speed are crucial.
Input $0.25 per 1M tokens
Output $0.75 per 1M tokens
Mercury 2
The fastest reasoning LLM and the first reasoning dLLM. Ideal for complex applications where performance and speed are crucial.
Input $0.25 per 1M tokens
Output $0.75 per 1M tokens
Early Access
Read API Docs
Mercury Edit 2
Get Started
Docs
A small, coding-focused dLLM. Ideal for code editing and other extremely latency-sensitive components of coding workflows.
Input $0.25 per 1M tokens
Output $0.75 per 1M tokens
Mercury Edit 2
A small, coding-focused dLLM. Ideal for code editing and other extremely latency-sensitive components of coding workflows.
Input $0.25 per 1M tokens
Output $0.75 per 1M tokens
Early Access
Read API Docs
Research
Led by visionary AI researchers
Our founders pioneered diffusion modeling and invented cornerstone AI technologies.
Our Research
Diffusion Models
Read paper
The underlying approach for modern image and video generation, powering applications including Sora and MidJourney.
Flash Attention
Read paper
A key algorithm for efficient GPU utilization in LLM training and inference.
Direct Preference Optimization
Read paper
One of the core approaches for aligning LLMs with human feedback.
Loved by leaders and innovators
Book a Demo
Because Mercury 2 delivers the perfect threshold of intelligence at lightning speeds, the equation heavily works in our favor. We cut summarization latency by 82% and dropped costs by 90%.
Ankur Rustagi & John Mu
Because Mercury 2 delivers the perfect threshold of intelligence at lightning speeds, the equation heavily works in our favor. We cut summarization latency by 82% and dropped costs by 90%.
Ankur Rustagi & John Mu
After trying Mercury, it's hard to go back. We are excited to roll out Mercury to support all of our voice agents.
Oliver Silverstein, CEO
After trying Mercury, it's hard to go back. We are excited to roll out Mercury to support all of our voice agents.
Oliver Silverstein, CEO
Speed in a code editor isn't a nice-to-have. It's the difference between staying in flow and losing your train of thought. Mercury completions land fast enough to feel like part of the developer's own thinking, not an interruption to it.
Max Brunsfeld, Co-founder
Speed in a code editor isn't a nice-to-have. It's the difference between staying in flow and losing your train of thought. Mercury completions land fast enough to feel like part of the developer's own thinking, not an interruption to it.
Max Brunsfeld, Co-founder
Speed in a code editor isn't a nice-to-have. It's the difference between staying in flow and losing your train of thought. Mercury completions land fast enough to feel like part of the developer's own thinking, not an interruption to it.
Max Brunsfeld, Co-founder
Enterprise-grade privacy and reliability
Enterprise-grade privacy and reliability
Enterprise-grade privacy and reliability
We’re available through major cloud providers like AWS Bedrock and Azure Foundry. Talk with us about fine-tuning and private deployments.
Talk to Sales
Integrate in seconds
Our models are OpenAI API compatible and a drop-in replacement for traditional LLMs.
Enterprise AI partner
We’re available through major cloud providers like AWS Bedrock and Azure Foundry.
Reliability at scale
Get 99.5%+ uptime and priority support with custom SLAs.
The future of LLMs is here
Get Started
The future of LLMs is here
Get Started
Products
Get Started
Models
Pricing
Company
About Us
Research
Careers
Blog
Resources
Mercury Chat
API Platform
Documentation
Integrations
Partners
Legal
Terms of Service
Privacy Policy
Contact
Sales
Inquires
Discord
X
© 2026 Inception
Products
Get Started
Models
Pricing
Company
About Us
Research
Careers
Blog
Resources
Mercury Chat
API Platform
Documentation
Integrations
Partners
Legal
Terms of Service
Privacy Policy
Contact
Sales
Inquires
Discord
X
© 2026 Inception