The Sequence Knowledge #866: Three Text Diffusion Models You Need To Know About
Text diffusion models challenge the autoregressive paradigm by generating text through iterative denoising, treating generation as editing rather than typing. Three key systems define the field: LLaDA (proof of scaling), Mercury (commercial speed advantage), and Gemini Diffusion (frontier validation), representing the three phases of a new architecture class: scientific proof, industrial deployment, and frontier validation.
Article intelligence
Key points
- Text diffusion models generate text by iterative refinement from noise, using bidirectional context.
- LLaDA proved diffusion can scale to a large language model.
- Mercury turned diffusion into a commercial speed advantage.
- Gemini Diffusion signals frontier labs see this paradigm as strategically important.
Why it matters
This matters because text diffusion models generate text by iterative refinement from noise, using bidirectional context.
Technical impact
May affect model selection, inference cost, product capability, and evaluation benchmarks.
💡 AI Concept of the Day: Three Text Diffusion Models You Need To Know About
For most of the LLM era, language generation has been built around a single assumption: text should be produced like a typewriter, one token at a time, left to right, each new symbol conditioned on a frozen history. Text diffusion models challenge that assumption at its root. They treat generation less like typing and more like editing: start from noise or masks, look at the whole canvas, and iteratively refine it into coherent language.
That sounds like a stylistic tweak. It is actually a different computational worldview. Instead of factorizing language as “the next token given all previous tokens,” diffusion models define a corruption process and then learn how to reverse it. In language, that usually means masking tokens or pushing text into noisier latent states, then training a model to recover the original sequence over several denoising steps. The result is a system that can update many positions at once, use bidirectional context during generation, and revisit its own outputs rather than committing irreversibly at every step.
If you look at the field today, three systems define the conversation more than any others: LLaDA, which proved that diffusion can scale into a real large language model; Mercury, which turned diffusion into a genuine commercial speed advantage; and Gemini Diffusion, which signaled that frontier labs see this paradigm as strategically important. Together, they outline the three phases of a new architecture class: scientific proof, industrial deployment, and frontier validation.
LLaDA: The Scientific Proof That Diffusion Can Scale
Read more