2026-07-02 00:00 UTCOriginal source2 min readUpdated: 2026-07-02 21:00 UTC

Learning Unmasking Policies for Diffusion Language Models

Diffusion Large Language Models (dLLMs) now match autoregressive models on many tasks while being more efficient. A key design aspect is the sampling procedure that selects which tokens to unmask. Current heuristics require manual tuning and degrade with larger block sizes. This work proposes training sampling policies via reinforcement learning, formalizing masked diffusion sampling as a Markov decision process, and using a lightweight single-layer transformer policy. Experiments show the trained policies match state-of-the-art heuristics in semi-autoregressive (block) generation and outperform them in full-diffusion settings.

SourceApple Machine Learning Research

content type paperpublished July 2026

AuthorsMetod Jazbec*†, Theo X. Olausson*‡, Louis Béthune, Pierre Ablin, Michael Kirchhof, João Monteiro, Victor Turrisi, Jason Ramapuram, Marco Cuturi

View publication

Diffusion (Large) Language Models (dLLMs) now match the downstream performance of their autoregressive counterparts on many tasks, while holding the promise of being more efficient during inference. One critical design aspect of dLLMs is the sampling procedure that selects which tokens to unmask at each diffusion step. Indeed, recent work has found that heuristic strategies such as confidence thresholding improve both sample quality and token throughput compared to random unmasking. However, such heuristics have downsides: they require manual tuning, and we observe that their performance degrades with larger block sizes. In this work, we instead propose to train sampling procedures using reinforcement learning. Specifically, we formalize masked diffusion sampling as a Markov decision process in which the dLLM serves as the environment, and propose a lightweight policy based on a single-layer transformer that maps dLLM token confidences to unmasking decisions. Our experiments show that these trained policies match the performance of state-of-the-art heuristics when combined with semi-autoregressive (block) generation, while outperforming them in the full-diffusion setting.

Equal Contributors

† University of Amsterdam

‡ Massachusetts Institute of Technology

** Work done while at Apple

Residual Context Diffusion Language Models

July 2, 2026research area Speech and Natural Language Processingconference ICML

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to purely autoregressive language models because they can decode multiple tokens in parallel. However, state-of-the-art block-wise dLLMs rely on a “remasking” mechanism that decodes only the most confident tokens and discards the rest, effectively wasting computation. We demonstrate that recycling computation from the discarded tokens is beneficial, as these tokens…

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

January 21, 2026research area Speech and Natural Language Processingconference ICLR

Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models because their denoising models operate over the entire sequence. The global planning and iterative refinement features of dLLMs are particularly useful for code generation. However, current training and inference mechanisms for dLLMs in coding are still under-explored. To demystify the decoding behavior of dLLMs and unlock their potential for coding,…