DiffusionGemma: Google's Open-Source High-Speed Text Generation Model
Google has released DiffusionGemma, a new open-weight model under Apache 2 license, available for free via NVIDIA's NIM cloud API. It delivers impressive generation speeds exceeding 500 tokens per second.
DiffusionGemma
Simon Willison’s Weblog
Subscribe
10th June 2026 - Link Blog
DiffusionGemma (via) Last May Google briefly released an experimental Gemini Diffusion model. I tried the preview at the time and recorded it running at 857 tokens/second. It was an exciting model, but Google made no further announcements about it.
That research has returned in the best possible way: as a new open weight (Apache 2 licensed) Gemma model, google/diffusiongemma-26B-A4B-it.
NVIDIA are currently hosting the model for free on their NIM cloud API. I used that API to generate this pelican, which took 4.4s (according to time uv run generate.py) to return 2,409 tokens - so at least 500 tokens/second.
Recent articles
Claude Fable is relentlessly proactive - 11th June 2026
Initial impressions of Claude Fable 5 - 9th June 2026
Running Python code in a sandbox with MicroPython and WASM - 6th June 2026
This is a link post by Simon Willison, posted on 10th June 2026.
google 412
ai 2,068
generative-ai 1,826
llms 1,794
nvidia 18
pelican-riding-a-bicycle 118
gemma 15
llm-release 205
llm-performance 16
Monthly briefing
Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.
Pay me to send you less!
Sponsor & subscribe
Disclosures
Colophon
©
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026