AI News HubLIVE
原文1 min read

DiffusionGemma: Google's Open-Source High-Speed Text Generation Model

Google has released DiffusionGemma, a new open-weight model under Apache 2 license, available for free via NVIDIA's NIM cloud API. It delivers impressive generation speeds exceeding 500 tokens per second.

DiffusionGemma

Simon Willison’s Weblog

Subscribe

10th June 2026 - Link Blog

DiffusionGemma (via) Last May Google briefly released an experimental Gemini Diffusion model. I tried the preview at the time and recorded it running at 857 tokens/second. It was an exciting model, but Google made no further announcements about it.

That research has returned in the best possible way: as a new open weight (Apache 2 licensed) Gemma model, google/diffusiongemma-26B-A4B-it.

NVIDIA are currently hosting the model for free on their NIM cloud API. I used that API to generate this pelican, which took 4.4s (according to time uv run generate.py) to return 2,409 tokens - so at least 500 tokens/second.

Recent articles

Claude Fable is relentlessly proactive - 11th June 2026

Initial impressions of Claude Fable 5 - 9th June 2026

Running Python code in a sandbox with MicroPython and WASM - 6th June 2026

This is a link post by Simon Willison, posted on 10th June 2026.

google 412

ai 2,068

generative-ai 1,826

llms 1,794

nvidia 18

pelican-riding-a-bicycle 118

gemma 15

llm-release 205

llm-performance 16

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe

Disclosures

Colophon

©

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026