AI News HubLIVE
原文

ZAYA1-8B Technical Report

ZAYA1-8B is a reasoning-focused mixture-of-experts model with 700M active and 8B total parameters, trained on AMD hardware. It matches or exceeds DeepSeek-R1-0528 on math and coding benchmarks and introduces Markovian RSA for test-time compute.

Article intelligence

EngineersAdvanced

Key points

  • ZAYA1-8B features 700M active parameters and 8B total parameters, trained on a full-stack AMD platform.
  • It matches or exceeds DeepSeek-R1-0528 on multiple math and coding benchmarks.
  • Post-training uses a four-stage RL cascade: reasoning warmup, RLVE-Gym, math/code RL, and behavioral RL.
  • Markovian RSA test-time compute boosts AIME'25 accuracy to 91.9% and HMMT'25 to 89.6%.

Why it matters

This matters because zAYA1-8B features 700M active parameters and 8B total parameters, trained on a full-stack AMD platform.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

[2605.05365] ZAYA1-8B Technical Report

Skip to main content

Learn about arXiv becoming an independent nonprofit.

We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate

> cs > arXiv:2605.05365

Help | Advanced Search

All fields Title Author Abstract Comments Journal reference ACM classification MSC classification Report number arXiv identifier DOI ORCID arXiv author ID Help pages Full text

Search

GO

quick links

Login

Help Pages

About

-->

Computer Science > Artificial Intelligence

arXiv:2605.05365 (cs)

[Submitted on 6 May 2026]

Title:ZAYA1-8B Technical Report

Authors:Robert Washbourne, Rishi Iyer, Tomas Figliolia, Henry Zheng, Ryan Lorig-Roach, Sungyeon Yang, Pritish Yuvraj, Quentin Anthony, Yury Tokpanov, Xiao Yang, Ganesh Nanduru, Stephen Ebert, Praneeth Medepalli, Skyler Szot, Srivatsan Rajagopal, Alex Ong, Bhavana Mehta, Beren Millidge

View a PDF of the paper titled ZAYA1-8B Technical Report, by Robert Washbourne and 17 other authors

View PDF HTML (experimental)

Abstract:We present ZAYA1-8B, a reasoning-focused mixture-of-experts (MoE) model with 700M active and 8B total parameters, built on Zyphra's MoE++ architecture. ZAYA1-8B's core pretraining, midtraining, and supervised fine-tuning (SFT) were performed on a full-stack AMD compute, networking, and software platform. With under 1B active parameters, ZAYA1-8B matches or exceeds DeepSeek-R1-0528 on several challenging mathematics and coding benchmarks, and remains competitive with substantially larger open-weight reasoning models. ZAYA1-8B was trained from scratch for reasoning, with reasoning data included from pretraining onward using an answer-preserving trimming scheme. Post-training uses a four-stage RL cascade: reasoning warmup on math and puzzles; a 400-task RLVE-Gym curriculum; math and code RL with test-time compute traces and synthetic code environments built from competitive-programming references; and behavioral RL for chat and instruction following. We also introduce Markovian RSA, a test-time compute method that recursively aggregates parallel reasoning traces while carrying forward only bounded-length reasoning tails between rounds. In TTC evaluation, Markovian RSA raises ZAYA1-8B to 91.9\% on AIME'25 and 89.6\% on HMMT'25 while carrying forward only a 4K-token tail, narrowing the gap to much larger reasoning models including Gemini-2.5 Pro, DeepSeek-V3.2, and GPT-5-High.

Subjects:

Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Cite as: arXiv:2605.05365 [cs.AI]

(or arXiv:2605.05365v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2605.05365

Focus to learn more

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Quentin Anthony [view email] [v1] Wed, 6 May 2026 18:44:08 UTC (1,590 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled ZAYA1-8B Technical Report, by Robert Washbourne and 17 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.AI

new | recent | 2026-05

Change to browse by:

cs cs.CL

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

export BibTeX citation Loading...

BibTeX formatted citation

×

loading...

Data provided by:

Bookmark

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Related Papers

Recommenders and Search Tools

Link to Influence Flower

Influence Flower (What are Influence Flowers?)

Core recommender toggle

CORE Recommender (What is CORE?)

Author

Venue

Institution

Topic

About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

About

Help

contact arXivClick here to contact arXiv Contact

subscribe to arXiv mailingsClick here to subscribe Subscribe

Copyright

Privacy Policy

Web Accessibility Assistance

arXiv Operational Status