Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning
Aryabhata 2 is a reasoning-focused language model for competitive STEM exams like JEE and NEET, fine-tuned via reinforcement learning on GPT-OSS-20B using PhysicsWallah's question banks. It achieves up to 64% fewer output tokens while outperforming the base model on multiple benchmarks.
Article intelligence
Key points
- Aryabhata 2 uses RL post-training optimized for competitive STEM exams.
- Built on GPT-OSS-20B with custom training curriculum from PhysicsWallah.
- Employs prolonged RL with progressively larger rollout group sizes for broader exploration.
- Outperforms GPT-OSS-20B on JEE, AIME, MMLU-Pro, etc., with 64% fewer tokens.
Why it matters
This matters because aryabhata 2 uses RL post-training optimized for competitive STEM exams.
Technical impact
May affect model selection, inference cost, product capability, and evaluation benchmarks.
[2605.28829] Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning
[Submitted on 10 Apr 2026]
Title:Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning
View a PDF of the paper titled Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning, by Ritvik Rastogi and 3 other authors
View PDF HTML (experimental)
Abstract:Competitive STEM examinations such as JEE and NEET require multi-step symbolic reasoning, precise numerical computation, and deep conceptual understanding across physics, chemistry, and mathematics. Recent large language models perform strongly on common reasoning benchmarks, yet they remain difficult to deploy at scale, where millions of student doubts demand domain-specific, consistently structured problem solving.
We introduce Aryabhata 2, a reasoning-focused language model for competitive STEM examinations, trained via reinforcement-learning post-training. Using PhysicsWallah's internal question banks, we construct a high-quality training curriculum and post-train GPT-OSS-20B through reinforcement learning with verifiable rewards. Training combines prolonged reinforcement learning with broadened exploration via progressively larger rollout group sizes.
We evaluate Aryabhata 2 on competitive examination benchmarks, including JEE Main, JEE Advanced, and NEET, as well as out-of-distribution reasoning datasets such as AIME, HMMT, MMLU-Pro, MMLU-Redux 2.0, and GPQA. Results show that Aryabhata 2 outperforms its base model GPT-OSS-20B on competitive STEM reasoning while requiring substantially fewer output tokens (up to 64\% fewer).
Subjects:
Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Cite as: arXiv:2605.28829 [cs.CL]
(or arXiv:2605.28829v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2605.28829
arXiv-issued DOI via DataCite
Submission history
From: Ritvik Rastogi [view email] [v1] Fri, 10 Apr 2026 06:53:27 UTC (320 KB)
Full-text links:
Access Paper:
View a PDF of the paper titled Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning, by Ritvik Rastogi and 3 other authors
View PDF
HTML (experimental)
TeX Source
view license
Current browse context:
cs.CL
new | recent | 2026-05
Change to browse by:
cs cs.AI cs.CY
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Loading...
Data provided by:
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Code, Data and Media Associated with this Article
alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos
Demos
Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers
Recommenders and Search Tools
Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
Author
Venue
Institution
Topic
About arXivLabs
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)