2026-05-29 04:00 UTCOriginal source2 min readUpdated: 2026-06-30 13:03 UTC

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

Aryabhata 2 is a reasoning-focused language model for competitive STEM exams like JEE and NEET, fine-tuned via reinforcement learning on GPT-OSS-20B using PhysicsWallah's question banks. It achieves up to 64% fewer output tokens while outperforming the base model on multiple benchmarks.

SourcearXiv Computational LinguisticsAuthor: Ritvik Rastogi, Vishal Singh, Tejas Chaudhari, Sandeep Varma

[2605.28829] Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

[Submitted on 10 Apr 2026]

Title:Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

View a PDF of the paper titled Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning, by Ritvik Rastogi and 3 other authors

View PDF HTML (experimental)

Abstract:Competitive STEM examinations such as JEE and NEET require multi-step symbolic reasoning, precise numerical computation, and deep conceptual understanding across physics, chemistry, and mathematics. Recent large language models perform strongly on common reasoning benchmarks, yet they remain difficult to deploy at scale, where millions of student doubts demand domain-specific, consistently structured problem solving.

We introduce Aryabhata 2, a reasoning-focused language model for competitive STEM examinations, trained via reinforcement-learning post-training. Using PhysicsWallah's internal question banks, we construct a high-quality training curriculum and post-train GPT-OSS-20B through reinforcement learning with verifiable rewards. Training combines prolonged reinforcement learning with broadened exploration via progressively larger rollout group sizes.

We evaluate Aryabhata 2 on competitive examination benchmarks, including JEE Main, JEE Advanced, and NEET, as well as out-of-distribution reasoning datasets such as AIME, HMMT, MMLU-Pro, MMLU-Redux 2.0, and GPQA. Results show that Aryabhata 2 outperforms its base model GPT-OSS-20B on competitive STEM reasoning while requiring substantially fewer output tokens (up to 64\% fewer).

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)

Cite as: arXiv:2605.28829 [cs.CL]

(or arXiv:2605.28829v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2605.28829

arXiv-issued DOI via DataCite

Submission history

From: Ritvik Rastogi [view email] [v1] Fri, 10 Apr 2026 06:53:27 UTC (320 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning, by Ritvik Rastogi and 3 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.CL

new | recent | 2026-05

Change to browse by:

cs cs.AI cs.CY

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)