2026-06-30 04:00 UTCOriginal source2 min readUpdated: 2026-06-30 07:55 UTC

An AI agent for treatment reasoning over a biomedical tool universe

Researchers introduce ATHENA-R1, an AI agent trained via reinforcement learning to perform treatment reasoning across 212 biomedical tools. It outperforms GPT-5 on benchmarks and is preferred by experts and physicians.

SourcearXiv AIAuthor: Shanghua Gao, Ayush Noori, Richard Zhu, Curtis Ginder, Zhenglun Kong, Xiaorui Su, Justin Kauffman, Benjamin S. Glicksberg, Joshua Lampert, Ankit Sakhuja, Ashwin Sawant, ATHENA-R1 Evaluation Consortium, David A. Clifton, Noa Dagan, Ran Balicer, Marinka Zitnik

[2606.28692] An AI agent for treatment reasoning over a biomedical tool universe

[Submitted on 27 Jun 2026]

Title:An AI agent for treatment reasoning over a biomedical tool universe

View a PDF of the paper titled An AI agent for treatment reasoning over a biomedical tool universe, by Shanghua Gao and 15 other authors

View PDF HTML (experimental)

Abstract:Treatment reasoning underpins every therapeutic decision, integrating disease context, comorbidities, medications, contraindications, and evolving biomedical knowledge to select an appropriate therapy. It is inherently iterative: candidates are weighed against many constraints, revised as evidence emerges, and grounded in verifiable sources. Here we introduce ATHENA-R1, an AI agent for treatment reasoning across all FDA approved drugs since 1939, trained by reinforcement learning over a universe of 212 biomedical tools. At each step it identifies missing information, selects and runs relevant tools, and incorporates the evidence. To train it without human-annotated traces, we build a two-level self-learning framework: multi-agent systems construct the tools, tasks, and reasoning trajectories for supervised fine-tuning, then reinforcement learning with scientific feedback rewards reasoning quality (evidence gathering, grounded tool use, logical non-redundancy). Across five benchmarks of 3,168 drug reasoning tasks and 456 patient treatment cases, ATHENA-R1 outperforms language models and tool-use systems, reaching 94.7% accuracy on open-ended drug reasoning and 82.9% on treatment reasoning, 17.8 and 10.7 points above GPT-5. In blinded evaluations by experts from 28 rare disease organizations, it is preferred over reference models on all criteria, and physicians rated it favorably on complex hospitalized cardiovascular and infectious-disease cases. Adverse-event hypotheses it generated, tested in electronic health records from 5.4 million patients, reached adjusted odds ratios of 1.48-1.84, with no elevation among negative controls. Because it requires knowing what evidence to seek before concluding, treatment reasoning has long been hard for AI; we show it can be reframed as a learnable process of iterative evidence gathering that reinforcement learning can train AI to perform.

Comments: Project page: this https URL Code: this https URL

Subjects:

Artificial Intelligence (cs.AI)

Cite as: arXiv:2606.28692 [cs.AI]

(or arXiv:2606.28692v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2606.28692

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Shanghua Gao [view email] [v1] Sat, 27 Jun 2026 02:24:56 UTC (8,433 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled An AI agent for treatment reasoning over a biomedical tool universe, by Shanghua Gao and 15 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.AI

new | recent | 2026-06

Change to browse by:

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)