2026-06-08 04:00 UTCOriginal source2 min readUpdated: 2026-06-30 13:03 UTC

CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions

Large language models have made substantial progress on mathematical reasoning, but existing benchmarks typically evaluate well-specified problems with final answers or complete proofs, missing collaborative open-problem solving. CrowdMath is a dataset of 164 expert-annotated progress chains from the MIT PRIMES-AoPS CrowdMath program (2016-2025). Each chain tracks multi-participant forum discussions from problem statement to completed proof, with posts labeled by functional roles. Six frontier models achieve 83-88% accuracy on next-post prediction but only 0.42 macro-F1 on post-role classification, highlighting a gap in understanding collaborative mathematical progress.

SourcearXiv AIAuthor: Sherin Muckatira, Jesse Geneson, Slava Gerovitch, Pavel Etingof, Mikhail Gronas, Anna Rumshisky

Article intelligence

EngineersAdvanced

Key points

CrowdMath dataset includes 164 expert-annotated progress chains from the MIT PRIMES-AoPS collaborative math program (2016-2025).
Posts are labeled by functional roles: partial progress, proof completion, erroneous reasoning, error identification.
Models show strong performance on next-post prediction (83-88%) but struggle with post-role classification (best macro-F1=0.42).
The dataset reveals a significant gap in AI's ability to understand collaborative problem-solving processes.

Why it matters

This matters because crowdMath dataset includes 164 expert-annotated progress chains from the MIT PRIMES-AoPS collaborative math program (2016-2025).

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

This panel is AI-generated and reviewed for accuracy.

[2606.06526] CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions

[Submitted on 2 Jun 2026]

Title:CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions

View a PDF of the paper titled CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions, by Sherin Muckatira and 5 other authors

View PDF HTML (experimental)

Abstract:Large language models have made substantial progress on mathematical reasoning, but existing benchmarks typically evaluate well-specified problems with final answers, step-by-step solutions, or complete proofs. They do not capture collaborative open-problem solving: a setting in which participants propose partial arguments, identify gaps or errors in prior steps, repair flawed reasoning, and gradually synthesize incremental contributions into a proof. We introduce CrowdMath, a dataset of 164 expert-annotated progress chains from the MIT PRIMES--Art of Problem Solving (AoPS) CrowdMath program (2016-2025), a collaborative research initiative whose discussions have led to peer-reviewed publications. Each chain traces a multi-participant forum discussion from an open-problem statement to a completed proof. Posts are labeled by their functional roles in the evolving solution process, including partial progress, proof completion, erroneous reasoning, and error identification. We define evaluation tasks and benchmark six frontier models. Models achieve 83-88% accuracy on next-post prediction, suggesting that they can follow the local flow of mathematical discussion. However, they struggle to identify the functional significance of individual contributions with the best model achieving only 0.42 macro-F1 on post-role classification. CrowdMath exposes a gap between solving well-specified mathematical problems and understanding collaborative mathematical progress as it unfolds.

Comments: 16 pages, 4 figures

Subjects:

Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cite as: arXiv:2606.06526 [cs.AI]

(or arXiv:2606.06526v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2606.06526

arXiv-issued DOI via DataCite

Submission history

From: Sherin Muckatira [view email] [v1] Tue, 2 Jun 2026 20:38:39 UTC (1,074 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions, by Sherin Muckatira and 5 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.AI

new | recent | 2026-06

Change to browse by:

cs cs.LG

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)