2026-06-04 04:00 UTCOriginal source2 min readUpdated: 2026-06-30 13:03 UTC

VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

Multimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when relying on visual aids. To study this discrepancy, we introduce VAMPS (Visual-Assisted Mathematical Problem Solving), a benchmark for graph-assisted mathematics. VAMPS contains 1,168 multimodal, bilingual multiple-choice question-answer pairs drawn from Iranian University Entrance Exam algebra and calculus problems, expanded with human-reviewed LLM-generated synthetic variants. Overall, we found that direct analytical solving surprisingly outperforms tool-enabled visual solving, even on problems where plotting is a natural strategy.

SourcearXiv AIAuthor: Amirhossein Dabiriaghdam, Shayan Vassef, Mohammadreza Bakhtiari, Yasamin Medghalchi, Ilker Hacihaliloglu, Mesrob Ohannessian, Lele Wang, Giuseppe Carenini

[2606.04244] VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

[Submitted on 2 Jun 2026]

Title:VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

View a PDF of the paper titled VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark, by Amirhossein Dabiriaghdam and 7 other authors

View PDF HTML (experimental)

Abstract:Multimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when they rely on visual aids. This gap is especially important because real engineering and scientific workflows often rely on visualization tools for analysis, validation, and decision-making. To study this discrepancy, we introduce VAMPS (Visual-Assisted Mathematical Problem Solving), a benchmark for graph-assisted mathematics. VAMPS contains 1,168 multimodal, bilingual multiple-choice question-answer pairs drawn from Iranian University Entrance Exam algebra and calculus problems and expanded with human-reviewed LLM-generated synthetic variants, all selected so that plotting provides a natural solution strategy by revealing intersections, extrema, asymptotes, etc. Designed for both benchmarking and diagnosis, VAMPS goes beyond prior multimodal benchmarks that primarily evaluate reasoning over fixed visual inputs by testing whether a model can benefit from constructing a useful graph and grounding its answer in the resulting visualization. Overall, we found that across a diverse set of models, direct analytical solving surprisingly outperforms tool-enabled visual solving, even on problems where plotting is a natural strategy.

Subjects:

Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Cite as: arXiv:2606.04244 [cs.AI]

(or arXiv:2606.04244v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2606.04244

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Amirhossein Dabiriaghdam [view email] [v1] Tue, 2 Jun 2026 21:45:21 UTC (2,354 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark, by Amirhossein Dabiriaghdam and 7 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.AI

new | recent | 2026-06

Change to browse by:

cs cs.CL cs.CV cs.LG

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)