2026-06-29 04:00 UTCOriginal source2 min readUpdated: 2026-06-29 08:10 UTC

Fine-tuning a multimodal large language model for clinician-grade autism behavioral scoring from short home videos

Researchers fine-tuned Gemini 2.5 Pro on 400 clinician-rated home videos using low-rank adaptation, achieving significant improvements in inter-rater reliability and ASD diagnosis accuracy, matching or exceeding clinician performance. The approach enables scalable behavioral feature extraction for autism assessment.

SourcearXiv Computer VisionAuthor: Mohammadmahdi Honarmand, Parnian Azizian, Aaron Kline, Kae Nurge, Zerin Nasrin Tumpa, Saimourya Surabhi, Kaitlyn Dunlap, Yang Qian, Ali Kargarandehkordi, Sameer Neupane, Peter Washington, Dennis P. Wall

[2606.27484] Fine-tuning a multimodal large language model for clinician-grade autism behavioral scoring from short home videos

[Submitted on 25 Jun 2026]

Title:Fine-tuning a multimodal large language model for clinician-grade autism behavioral scoring from short home videos

View a PDF of the paper titled Fine-tuning a multimodal large language model for clinician-grade autism behavioral scoring from short home videos, by Mohammadmahdi Honarmand and 11 other authors

View PDF HTML (experimental)

Abstract:Autism spectrum disorder (ASD) affects 1 in 31 US children, yet median age at diagnosis exceeds four years. Artificial intelligence pipelines that provide quantified diagnosis using easy to access observational data (e.g., home videos) could help with earlier diagnosis, and timely delivery of early treatments. We fine-tuned Gemini 2.5 Pro on 400 clinician-rated home videos with low-rank adaptation, training only on 30 behavioral features previously validated to produce reliable predictions when passed to various ML models. On 99 held-out children (49 ASD, 50 neurotypical), inter-rater reliability with clinicians (per-feature weighted Cohen's kappa) improved by 40% (p

new | recent | 2026-06

Change to browse by:

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)