2026-06-25 04:00 UTCOriginal source2 min readUpdated: 2026-06-25 07:51 UTC

Graph-Based Phonetic Error Correction of Noisy ASR

Researchers propose G-SPIN, a structured ASR correction framework that combines phonetic graph modeling with contextual language understanding. It uses a graph neural network to generate acoustically plausible candidate sets, a masked language model for scoring, and an instruction-tuned large language model for final re-ranking, enabling lightweight, modular inference-time correction.

SourcearXiv Computational LinguisticsAuthor: Pratik Rakesh Singh, Mohammadi Zaki, Aneesh Mukkamala, Pankaj Wasnik

[2606.24889] Graph-Based Phonetic Error Correction of Noisy ASR

[Submitted on 29 Apr 2026]

Title:Graph-Based Phonetic Error Correction of Noisy ASR

View a PDF of the paper titled Graph-Based Phonetic Error Correction of Noisy ASR, by Pratik Rakesh Singh and 2 other authors

View PDF HTML (experimental)

Abstract:Automatic speech recognition (ASR) systems, despite low overall word error rates, produce residual lexical errors that disproportionately affect semantically critical tokens such as named entities, negations, and sentiment-bearing words. These errors are often structured, arising from phonetic similarity rather than random noise, making naive token-level correction insufficient. We propose a structured ASR correction framework, that we call G-SPIN, that combines phonetic graph modeling with contextual language understanding. A graph neural network (GNN) first constructs acoustically plausible candidate neighborhoods for flagged tokens, explicitly restricting the correction search space to phonetic alternatives. A masked language model (MLM) then provides local contextual scoring, and an instruction-tuned large language model (LLM) performs final context-aware re-ranking over this compact candidate set. By decoupling structured phonetic reasoning from contextual semantic selection, our method avoids unconstrained generation while improving correction accuracy. The framework is lightweight, modular, and operates entirely at inference time.

Comments: Accepted at ACL Industry Track 2026

Subjects:

Computation and Language (cs.CL); Sound (cs.SD)

Cite as: arXiv:2606.24889 [cs.CL]

(or arXiv:2606.24889v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2606.24889

arXiv-issued DOI via DataCite

Submission history

From: Mohammadi Zaki [view email] [v1] Wed, 29 Apr 2026 13:57:11 UTC (392 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Graph-Based Phonetic Error Correction of Noisy ASR, by Pratik Rakesh Singh and 2 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.CL

new | recent | 2026-06

Change to browse by:

cs cs.SD

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)