2026-06-29 04:00 UTCOriginal source2 min readUpdated: 2026-06-29 08:04 UTC

Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns

This study uses a developmental approach to investigate the statistical learning and mental representation of neural language models (NLM). A series of Generative Transformer models are trained on a synthetic grammar, and model states are saved at multiple stages. By analyzing changes in internal representations, the authors find that NLMs acquire the most abstract global statistical knowledge at the beginning of learning, and later acquire local statistical dependencies. This learning path contains many over-generalizations from the start, which are gradually constrained later. Based on this observation, a new framework is proposed to explain the statistical learning and language cognition of NLMs.

SourcearXiv Computational LinguisticsAuthor: Wang Bojun, Holly Jenkins, Elizabeth Wonnacott

[2606.27460] Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns

[Submitted on 25 Jun 2026]

Title:Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns

View a PDF of the paper titled Developmental approach reveals the statistical learning of Neural Language Models: Transformers generalize from the most abstract statistical patterns, by Wang Bojun and 2 other authors

View PDF HTML (experimental)

Abstract:In this study, we use a developmental approach to investigate the statistical learning and mental representation of neural language models (NLM). A series of Generative Transformer models are trained on a synthetic grammar. The model states are saved at multiple stages in the course of training. Through analyzing how the internal representations of these models change in the developmental path, we found that NLMs acquire the most abstract global statistical knowledge at the beginning of learning and later acquire the relatively local statistical dependencies. This learning path contains many over-generalizations from the very beginning and these over-generalizations are gradually constrained in the later stage of learning. Based on this observation, we propose a new framework to explain the statistical learning and language cognition of NLMs.

Comments: 10 pages, 7 figures, oral presentation at Interdisciplinary Advances in Statistical Learning

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2606.27460 [cs.CL]

(or arXiv:2606.27460v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2606.27460

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Bojun Wang [view email] [v1] Thu, 25 Jun 2026 18:34:56 UTC (1,124 KB)

Full-text links:

Access Paper:

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.CL

new | recent | 2026-06

Change to browse by:

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)