2026-06-04 04:00 UTCOriginal source2 min readUpdated: 2026-06-30 13:03 UTC

End-to-End Text Line Detection and Ordering

This paper introduces Orli (Ordered Regression of Lines), an end-to-end model that unifies text line detection and reading order prediction as a single image-to-sequence task. Trained on 196,691 pages across ten writing systems, Orli marginally exceeds state-of-the-art on cBAD line detection without dataset-specific training, achieves near-perfect coverage and ordering on multiple reading-order benchmarks zero-shot, and adapts to specialized out-of-domain layouts with limited fine-tuning. Code and weights are open-sourced.

SourcearXiv Computer VisionAuthor: Benjamin Kiessling (ALMAnaCH)

[2606.04166] End-to-End Text Line Detection and Ordering

[Submitted on 2 Jun 2026]

Title:End-to-End Text Line Detection and Ordering

View a PDF of the paper titled End-to-End Text Line Detection and Ordering, by Benjamin Kiessling (ALMAnaCH)

View PDF

Abstract:Practical text-recognition pipelines for historical documents typically decompose layout analysis into line detection followed by a separate reading-order step, with the latter most often handled by a hand-coded geometric heuristic that struggles with marginalia, multiple columns, tables, and source-specific editorial conventions. This article introduces Orli (Ordered Regression of Lines), an end-to-end model that casts both sub-tasks as a single image-to-sequence problem: from a page image, Orli autoregressively generates text-line baselines directly in reading order. Baselines are represented in a chord-frame parameterization that anchors a line's position, orientation, and extent while encoding local geometry through perpendicular offsets; an iterative refinement head and a local visual refiner produce the final curve. Trained on a heterogeneous corpus of 196,691 pages spanning ten writing systems, Orli marginally exceeds the previously reported state of the art for cBAD line detection without dataset-specific training, reaches near perfect coverage and ordering on multiple reading-order benchmarks zero-shot, and adapts to more specialized out-of-domain layouts with limited fine-tuning. The method's source code and model weights are available under an open license at this https URL.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2606.04166 [cs.CV]

(or arXiv:2606.04166v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2606.04166

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Benjamin Kiessling [view email] [via CCSD proxy] [v1] Tue, 2 Jun 2026 19:29:32 UTC (278 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled End-to-End Text Line Detection and Ordering, by Benjamin Kiessling (ALMAnaCH)

View PDF

TeX Source

view license

Current browse context:

cs.CV

new | recent | 2026-06

Change to browse by:

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)