2026-06-17原文2 min readUpdated: 2026-06-17

ParkingTransformer: LLM-Enhanced End-to-End Trajectory Planning for Autonomous Parking

This paper proposes ParkingTransformer, a novel framework that leverages multi-view perception and scene understanding capability of Large Language Models (LLMs) for end-to-end autonomous parking. By combining trajectory queries with LLMs implicit state features, it outputs planning trajectories directly, eliminating dense BEV representations. It introduces 3D positional encoding, a fixed-window streaming mechanism, and a coarse-to-fine decoding strategy. Experiments on CARLA simulator achieve a driving score of 61.32, and real-world experiments show an average success rate of 88.70%.

SourcearXiv RoboticsAuthor: Hauteng Wu, Xu Li, Dong Kong, Zihang Wang, Xieyuanli Chen, Benwu Wang, Wenkai Zhu

[2606.17082] ParkingTransformer: LLM-Enhanced End-to-End Trajectory Planning for Autonomous Parking

[Submitted on 12 Jun 2026]

Title:ParkingTransformer: LLM-Enhanced End-to-End Trajectory Planning for Autonomous Parking

View a PDF of the paper titled ParkingTransformer: LLM-Enhanced End-to-End Trajectory Planning for Autonomous Parking, by Hauteng Wu and 6 other authors

View PDF HTML (experimental)

Abstract:End-to-end autonomous parking has emerged as a critical task within the realm of autonomous driving. However, existing methods suffer from black-box characteristics, lacking high-level semantic understanding and interpretability, which impedes the realization of seamless long-distance autonomous parking from the road to the target spot. To address these limitations, we propose ParkingTransformer, a novel framework that leverages multi-view perception and the scene understanding capability of Large Language Models (LLMs). By combining trajectory queries with LLMs implicit state features, our method interacts directly with historical information and raw sensor data to output planning trajectories, eliminating the need for dense Bird's-View (BEV) representations. To compensate for the inadequate spatial reasoning ability of LLMs, we introduce 3D positional encoding to explicitly inject spatial geometric awareness. Furthermore, a fixed-window streaming mechanism is designed for historical information processing, significantly improving long-term temporal processing efficiency and inference speed. Additionally, a coarse-to-fine decoding strategy is employed to progressively enhance trajectory precision. Extensive closed-loop experiments are conducted on the CARLA simulator and real-world vehicle platforms. The results demonstrate that our method achieves a driving score of 61.32 in CARLA simulator and an average success rate of 88.70% in real-world experiments, validating the feasibility and effectiveness of the proposed algorithms.

Subjects:

Robotics (cs.RO); Artificial Intelligence (cs.AI)

Cite as: arXiv:2606.17082 [cs.RO]

(or arXiv:2606.17082v1 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2606.17082

arXiv-issued DOI via DataCite

Submission history

From: Huateng Wu [view email] [v1] Fri, 12 Jun 2026 05:52:01 UTC (6,965 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled ParkingTransformer: LLM-Enhanced End-to-End Trajectory Planning for Autonomous Parking, by Hauteng Wu and 6 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.RO

new | recent | 2026-06

Change to browse by:

cs cs.AI

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)