ParkingTransformer: LLM-Enhanced End-to-End Trajectory Planning for Autonomous Parking
This paper proposes ParkingTransformer, a novel framework that leverages multi-view perception and scene understanding capability of Large Language Models (LLMs) for end-to-end autonomous parking. By combining trajectory queries with LLMs implicit state features, it outputs planning trajectories directly, eliminating dense BEV representations. It introduces 3D positional encoding, a fixed-window streaming mechanism, and a coarse-to-fine decoding strategy. Experiments on CARLA simulator achieve a driving score of 61.32, and real-world experiments show an average success rate of 88.70%.
[2606.17082] ParkingTransformer: LLM-Enhanced End-to-End Trajectory Planning for Autonomous Parking
[Submitted on 12 Jun 2026]
Title:ParkingTransformer: LLM-Enhanced End-to-End Trajectory Planning for Autonomous Parking
View a PDF of the paper titled ParkingTransformer: LLM-Enhanced End-to-End Trajectory Planning for Autonomous Parking, by Hauteng Wu and 6 other authors
View PDF HTML (experimental)
Abstract:End-to-end autonomous parking has emerged as a critical task within the realm of autonomous driving. However, existing methods suffer from black-box characteristics, lacking high-level semantic understanding and interpretability, which impedes the realization of seamless long-distance autonomous parking from the road to the target spot. To address these limitations, we propose ParkingTransformer, a novel framework that leverages multi-view perception and the scene understanding capability of Large Language Models (LLMs). By combining trajectory queries with LLMs implicit state features, our method interacts directly with historical information and raw sensor data to output planning trajectories, eliminating the need for dense Bird's-View (BEV) representations. To compensate for the inadequate spatial reasoning ability of LLMs, we introduce 3D positional encoding to explicitly inject spatial geometric awareness. Furthermore, a fixed-window streaming mechanism is designed for historical information processing, significantly improving long-term temporal processing efficiency and inference speed. Additionally, a coarse-to-fine decoding strategy is employed to progressively enhance trajectory precision. Extensive closed-loop experiments are conducted on the CARLA simulator and real-world vehicle platforms. The results demonstrate that our method achieves a driving score of 61.32 in CARLA simulator and an average success rate of 88.70% in real-world experiments, validating the feasibility and effectiveness of the proposed algorithms.
Subjects:
Robotics (cs.RO); Artificial Intelligence (cs.AI)
Cite as: arXiv:2606.17082 [cs.RO]
(or arXiv:2606.17082v1 [cs.RO] for this version)
https://doi.org/10.48550/arXiv.2606.17082
arXiv-issued DOI via DataCite
Submission history
From: Huateng Wu [view email] [v1] Fri, 12 Jun 2026 05:52:01 UTC (6,965 KB)
Full-text links:
Access Paper:
View a PDF of the paper titled ParkingTransformer: LLM-Enhanced End-to-End Trajectory Planning for Autonomous Parking, by Hauteng Wu and 6 other authors
View PDF
HTML (experimental)
TeX Source
view license
Current browse context:
cs.RO
new | recent | 2026-06
Change to browse by:
cs cs.AI
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Loading...
Data provided by:
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Code, Data and Media Associated with this Article
alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos
Demos
Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers
Recommenders and Search Tools
Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
Author
Venue
Institution
Topic
About arXivLabs
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)