Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection
This paper presents a lightweight modification to the DETR-based fusion transformer baseline for the MaCVi 2026 Vision-to-Chart data association challenge. A dedicated MLP (QueryMLP) is trained to explicitly predict the buoy's waterline contact point in the image from chart measurements and IMU orientation data. The predicted pixel coordinates are appended to the baseline decoder query vector, providing a direct spatial prior per buoy and reducing the geometric reasoning burden on the transformer decoder. The approach achieves an Overall score of 0.7386, F1=0.8055, and mIoU=0.6718 on the held-out test set, placing second among all submissions.
Article intelligence
Key points
- QueryMLP explicitly predicts buoy pixel coordinates from chart and IMU data, providing a spatial prior.
- Reduces geometric reasoning burden on the transformer decoder.
- Achieves second place in MaCVi 2026 challenge with Overall 0.7386, F1 0.8055, mIoU 0.6718.
Why it matters
This matters because queryMLP explicitly predicts buoy pixel coordinates from chart and IMU data, providing a spatial prior.
Technical impact
May affect model selection, inference cost, product capability, and evaluation benchmarks.
[2605.22942] Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection
[Submitted on 21 May 2026]
Title:Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection
View a PDF of the paper titled Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection, by Borja Carrillo-Perez (Arquimea Research Center)
View PDF HTML (experimental)
Abstract:This report presents a lightweight modification to the DETR-based fusion transformer baseline for the MaCVi 2026 Vision-to-Chart data association challenge. The challenge baseline decoder receives per-buoy queries encoding world-space distance and bearing, forcing the transformer to implicitly learn the complex geometric projection from world coordinates to image pixels. Instead, this work trains an additional dedicated MLP, QueryMLP, to explicitly predict the buoy's waterline contact point in the image from chart measurements and IMU orientation data. The predicted pixel coordinates are appended to the baseline decoder query vector, providing a direct spatial prior per buoy and reducing the geometric reasoning burden on the transformer decoder. On the challenge leaderboard, the presented approach achieves an Overall score of 0.7386, with F1 = 0.8055 and mIoU = 0.6718, on the held-out test set, placing second among all submissions.
Comments: 5 pages, 3 figures. Technical report for the MaCVi 2026 Vision-to-Chart Data Association Challenge at the CVPR 2026 Workshop; 2nd place submission. Code: this https URL
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2605.22942 [cs.CV]
(or arXiv:2605.22942v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2605.22942
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Borja Carrillo Perez [view email] [v1] Thu, 21 May 2026 18:17:55 UTC (1,562 KB)
Full-text links:
Access Paper:
View a PDF of the paper titled Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection, by Borja Carrillo-Perez (Arquimea Research Center)
View PDF
HTML (experimental)
TeX Source
view license
Current browse context:
cs.CV
new | recent | 2026-05
Change to browse by:
cs
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Loading...
Data provided by:
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Code, Data and Media Associated with this Article
alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos
Demos
Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers
Recommenders and Search Tools
Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
Author
Venue
Institution
Topic
About arXivLabs
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)