2026-06-25 04:00 UTCOriginal source2 min readUpdated: 2026-06-25 08:08 UTC

Memory Retrieval in Visuomotor Policies for Long-Horizon Robot Control

This paper introduces HALO, a visuomotor policy with attention-based memory retrieval for long-horizon robot control, addressing spurious correlations and error accumulation in imitation learning.

SourcearXiv RoboticsAuthor: Rutav Shah, Yisu Li, Femi Bello, Yuke Zhu, Roberto Mart\'in-Mart\'in

[2606.25136] Memory Retrieval in Visuomotor Policies for Long-Horizon Robot Control

[Submitted on 23 Jun 2026]

Title:Memory Retrieval in Visuomotor Policies for Long-Horizon Robot Control

View a PDF of the paper titled Memory Retrieval in Visuomotor Policies for Long-Horizon Robot Control, by Rutav Shah and 4 other authors

View PDF HTML (experimental)

Abstract:General-purpose robots operating in partially observable environments, such as homes, require memory to support autonomy. They must recall diverse information from the past, such as where objects were placed, which tasks a human partner has completed, and when an appliance was turned on. Achieving this versatility requires a general memory retrieval mechanism. Transformer architectures that use attention over long contexts for memory retrieval provide a promising approach, as they learn retrieval from data rather than relying on task-specific or hand-designed rules. However, directly incorporating them into imitation learning from offline data introduces two key challenges: (1) the policy may learn spurious correlations between past information and predicted actions, and (2) errors accumulate in memory due to prediction inaccuracies and their compounding interactions with the environment, causing model drift and cascading failures. To address both challenges, we introduce HALO, a visuomotor policy with an attention-based memory retrieval mechanism for long-horizon control. First, to suppress spurious correlations, HALO distills vision-language model (VLM) priors into the policy. It generates memory-dependent question--answer pairs from demonstration trajectories and trains jointly with a video question--answering objective, steering retrieval toward task-relevant information. Second, to reduce the impact of accumulated errors in memory during closed-loop control, HALO uses sparse attention that restricts retrieval to only the most relevant parts of the history. Together, these components enable more reliable long-horizon control by guiding the policy to retrieve task-relevant information from up to eight minutes of past experience. Project website: this https URL

Comments: 16 pages, 5 tables, 8 figures

Subjects:

Robotics (cs.RO)

Cite as: arXiv:2606.25136 [cs.RO]

(or arXiv:2606.25136v1 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2606.25136

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Rutav Shah [view email] [v1] Tue, 23 Jun 2026 20:07:23 UTC (10,524 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Memory Retrieval in Visuomotor Policies for Long-Horizon Robot Control, by Rutav Shah and 4 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.RO

new | recent | 2026-06

Change to browse by:

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)