AI News HubLIVE
原文

In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective

This study reexamines retrieval-augmented generation (RAG) through the lens of gradient descent. It proves that a single linear self-attention layer can perform one gradient-descent step on a unified linearized RAG objective, establishing an exact equivalence between retrieval-augmented prediction and in-context optimization. Based on this insight, the authors propose a lightweight method that uses a forward-only update to optimize the evidence-use interface of frozen RAG large language models. Across seven QA benchmarks, the method improves baseline performance without modifying the retriever or backbone, approaching test-time gradient adaptation at significantly lower per-query cost.

Article intelligence

EngineersAdvanced

Key points

  • RAG is reinterpreted as an in-context optimization process with a theoretical link to gradient descent.
  • A single linear self-attention layer can implement one gradient-descent step covering both projection-based and dot-product retrieval interfaces.
  • A lightweight forward-only update method is proposed, requiring no changes to the retriever or backbone model.
  • The method improves performance on seven QA benchmarks, transfers to held-out tasks, and nears test-time gradient optimization efficiency.

Why it matters

This matters because RAG is reinterpreted as an in-context optimization process with a theoretical link to gradient descent.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

[2605.26356] In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective

[Submitted on 25 May 2026]

Title:In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective

View a PDF of the paper titled In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective, by Mingchen Li and 4 other authors

View PDF HTML (experimental)

Abstract:In-context learning has recently been linked to implicit gradient descent in linear self-attention models, suggesting that context can induce a forward-pass update. Retrieval-augmented generation (RAG) also relies on context, but retrieved documents are usually treated as static evidence rather than signals for adaptation. We study RAG as an in-context optimization process. First, we show that one linear self-attention layer can implement one gradient-descent step on a unified linearized RAG objective covering both projection-based and dot-product retrieval interfaces. This gives an exact regime where retrieval-augmented prediction and in-context optimization coincide. We use this result not as a literal model of LLM computation, but as a guide for adapting the interaction between queries and retrieved evidence. We then test the boundary of this correspondence: it remains stable under controlled linear extensions, but becomes feature-distribution dependent under nonlinear architectures. Finally, we turn this view into a lightweight method for frozen RAG LLMs. The method keeps the retriever and backbone fixed, and predicts a context-conditioned update to a generator-side evidence-use interface. Across seven QA benchmarks, two retrievers, and two frozen LLM backbones, this forward-only update improves a shared-interface baseline, transfers to held-out tasks, and approaches test-time gradient adaptation at much lower per-query cost.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2605.26356 [cs.CL]

(or arXiv:2605.26356v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2605.26356

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Mingchen Li [view email] [v1] Mon, 25 May 2026 22:04:54 UTC (273 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective, by Mingchen Li and 4 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.CL

new | recent | 2026-05

Change to browse by:

cs

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Loading...

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Related Papers

Recommenders and Search Tools

Link to Influence Flower

Influence Flower (What are Influence Flowers?)

Core recommender toggle

CORE Recommender (What is CORE?)

Author

Venue

Institution

Topic

About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)