2026-04-21 16:42 UTCOriginal source5 min readUpdated: 2026-06-27 00:25 UTC

ReasoningBank: Enabling agents to learn from experience

Google Cloud researchers introduce ReasoningBank, a novel agent memory framework that distills generalizable reasoning strategies from both successful and failed experiences, enabling agents to continuously learn after deployment. It outperforms baselines on web and software engineering benchmarks.

SourceGoogle Research Blog

Jump to Content

Research

Who we are

Back to Who we are menu

Defining the technology of today and tomorrow.

Philosophy

We strive to create an environment conducive to many different types of research across many different time scales and levels of risk.

Learn more about our Philosophy Learn more

Philosophy

People

Our researchers drive advancements in computer science through both fundamental and applied research.

Learn more about our People Learn more

People

Research areas

Back to Research areas menu

Research areas

Explore all research areas

Research areas

Back to Research areas menu

Explore all research areas

Foundational ML & Algorithms

Algorithms & Theory

Data Management

Data Mining & Modeling

Information Retrieval & the Web

Machine Intelligence

Machine Perception

Machine Translation

Natural Language Processing

Speech Processing

Foundational ML & Algorithms

Back to Foundational ML & Algorithms menu

Algorithms & Theory

Data Management

Data Mining & Modeling

Information Retrieval & the Web

Machine Intelligence

Machine Perception

Machine Translation

Natural Language Processing

Speech Processing

Computing Systems & Quantum AI

Distributed Systems & Parallel Computing

Hardware & Architecture

Mobile Systems

Networking

Quantum Computing

Robotics

Security, Privacy, & Abuse Prevention

Software Engineering

Software Systems

Computing Systems & Quantum AI

Back to Computing Systems & Quantum AI menu

Distributed Systems & Parallel Computing

Hardware & Architecture

Mobile Systems

Networking

Quantum Computing

Robotics

Security, Privacy, & Abuse Prevention

Software Engineering

Software Systems

Science, AI & Society

Climate & Sustainability

Economics & Electronic Commerce

Education Innovation

General Science

Health & Bioscience

Human-Computer Interaction and Visualization

Responsible AI

Science, AI & Society

Back to Science, AI & Society menu

Climate & Sustainability

Economics & Electronic Commerce

Education Innovation

General Science

Health & Bioscience

Human-Computer Interaction and Visualization

Responsible AI

Our work

Back to Our work menu

Projects

We regularly open-source projects with the broader research community and apply our developments to Google products.

Learn more about our Projects Learn more

Projects

Publications

Publishing our work allows us to share ideas and work collaboratively to advance the field of computer science.

Learn more about our Publications Learn more

Publications

Resources

We make products, tools, and datasets available to everyone with the goal of building a more collaborative ecosystem.

Learn more about our Resources Learn more

Resources

Programs & events

Back to Programs & events menu

Shaping the future, together.

Collaborate with us

Student programs

Supporting the next generation of researchers through a wide range of programming.

Learn more about our Student programs Learn more

Student programs

Faculty programs

Participating in the academic research community through meaningful engagement with university faculty.

Learn more about our Faculty programs Learn more

Faculty programs

Conferences & events

Connecting with the broader research community through events is essential for creating progress in every aspect of our work.

Learn more about our Conferences & events Learn more

Conferences & events

Collaborate with us

Careers

Blog

Home

Blog

ReasoningBank: Enabling agents to learn from experience

April 21, 2026

Jun Yan and Chen-Yu Lee, Research Scientists, Google Cloud

ReasoningBank is a novel agent memory framework that uses successful and failed experiences to distill generalizable reasoning strategies, enabling an agent to continuously learn from experience after deployment.

Quick links

Paper

ReasoningBank code

Copy link

Agents are becoming increasingly crucial in tackling complex real-world tasks, ranging from general web navigation to assisting with extensive software engineering codebases. However, as these agents transition into persistent, long-running roles in the real world, they face a critical limitation: they struggle to analyze and learn from successful and failed experiences after deployment.

Agents approaching each new task without a memory mechanism will repeatedly make the same strategic errors and discard valuable insights. To address this, various forms of agent memory have been introduced to store information about past interactions for reuse. However, existing methods generally focus on saving exhaustive records of every action taken — such as the trajectory memory used in Synapse — or only documenting workflows summarized from successful attempts, as seen in Agent Workflow Memory). These approaches have two fundamental drawbacks: first, by recording detailed actions instead of tactical foresight, they fail to distill higher-level, transferable reasoning patterns; second, by over-emphasizing successful experiences, they miss out on a primary source of learning — their own failures.

To bridge this gap, in our ICLR paper, "ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory", we introduce a novel agent memory framework (github) that distills useful insights from both successful and failed experiences for test-time self-evolution. When evaluated on web browsing and software engineering benchmarks, ReasoningBank enhances both agent effectiveness (higher success rates) and efficiency (fewer task steps) compared to baseline approaches.

Memory content comparison: existing strategies and ReasoningBank.

Distilling insights with ReasoningBank

ReasoningBank distills global reasoning patterns into high-level, structured memories. Each structured memory item contains the following:

Title: A concise identifier summarizing the core strategy.

Description: A brief summary of the memory item.

Content: The distilled reasoning steps, decision rationales, or operational insights extracted from past experiences.

The memory workflow operates in a continuous, closed loop of retrieval, extraction, and consolidation. Before taking action, the agent draws upon the ReasoningBank to gather relevant memories into its context. It then interacts with the environment and uses an LLM-as-a-judge to self-assess the resulting trajectory and extracts success insights or failure reflection. Notably, this self-judgement does not need to be perfectly accurate, as we find ReasoningBank to be quite robust against judgment noise. During extraction, the agent distills workflows and generalizable insights from the trajectory into new memories. For simplicity, we directly append these to the ReasoningBank, leaving more sophisticated consolidation strategies for future work.

Crucially, unlike existing workflow memory strategies that only focus on successful runs, ReasoningBank actively analyzes failed experiences to source counterfactual signals and pitfalls. By distilling these mistakes into preventative lessons, ReasoningBank builds powerful strategic guardrails. For example, instead of merely learning a procedural rule like "click the 'Load More' button”, the agent might learn from a past failure to "always verify the current page identifier first to avoid infinite scroll traps before attempting to load more results”.

Workflow of ReasoningBank integrated with an agent during test time.

Memory-aware test-time scaling (MaTTS)

Test-time scaling (TTS) — scaling compute at inference time — has shown immense effectiveness in reasoning domains like math and competitive programming. However, in agentic environments, existing TTS methods often discard the exploration trajectory and treat the final answer as the only useful outcome. This overlooked exploration is actually a rich data source that could accelerate an agent's ability to learn from experience over time.

We bridge this gap by explicitly linking memory with scaling through memory-aware test-time scaling (MaTTS). By using ReasoningBank as a powerful experience learner, MaTTS distills extensive exploration into high-quality memories via contrastive and refinement signals. We demonstrate the power of MaTTS functions through two distinct forms of scaling:

Parallel scaling: The agent generates multiple distinct trajectories for the same query under the guidance of memory. Through self-contrast, ReasoningBank compares successful and spuriously reasoned trajectories to distill more robust strategies and synthesize higher-quality memories.

Sequential scaling: The agent iteratively refines reasoning within a single trajectory to produce strong intermediate rationale. ReasoningBank captures these intermediate insights on the agent's trial-and-errors and progressive improvement as high-quality memory items.

MaTTS establishes a strong synergy: high-quality memory from ReasoningBank steers the scaled exploration towards more promising strategies, and in return, the scaled interactions generate significantly richer learning signals that feed back into an even smarter ReasoningBank to help the agent.

Comparison of memory-aware test-time scaling (MaTTS) with ReasoningBank.

Performance & emergent capabilities

We evaluated ReasoningBank across challenging benchmarks covering dynamic environments. Using the ReAct prompting strategy as the foundation for all agents, we compared ReasoningBank against three memory configurations: a memory-free baseline (Vanilla ReAct), Synapse (Trajectory Memory) and AWM (Workflow Memory). From our main evaluation results with Gemini-2.5-Flash on WebArena and SWE-Bench-Verified, we have the following key observations:

Superior success rates: ReasoningBank without scaling outperformed memory-free agents by 8.3% on WebArena and 4.6% on SWE-Bench-Verified.

Efficiency gains: Because the agent actively accesses past decision rationales, it executes commands with vastly reduced aimless exploration. On SWE-Bench-Verified, ReasoningBank saved almost 3 total execution steps per task over memory-free baselines.

MaTTS synergy: When adding MaTTS (parallel scaling with a scaling factor k=5), success rates are further boosted. ReasoningBank w/ MaTTS improves over ReasoningBank by a 3% success rate increase and 0.4 fewer steps on WebArena.

Performance comparison (task success rates and average steps per task) of different agent memory strategies on WebArena and SWE-Bench-Verified.

Importantly, during evaluation, we observed the emergence of strategic maturity. In a web-browsing example, the agent's initial curated rules resembled simple procedural checklists (e.g., "Look for page links"). As the agent persisted through more problem sets, these memories were incorporated during execution. Building upon existing knowledge, the agent distilled new trajectories into more advanced memories. Over time, simple checklists evolved into memories with compositional, preventative logic structures (e.g., "Cross-reference tasks continuously with active page filters to ensure retrieved datasets aren't paginated prematurely"). See the paper for more details.

Conclusion

ReasoningBank provides a powerful framework for enabling LLMs to learn from experiences and evolve into continuous learners during test-time. We believe memory-driven experience scaling represents a crucial new frontier for agent scaling.

We are excited to share this with the broader research community.

Acknowledgements

This research was conducted by Siru Ouyang, Jun Yan, I-Hung Hsu, Yanfei Chen, Ke Jiang, Zifeng Wang, Rujun Han, Long T. Le, Samira Daruki, Xiangru Tang, Vishy Tirumalashetty, George Lee, Mahsan Rofouei, Hangfei Lin, Jiawei Han, Chen-Yu Lee, and Tomas Pfister.

Labels:

Generative AI

Machine Intelligence

Natural Language Processing

Quick links

Paper

ReasoningBank code

Copy link