2026-06-18原文2 min readUpdated: 2026-06-18

SAGE: Retain-Aware Post-Hoc Sanitization of Final Unlearning Vector

SAGE is a post-hoc sanitization method that repairs the retention damage caused by LLM unlearning. By extracting dominant activation geometry from a retain proxy and solving a closed-form optimization, SAGE suppresses update components aligned with high-energy retained directions while preserving the forgetting carrier, alleviating the retain-forget trade-off. Experiments across multiple unlearning methods and model scales demonstrate consistent retention improvement.

SourcearXiv Machine LearningAuthor: Jingyuan Zhang, Yucheng Bai, Peixi Wen, Zhehao Huang, Zhengbao He, Hanling Tian, Xinwen Cheng, Haiyin Ran, Xiaolin Huang

[2606.18309] SAGE: Retain-Aware Post-Hoc Sanitization of Final Unlearning Vector

[Submitted on 16 Jun 2026]

Title:SAGE: Retain-Aware Post-Hoc Sanitization of Final Unlearning Vector

View a PDF of the paper titled SAGE: Retain-Aware Post-Hoc Sanitization of Final Unlearning Vector, by Jingyuan Zhang and 8 other authors

View PDF HTML (experimental)

Abstract:Large Language Model (LLM) unlearning aims to remove undesirable knowledge or behaviors while preserving retained capabilities. Current unlearning methods all involve a trade-off between unlearning and retention. We have found that the retention activation bias can also be used to quantify the damage an unlearning method inflicts on retention, without considering the specific implementation of the unlearning process. This allows us to restore retention performance for any unlearning method using a post-hoc approach. Therefore, we propose a complementary post-hoc setting to sanitize the final update vector without rerunning the original unlearning pipeline. In this setting, we design SAGE, Spectral Activation-GEometry Sanitization, a source-agnostic correction for final unlearning updates. SAGE collects real module inputs from a small retain proxy, extracts their dominant activation geometry, and solves a source-anchored optimization objective in closed form, which suppresses update components aligned with high-energy retained directions while preserving the source method's forgetting carrier. Across multiple unlearning methods, model scales, and benchmarks, SAGE consistently relieves the retain-forget trade-off, identifying post-hoc sanitization of final vectors as a practical and underexplored axis for machine unlearning.

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2606.18309 [cs.LG]

(or arXiv:2606.18309v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2606.18309

arXiv-issued DOI via DataCite

Submission history

From: Jingyuan Zhang [view email] [v1] Tue, 16 Jun 2026 08:29:43 UTC (1,350 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled SAGE: Retain-Aware Post-Hoc Sanitization of Final Unlearning Vector, by Jingyuan Zhang and 8 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.LG

new | recent | 2026-06

Change to browse by:

cs cs.AI

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)