2026-06-17原文2 min readUpdated: 2026-06-17

RepSelect: Robust LLM Unlearning via Representation Selectivity

RepSelect is a new LLM unlearning method that isolates forget-set-specific representations by collapsing top principal components of weight gradients, achieving 4-50x better resistance to reversal than existing methods.

SourcearXiv Computational LinguisticsAuthor: Filip Sondej, Yushi Yang, Adam Mahdi

[2606.17168] RepSelect: Robust LLM Unlearning via Representation Selectivity

[Submitted on 15 Jun 2026]

Title:RepSelect: Robust LLM Unlearning via Representation Selectivity

View a PDF of the paper titled RepSelect: Robust LLM Unlearning via Representation Selectivity, by Filip Sondej and 2 other authors

View PDF HTML (experimental)

Abstract:Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. However, current methods are easily reversed by fine-tuning or few-shot prompting, suggesting their forgetting is only shallow. We identify the root cause. Existing methods target representations shared with both the retain set and the subspace recovered by a fine-tuning attacker, making unlearning both disruptive to general capabilities and easy to reverse. We propose RepSelect (Representation Selectivity), isolates forget-set-specific representations by collapsing top principal components of weight gradients before each update, leaving general capabilities intact while limiting what fine-tuning can recover. We evaluate across two forget categories, biohazardous knowledge and abusive tendencies, and four model families spanning dense and Mixture-of-Experts architectures (Llama 3, Qwen 3.5, Gemma 4 E4B, DeepSeek V2 Lite). Compared to five popular baselines (GradDiff, NPO, SimNPO, RMU, UNDIAL), RepSelect achieves a 4-50x larger reduction in post-relearning answer accuracy than the strongest baseline, and is near-perfectly robust to few-shot prompting attacks. Targeting selective representations is thus an important step towards deep and robust LLM forgetting.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2606.17168 [cs.CL]

(or arXiv:2606.17168v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2606.17168

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yushi Yang [view email] [v1] Mon, 15 Jun 2026 18:06:59 UTC (343 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled RepSelect: Robust LLM Unlearning via Representation Selectivity, by Filip Sondej and 2 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.CL

new | recent | 2026-06

Change to browse by:

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)