2026-07-01 04:00 UTCOriginal source2 min readUpdated: 2026-07-01 08:03 UTC

Revocable Learned State via Process Sidecars

This paper introduces process sidecars, a two-coefficient edit family for revoking learned state in language models after safety training, proving second-order accuracy over naive methods and demonstrating improvements across three models.

SourcearXiv Machine LearningAuthor: John Sweeney

Article intelligence

EngineersAdvanced

Key points

Language models often undergo sequential adaptation stages (skill, memory, safety) where later safety training can transform memory directions, making simple subtraction insufficient for revocation.
Process sidecars use a two-coefficient edit that incorporates a secant estimate of the AdamW training process to achieve second-order accurate memory revocation.
The method recovers the counterfactual safety-only oracle under exact conditions and outperforms naive task arithmetic and a simpler process-JVP family in all trials across three models.

Why it matters

This matters because language models often undergo sequential adaptation stages (skill, memory, safety) where later safety training can transform memory directions, making simple subtraction insufficient for revocation.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

This panel is AI-generated and reviewed for accuracy.

-->

[Submitted on 29 Jun 2026]

Title:Revocable Learned State via Process Sidecars

View a PDF of the paper titled Revocable Learned State via Process Sidecars, by John Sweeney

View PDF HTML (experimental)

Abstract:Language models are often adapted in stages: a public skill phase, a private memory phase, and a later safety phase that learns to refuse outputs tied to the remembered entities. Revoking the memory after the safety phase is not the same problem as subtracting the memory update: the later safety optimizer has transported the memory direction. We introduce process sidecars, a two-coefficient edit family $\hat{\theta}(\lambda,\gamma)=\theta_{\mathrm{AMS}}-\lambda\Delta_{\mathrm{M}}-\gamma\hat{R}_{\mathrm{S}\leftarrow\mathrm{M}}$, with $\hat{R}_{\mathrm{S}\leftarrow\mathrm{M}}=\hat{J}_{\mathrm{S},\varepsilon}(\Delta_{\mathrm{M}})-\Delta_{\mathrm{M}}$, where $\hat{J}_{\mathrm{S},\varepsilon}$ is a centered secant through the realized future AdamW safety-training process. The implementation uses $\varepsilon=1$ at the natural memory-edit scale; it reuses $\theta_{\mathrm{AMS}}$ as the positive endpoint and computes one additional safety trace at $\theta_{\mathrm{A}}-\Delta_{\mathrm{M}}$. We prove two things. First, the exact sidecar, using the true transported direction $R_{\mathrm{S}\leftarrow\mathrm{M}}$ rather than the secant estimate, at $(\lambda,\gamma)=(1,1)$ recovers the counterfactual safety-only oracle $\theta_{\mathrm{AS}}$ up to second order; the proof treats AdamW as an augmented-state map over parameters, first moments, and second moments. Second, this process information is necessary: whenever future safety training bends the memory direction, every scalar task-arithmetic edit leaves first-order counterfactual error, while the process-sidecar edit is second-order accurate. Across three models, the validation-selected 2D edit improves held-out refusal closure over naive task arithmetic in all trials, and over the $\gamma=\lambda$ process-JVP subfamily, the diagonal slice of the cached 2D grid, in all paired trials.

Comments: 23 pages, 2 figures, 6 tables

Subjects:

Machine Learning (cs.LG); Computation and Language (cs.CL); Cryptography and Security (cs.CR)

ACM classes: I.2.6; I.2.7

Cite as: arXiv:2606.30788 [cs.LG]

(or arXiv:2606.30788v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2606.30788

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: John Sweeney [view email] [v1] Mon, 29 Jun 2026 18:18:36 UTC (347 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Revocable Learned State via Process Sidecars, by John Sweeney

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.LG

new | recent | 2026-06

Change to browse by:

cs cs.CL cs.CR

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)