2026-05-28 04:00 UTCOriginal source3 min readUpdated: 2026-06-30 13:03 UTC

Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift

This paper introduces Architecture-driven Shift (ADS), a lightweight metric for selecting pre-trained models in continual learning. ADS decouples logit shift into architecture and data dependencies, requiring only few data samples to capture shift trends. Experiments across over 175 architectures show strong monotonic correlation (Spearman's r_s ≥ 0.731) between ADS and logit shift, and ADS serves as an effective proxy for expected calibration error for reliable CL model selection across three datasets and six scenarios.

SourcearXiv Machine LearningAuthor: Zhong Ye, Yu Hu, Ruilin Tang

Article intelligence

EngineersAdvanced

Key points

Selecting pre-trained models that balance plasticity and stability in continual learning is critical, but computing logit shift is computationally expensive.
Existing theories assume uniform hidden layer widths, ignoring real-world architectural heterogeneity and failing to provide efficient alternatives.
Proposed ADS decouples shift into architecture and data dependencies, based on three mechanistic components, computable with few data samples.
Experiments on over 175 architectures demonstrate strong correlation between ADS and logit shift, validating ADS as a lightweight model selection proxy.

Why it matters

This matters because selecting pre-trained models that balance plasticity and stability in continual learning is critical, but computing logit shift is computationally expensive.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

This panel is AI-generated and reviewed for accuracy.

[2605.27469] Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift

[Submitted on 26 May 2026]

Title:Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift

View a PDF of the paper titled Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift, by Zhong Ye and 1 other authors

View PDF HTML (experimental)

Abstract:Continual Learning (CL) is a practical paradigm to utilize power of deep pre-trained neural networks, but which pre-trained model has a better ability to balance ``Plasticity-Stability", deserving to be chosen? The logit shift serves as a natural proxy because it represents the logit shift in CL scenarios. However, obtaining the logit shift requires huge computational cost, which hinders large-scale model selection. Existing theoretical analyses fail to offer an efficient alternative because of the assumption of uniform hidden layer widths, which ignores the structural heterogeneity (variable width and depth) of real-world architectures. This raises a critical question: what theoretically relationship can be identified between heterogeneous architecture and logit shift on prior tasks (that the model has been trained on)? To answer the question, we decouple logit shift into architecture dependency and data dependency to establish our framework, which reveals that the combination of two dependency, defined as Architecture-driven Shift (ADS), that can capture the logit shift tendency well computable with few data samples. Specifically, for a well-optimized model on prior tasks, higher ADS is associated with a larger logit shift after training on the current task, which derived based on three mechanistic components: (1) spectral norm scaling of weight matrix gradients with layer width, (2) the optimization path length of the new task, and (3) the asymptotic task conflict in wide networks. Extensive empirical results across more than 175 diverse architectures demonstrate a strong monotonic correlation (the weakest Spearman's $r_s=0.731$) between ADS and logit shift. Practically, we demonstrate that ADS can serve as a lightweight proxy of the expected calibration error, which is a widely used metric for reliable CL model selection, on three datasets across six scenarios.

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2605.27469 [cs.LG]

(or arXiv:2605.27469v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2605.27469

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Zhong Ye [view email] [v1] Tue, 26 May 2026 08:41:13 UTC (2,417 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift, by Zhong Ye and 1 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.LG

new | recent | 2026-05

Change to browse by:

cs cs.AI

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)