Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift
This paper introduces Architecture-driven Shift (ADS), a lightweight metric for selecting pre-trained models in continual learning. ADS decouples logit shift into architecture and data dependencies, requiring only few data samples to capture shift trends. Experiments across over 175 architectures show strong monotonic correlation (Spearman's r_s ≥ 0.731) between ADS and logit shift, and ADS serves as an effective proxy for expected calibration error for reliable CL model selection across three datasets and six scenarios.
Article intelligence
Key points
- Selecting pre-trained models that balance plasticity and stability in continual learning is critical, but computing logit shift is computationally expensive.
- Existing theories assume uniform hidden layer widths, ignoring real-world architectural heterogeneity and failing to provide efficient alternatives.
- Proposed ADS decouples shift into architecture and data dependencies, based on three mechanistic components, computable with few data samples.
- Experiments on over 175 architectures demonstrate strong correlation between ADS and logit shift, validating ADS as a lightweight model selection proxy.
Why it matters
This matters because selecting pre-trained models that balance plasticity and stability in continual learning is critical, but computing logit shift is computationally expensive.
Technical impact
May affect model selection, inference cost, product capability, and evaluation benchmarks.
[2605.27469] Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift
[Submitted on 26 May 2026]
Title:Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift
View a PDF of the paper titled Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift, by Zhong Ye and 1 other authors
View PDF HTML (experimental)
Abstract:Continual Learning (CL) is a practical paradigm to utilize power of deep pre-trained neural networks, but which pre-trained model has a better ability to balance ``Plasticity-Stability", deserving to be chosen? The logit shift serves as a natural proxy because it represents the logit shift in CL scenarios. However, obtaining the logit shift requires huge computational cost, which hinders large-scale model selection. Existing theoretical analyses fail to offer an efficient alternative because of the assumption of uniform hidden layer widths, which ignores the structural heterogeneity (variable width and depth) of real-world architectures. This raises a critical question: what theoretically relationship can be identified between heterogeneous architecture and logit shift on prior tasks (that the model has been trained on)? To answer the question, we decouple logit shift into architecture dependency and data dependency to establish our framework, which reveals that the combination of two dependency, defined as Architecture-driven Shift (ADS), that can capture the logit shift tendency well computable with few data samples. Specifically, for a well-optimized model on prior tasks, higher ADS is associated with a larger logit shift after training on the current task, which derived based on three mechanistic components: (1) spectral norm scaling of weight matrix gradients with layer width, (2) the optimization path length of the new task, and (3) the asymptotic task conflict in wide networks. Extensive empirical results across more than 175 diverse architectures demonstrate a strong monotonic correlation (the weakest Spearman's $r_s=0.731$) between ADS and logit shift. Practically, we demonstrate that ADS can serve as a lightweight proxy of the expected calibration error, which is a widely used metric for reliable CL model selection, on three datasets across six scenarios.
Subjects:
Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as: arXiv:2605.27469 [cs.LG]
(or arXiv:2605.27469v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2605.27469
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Zhong Ye [view email] [v1] Tue, 26 May 2026 08:41:13 UTC (2,417 KB)
Full-text links:
Access Paper:
View a PDF of the paper titled Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift, by Zhong Ye and 1 other authors
View PDF
HTML (experimental)
TeX Source
view license
Current browse context:
cs.LG
new | recent | 2026-05
Change to browse by:
cs cs.AI
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Loading...
Data provided by:
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Code, Data and Media Associated with this Article
alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos
Demos
Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers
Recommenders and Search Tools
Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
IArxiv recommender toggle
IArxiv Recommender (What is IArxiv?)
Author
Venue
Institution
Topic
About arXivLabs
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)