Residual Modeling for High-Fidelity Learned Compression of Scientific Data
Lossy compression is crucial for massive spatiotemporal data from scientific simulations. Learned compressors achieve high compression ratios at moderate accuracy, but in high-fidelity regimes (block-level NRMSE 10^-6 to 10^-4), residual correction streams dominate the bitrate. This paper proposes a residual-centric view and introduces two residual coders: LBRC (deterministic, training-free adaptive quantization pipeline) and NGLR (adds a causal neural predictor). On E3SM, JHTDB, and ERA5 datasets, LBRC improves compression ratio over GAE by 30-60%, and NGLR adds 10-40% further, outperforming SZ.
[2606.05389] Residual Modeling for High-Fidelity Learned Compression of Scientific Data
[Submitted on 3 Jun 2026]
Title:Residual Modeling for High-Fidelity Learned Compression of Scientific Data
View a PDF of the paper titled Residual Modeling for High-Fidelity Learned Compression of Scientific Data, by Liangji Zhu and 2 other authors
View PDF HTML (experimental)
Abstract:Lossy compression is essential for massive spatiotemporal data from scientific simulations. Learned compressors can achieve high compression ratios at moderate accuracy targets, but their aggregate reconstruction losses do not guarantee accuracy for each block. Existing Guaranteed Autoencoder (GAE) methods add a per-block residual correction by retaining SVD/PCA-style coefficients until the target is met. This works at moderate tolerances, but in the high-fidelity regime with block-level NRMSE from 10^-6 to 10^-4, the number of retained coefficients grows quickly and the correction stream dominates the total rate.
We propose a residual-centric view: the learned residual is structurally different from the original scientific field and should be coded with a representation designed for that residual. We introduce two residual coders. LBRC is a deterministic, training-free pipeline that adaptively quantizes the learned residual to the target NRMSE and losslessly encodes the resulting integer residual using 3D Lorenzo differencing, zigzag mapping, bit-plane coding, and entropy coding. NGLR adds a causal neural predictor that outputs a normalized bias for an integer-rounded Lorenzo prediction in the same deterministic integer pipeline, reducing the entropy of the remaining residual code while preserving deterministic decoding. The predictor weights are serialized and counted in the bitstream.
Across E3SM, JHTDB, and ERA5 at block-level NRMSE targets from 10^-6 to 10^-4, LBRC improves compression ratio over GAE by 30-60% and is broadly competitive with SZ. NGLR adds a further 10-40% over LBRC and outperforms SZ in the evaluated high-fidelity regime. These results show that residual representations tailored to learned-compressor residuals can preserve the advantage of learned compression when global residual correction becomes rate-dominant.
Comments: 9 pages, 3 figures, 3 tables
Subjects:
Artificial Intelligence (cs.AI)
Cite as: arXiv:2606.05389 [cs.AI]
(or arXiv:2606.05389v1 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2606.05389
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Liangji Zhu [view email] [v1] Wed, 3 Jun 2026 19:49:23 UTC (1,250 KB)
Full-text links:
Access Paper:
View a PDF of the paper titled Residual Modeling for High-Fidelity Learned Compression of Scientific Data, by Liangji Zhu and 2 other authors
View PDF
HTML (experimental)
TeX Source
view license
Current browse context:
cs.AI
new | recent | 2026-06
Change to browse by:
cs
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Loading...
Data provided by:
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Code, Data and Media Associated with this Article
alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos
Demos
Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers
Recommenders and Search Tools
Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
Author
Venue
Institution
Topic
About arXivLabs
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)