AI News HubLIVE
原文2 min read

Residual Modeling for High-Fidelity Learned Compression of Scientific Data

Lossy compression is crucial for massive spatiotemporal data from scientific simulations. Learned compressors achieve high compression ratios at moderate accuracy, but in high-fidelity regimes (block-level NRMSE 10^-6 to 10^-4), residual correction streams dominate the bitrate. This paper proposes a residual-centric view and introduces two residual coders: LBRC (deterministic, training-free adaptive quantization pipeline) and NGLR (adds a causal neural predictor). On E3SM, JHTDB, and ERA5 datasets, LBRC improves compression ratio over GAE by 30-60%, and NGLR adds 10-40% further, outperforming SZ.

SourcearXiv AIAuthor: Liangji Zhu, Sanjay Ranka, Anand Rangarajan

[2606.05389] Residual Modeling for High-Fidelity Learned Compression of Scientific Data

[Submitted on 3 Jun 2026]

Title:Residual Modeling for High-Fidelity Learned Compression of Scientific Data

View a PDF of the paper titled Residual Modeling for High-Fidelity Learned Compression of Scientific Data, by Liangji Zhu and 2 other authors

View PDF HTML (experimental)

Abstract:Lossy compression is essential for massive spatiotemporal data from scientific simulations. Learned compressors can achieve high compression ratios at moderate accuracy targets, but their aggregate reconstruction losses do not guarantee accuracy for each block. Existing Guaranteed Autoencoder (GAE) methods add a per-block residual correction by retaining SVD/PCA-style coefficients until the target is met. This works at moderate tolerances, but in the high-fidelity regime with block-level NRMSE from 10^-6 to 10^-4, the number of retained coefficients grows quickly and the correction stream dominates the total rate.

We propose a residual-centric view: the learned residual is structurally different from the original scientific field and should be coded with a representation designed for that residual. We introduce two residual coders. LBRC is a deterministic, training-free pipeline that adaptively quantizes the learned residual to the target NRMSE and losslessly encodes the resulting integer residual using 3D Lorenzo differencing, zigzag mapping, bit-plane coding, and entropy coding. NGLR adds a causal neural predictor that outputs a normalized bias for an integer-rounded Lorenzo prediction in the same deterministic integer pipeline, reducing the entropy of the remaining residual code while preserving deterministic decoding. The predictor weights are serialized and counted in the bitstream.

Across E3SM, JHTDB, and ERA5 at block-level NRMSE targets from 10^-6 to 10^-4, LBRC improves compression ratio over GAE by 30-60% and is broadly competitive with SZ. NGLR adds a further 10-40% over LBRC and outperforms SZ in the evaluated high-fidelity regime. These results show that residual representations tailored to learned-compressor residuals can preserve the advantage of learned compression when global residual correction becomes rate-dominant.

Comments: 9 pages, 3 figures, 3 tables

Subjects:

Artificial Intelligence (cs.AI)

Cite as: arXiv:2606.05389 [cs.AI]

(or arXiv:2606.05389v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2606.05389

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Liangji Zhu [view email] [v1] Wed, 3 Jun 2026 19:49:23 UTC (1,250 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Residual Modeling for High-Fidelity Learned Compression of Scientific Data, by Liangji Zhu and 2 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.AI

new | recent | 2026-06

Change to browse by:

cs

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Loading...

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Related Papers

Recommenders and Search Tools

Link to Influence Flower

Influence Flower (What are Influence Flowers?)

Core recommender toggle

CORE Recommender (What is CORE?)

Author

Venue

Institution

Topic

About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Residual Modeling for High-Fidelity Learned Compression of Scientific Data | AI News Hub