2026-06-06 04:00 UTCOriginal source2 min readUpdated: 2026-06-30 13:03 UTC

Residual Modeling for High-Fidelity Learned Compression of Scientific Data

Lossy compression is crucial for massive spatiotemporal data from scientific simulations. Learned compressors achieve high compression ratios at moderate accuracy, but in high-fidelity regimes (block-level NRMSE 10^-6 to 10^-4), residual correction streams dominate the bitrate. This paper proposes a residual-centric view and introduces two residual coders: LBRC (deterministic, training-free adaptive quantization pipeline) and NGLR (adds a causal neural predictor). On E3SM, JHTDB, and ERA5 datasets, LBRC improves compression ratio over GAE by 30-60%, and NGLR adds 10-40% further, outperforming SZ.

SourcearXiv AIAuthor: Liangji Zhu, Sanjay Ranka, Anand Rangarajan

Article intelligence

EngineersAdvanced

Key points

Existing learned compressors suffer from high bitrate in high-fidelity regimes due to residual correction overhead
LBRC: deterministic, training-free pipeline that adaptively quantizes and losslessly encodes residual using 3D Lorenzo differencing, zigzag mapping, bit-plane coding, and entropy coding
NGLR: adds a causal neural predictor to reduce entropy of remaining residual code while preserving deterministic decoding
Experiments show 30-60% improvement over GAE and 10-40% further improvement over LBRC, surpassing SZ on various datasets

Why it matters

This matters because existing learned compressors suffer from high bitrate in high-fidelity regimes due to residual correction overhead.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

This panel is AI-generated and reviewed for accuracy.

[2606.05389] Residual Modeling for High-Fidelity Learned Compression of Scientific Data

[Submitted on 3 Jun 2026]

Title:Residual Modeling for High-Fidelity Learned Compression of Scientific Data

View a PDF of the paper titled Residual Modeling for High-Fidelity Learned Compression of Scientific Data, by Liangji Zhu and 2 other authors

View PDF HTML (experimental)

Abstract:Lossy compression is essential for massive spatiotemporal data from scientific simulations. Learned compressors can achieve high compression ratios at moderate accuracy targets, but their aggregate reconstruction losses do not guarantee accuracy for each block. Existing Guaranteed Autoencoder (GAE) methods add a per-block residual correction by retaining SVD/PCA-style coefficients until the target is met. This works at moderate tolerances, but in the high-fidelity regime with block-level NRMSE from 10^-6 to 10^-4, the number of retained coefficients grows quickly and the correction stream dominates the total rate.

We propose a residual-centric view: the learned residual is structurally different from the original scientific field and should be coded with a representation designed for that residual. We introduce two residual coders. LBRC is a deterministic, training-free pipeline that adaptively quantizes the learned residual to the target NRMSE and losslessly encodes the resulting integer residual using 3D Lorenzo differencing, zigzag mapping, bit-plane coding, and entropy coding. NGLR adds a causal neural predictor that outputs a normalized bias for an integer-rounded Lorenzo prediction in the same deterministic integer pipeline, reducing the entropy of the remaining residual code while preserving deterministic decoding. The predictor weights are serialized and counted in the bitstream.

Across E3SM, JHTDB, and ERA5 at block-level NRMSE targets from 10^-6 to 10^-4, LBRC improves compression ratio over GAE by 30-60% and is broadly competitive with SZ. NGLR adds a further 10-40% over LBRC and outperforms SZ in the evaluated high-fidelity regime. These results show that residual representations tailored to learned-compressor residuals can preserve the advantage of learned compression when global residual correction becomes rate-dominant.

Comments: 9 pages, 3 figures, 3 tables

Subjects:

Artificial Intelligence (cs.AI)

Cite as: arXiv:2606.05389 [cs.AI]

(or arXiv:2606.05389v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2606.05389

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Liangji Zhu [view email] [v1] Wed, 3 Jun 2026 19:49:23 UTC (1,250 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Residual Modeling for High-Fidelity Learned Compression of Scientific Data, by Liangji Zhu and 2 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.AI

new | recent | 2026-06

Change to browse by:

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)