2026-06-02原文2 min readUpdated: 2026-06-02

Aligning Cellular Sheaves with Classifier Attention for Interpretable Weakly-Supervised Pathology Localization

arXiv:2606.00092v1 Announce Type: new Abstract: Weakly-supervised classification of whole-slide images with attention-based multiple instance learning (ABMIL) on top of foundation features now reaches near-saturation on Camelyon16 slide-level performance, but the corresponding attention maps are an imperfect localization signal: in clinical interpretation, a model that classifies correctly without firing on the actual lesion is hard to trust. We address this gap with cellular sheaves, which equip each vertex and edge of a graph with a finite-dimensional vector space and consistent linear maps between them, providing a principled way to detect local disagreement on graph-structured data. We apply cellular sheaves to weakly-supervised tumour localization on whole-slide images, combining a sheaf disagreement field with ABMIL. The natural training objective, encouraging consistency between similar features, produces a disagreement field that tracks tissue-level texture rather than diagnostic content. We propose attention-conditional consistency, which uses the classifier's attention to define which neighbouring patches should agree. Joint training of the classifier and the sheaf under this objective produces a disagreement field with patch-level AUC 0.940 on Camelyon16 and raises the attention head from its ABMIL-alone level of 0.717 to 0.953. Two-stage ablation with the classifier frozen at its ABMIL values reaches only 0.727 on the disagreement field and leaves attention at 0.717, confirming that the gain comes from the projector co-adapting under both objectives, not from the loss change in isolation. The trained model transfers without retraining to annotated slides from Camelyon17, maintaining Delta AUC 0.932 +/- 0.083 and attention AUC 0.955 +/- 0.099. The result is an attention map and a sheaf-disagreement map that fire on the same diagnostic regions, giving clinicians two complementary explanations for each slide-level prediction.

SourcearXiv Computer VisionAuthor: Devansh Lalwani, Swapnil Bhat, Maulik Shah

[2606.00092] Aligning Cellular Sheaves with Classifier Attention for Interpretable Weakly-Supervised Pathology Localization

[Submitted on 24 May 2026]

Title:Aligning Cellular Sheaves with Classifier Attention for Interpretable Weakly-Supervised Pathology Localization

View a PDF of the paper titled Aligning Cellular Sheaves with Classifier Attention for Interpretable Weakly-Supervised Pathology Localization, by Devansh Lalwani and 2 other authors

View PDF HTML (experimental)

Abstract:Weakly-supervised classification of whole-slide images with attention-based multiple instance learning (ABMIL) on top of foundation features now reaches near-saturation on Camelyon16 slide-level performance, but the corresponding attention maps are an imperfect localization signal: in clinical interpretation, a model that classifies correctly without firing on the actual lesion is hard to trust. We address this gap with cellular sheaves, which equip each vertex and edge of a graph with a finite-dimensional vector space and consistent linear maps between them, providing a principled way to detect local disagreement on graph-structured data. We apply cellular sheaves to weakly-supervised tumour localization on whole-slide images, combining a sheaf disagreement field with ABMIL. The natural training objective, encouraging consistency between similar features, produces a disagreement field that tracks tissue-level texture rather than diagnostic content. We propose attention-conditional consistency, which uses the classifier's attention to define which neighbouring patches should agree. Joint training of the classifier and the sheaf under this objective produces a disagreement field with patch-level AUC 0.940 on Camelyon16 and raises the attention head from its ABMIL-alone level of 0.717 to 0.953. Two-stage ablation with the classifier frozen at its ABMIL values reaches only 0.727 on the disagreement field and leaves attention at 0.717, confirming that the gain comes from the projector co-adapting under both objectives, not from the loss change in isolation. The trained model transfers without retraining to annotated slides from Camelyon17, maintaining Delta AUC 0.932 +/- 0.083 and attention AUC 0.955 +/- 0.099. The result is an attention map and a sheaf-disagreement map that fire on the same diagnostic regions, giving clinicians two complementary explanations for each slide-level prediction.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2606.00092 [cs.CV]

(or arXiv:2606.00092v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2606.00092

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Swapnil Bhat [view email] [v1] Sun, 24 May 2026 18:37:12 UTC (3,124 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Aligning Cellular Sheaves with Classifier Attention for Interpretable Weakly-Supervised Pathology Localization, by Devansh Lalwani and 2 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.CV

new | recent | 2026-06

Change to browse by:

cs cs.AI

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)