2026-06-02 04:00 UTCOriginal source2 min readUpdated: 2026-06-30 13:03 UTC

Improved Belief-Attention in Vision Task

This paper proposes Belief2-Attention, an extension of Belief-Attention that utilizes both the perpendicular and projected components from orthogonal projection. The projected component is processed via an activation function and linear mapping, functioning as a two-layer FFN. Additionally, an extra inner-product matrix ZZ^T is added to QK^T to capture richer token correlations. Experiments on image classification and segmentation demonstrate improved performance.

SourcearXiv Computer VisionAuthor: Guoqiang Zhang

[2606.00077] Improved Belief-Attention in Vision Task

[Submitted on 22 May 2026]

Title:Improved Belief-Attention in Vision Task

View a PDF of the paper titled Improved Belief-Attention in Vision Task, by Guoqiang Zhang

View PDF HTML (experimental)

Abstract:Recently, Belief-Attention \cite{Guoqiang25BeliefAttention} has been proposed by first performing an orthogonal projection of the softmax-based weighted summation of $V$ vectors with respect to the original $V$ vectors and then taking the perpendicular component as the residual signal in Transformer for performance improvement. In this paper, we first conduct an ablation study showing the projected component also carries information about the token correlation, which should not be ignored. We then propose to extend Belief-Attention by making use of both the perpendicular and projected components. In particular, the projected component goes through certain activation function and then a linear mapping before merging with the considered token. Conceptually speaking, the neural block for the projected component can be viewed as a two-layer feedforward network (FFN) within the new attention block. It is also noted that standard attention captures the token correlation via the inner-product matrix $QK^T$. We propose to introduce an additional inner-product matrix $ZZ^T$ to $QK^T$ to capture richer token correlation. We refer to the new module as Belief2-Attention. It can be easily shown that Belief2-Attention is more expressive than standard Attention. We then verify the effectiveness of Belief2-Attention for vision tasks of image classification and segmentation.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2606.00077 [cs.CV]

(or arXiv:2606.00077v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2606.00077

arXiv-issued DOI via DataCite

Submission history

From: Guoqiang Zhang [view email] [v1] Fri, 22 May 2026 09:12:43 UTC (123 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Improved Belief-Attention in Vision Task, by Guoqiang Zhang

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.CV

new | recent | 2026-06

Change to browse by:

cs cs.AI

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)