Skip to yearly menu bar Skip to main content


Poster

CR2PQ: Continuous Relative Rotary Positional Query for Dense Visual Representation Learning

Shaofeng Zhang · Qiang Zhou · Sitong Wu · Haoru Tan · zhibin wang · Jinfa Huang · Junchi Yan

Hall 3 + Hall 2B #560
[ ]
Fri 25 Apr midnight PDT — 2:30 a.m. PDT

Abstract: Dense visual contrastive learning (DRL) shows promise for learning localized information in dense prediction tasks, but struggles with establishing pixel/patch correspondence across different views (cross-contrasting). Existing methods primarily rely on self-contrasting the same view with variations, limiting input variance and hindering downstream performance. This paper delves into the mechanisms of self-contrasting and cross-contrasting, identifying the crux of the issue: transforming discrete positional embeddings to continuous representations. To address the correspondence problem, we propose a Continuous Relative Rotary Positional Query ({\mname}), enabling patch-level representation learning. Our extensive experiments on standard datasets demonstrate state-of-the-art (SOTA) results. Compared to the previous SOTA method (PQCL), our approach achieves significant improvements on COCO: with 300 epochs of pretraining, {\mname} obtains \textbf{3.4\%} mAPbb and \textbf{2.1\%} mAPmk improvements for detection and segmentation tasks, respectively. Furthermore, {\mname} exhibits faster convergence, achieving \textbf{10.4\%} mAPbb and \textbf{7.9\%} mAPmk improvements over SOTA with just 40 epochs of pretraining.

Live content is unavailable. Log in and register to view live content