Skip to yearly menu bar Skip to main content


Poster

Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation

Abhishek Aich · Yumin Suh · Samuel Schulter · Manmohan Chandraker

Hall 3 + Hall 2B #110
[ ] [ Project Page ]
Fri 25 Apr 7 p.m. PDT — 9:30 p.m. PDT

Abstract:

A powerful architecture for universal segmentation relies on transformers that encode multi-scale image features and decode object queries into mask predictions. With efficiency being a high priority for scaling such models, we observed that the state-of-the-art method Mask2Former uses \~50% of its compute only on the transformer encoder. This is due to the retention of a full-length token-level representation of all backbone feature scales at each encoder layer. With this observation, we propose a strategy termed PROgressive Token Length SCALing for Efficient transformer encoders (PRO-SCALE) that can be plugged-in to the Mask2Former segmentation architecture to significantly reduce the computational cost. The underlying principle of PRO-SCALE is: progressively scale the length of the tokens with the layers of the encoder. This allows PRO-SCALE to reduce computations by a large margin with minimal sacrifice in performance (\~52% GFLOPs reduction with no drop in performance on COCO dataset). We validate our framework on multiple public benchmarks. Our code will be publicly released.

Live content is unavailable. Log in and register to view live content