Skip to yearly menu bar Skip to main content


In-Person Poster presentation / poster accept

3D Segmenter: 3D Transformer based Semantic Segmentation via 2D Panoramic Distillation

ZHENNAN WU · YANG LI · Yifei Huang · Lin Gu · Tatsuya Harada · Hiroyuki Sato

MH1-2-3-4 #55

Keywords: [ Applications ] [ knowledge distillation ] [ 3D semantic segmentation ]


Abstract: Recently, 2D semantic segmentation has witnessed a significant advancement thanks to the huge amount of 2D image datasets available. Therefore, in this work, we propose the first 2D-to-3D knowledge distillation strategy to enhance 3D semantic segmentation model with knowledge embedded in the latent space of powerful 2D models. Specifically, unlike standard knowledge distillation, where teacher and student models take the same data as input, we use 2D panoramas properly aligned with corresponding 3D rooms to train the teacher network and use the learned knowledge from 2D teacher to guide 3D student. To facilitate our research, we create a large-scale, fine-annotated 3D semantic segmentation benchmark, containing voxel-wise semantic labels and aligned panoramas of 5175 scenes. Based on this benchmark, we propose a 3D volumetric semantic segmentation network, which adapts Video Swin Transformer as backbone and introduces a skip connected linear decoder. Achieving a state-of-the-art performance, our 3D Segmenter is computationally efficient and only requires $3.8\%$ of the parameters compared to the prior art. Our code and data will be released upon acceptance.

Chat is not available.