ICLR Poster MTSAM: Multi-Task Fine-Tuning for Segment Anything Model

Poster

MTSAM: Multi-Task Fine-Tuning for Segment Anything Model

Xuehao Wang · Zhan ZHUANG · Feiyang YE · Yu Zhang

Hall 3 + Hall 2B #73

[ Abstract ]

Thu 24 Apr midnight PDT — 2:30 a.m. PDT

Abstract:

The Segment Anything Model (SAM), with its remarkable zero-shot capability, has the potential to be a foundation model for multi-task learning. However, adopting SAM to multi-task learning faces two challenges: (a) SAM has difficulty generating task-specific outputs with different channel numbers, and (b) how to fine-tune SAM to adapt multiple downstream tasks simultaneously remains unexplored. To address these two challenges, in this paper, we propose the Multi-Task SAM (MTSAM) framework, which enables SAM to work as a foundation model for multi-task learning. MTSAM modifies SAM's architecture by removing the prompt encoder and implementing task-specific no-mask embeddings and mask decoders, enabling the generation of task-specific outputs. Furthermore, we introduce Tensorized low-Rank Adaptation (ToRA) to perform multi-task fine-tuning on SAM. Specifically, ToRA injects an update parameter tensor into each layer of the encoder in SAM and leverages a low-rank tensor decomposition method to incorporate both task-shared and task-specific information.Extensive experiments conducted on benchmark datasets substantiate the efficacy of MTSAM in enhancing the performance of multi-task learning.

Live content is unavailable. Log in and register to view live content