Adaptive FSS: A Novel Few-Shot Segmentation Framework via Prototype Enhancement

AAAI 2024

1University of Science and Technology Beijing 2University of Central Florida

Abstract

The Few-Shot Segmentation (FSS) aims to accomplish the novel class segmentation task with a few annotated images. Current FSS research based on meta-learning focuses on designing a complex interaction mechanism between the query and support feature. However, unlike humans who can rapidly learn new things from limited samples, the existing approach relies solely on fixed feature matching to tackle new tasks, lacking adaptability. In this paper, we propose a novel framework based on the adapter mechanism, namely Adaptive FSS, which can efficiently adapt the existing FSS model to the novel classes. In detail, we design the Prototype Adaptive Module (PAM), which utilizes accurate category information provided by the support set to derive class prototypes, enhancing class-specific information in the multi-stage representation. In addition, our approach is compatible with in diverse FSS methods with different backbones by simply inserting PAM between the layers of the encoder. Experiments demonstrate that our method effectively improves the performance of the FSS models (e.g., MSANet, HDMNet, FPTrans, and DCAMA) and achieves new state-of-the-art (SOTA) results (i.e., 72.4% and 79.1% mIoU on PASCAL-5i 1-shot and 5-shot settings, 52.7% and 60.0% mIoU on COCO-20i 1-shot and 5-shot settings).

Method

MY ALT TEXT

The overall architecture of our proposed Adaptive FSS. Given a support set {Is, Ms}, the image Is is fed into the encoder and generates feature Fs (Fq). In each PAM, with calculation between Fs and mask Ms, the temporary prototype Pt is first obtained to select prototype Pi and update the bank. After that, the corresponding class prototype Pi and feature Fs (Fq) are combined to generate the class-specific feature Fs* (Fq*). Finally, Fs* (Fq*) is sent into the Learnable Adaptive Module, leading to the acquired \hat{Fs} (\hat{Fq}), which are injected into the encoder.

Quantitative Results

MY ALT TEXT

To comprehensively evaluate our approach, we conduct experiments on four few-shot segmentation networks (MSANet, HDMANet, FPTrans, and DCAMA), which adopt three popular backbones (ResNet, Vision Transformer, and Swin Transformer) as shown in Table. It is worth emphasizing that we followed the test setting of DCAMA so that the performance of the other method is slightly different from that in the original paper. As expected, our method consistently improves the performance of existing FSS methods with different encoders on two benchmarks.

Qualitative Results

MY ALT TEXT

This a visual comparison between the baseline and our Adaptive FSS on the PASCAL-5i as shown in figure. The FPTrans without finetuning is chosen as the baseline. Our method achieves high-quality segmentation due to adapting the model to new categories effectively.

BibTeX

@article{Wang_Li_Chen_Zhang_Shen_Zhang_2024,
      title={Adaptive FSS: A Novel Few-Shot Segmentation Framework via Prototype Enhancement}, 
      author={Wang Jing and Li Jinagyun and Chen Chen and Zhang Yisi and Shen Haoran and Zhang Tianxiang},
      year={2024},
      journal={Proceedings of the AAAI Conference on Artificial Intelligence},
      month={Feb.} 
}