Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation

1 Nanjing University of Science and Technology, 2 Horizon Robotics
European Conference on Computer Vision (ECCV), 2024
Poster Dataset Distillation (PoDD)

(a) Previous adversarial erasing-based approaches typically suffer from the over-expansion problem, which is hard to constrain. (b) Different from their information removal strategy, we propose to add extra object knowledge from a paired image to weaken the current object activation. The localization ability of the network is then enhanced by improving the consequent less activated attention map through learning from the object knowledge of the anchor branch. (c) Result comparison.

Abstract

Though adversarial erasing has prevailed in weakly supervised semantic segmentation to help activate integral object regions, existing approaches still suffer from the dilemma of under-activation and over-expansion due to the difficulty in determining when to stop erasing. In this paper, we propose a Knowledge Transfer with Simulated Inter-Image Erasing (KTSE) approach for weakly supervised semantic segmentation to alleviate the above problem. In contrast to existing erasing-based methods that remove the discriminative part for more object discovery, we propose a simulated inter-image erasing scenario to weaken the original activation by introducing extra object information. Then, object knowledge is transferred from the anchor image to the consequent less activated localization map to strengthen network localization ability. Considering the adopted bidirectional alignment will also weaken the anchor image activation if appropriate constraints are missing, we propose a self-supervised regularization module to maintain the reliable activation in discriminative regions and improve the inter-class object boundary recognition for complex images with multiple categories of objects. In addition, we resort to intra-image erasing and propose a multi-granularity alignment module to gently enlarge the object activation to boost the object knowledge transfer. Extensive experiments and ablation studies on PASCAL VOC 2012 and COCO datasets demonstrate the superiority of our proposed approach. Source codes and models are available at https://github.com/NUST-Machine-Intelligence-Laboratory/KTSE.

Architecture of Proposed KTSE Approach

Architecture of Proposed KTSE Approach

We propose a simulated inter-image erasing (SIE) scenario where extra object information is introduced from another paired image. We then strengthen the object localization ability of the network by improving the consequent less activated localization map through learning object knowledge from the anchor image. A self-supervised regularization (SSR) module is also proposed to avoid weakening the anchor activation due to bidirectional alignment and improve the inter-class object boundary recognition for complex images. In addition, we propose a multi-granularity alignment (MGA) module to gently enlarge the object activation to further boost the object knowledge transfer.

Example Localization Maps on PASCAL VOC 2012 Training Set

Example Localization Maps on PASCAL VOC 2012 Training Set

For each (a) image, we show (b) ground truth, localization maps produced by (c) previous work of AEFT, (d) our baseline, (e) baseline + SIE, (f) baseline + SIE + SSR, and (g) baseline + SIE + SSR + MGA. Best viewed in color.

Quantitative Comparisons of Pseudo-Mask Accuracy and Segmentation Results on PASCAL VOC 2012

Quantitative Comparisons of Pseudo-Mask Accuracy and Segmentation Results on PASCAL VOC 2012

Comparison of various methods' performance on the PASCAL VOC 2012 dataset, including accuracy of pseudo-masks (Table 1) and segmentation results with different backbones (Table 2). The results highlight the effectiveness of the proposed KTSE approach in improving both pseudo-mask quality and segmentation accuracy.

Quantitative Comparisons of Segmentation Performance on PASCAL VOC 2012 and COCO Datasets

Quantitative Comparisons of Segmentation Performance on PASCAL VOC 2012 and COCO Datasets

Performance comparisons of various methods on PASCAL VOC 2012 using ResNet backbone (Table 3) and COCO (Tables 4 and 5) datasets using different backbones. The tables demonstrate the superiority of the proposed KTSE approach in achieving higher segmentation accuracy across both datasets.

BibTeX

@article{chen2024knowledge,
  title={Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation},
  author={Chen, Tao and Jiang, Xiruo and Pei, Gensheng and Sun, Zeren and Wang, Yucheng and Yao, Yazhou},
  journal={European Conference on Computer Vision (ECCV)},
  year={2024}
}