Coarse-to-Fine Amodal Segmentation with Shape Prior

Event: ICCV23 PARIS · Duration: 5 min · ▶ Watch on YouTube

Abstract

This paper introduces C2F-Seg, a novel coarse-to-fine framework for amodal segmentation that leverages shape priors. The framework employs a Mask-and-Predict Transformer Module for coarse mask generation and a Convolutional Refinement Module for precise mask refinement, guided by human-imitated attention. Additionally, the authors present MOViD-Amodal, a synthetic video dataset designed to address challenges in amodal segmentation, featuring multiple heavily occluded objects. C2F-Seg demonstrates superior performance on both image (KINS, COCOA) and video (FISHBOWL, MOViD-A) amodal segmentation tasks, effectively segmenting fully occluded regions with accurate shapes.

Speakers

  • Jianxiong Gao — Fudan University
  • Xuelin Qian — Fudan University
  • Yikai Wang — Fudan University
  • Tianjun Xiao — Amazon Web Services
  • Tong He — Amazon Web Services
  • Zheng Zhang — Amazon Web Services
  • Yanwei Fu — Fudan University

Talks (1)

  • 00:00:00 — Jianxiong Gao: Coarse-to-Fine Amodal Segmentation with Shape Prior
    • Presentation of C2F-Seg, a novel coarse-to-fine framework for amodal segmentation using shape priors, and the MOViD-Amodal dataset, demonstrating superior performance on both image and video tasks.

Key Takeaways

  • Introduces C2F-Seg, a novel coarse-to-fine framework for amodal segmentation that effectively utilizes shape priors.
  • Proposes a Mask-and-Predict Transformer Module for coarse mask generation and a Convolutional Refinement Module for fine-grained mask refinement.
  • Presents MOViD-Amodal, a new synthetic video dataset with challenging occlusion scenarios for amodal segmentation research.
  • Achieves state-of-the-art performance on both image (KINS, COCOA) and video (FISHBOWL, MOViD-A) amodal segmentation benchmarks.
  • The framework is extensible to video amodal segmentation by incorporating spatial-temporal transformer blocks.

Methods / Models / Datasets Mentioned

  • C2F-Seg
  • MOViD-Amodal
  • PCNet [15]
  • Mask R-CNN [12]
  • ORCNN [9]
  • VRSP [32]
  • AISformer [29]
  • Convex
  • SaVos [34]
  • RefineCNN
  • ResNet50
  • Masked Transformer
  • MaskGIT

Topics

Amodal segmentation · Shape prior · Coarse-to-fine framework · Video segmentation · Transformer models · Convolutional neural networks · Synthetic datasets · Object occlusion · Mask prediction


Notes

Open for commentary — connections to other work, critiques, follow-up reading.