Coarse-to-Fine Amodal Segmentation with Shape Prior
Event: ICCV23 PARIS · Duration: 5 min · ▶ Watch on YouTube
Abstract
This paper introduces C2F-Seg, a novel coarse-to-fine framework for amodal segmentation that leverages shape priors. The framework employs a Mask-and-Predict Transformer Module for coarse mask generation and a Convolutional Refinement Module for precise mask refinement, guided by human-imitated attention. Additionally, the authors present MOViD-Amodal, a synthetic video dataset designed to address challenges in amodal segmentation, featuring multiple heavily occluded objects. C2F-Seg demonstrates superior performance on both image (KINS, COCOA) and video (FISHBOWL, MOViD-A) amodal segmentation tasks, effectively segmenting fully occluded regions with accurate shapes.
Speakers
- Jianxiong Gao — Fudan University
- Xuelin Qian — Fudan University
- Yikai Wang — Fudan University
- Tianjun Xiao — Amazon Web Services
- Tong He — Amazon Web Services
- Zheng Zhang — Amazon Web Services
- Yanwei Fu — Fudan University
Talks (1)
- 00:00:00 — Jianxiong Gao: Coarse-to-Fine Amodal Segmentation with Shape Prior
- Presentation of C2F-Seg, a novel coarse-to-fine framework for amodal segmentation using shape priors, and the MOViD-Amodal dataset, demonstrating superior performance on both image and video tasks.
Key Takeaways
- Introduces C2F-Seg, a novel coarse-to-fine framework for amodal segmentation that effectively utilizes shape priors.
- Proposes a Mask-and-Predict Transformer Module for coarse mask generation and a Convolutional Refinement Module for fine-grained mask refinement.
- Presents MOViD-Amodal, a new synthetic video dataset with challenging occlusion scenarios for amodal segmentation research.
- Achieves state-of-the-art performance on both image (KINS, COCOA) and video (FISHBOWL, MOViD-A) amodal segmentation benchmarks.
- The framework is extensible to video amodal segmentation by incorporating spatial-temporal transformer blocks.
Methods / Models / Datasets Mentioned
C2F-SegMOViD-AmodalPCNet [15]Mask R-CNN [12]ORCNN [9]VRSP [32]AISformer [29]ConvexSaVos [34]RefineCNNResNet50Masked TransformerMaskGIT
Topics
Amodal segmentation · Shape prior · Coarse-to-fine framework · Video segmentation · Transformer models · Convolutional neural networks · Synthetic datasets · Object occlusion · Mask prediction
Notes
Open for commentary — connections to other work, critiques, follow-up reading.