Coarse-to-Fine Amodal Segmentation with Shape Prior

Event: ICCV23 PARIS · Duration: 5 min · ▶ Watch on YouTube

Abstract

This paper introduces C2F-Seg, a novel coarse-to-fine framework for amodal segmentation that leverages shape priors. The framework employs a Mask-and-Predict Transformer Module for coarse mask generation and a Convolutional Refinement Module for precise mask refinement, guided by human-imitated attention. Additionally, the authors present MOViD-Amodal, a synthetic video dataset designed to address challenges in amodal segmentation, featuring multiple heavily occluded objects. C2F-Seg demonstrates superior performance on both image (KINS, COCOA) and video (FISHBOWL, MOViD-A) amodal segmentation tasks, effectively segmenting fully occluded regions with accurate shapes.

Speakers

Jianxiong Gao — Fudan University
Xuelin Qian — Fudan University
Yikai Wang — Fudan University
Tianjun Xiao — Amazon Web Services
Tong He — Amazon Web Services
Zheng Zhang — Amazon Web Services
Yanwei Fu — Fudan University

Talks (1)

00:00:00 — Jianxiong Gao: Coarse-to-Fine Amodal Segmentation with Shape Prior
- Presentation of C2F-Seg, a novel coarse-to-fine framework for amodal segmentation using shape priors, and the MOViD-Amodal dataset, demonstrating superior performance on both image and video tasks.

Key Takeaways

Introduces C2F-Seg, a novel coarse-to-fine framework for amodal segmentation that effectively utilizes shape priors.
Proposes a Mask-and-Predict Transformer Module for coarse mask generation and a Convolutional Refinement Module for fine-grained mask refinement.
Presents MOViD-Amodal, a new synthetic video dataset with challenging occlusion scenarios for amodal segmentation research.
Achieves state-of-the-art performance on both image (KINS, COCOA) and video (FISHBOWL, MOViD-A) amodal segmentation benchmarks.
The framework is extensible to video amodal segmentation by incorporating spatial-temporal transformer blocks.

Methods / Models / Datasets Mentioned

C2F-Seg
MOViD-Amodal
PCNet [15]
Mask R-CNN [12]
ORCNN [9]
VRSP [32]
AISformer [29]
Convex
SaVos [34]
RefineCNN
ResNet50
Masked Transformer
MaskGIT

Topics

Amodal segmentation · Shape prior · Coarse-to-fine framework · Video segmentation · Transformer models · Convolutional neural networks · Synthetic datasets · Object occlusion · Mask prediction

Notes

Open for commentary — connections to other work, critiques, follow-up reading.