Virtual Try-On Workshop

Event: CVPR 2024 Workshop · Duration: 192 min · ▶ Watch on YouTube

Abstract

This workshop presents recent advancements in virtual try-on (VTO) technology, covering diverse approaches from 3D human modeling to diffusion-based image synthesis. Speakers discuss methods for accurate garment recovery, material estimation, and realistic cloth simulation, emphasizing the integration of physics-inspired models and learning-based frameworks. Key challenges addressed include achieving high-fidelity rendering, preserving garment details, handling complex poses and body shapes, and ensuring real-time performance on various devices. The presentations showcase innovative techniques for generating animatable layered assets, performing virtual try-on using latent diffusion models, and even estimating 3D facial makeup, pushing the boundaries of photorealistic and user-friendly VTO applications.

Speakers

Javier Romero — Amazon
Gerard Pons-Moll — University of Tübingen and MPII
Sunil — Amazon
Ming C. Lin — UMD
Hanbyul Joo — Seoul National University
Jeongho Kim — KAIST
Mehmet Saygin Seyfioglu — University of Washington
Xingchao Yang — CyberAgent AI Lab and University of Tsukuba
Katie Lewis — ModiFace

Talks (9)

00:00:00 — Javier Romero: Virtual Try-On CVPR 2024 workshop
- Introduction to the CVPR 2024 Virtual Try-On workshop, acknowledging co-organizers and providing a historical context of virtual try-on technology from 2010 to recent advancements.
00:52:00 — Gerard Pons-Moll: What do foundation models know about 3D humans in clothing?
- Discusses the evolution of 3D human modeling in clothing, from mesh-based SMPL to neural implicits and Gaussian Splats, and introduces Human 3-Diffusion as a new approach to extract 3D information from foundation models.
01:09:00 — Sunil: Garment Layering and Material Estimation for Virtual Try-On
- Presents a two-stage pipeline for disentangling garment style and texture, using parsing-based style editing and texture inpainting modules, and leveraging CLIP features for photorealistic texture transfer.
01:16:01 — Ming C. Lin: Physics-Inspired Fit-Aware Virtual Try-On
- Introduces a physics-inspired, fit-aware virtual try-on system that accurately reconstructs human bodies, faithfully estimates garment materials, and performs real-time cloth simulation, addressing challenges in scalability and realism.
01:36:56 — Hanbyul Joo: GALA: Generating Animatable Layered Assets from a Single Scan
- Presents GALA, a method for generating animatable and layered 3D assets from a single scan by decomposing objects and humans, canonicalizing poses, and refining misaligned assets through penetration handling.
01:47:51 — Jeongho Kim: Virtual Try-on with Latent Diffusion Model
- Explores virtual try-on using latent diffusion models, introducing StableVITON to address limitations of clothing tokenization and warped clothing input, and demonstrating improved performance in garment reconstruction.
02:00:01 — Mehmet Saygin Seyfioglu: Diffuse2Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All
- Proposes Diffuse2Choose, a novel approach for virtual try-all that enriches image-conditioned inpainting in latent diffusion models, focusing on fast inference, preserving product details, and achieving high-quality results.
02:10:01 — Xingchao Yang: Makeup Prior Models for 3D Facial Makeup Estimation and Applications
- Develops two makeup prior models (PCA-based and StyleGAN2-based) for efficient and accurate 3D facial makeup estimation and transfer, demonstrating robustness in handling self-occluded faces and improving 3D face reconstruction.
02:20:01 — Katie Lewis: Integrating Learning-based VTO: Challenges and Advances
- Discusses challenges and advances in integrating learning-based virtual try-on, highlighting the need for hyper-fidelity, image aesthetics, and style diversity, and introducing a mobile fitting room application.

Key Takeaways

Virtual try-on technology has evolved significantly, moving from early 2D image manipulation to sophisticated 3D modeling and diffusion-based synthesis, offering increasingly realistic and personalized experiences.
Integrating physics-inspired simulations with deep learning models is crucial for achieving high-fidelity garment draping, material estimation, and realistic motion in virtual try-on applications.
The development of large-scale synthetic datasets and multi-view learning frameworks is essential to overcome limitations of small-scale real-world data and improve generalization capabilities of VTO models across diverse body shapes, poses, and garment types.
Future advancements in VTO will focus on enhancing control over garment reconstruction, enabling seamless animation, and improving the efficiency and scalability of models to support real-time applications on various devices while addressing privacy concerns.
Novel approaches leveraging latent diffusion models and explicit 3D representations are demonstrating promising results in generating high-quality, consistent, and customizable virtual try-on images, paving the way for more interactive and personalized online shopping experiences.

Methods / Models / Datasets Mentioned

SMPL
ClothCap
Video-Avatars
PIFu
NeRF
SMPLicit
Gaussian Splats
Human 3-Diffusion
FX Mirror
Wannakicks
Amazon Virtual Try-On
Google Virtual Try-On
Codec Avatars
Neural-Gif
Multi-Garment Net (MGN)
ImageDream
Neural Surface Fields (NSF)
ControlNet
StableVITON
DreamPaint
PBE (Paint by Example)
Diffuse2Choose
PCA
StyleGAN2
FLAME model
DECA
HMR
ARCsim
Tiny-CNN
S2GAN
KD GAN
Sparsifiner
Controllable GAN
VTON-HD
HR-VITON
GP-VITON
LADI-VITON
DCI-VITON
PSGAN
SCGAN
SpMT
LADN
SSAT
EleGANt
CSD-MT
PPO (Proximal Policy Optimization)
PointNet
AtlasNet
HumanSGD
MAE (Masked Autoencoder)
DIT Transformer
U-Net
LSTM
CNN
AlexNet
VPoser
CLoSE-D
DreamBooth
TADA
GALA

Topics

Virtual Try-On (VTO) · 3D Human Modeling · Garment Recovery · Cloth Simulation · Diffusion Models · Material Estimation · Pose Estimation · Image Synthesis · Facial Makeup Estimation · Scalability

Notes

Open for commentary — connections to other work, critiques, follow-up reading.