Virtual Try-On Workshop
Event: CVPR 2024 Workshop · Duration: 192 min · ▶ Watch on YouTube
Abstract
This workshop presents recent advancements in virtual try-on (VTO) technology, covering diverse approaches from 3D human modeling to diffusion-based image synthesis. Speakers discuss methods for accurate garment recovery, material estimation, and realistic cloth simulation, emphasizing the integration of physics-inspired models and learning-based frameworks. Key challenges addressed include achieving high-fidelity rendering, preserving garment details, handling complex poses and body shapes, and ensuring real-time performance on various devices. The presentations showcase innovative techniques for generating animatable layered assets, performing virtual try-on using latent diffusion models, and even estimating 3D facial makeup, pushing the boundaries of photorealistic and user-friendly VTO applications.
Speakers
- Javier Romero — Amazon
- Gerard Pons-Moll — University of Tübingen and MPII
- Sunil — Amazon
- Ming C. Lin — UMD
- Hanbyul Joo — Seoul National University
- Jeongho Kim — KAIST
- Mehmet Saygin Seyfioglu — University of Washington
- Xingchao Yang — CyberAgent AI Lab and University of Tsukuba
- Katie Lewis — ModiFace
Talks (9)
- 00:00:00 — Javier Romero: Virtual Try-On CVPR 2024 workshop
- Introduction to the CVPR 2024 Virtual Try-On workshop, acknowledging co-organizers and providing a historical context of virtual try-on technology from 2010 to recent advancements.
- 00:52:00 — Gerard Pons-Moll: What do foundation models know about 3D humans in clothing?
- Discusses the evolution of 3D human modeling in clothing, from mesh-based SMPL to neural implicits and Gaussian Splats, and introduces Human 3-Diffusion as a new approach to extract 3D information from foundation models.
- 01:09:00 — Sunil: Garment Layering and Material Estimation for Virtual Try-On
- Presents a two-stage pipeline for disentangling garment style and texture, using parsing-based style editing and texture inpainting modules, and leveraging CLIP features for photorealistic texture transfer.
- 01:16:01 — Ming C. Lin: Physics-Inspired Fit-Aware Virtual Try-On
- Introduces a physics-inspired, fit-aware virtual try-on system that accurately reconstructs human bodies, faithfully estimates garment materials, and performs real-time cloth simulation, addressing challenges in scalability and realism.
- 01:36:56 — Hanbyul Joo: GALA: Generating Animatable Layered Assets from a Single Scan
- Presents GALA, a method for generating animatable and layered 3D assets from a single scan by decomposing objects and humans, canonicalizing poses, and refining misaligned assets through penetration handling.
- 01:47:51 — Jeongho Kim: Virtual Try-on with Latent Diffusion Model
- Explores virtual try-on using latent diffusion models, introducing StableVITON to address limitations of clothing tokenization and warped clothing input, and demonstrating improved performance in garment reconstruction.
- 02:00:01 — Mehmet Saygin Seyfioglu: Diffuse2Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All
- Proposes Diffuse2Choose, a novel approach for virtual try-all that enriches image-conditioned inpainting in latent diffusion models, focusing on fast inference, preserving product details, and achieving high-quality results.
- 02:10:01 — Xingchao Yang: Makeup Prior Models for 3D Facial Makeup Estimation and Applications
- Develops two makeup prior models (PCA-based and StyleGAN2-based) for efficient and accurate 3D facial makeup estimation and transfer, demonstrating robustness in handling self-occluded faces and improving 3D face reconstruction.
- 02:20:01 — Katie Lewis: Integrating Learning-based VTO: Challenges and Advances
- Discusses challenges and advances in integrating learning-based virtual try-on, highlighting the need for hyper-fidelity, image aesthetics, and style diversity, and introducing a mobile fitting room application.
Key Takeaways
- Virtual try-on technology has evolved significantly, moving from early 2D image manipulation to sophisticated 3D modeling and diffusion-based synthesis, offering increasingly realistic and personalized experiences.
- Integrating physics-inspired simulations with deep learning models is crucial for achieving high-fidelity garment draping, material estimation, and realistic motion in virtual try-on applications.
- The development of large-scale synthetic datasets and multi-view learning frameworks is essential to overcome limitations of small-scale real-world data and improve generalization capabilities of VTO models across diverse body shapes, poses, and garment types.
- Future advancements in VTO will focus on enhancing control over garment reconstruction, enabling seamless animation, and improving the efficiency and scalability of models to support real-time applications on various devices while addressing privacy concerns.
- Novel approaches leveraging latent diffusion models and explicit 3D representations are demonstrating promising results in generating high-quality, consistent, and customizable virtual try-on images, paving the way for more interactive and personalized online shopping experiences.
Methods / Models / Datasets Mentioned
SMPLClothCapVideo-AvatarsPIFuNeRFSMPLicitGaussian SplatsHuman 3-DiffusionFX MirrorWannakicksAmazon Virtual Try-OnGoogle Virtual Try-OnCodec AvatarsNeural-GifMulti-Garment Net (MGN)ImageDreamNeural Surface Fields (NSF)ControlNetStableVITONDreamPaintPBE (Paint by Example)Diffuse2ChoosePCAStyleGAN2FLAME modelDECAHMRARCsimTiny-CNNS2GANKD GANSparsifinerControllable GANVTON-HDHR-VITONGP-VITONLADI-VITONDCI-VITONPSGANSCGANSpMTLADNSSATEleGANtCSD-MTPPO (Proximal Policy Optimization)PointNetAtlasNetHumanSGDMAE (Masked Autoencoder)DIT TransformerU-NetLSTMCNNAlexNetVPoserCLoSE-DDreamBoothTADAGALA
Topics
Virtual Try-On (VTO) · 3D Human Modeling · Garment Recovery · Cloth Simulation · Diffusion Models · Material Estimation · Pose Estimation · Image Synthesis · Facial Makeup Estimation · Scalability
Notes
Open for commentary — connections to other work, critiques, follow-up reading.