The 13th Women in Computer Vision (WiCV) Workshop

Event: CVPR 2024 Workshop · Duration: 240 min · ▶ Watch on YouTube

Abstract

The 13th Women in Computer Vision (WiCV) Workshop, held in conjunction with CVPR 2024, brought together researchers and professionals to discuss advancements and challenges in computer vision. The workshop aimed to raise the visibility of female researchers, provide opportunities for junior female students and researchers to present their work, and share career advice. Presentations covered a wide range of topics including affective computing, language-based video understanding, remote sensing imagery analysis, hand gesture recognition, retinal feature segmentation, weed segmentation, generalist biomedical AI, diffusion image synthesis, and interactive robot task planning. The event highlighted the importance of diversity and inclusion in STEM fields and provided a platform for networking and collaboration.

Speakers

Asra Aslam — University of Leeds
Deblina Bhattacharjee — EPFL, Switzerland, The University of Bath, UK
Guoying Zhao — Academy Professor, Finland and Head of Research Unit, University of Oulu
Elisa Ricci — Associate Professor, University of Trento and Head of Research Unit, Fondazione Bruno Kessler
Xinyi Wanyan — University of Melbourne
Mallika Garg — University of Melbourne
Mehwish Mahmood — Queen’s University Belfast
Yingchao Huang — University of Regina, Saskatchewan, Canada
Shekoofeh Azizi — Staff Research Scientist and Research Lead, Google DeepMind
Gemma Canet Tarres — University of Surrey, Adobe Research
Boyi Li — Research Scientist, NVIDIA Research, UC Berkeley
Cornelia Fermüller — Co-founder, Autonomy Cognition and Robotics (ARC) Lab, UMD
Krystle de Mesa — University of Regina
Sasha — Wave
Anna Maria — Wave

Talks (15)

00:00:00 — Asra Aslam: Introduction to WiCV Workshop
- Asra Aslam introduces the 13th Women in Computer Vision (WiCV) Workshop, highlighting its purpose and acknowledging the organizing committee.
00:00:42 — Deblina Bhattacharjee: Motivation: Overview
- Deblina Bhattacharjee discusses the motivation behind WiCV, focusing on the underrepresentation of women in STEM subjects and careers, particularly in computer science, and the ‘leaky pipeline’ phenomenon in academia.
03:00:00 — Guoying Zhao: Computer Vision in Affective Computing
- Guoying Zhao presents her research on computer vision in affective computing, discussing challenges in recognizing subtle facial expressions and physiological signals, and introducing new datasets and methods for micro-expression analysis.
03:12:00 — Elisa Ricci: Harnessing Language for Video Understanding without Training
- Elisa Ricci explores methods for video understanding that leverage language models without extensive training, focusing on anomaly detection and temporal action localization using vision-language models.
03:16:00 — Xinyi Wanyan: Extending global-local view alignment for self-supervised learning with remote sensing imagery
- Xinyi Wanyan presents a self-supervised learning approach for remote sensing imagery analysis, utilizing global-local view alignment and knowledge distillation to improve weed segmentation.
03:20:00 — Mallika Garg: GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture Recognition
- Mallika Garg introduces GestFormer, a transformer-based model designed for dynamic hand gesture recognition, which addresses computational complexity and scale variability using multiscale wavelet pooling.
03:25:00 — Mehwish Mahmood: RetinaLiteNet: A Lightweight Transformer based CNN for Retinal Feature Segmentation
- Mehwish Mahmood presents RetinaLiteNet, a lightweight transformer-based CNN for segmenting retinal features like blood vessels and optic disc, achieving high accuracy with reduced computational complexity.
03:32:00 — Yingchao Huang: Unsupervised Domain Adaptation for Weed Segmentation Using Greedy Pseudo-labelling
- Yingchao Huang discusses unsupervised domain adaptation for weed segmentation, proposing a greedy pseudo-labelling method to improve model performance across different agricultural environments and robot systems.
03:37:00 — Shekoofeh Azizi: Generalist Biomedical AI
- Shekoofeh Azizi introduces the concept of Generalist Biomedical AI, highlighting the development of large language models like Med-PaLM and Med-Gemini that can process multimodal medical data and perform various diagnostic and therapeutic tasks.
03:42:00 — Gemma Canet Tarres: PARASOL: Parametric Style Control for Diffusion Image Synthesis
- Gemma Canet Tarres presents PARASOL, a diffusion model-based approach for fine-grained style control in image synthesis, using separate parametric embeddings for content and style to enhance controllability.
03:46:00 — Boyi Li: Vision and Language for Interactive Robot Task Planning
- Boyi Li discusses the integration of vision and language models for interactive robot task planning, enabling robots to understand human instructions and execute complex tasks in a human-like manner.
03:50:00 — Cornelia Fermüller: Robotics
- Cornelia Fermüller presents on robotics, emphasizing the importance of robots operating safely and intelligently in human environments, and highlighting research in dexterous manipulation and human-robot collaboration.
03:55:00 — Krystle de Mesa: Robotics
- Krystle de Mesa discusses advancements in robotics, focusing on developing robust and adaptable robotic systems for various tasks, including food preparation and object manipulation in unstructured environments.
03:59:00 — Sasha: Robotics
- Sasha presents on robotics, highlighting the development of intelligent robotic systems capable of learning from human activities and adapting to complex, dynamic environments.
04:03:00 — Anna Maria: Robotics
- Anna Maria discusses robotics research, emphasizing the development of advanced robotic manipulation capabilities for tasks requiring dexterity and precise interaction with objects.

Key Takeaways

The WiCV workshop provides a crucial platform for promoting diversity and inclusion in computer vision by showcasing the work of female researchers and fostering mentorship.
Advancements in large language models (LLMs) and vision-language models (VLMs) are enabling new approaches to complex computer vision tasks, including training-free methods and generalist AI systems.
The development of multimodal and generalist AI models, such as Med-PaLM M and Med-Gemini, is rapidly transforming the landscape of biomedical AI, offering capabilities for diverse medical applications.
Self-supervised learning and domain adaptation techniques are proving effective in addressing challenges like limited annotated data and improving model robustness in real-world scenarios.
Future directions in computer vision research include leveraging transformer-based models, addressing class imbalance, and exploring adaptive mechanisms for enhanced efficiency and scalability.

Methods / Models / Datasets Mentioned

GestFormer
RetinaLiteNet
DINO-TP
DINO-MC
CycleGAN
Med-PaLM
Med-PaLM 2
Med-PaLM M
Med-Gemini
Tx-LLM
REMEDIS/MiCLE
AMIE
CLIP
VGG18
ABC-CapsNet
PaLM
PaLM 2
BERT
GPT-2
GPT-3
LaMDA
ChatGPT (GPT-3.5)
GPT-4
Gemini
LLaMA
UNet
UNet++
Attention UNet

Topics

Computer Vision · Women in STEM · Affective Computing · Video Understanding · Remote Sensing · Hand Gesture Recognition · Medical Imaging · Weed Segmentation · Generalist AI · Robotics · Diffusion Models · Task Planning

Notes

Open for commentary — connections to other work, critiques, follow-up reading.