IEEE CVPR workshop on Fair, Data Efficient and Trusted Computer Vision
Event: CVPR 2024 Workshop · Duration: 421 min · ▶ Watch on YouTube
Abstract
This segment features several talks from the IEEE CVPR 2024 Workshop on Fair, Data Efficient and Trusted Computer Vision. It begins with an introduction to the workshop and its organizers, followed by a presentation on DIA, a diffusion-based inverse network attack. The keynote speaker then provides a 15-year review of scale learning in image semantics, covering the evolution from ImageNet to CLIP. Subsequent talks introduce Fast-NTK for parameter-efficient machine unlearning and AR-CP, an uncertainty-aware perception system for assisted driving in adverse conditions. This segment introduces a novel regularization method designed to enforce conditional independence for fair representation learning and causal image generation. The core idea is to make a classifier’s predictions, such as sex classification, invariant to confounding attributes like skin type. The method leverages a Jensen-Shannon divergence-based approach to measure and minimize the similarity between probability distributions, effectively decoupling the prediction from the sensitive attribute. This technique is integrated into existing image encoders, acting as a regularizer to improve fairness and reduce bias in machine learning models. This segment features four distinct talks on critical aspects of fair, data-efficient, and trusted computer vision. The first talk introduces a method for conditionally independent causal representation learning, applying it to diffusion autoencoders for bias mitigation in image generation. The second talk explores strategies for enabling opt-out and incentivizing opt-in for text-to-image models, presenting ‘Custom Diffusion’ for concept addition and ‘Concept Ablation’ for concept removal, alongside methods for data attribution. The third talk delves into the origins and mitigation of biases in diffusion models, proposing ‘Distribution-Guided Debiasing’ to achieve fair generation by guiding the denoising process with reference attribute distributions and analyzing geographical inclusivity. Finally, the fourth talk presents ‘Guided Loss-Increasing (GLI)’, a novel data augmentation method for efficient machine unlearning that prevents catastrophic model utility drop while ensuring compliance with privacy regulations. This segment introduces a new benchmark, Human 3.6M-C (H3.6M-C), for evaluating the robustness of 3D human pose estimation (HPE) models on corrupted videos. The benchmark is derived from the Human 3.6M dataset using six types of RGB video corruption operators. The talk proposes two methods: Temporal Additive Gaussian Noise (TAGN) for improving generalization under unforeseen corruption (Scenario 1) and Confidence-Aware Convolution (CAC) for enhancing learning efficiency with foreseen corruption (Scenario 2). Experimental results demonstrate that both TAGN and CAC effectively reduce MPJPE on the H3.6M-C dataset. This segment features multiple presentations on various topics in computer vision. One talk introduces RLNet for efficient private inference, focusing on reducing ReLU operations and enhancing robustness. Another presentation details a Dual-Carriageway Framework for robust and explainable fine-grained visual classification, addressing data challenges in industrial settings. Additionally, a data-free defense mechanism against adversarial attacks on black-box models is presented, utilizing wavelet noise removal and a regenerator network. The use of fractals as pre-training datasets for anomaly detection is also explored, showing improved model performance. Finally, a novel method called SkipPLUS is introduced to enhance the interpretability of Vision Transformers by skipping initial layers, and a Brain Inspired Vision Transformer (CP-ViT) is presented, integrating brain network organization principles for improved performance and interpretability.
Speakers
- Nalini Ratha — University of Buffalo
- Chenghao Li — University of Southern California
- Liangliang Cao — Apple
- Guihong Li — UT Austin
- Achref Doula — Technical University of Darmstadt, Germany
- Jensen Hwa — Stanford University
- Aditya Lahiri — Stanford University
- Qingyu Zhao — Stanford University
- Adnan Masood — Stanford University
- Babak Salimi — Stanford University
- Ehsan Adeli — Stanford University
- Richard Zhang — Adobe Research, SF
- R. Venkatesh Babu — IISc Bangalore
- Dasol Choi — Yonsei University
- Trung-Hieu Hoang — University of Illinois at Urbana-Champaign
- Phan Nhat Huy — VinUniversity, Ha Noi, Vietnam
- Mona Zehni — University of Illinois at Urbana-Champaign
- Duc Minh Vo — The University of Tokyo, Japan
- Minh N. Do — University of Illinois at Urbana-Champaign
- Sreetama Sarkar — University of Southern California, Los Angeles, USA
- Souvik Kundu — Intel Labs, San Diego, USA
- Peter A. Beerel — University of Southern California, Los Angeles, USA
- Zheming Zuo — Newcastle University, UK
- Joseph Smith — Newcastle University, UK
- Jonathan Stonehouse — Procter and Gamble, UK
- Boguslaw Obara — Newcastle University, UK
- Gaurav Kumar Nayak — Indian Institute of Science, Bangalore, India
- Inder Khatri — Indian Institute of Science, Bangalore, India
- Ruchit Rawal — Indian Institute of Science, Bangalore, India
- Anirban Chakraborty — Indian Institute of Science, Bangalore, India
- Cynthia Ugwu — Free University of Bozen-Bolzano
- Sofia Casarin — Free University of Bozen-Bolzano
- Oswald Lanz — Free University of Bozen-Bolzano
- Feraidoon Mehri — Sharif University of Technology
- Lu Zhang — Indiana University Indianapolis
Talks (16)
- 00:05:09 — Chenghao Li: DIA: Diffusion based Inverse Network Attack on Collaborative Inference
- This talk introduces DIA, a diffusion-based inverse network attack designed to reconstruct private client data from intermediate features in collaborative inference systems, highlighting vulnerabilities in Vision Transformers and the need for improved privacy-preserving technologies.
- 00:22:35 — Liangliang Cao: Scale Learning in Image Semantics: A 15-Year Review
- This keynote provides a 15-year review of scale learning in image semantics, tracing the evolution from ImageNet to CLIP, discussing advancements in datasets, models, and loss functions, and exploring the challenges and opportunities in leveraging computation and data for more efficient and robust models.
- 00:53:15 — Guihong Li: Fast-NTK: Parameter-Efficient Unlearning for Large-Scale Models
- This talk introduces Fast-NTK, a novel algorithm for efficient machine unlearning in large-scale neural networks. It leverages parameter-efficient fine-tuning (PEFT) by focusing on key parameters like Batch Normalization layers and visual prompts, demonstrating higher scalability and comparable accuracy to traditional retraining-based approaches.
- 01:07:30 — Achref Doula: AR-CP: Uncertainty-Aware Perception in Adverse Conditions with Conformal Prediction and Augmented Reality For Assisted Driving
- This presentation introduces AR-CP, an approach for driver assistance in adverse weather conditions that combines uncertainty-aware classification using Conformal Prediction (CP) with Augmented Reality (AR) to reduce driver confusion and improve perception. The method reduces prediction set size using a class taxonomy and demonstrates significant improvements in accuracy, cognitive load, and situation awareness in user studies.
- 01:24:12 — Jensen Hwa: Enforcing Conditional Independence for Fair Representation Learning and Causal Image Generation
- This talk introduces a regularization method to enforce conditional independence for fair representation learning and causal image generation, focusing on making sex classification invariant to skin type.
- 02:48:25 — None: From Label Space to Latent Space
- This talk introduces a method for conditionally independent causal representation learning, applying it to diffusion autoencoders for bias mitigation in image generation by controlling attributes like skin type and sex.
- 02:55:25 — Richard Zhang: Incentivizing Opt-in & Enabling Opt-out for Text-to-Image Models
- This talk explores methods for enabling opt-out and incentivizing opt-in for text-to-image models, introducing ‘Custom Diffusion’ for concept addition, ‘Concept Ablation’ for concept removal, and ‘Attribution by Customization/Unlearning’ for data attribution.
- 03:14:05 — R. Venkatesh Babu: Uncovering and Addressing Biases in Diffusion Models
- This talk addresses the origins and mitigation of biases in diffusion models, proposing ‘Distribution-Guided Debiasing’ to achieve fair generation by guiding the denoising process with reference attribute distributions and analyzing geographical inclusivity.
- 03:24:00 — Dasol Choi: Towards Efficient Machine Unlearning with Data Augmentation: Guided Loss-Increasing(GLI) to Prevent the Catastrophic Model Utility Drop
- This talk presents ‘Guided Loss-Increasing (GLI)’, a novel data augmentation method for efficient machine unlearning that prevents catastrophic model utility drop while ensuring compliance with privacy regulations.
- 04:12:38 — Trung-Hieu Hoang: Improving the Robustness of 3D Human Pose Estimation: A Benchmark and Learning from Noisy Input
- This talk introduces a new benchmark (H3.6M-C) for 3D human pose estimation on corrupted videos and proposes two methods, TAGN and CAC, to improve robustness and learning efficiency under unforeseen and foreseen corruption scenarios, respectively.
- 05:36:51 — Sreetama Sarkar: RLNet: Robust Linearized Networks for Efficient Private Inference
- This talk introduces RLNet, a class of models designed to improve latency and model performance in private inference by reducing high-latency ReLU operations, while also enhancing robustness against adversarial and naturally perturbed images.
- 05:42:41 — Zheming Zuo: Robust and Explainable Fine-Grained Visual Classification with Transfer Learning: A Dual-Carriageway Framework
- This presentation introduces a Dual-Carriageway Framework (DCF) for robust and explainable fine-grained visual classification, addressing challenges like varying illuminance, imbalanced data, noisy textures, and camera bias in industrial datasets.
- 05:50:46 — Gaurav Kumar Nayak: Data-free Defense of Black Box Models Against Adversarial Attacks
- This talk presents a data-free defense mechanism against adversarial attacks on black-box models, utilizing a wavelet noise remover and a regenerator network to reconstruct images and improve robustness without access to original training data.
- 05:58:46 — Cynthia Ugwu: Fractals as Pre-training Datasets for Anomaly Detection and Localization
- This presentation explores the use of fractals as a novel pre-training dataset for anomaly detection and localization, demonstrating their effectiveness in improving model performance across various benchmark datasets, particularly for memory-based methods.
- 06:04:46 — Feraidoon Mehri: SkipPLUS: Skip the First Few Layers to Better Explain Vision Transformers
- This presentation introduces SkipPLUS, a novel method for improving the interpretability of Vision Transformers by skipping the first few layers during attribution, demonstrating enhanced class discriminativity and reduced noise in relevance maps compared to existing methods.
- 06:10:01 — Lu Zhang: Brain Inspired Vision Transformer (ViT)
- This talk introduces Brain Inspired Vision Transformer (CP-ViT), a novel architecture that integrates core-periphery organization principles from brain networks into Vision Transformers, demonstrating improved classification performance and interpretability across various datasets.
Key Takeaways
- Machine unlearning is crucial for addressing data erasure requirements (e.g., GDPR) in large-scale models, and Fast-NTK offers an efficient, scalable solution by leveraging parameter-efficient fine-tuning.
- Uncertainty-aware perception systems like AR-CP can significantly improve driver assistance in adverse conditions by providing guarantees over predictions and reducing cognitive load through augmented reality.
- The evolution of image semantics over the past 15 years has been driven by increasingly large datasets and sophisticated models, with a shift towards leveraging computation and data, as highlighted by the “bitter lesson.”
- Despite the trend towards larger models and datasets, strategic data filtering can lead to more efficient models with less data, as demonstrated by improved CLIP embeddings after removing text regions from training images.
- The proposed method enforces conditional independence by minimizing the Jensen-Shannon divergence between specific probability distributions, making predictions invariant to confounding attributes.
- This regularization technique can be added to any existing image encoder, providing a flexible way to improve fairness in representation learning.
- The approach aims to make classifiers robust against biases introduced by sensitive attributes, ensuring that predictions are based solely on relevant features.
- By enforcing conditional independence at the latent representation level, the method seeks to achieve a more fundamental and robust form of fairness within the model.
- Novel methods like Custom Diffusion and Concept Ablation offer fine-grained control over generative AI models, allowing for efficient concept addition and targeted removal to address creator rights and ethical concerns.
- Bias in diffusion models, stemming from real-world data, can be mitigated through techniques like Distribution-Guided Debiasing, which guides the denoising process to achieve fairer and more representative image generation across diverse attributes and geographies.
- Machine unlearning is a critical area for privacy compliance, and methods like Guided Loss-Increasing (GLI) aim to make this process efficient while preventing the loss of overall model utility.
- Data attribution techniques, such as Attribution by Customization and Unlearning, are being developed to understand the influence of specific training data on generated outputs, providing transparency and accountability in AI models.
- A new benchmark (H3.6M-C) is proposed to evaluate 3D HPE models under various video corruptions, simulating real-world ‘in-the-wild’ conditions.
- Temporal Additive Gaussian Noise (TAGN) is an effective data augmentation strategy for improving model generalization when corruption types are unforeseen during training.
- Confidence-Aware Convolution (CAC) blocks can be integrated into 3D pose lifters to improve learning efficiency when corruption types are known and can be included in the training data.
- The proposed methods significantly enhance the robustness of 2D-to-3D pose lifters against different types of video corruptions.
- RLNet improves latency and model performance in private inference by reducing high-latency ReLU operations and enhancing robustness against adversarial and naturally perturbed images.
- The Dual-Carriageway Framework (DCF) for fine-grained visual classification addresses challenges like varying illuminance, imbalanced data, noisy textures, and camera bias in industrial datasets, demonstrating improved performance through an automatic best-suit training pathway seeking framework.
- A data-free defense mechanism for black-box models against adversarial attacks utilizes a wavelet noise remover to filter corrupted high-frequency coefficients and a regenerator network to reconstruct missing information, achieving significant improvements in adversarial accuracy.
- Fractals can serve as effective pre-training datasets for anomaly detection and localization, offering a scalable and privacy-preserving alternative to real-world datasets, and demonstrating competitive performance across various benchmark datasets and anomaly detection methods.
Methods / Models / Datasets Mentioned
APSAR-CPAUPROAUROCAlexNetAlexnetAmazon RekognitionAnyDoorAttCAMAttnGVAttnGradAttribution by Customization (AbC)Attribution by Unlearning (AbU)AtttCAMAugmented Reality (AR)Auto AttackAutoformerAutoregressiveBIMBLIPDiffusionBatch Normalization (BN) layersCFLOWCLIPCNNCP-ViTCP-ViT-SCaption LossClass TaxonomyCollaborative InferenceConcept AblationConditional Independence FormulationConfidence-Aware Convolution (CAC)Conformal Prediction (CP)Contrastive LossCross-entropy lossCustom DiffusionCutPasteDALL-E 2DecompXDiffusion AutoencoderDiffusion modelsDistribution-Guided Debiasing (DGD)DreamboothDual-Carriageway Framework (DCF)Dynamic Sampling TechniqueE4T-DiffusionEWCFast-NTKFastComposerFastFlowFractalsFullGradGANsGenAttGender ShadesGigaGANGlobEncGradCAM++Guided Loss-Increasing (GLI)HOGHiResCAMHuman 3.6M (H3.6M)Human 3.6M-C (H3.6M-C)HyperDreamBoothIP-AdapterITI GenImageGPTImageNetImagenInception-v3Instagram3.5BInverse Network AttackJFT300MJensen-Shannon DivergenceLAIONLAION-400MLAION-5BMPJPE (Mean Per Joint Position Error)MS-COCOMURA VITMUSEMVtec ADMaskGitMasked TransformerMixer-SMobileNetMobileNetV2NegGradNeural Tangent Kernel (NTK)PANDAPGDPaDiMParameter-Efficient Fine-Tuning (PEFT)PartiPatchCorePrompt-based fine-tuningRAPSRDRLNetROAD datasetReLU operationsRegenerator Network (Rn)ResNetResNet-110ResNet-18ResNet-50ResNet34Resnet-18SIFTSISASTFPMSUTISVDiffSafe Latent DiffusionSkipPLUSSoftmaxStyleGANT2T-ViT-12Temporal Additive Gaussian Noise (TAGN)Textual InversionUNSIRUltra Fine-Grained EmbeddingViT-BViT-SVisAVision Transformers (ViT)Visual TokenizerWavelet Coefficient Selection Module (WCSM)Wavelet Noise Remover (WNR)YFCC100MZipLoRAiG
Topics
3D Human Pose Estimation (3D HPE) · Adversarial Robustness · Anomaly Detection · Assisted Driving · Augmented Reality · Bias Mitigation · Bias reduction · Brain-Inspired AI · Causal Representation Learning · Causal image generation · Computer Vision Workshops · Conditional independence · Data Attribution · Data Augmentation · Data-Free Defense · Deep Learning · Diffusion Models · Fair AI · Fair representation learning · Fine-Grained Visual Classification · Fractals · Generalization · Generative AI Ethics · Image Semantics · Invariance · Inverse Network Attacks · Jensen-Shannon divergence · Learning Efficiency · Machine Unlearning · Model Customization · Pose Lifter · Privacy in AI · Private Inference · Regularization · Robustness · Scale Learning · Text-to-Image Generation · Uncertainty-Aware Perception · Video Corruption · Vision Transformers · Wavelet Noise Removal
Notes
Open for commentary — connections to other work, critiques, follow-up reading.