CVPR 2024 Workshop

Event: CVPR 2024 Workshop · Duration: 548 min · ▶ Watch on YouTube

Abstract

This segment features the introduction to the CVPR 2024 Continual Learning in Computer Vision Workshop, followed by two invited talks. The first talk by Vineeth N Balasubramanian explores the integration of explainability and privacy-awareness into continual learning, presenting frameworks like CLAM and POET to address challenges such as catastrophic forgetting and organic lifelong learning. The second talk by Hava Siegelmann critiques the “we know it all in advance” paradigm of current AI, advocating for adaptive intelligence inspired by nature, introducing Super-Turing computation, and discussing Edge Lifelong Learning for on-device AI. The segment begins with a presentation by Hava Siegelmann, who discusses temporal networks, Super-Turing computing, and its implications for lifelong learning and intelligent edge computing. She highlights the need for small, robust, and accurate temporal networks that can learn continuously without catastrophic forgetting. Following a Q&A session, the video transitions to a break and then introduces Liyuan Wang, who presents on bio-inspired continual learning, specifically addressing forgetting and active protection mechanisms. This segment features two talks on continuous learning in computer vision. The first talk introduces MultiIOD, a rehearsal-free multihead incremental object detector that uses a transfer learning scheme to mitigate catastrophic forgetting and background interference. The second talk delves into the practical challenges of continuous learning, including computational budget constraints, the disparity between data generation speed and model training speed, and the difficulties of obtaining labeled data in real-time scenarios. It proposes a new framework for evaluating online continuous learning methods under these realistic constraints. This segment introduces the NEVIS benchmark, a novel approach to evaluating continual learning by leveraging the historical evolution of computer vision research. It details the construction of NEVIS, which comprises over 100 tasks and 8 million images spanning three decades of CVPR, ICCV, ECCV, and BMVC proceedings. The talk then delves into the evaluation protocol, emphasizing Pareto fronts for compute vs. error, and presents baseline results showing that dynamic fine-tuning outperforms multi-tasking. Finally, it transitions to exploring online learning with Transformer models, investigating how to adapt to local distribution shifts and the trade-offs between performance and computational cost in language and vision tasks. This segment covers two distinct presentations on continual learning. The first speaker from Google DeepMind discusses the efficiency and performance trade-offs in language models using KV-caching and applies these principles to vision tasks, demonstrating meta-learning capabilities under non-stationarity. The second speaker, Elisa Ricci, introduces her work on building strong foundations for continual learning, focusing on challenges with large pretrained models in Class-Incremental Novel Class Discovery (CIL) and computational cost. She presents novel methods like FRoST, FineR, and MULTI-LANE, emphasizing the importance of leveraging pretrained models and efficient architectures. This segment features a lively panel discussion on various aspects of continual learning, including the role of attributes, interpretability, and the impact of large pre-trained models. Panelists debated the importance of human-understandable attributes for model interpretability and discussed the challenges and opportunities in discovering new attributes versus combining existing ones. The conversation also touched upon the distinction between traditional supervised learning, continual learning, and curriculum learning, exploring why continual learning often underperforms when compared to models trained on full datasets. The discussion further delved into the evolving landscape of foundation models and their relationship to traditional backbones, considering how these powerful models might influence future research in continual learning, particularly in scenarios involving model incremental learning and multimodal data. This segment features presentations from the 5th Workshop on Continual Learning in Computer Vision at CVPR 2024. It includes a paper presentation on continual learning with weight interpolation, followed by a report on the CLVISION challenge. The report details the challenge setup, rules, and showcases presentations from the top-performing teams, highlighting their innovative approaches to incremental learning with unlabeled and repetitive data. The segment concludes with the announcement of the best paper award and closing remarks for the workshop.

Speakers

Marc Masana — TU Graz, DESLab SAL
Vineeth N Balasubramanian — Department of Computer Science and Engineering/Artificial Intelligence, Indian Institute of Technology, Hyderabad
Hava Siegelmann — University of Massachusetts Amherst
Liyuan Wang — Tsinghua University
Eden Belouadah
Kengo Machida
Amal Rannen-Triki — Senior Research Scientist, Google DeepMind
Jörg Bornschein — Google DeepMind
Razvan Pascanu — Google DeepMind
Marcus Hutter — Google DeepMind
Andras György — Google DeepMind
Alexandre Galashov — Google DeepMind
Yee Whye Teh — Google DeepMind
Michalis K. Titsias — Google DeepMind
Elisa Ricci — University of Trento & FBK
Mingxuan Liu — CVPR
Subhankar Roy — CVPR
Wenjing Li — CVPR
Zhun Zhong — CVPR
Thomas De Min — CVPR
Massimiliano Mancini — CVPR
Stephane Lathuilière — CVPR
Nicu Sebe — CVPR
Federico — University of Florence
Jedrzej Kozal — Wroclaw University of Science and Technology
Jan Wasilewski — Wroclaw University of Science and Technology
Bartosz Krawczyk — Wroclaw University of Science and Technology
Michal Wozniak — Wroclaw University of Science and Technology
Gianluca Guglielmo — TU Graz
Panagiota Moraiti — Democritus University of Thrace
Efstathios Karypidis — National Technical University of Athens
Chengkun Ling — Chinatelecom Cloud
Weiwei Zhou — Chinatelecom Cloud
Taeheon Kim — Seoul National University
San Kim — Seoul National University
Minhyeok Seo — Seoul National University
Dongjae Jeon — Seoul National University
Jonghyun Choi — Yonsei University
Sishun Pan — Nanjing University of Science and Technology
Tingmin Li — Nanjing University of Science and Technology
Yang Yang — Nanjing University of Science and Technology

Talks (16)

00:00:00 — Marc Masana: None
- Introduction to the CVPR 2024 Continual Learning in Computer Vision Workshop, covering its goals, topics, program, organizers, and sponsors.
00:12:40 — Vineeth N Balasubramanian: Expanding the Horizons of Continual Learning Integrating Explainability and Privacy-Awareness
- This talk explores integrating explainability and privacy-awareness into continual learning, addressing challenges like catastrophic forgetting and proposing new frameworks like CLAM and POET.
00:54:50 — Hava Siegelmann: Lifelong Learning AI at the Edge
- This talk discusses the limitations of current AI, advocates for adaptive intelligence inspired by nature, and introduces Super-Turing computation and Edge Lifelong Learning for on-device AI.
01:18:19 — Hava Siegelmann: Small, Robust and Accurate Temporal Networks, Super-Turing Computing, and Intelligent Edge Computing
- This talk discusses the differences between static and temporal networks, highlighting the benefits of temporal networks for sequential data. It then delves into the expressivity of Super-Turing computing for lifelong learning and its application to intelligent edge computing, emphasizing the need for small, robust, and accurate models that can learn continuously without catastrophic forgetting.
02:36:38 — Eden Belouadah: MultiIOD: Rehearsal-free Multihead Incremental Object Detector
- This talk introduces MultiIOD, a rehearsal-free multihead incremental object detector based on a transfer learning scheme, designed to address catastrophic forgetting and background interference in continuous learning scenarios.
02:40:06 — Kengo Machida: None
- This talk discusses the challenges of continuous learning in real-time scenarios, particularly focusing on computational budget constraints, the speed of data generation versus model training, and the practical limitations of data annotation.
03:31:37 — Liyuan Wang: Bio-inspired Continual Learning: Forgetting and Beyond Forgetting
- This talk explores bio-inspired continual learning, focusing on the concepts of forgetting and active protection mechanisms. It discusses how biological brains manage forgetting through active and passive processes, and how these insights can be applied to develop more robust continual learning systems that balance plasticity and memory stability.
03:54:58 — Amal Rannen-Triki: Rethinking Evaluation for Continual Learning A Case for Efficiency
- This talk introduces the NEVIS benchmark for efficient sequential learning and explores online learning with Transformers, highlighting the need for better evaluation metrics and methods to manage knowledge accumulation and distribution shifts.
05:13:17 — Amal Rannen-Triki: Varying context size
- Discusses the trade-offs between cost and performance with varying context sizes in language models, highlighting the benefits of KV-caching over overlapping approaches and applying in-context + online learning to vision tasks for meta-learning and handling non-stationarity.
05:13:17 — Elisa Ricci: Strong Models (for Continual Learning) Need Strong Foundation
- Presents challenges and solutions for continual learning, particularly Class-Incremental Novel Class Discovery (CIL), when leveraging large pretrained models, addressing knowledge extraction and computational cost with methods like FRoST, FineR, and MULTI-LANE architecture.
07:51:01 — Jedrzej Kozal: Continual Learning with Weight Interpolation
- The talk introduces a method for continual learning using weight interpolation, leveraging permutation invariance in neural networks and applying it to address catastrophic forgetting.
07:51:21 — Gianluca Guglielmo: 5th CLVISION challenge
- This segment reports on the 5th CLVISION challenge, detailing its phases, scenarios (incremental learning with unlabeled data, repetition, distractors), restrictions, and presenting the leaderboard and winners’ presentations.
07:51:47 — Panagiota Moraiti, Efstathios Karypidis: CLvision-Challenge-2024
- Team PM-EK presents their approach for the CLVISION challenge, focusing on regularization methods like Learning without Forgetting (LWF) and Less Forgetful Learning (LFL), and utilizing pseudo-labeling for unlabeled data.
07:52:46 — Chengkun Ling, Weiwei Zhou: Frequency-Aware Loader and Feature Supervise in Class-Incremental with Repetition
- Team CtyunAI presents their solution, featuring a frequency-aware replay loader, a dual strategy for unlabeled data, and distillation loss to prevent catastrophic forgetting, achieving 3rd place in the CLVISION challenge.
07:53:41 — Taeheon Kim, San Kim, Minhyeok Seo, Dongjae Jeon, Jonghyun Choi: Multi-layer KD and Dynamic SSL on Repetitive Class-incremental Scenarios
- Team SNUMPR presents their 2nd place solution, utilizing multi-layer knowledge distillation (MLKD) and dynamic self-supervised learning (SSL) with rotation prediction, along with an experience replay-based alternative cross-entropy (ER-ACE) method.
07:54:41 — Sishun Pan, Tingmin Li, Yang Yang: Enhanced winning subnetworks for open-set semi-supervised class-incremental learning with repetition
- Team NJUST-KMG, the 1st place winner, presents their method involving WSN-based subnetwork partitioning, unsupervised contrastive learning, and pseudo-label classification learning, to address open-set semi-supervised class-incremental learning with repetition.

Key Takeaways

Continual learning is evolving to incorporate crucial aspects like explainability and privacy-awareness, moving beyond traditional performance metrics.
The concept of “organic lifelong learning” aims to make AI systems more adaptive and human-like, capable of learning from diverse and unlabeled data streams.
A shift from post-hoc to ante-hoc explainability is proposed, integrating interpretability directly into the model’s architecture for more reliable and robust AI.
Super-Turing computation and Edge Lifelong Learning offer new paradigms for AI development, enabling adaptive, energy-efficient, and privacy-aware on-device learning.
Temporal networks offer advantages over static models for sequential data by starting small and adapting.
Super-Turing computing, particularly with memory retention, significantly expands the expressivity of AI beyond traditional Turing machines, crucial for lifelong learning.
Bio-inspired approaches to continual learning can address catastrophic forgetting by balancing learning plasticity and memory stability through mechanisms like active forgetting and active protection.
Effective continual learning strategies, inspired by biological brains, involve active protection of old knowledge, active forgetting of conflicting information, and modular architectures for generalizability.
MultiIOD is a rehearsal-free multihead incremental object detector that uses a transfer learning scheme, replacing ResNet50 with EfficientNet-B3 and employing class-specific size and offset maps to improve efficiency and adapt to continuous learning.
The absence of past class annotations leads to catastrophic forgetting and background interference, which MultiIOD addresses by separating class representations and applying transfer learning between learned and new classes.
Realistic continuous learning scenarios face significant challenges including computational budget constraints, where simpler methods like ERM can outperform more complex ones under restricted compute, and the speed mismatch between data generation and model training.
Data annotation is a major bottleneck in real-time continuous learning; novel approaches are needed to leverage unlabeled data and manage computational budgets effectively, especially when dealing with sparse or zero-shot annotation scenarios.
Continual learning is already happening in ML practice, but current methods for knowledge accumulation and transfer are often inefficient and not automated.
The NEVIS benchmark, derived from 30 years of computer vision research papers, provides a diverse and non-stationary stream of tasks suitable for evaluating efficient sequential learning.
Dynamic fine-tuning, especially when applied to pre-trained models, shows promising results in improving performance over static models and multi-tasking, even with significant distribution shifts.
Online learning with Transformers, by turning model weights into states and adapting to local distributions, offers a forward-looking approach to handle long sequences and distribution shifts, with interesting trade-offs between computational cost and performance depending on context size.
Dynamic evaluation in continual learning leads to better predictions and offers favorable trade-offs between computational cost and predictive performance, especially when facing significant distribution shifts.
KV-caching streaming provides a computational advantage by encoding tokens only once, outperforming traditional overlapping approaches in language models and enabling competitive performance with shorter context models.
Combining in-context learning and online learning can effectively leverage both short-term and long-term information to solve challenging problems, leading to meta-learning behaviors and strong performance on real-world sequences.
Large-scale pretrained models offer a strong foundation for continual learning, and simple, well-designed baselines leveraging pretrained initialization and cosine normalization can achieve surprisingly good performance in novel class discovery scenarios.
Computational cost is a critical challenge for continual learning with large pretrained models, necessitating efficient architectures and summarization techniques like MULTI-LANE to manage resource requirements as models grow.
Fine-grained semantic category reasoning can be achieved by combining visual question answering models, large language models, and vision-language models to discover and name novel concepts, even making ‘better mistakes’ through semantic awareness.
Human-understandable attributes are crucial for model interpretability, especially when shared semantics (like language) are desired between the model and the user.
The quality and selection of data, potentially through active learning, are increasingly recognized as primary factors for improving model performance, including for foundation models.
The field is moving towards a future with a multitude of foundation models, raising questions about how to effectively manage and transition between them in a continual learning setting, especially when data from previous models is not retained.
Compatible learning, which focuses on aligning different models and continually learning without forgetting, is an emerging area of research that could address challenges in model incremental learning.
Weight interpolation can be effectively used in continual learning to mitigate catastrophic forgetting, especially when combined with memory-based algorithms and careful permutation alignment.
The CLVISION challenge highlights diverse strategies for incremental learning, including dynamic weighting of self-supervised loss, multi-layer knowledge distillation, and subnetwork partitioning.
Leveraging unlabeled data through techniques like pseudo-labeling and unsupervised contrastive learning is crucial for improving performance in continual learning scenarios with repetition and distractors.
Balancing plasticity and stability is a key challenge in continual learning, and methods that allow direct control over this trade-off are valuable.

Methods / Models / Datasets Mentioned

2-Token Approach
ACE
ACM (Prabhu et al., 2023)
AF-1
AF-2
ARNN Computing
Active learning
Analog Recurrent Neural Networks (ARNN)
AutoNovel
BLIP-2
Basic Transformer on C4
BiC
Biological memory replay
CAF
CE Loss
CLAM
CLEVER
CLIP-Sinkhorn
CLOC benchmark
CODA-P
CaSED
Cdc42
CenterNet
Class-Incremental Learning with Cross-Space Clustering and Controlled Transfer
Class-incremental Novel Class Discovery (class-iNCD)
Compatible learning
Continual personalization
Convolutional Block
Cross-Entropy Loss with LabelSmooth
DER++
DINO-Sinkhorn
Distillation Loss
Dynamic Self-Supervised Learning (SSL)
ER
ER (Cai et al. 2021)
ER-ACE (Experience Replay based Alternative Cross-Entropy)
EWC
EfficientNet-B0
EfficientNet-B3
EfficientNet-B5
Energy-based Latent Aligner for Incremental Learning
FRoST (Feature Replay and Distillation with Self-Training)
Fine-Tuning (FT)
Fine-tuning
FineR (Fine-grained Semantic Category Reasoning)
FixMatch
GSS
GT'
Git Re-Basin
Grad-CAM
Hindsight Sequence Planner (HSP)
Hyperparameter Optimization (BHPO)
Joint Training
Jump ODE
K/V caching
KRT-R
KV-caching
Kalman Filter (Titsias et al., 2023)
Kalman Filter for Online Classification of Non-Stationary Data. Titsias et al. 2023
Kmeans
LIME
LRP
LTP
Large Language Models
Lateral inhibition
Learning without Forgetting (LWF)
Less Forgetful Learning (LFL)
LwF
MAS
MCL
MIR
MOSE
MULTI-LANE architecture
Meta-Consolidation for Continual Learning
Modularity
Multi-Tasking (MT)
Multi-layer Knowledge Distillation (MLKD)
Multi-step class-iNCD (MSc-iNCD)
MultiIOD
NEVIS benchmark
Nevis'22
OCDM
OCRA
Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data, Z. Cai, O. Sener, V. Koltun, 2021
Overlapping approach
PIVOT
POET
PRS
Patch Selectors
PoLRS
Prequential MDL
Pretraining (PT)
Privileged Information (PI) Transformer
Project Gutenberg (PG-19)
Pseudo-label Classification Learning
R-DFCI
REPAIR algorithm
RWalk
Rac1
Replay Streams (Bornschein et al., 2022)
Replay loader
Replay streams
ResNet18
ResNet50
Rotation Prediction
SCD
SCL
SDR
SHAP
SI
SID
Self-attention layer
Sequential Learning Of Neural Networks For Prequential MDL - Bornschein et al. 2023
Split-EMNIST
Super-Turing Computing
Super-Turing Test
Synaptic consolidation
Transformer-XL
Turing Test
Unsupervised Contrastive Learning
Vision Transformer
WSN-based Subnetwork Partitioning
WordNet
iCaRL
k-Truncated(ARNN)
split-Cifar10
split-Cifar100
split-TinyImageNet
time-var. CNF

Topics

Active learning · Attributes · Bio-inspired AI · Catastrophic Forgetting · Class-Incremental Learning · Computational Cost · Compute-Performance Trade-offs · Continual Learning · Curriculum learning · Data selection · Distribution Shift · Dynamic Evaluation · Efficient Sequential Learning · Explainability · Explainable AI (XAI) · Fine-grained Recognition · Foundation models · Human-understandable attributes · In-context Learning · Intelligent Edge Computing · Interpretability · KV-caching · Knowledge Accumulation · Knowledge Distillation · Large Language Models · Learning Plasticity · Lifelong Learning · Memory Stability · Meta-learning · Model incremental learning · NEVIS Benchmark · Novel Class Discovery · On-Device AI · Online Learning · Pretrained Models · Privacy-Aware AI · Self-Supervised Learning · Subnetwork Partitioning · Super-Turing Computation · Super-Turing Computing · Temporal Networks · Transformers · Unlabeled Data · Weight Interpolation · background interference · catastrophic forgetting · computational budget · continuous learning · data annotation · incremental learning · object detection · real-time learning · transfer learning

Notes

Open for commentary — connections to other work, critiques, follow-up reading.