CVPR 2024 Tutorial: Learning Deep Low-Dimensional Models from High-Dimensional Data: Theory to Practice

Event: CVPR 2024 Tutorial · Duration: 260 min · ▶ Watch on YouTube

Abstract

This tutorial explores the fundamental principles and practical applications of learning deep low-dimensional models from high-dimensional data. It delves into the phenomenon of neural collapse in classification, demonstrating how representations become compressed and linearized across network layers. The tutorial also investigates the emergence of invariant low-dimensional subspaces during gradient descent optimization, providing a theoretical foundation for understanding the implicit biases in deep learning. Furthermore, it showcases how these theoretical insights can be leveraged to design white-box architectures for efficient low-rank training, network compression, and robust adaptation techniques like Deep LoRA, with implications for large-scale vision and language models.

Speakers

Sam Buchanan — TTIC
Yi Ma — UC Berkeley & HKU
Qing Qu — CMU
Yaodong Yu — UC Berkeley
Yuqian Zhang — Microsoft Research
Zhihui Zhu — TTIC

Talks (21)

02:02:00 — Qing Qu: Lecture 3: Invariant Low-Dim Subspaces of Learning Dynamics
- This talk focuses on progressive neural collapse across layers, invariant subspaces in gradient learning dynamics, and efficient low-rank training and network compression.
04:00:00 — Qing Qu: Neural Collapse in Classification
- Explains neural collapse as an overfitting phenomenon where representations collapse to class means, and introduces NC1 metric to quantify this.
08:58:00 — Qing Qu: Towards Understanding Progressive Collapse?
- Investigates progressive collapse in deep linear and nonlinear networks, showing a geometric decay of feature compression across layers.
14:15:00 — Qing Qu: Why Deep Linear Network? (Visualization)
- Visualizes feature maps using UMAP, showing similar progressive collapse behavior in nonlinear and hybrid networks, and highlights depth’s role in generalization and compression.
18:47:00 — Qing Qu: Implications on Transfer Learning
- Discusses how neural collapse affects transfer learning, suggesting that features before projection heads are less collapsed and exhibit better transferability.
22:34:00 — Qing Qu: Invariant Subspace in Gradient Learning Dynamics
- Explores invariant subspaces in deep nonlinear networks by tracking singular values and vectors, showing low-rank updates and invariant subspaces during training and fine-tuning.
27:20:00 — Qing Qu: Main Message (Invariant Subspace)
- Presents a theorem showing progressive feature compression with geometric decay and feature discrimination with linear growth in deep linear networks under specific assumptions.
32:24:00 — Qing Qu: The Evolution of Singular Spaces in GD Iterates for DLNs
- Visualizes the evolution of singular values and vectors during training, demonstrating that updates occur within a minimal invariant subspace, leading to an implicit low-rank bias.
37:33:00 — Qing Qu: Efficient Low-rank Training & Network Compression
- Introduces deep matrix completion as a starting point, highlighting the benefits of depth and width in preventing overfitting and accelerating convergence, respectively.
42:24:00 — Qing Qu: How to Achieve the Best of Two Worlds?
- Explains how to leverage the low-rank structure in deep linear networks for efficient training by approximating full weights with compressed components, leading to significant computation reduction.
48:46:00 — Qing Qu: From Deep Matrix Factorization to Completion?
- Discusses extending the approach to deep matrix completion, addressing challenges with non-linear observation matrices and proposing a remedy to update both left and right singular vectors.
51:00:00 — Qing Qu: Comparison with Alternating Minimization
- Compres the proposed method with alternating minimization on synthetic and MovieLens data, showing superior performance in avoiding overfitting and achieving high-quality solutions.
52:00:00 — Qing Qu: Low-rank Adaptation (LoRA) of Large Language Models?
- Explains LoRA as a parameter-efficient adaptation technique for large pretrained models, highlighting its effectiveness but also its sensitivity to the hyperparameter ‘r’ and proneness to overfitting on limited data.
56:00:00 — Qing Qu: Deep Low-rank Adaptation (LoRA)
- Introduces Deep LoRA, which updates weights via deep factorization, showing improved performance on limited training examples and increased robustness to the choice of rank ‘r’ compared to vanilla LoRA.
59:55:00 — Qing Qu: Conclusion (Lecture 3)
- Summarizes that gradient descent learning occurs within a minimal invariant subspace, enabling efficient low-rank training and network compression, and aiding in understanding hierarchical representations in deep networks.
01:01:00 — Yi Ma: Panel Discussion
- Introduces the panel discussion with four questions and an open debate, inviting comments from the panelists and audience.
01:02:00 — Yi Ma: Conference on Parsimony and Learning (CPAL)
- Advertises the CPAL conference, focusing on parsimonious, low-dimensional structures in machine learning, and announces the second conference at Stanford.
01:04:00 — Yi Ma: What are the central questions that remain unanswered for us to understand deep learning? How can these be achieved via exploring low-dimensionality?
- Discusses the fundamental questions in deep learning, emphasizing the need to understand deep networks beyond shallow models and the role of low-dimensionality in achieving this understanding.
01:18:00 — Yi Ma: Richard Sutton argues in “The Bitter Lesson” that the most significant advancements in AI have come from simply scaling up computation, rather than through human ingenuity in e.g. algorithm design. How do you understand “the bitter lesson” with respect to low-dimensionality in deep learning models and AI? How do you think the research community on low-dimensionality should ‘take’ “the bitter lesson”?
- Explores “The Bitter Lesson” in the context of low-dimensionality, discussing whether scaling up computation is always the best approach or if clever architectural designs are crucial.
01:31:00 — Yi Ma: What directions do you think this line of research of low-dimensionality is headed over the next five years? a. The potential impact? b. What implications will there be for large-scale vision and language models? c. For data-rich / data-scarce training regimes?
- Discusses future research directions in low-dimensionality, including its potential impact, implications for large-scale models, and relevance for data-rich/scarce training regimes.
01:59:00 — Yi Ma: Conclusion (Panel Discussion)
- Concludes the panel discussion, thanking the audience and panelists.

Key Takeaways

Neural collapse is a pervasive phenomenon in deep learning, where representations of different classes converge to their respective class means, enabling efficient classification.
Deep networks exhibit progressive feature compression and linearization across layers, which can be rigorously analyzed using metrics like NC1 and is crucial for understanding their learning dynamics.
Gradient descent optimization in deep networks operates within low-dimensional invariant subspaces, leading to an implicit low-rank bias in the learned weights and representations.
Leveraging these low-dimensional structures allows for the design of efficient low-rank training and network compression techniques, such as Deep LoRA, which can improve performance and robustness in various applications including transfer learning.
White-box model design, derived from first principles and unrolled optimization, offers a path towards more interpretable and robust deep learning architectures, moving beyond black-box empirical designs.

Methods / Models / Datasets Mentioned

NC1 metric
UMAP
ResNet18
CIFAR-10
MLP
VGG
ViT-B
BERT
STS-B dataset
Deep LoRA
GP2

Topics

Low-dimensionality · Deep Learning · Neural Collapse · Representation Learning · Invariant Subspaces · Gradient Descent · Network Compression · Low-Rank Adaptation (LoRA) · Transfer Learning · White-Box Models

Notes

Open for commentary — connections to other work, critiques, follow-up reading.