Workshop on Graphic Design Understanding and Generation

Event: CVPR 2024 Workshop · Duration: 168 min · ▶ Watch on YouTube

Abstract

The Workshop on Graphic Design Understanding and Generation (GDUG) at CVPR 2024 brings together researchers and practitioners to explore the intersection of computer vision, natural language processing, and graphic design. The workshop features invited talks from leading experts in the field, paper spotlights on cutting-edge research, and a poster session for interactive discussions. Key topics include multi-modal document understanding and generation, layout analysis, font and typography analysis, color palette recommendation, and AI-assisted design tools. The event aims to foster collaboration and advance the state-of-the-art in creating intelligent systems for graphic design.

Speakers

Kota Yamaguchi — CyberAgent
Cherry (Nanxuan) Zhao — Adobe Research
Shizhao Sun — Microsoft Research Asia, Machine Learning Group
Zhouhui Lian — WICT, Peking University
Shohei Tanaka — Waseda University
Jaejung Seol — UNIST
Sanket Biswas — Adobe Research

Talks (7)

00:02:45 — Kota Yamaguchi: Opening Remarks
- The speaker welcomes attendees to the Workshop on Graphic Design Understanding and Generation, highlighting the event’s focus on graphic design and its challenges, and introducing the day’s agenda and invited speakers.
01:31:59 — Cherry (Nanxuan) Zhao: Can AI Shape Graphic Design Creation?
- The speaker discusses the challenges of graphic design creation, the evolution of AI techniques from traditional methods to deep learning, and how deep learning models are being applied to various graphic design tasks like font generation, color palette recommendation, and layout generation.
01:57:38 — Shizhao Sun: Reinventing Design Creation by Foundation Models
- The speaker discusses the challenges of graphic design, the limitations of existing layout generation methods, and proposes a new framework called LayoutPrompter that leverages large language models to generate diverse and high-quality layouts conditioned on input content.
02:26:54 — Zhouhui Lian: Font Synthesis via Deep Generative Models
- The speaker discusses the challenges of font synthesis, particularly for Chinese characters, and presents a decade of research on developing deep generative models for high-quality font generation, including methods for few-shot learning, style transfer, and vector font synthesis.
02:55:00 — Shohei Tanaka: SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
- The speaker introduces SciPostLayout, a new dataset for layout analysis and generation of scientific posters, and presents experimental results using various models for layout analysis and generation tasks, highlighting the challenges and potential of using deep learning for scientific poster design.
03:08:30 — Jaejung Seol: PosterLlama: Bridging Design Ability of Language Model to Content-Aware Layout Generation
- The speaker introduces PosterLlama, a novel framework that leverages large language models (LLMs) to generate content-aware layouts for posters, addressing the limitations of existing methods in handling textual content and fine-grained visual details.
03:38:15 — Sanket Biswas: DocSynthv2: A Practical Autoregressive Modeling for Document Generation
- The speaker introduces DocSynthv2, a novel autoregressive model for document generation that leverages a GPT-2 architecture to generate documents as sequences of elements, addressing the limitations of existing methods in handling diverse document types and complex layouts.

Key Takeaways

AI is increasingly capable of assisting in various aspects of graphic design, from generating fonts and color palettes to creating complex layouts, but challenges remain in handling subjective aesthetic judgments and complex design workflows.
Deep learning models, particularly generative adversarial networks (GANs) and variational autoencoders (VAEs), have shown impressive results in synthesizing graphic elements and layouts, often leveraging large datasets and self-supervised learning techniques.
The field is moving towards more sophisticated representations of graphic designs, such as vector formats and structured sequences of elements, which allow for greater editability, controllability, and integration with AI systems.
Future research needs to address the limitations of current AI models in understanding and generating designs based on high-level semantic intentions, brand guidelines, and existing materials, requiring a more holistic approach to the creative workflow.
The development of robust evaluation metrics and benchmark datasets is crucial for advancing research in graphic design understanding and generation, especially for tasks involving multi-modal content and complex design principles.

Methods / Models / Datasets Mentioned

GPT-2
DeepVecFont
VecFontSDF
HFH-Font
LayoutPrompter
LayoutGAN
FlexDM
DocSynth
BLT/LayoutTransformer
DINOv2
CodeLlama
DreamSim
LayoutDM
Visual Layout Composer
COLE
OpenCOLE
FlexDM
AGIS-Net
MC-GAN
TET-GAN
SCFont
QT-Font

Topics

Graphic Design · Computer Vision · Natural Language Processing · Deep Learning · Font Synthesis · Layout Generation · Content-Aware Design · AI-Assisted Design · Multi-modal Understanding · Document Generation

Notes

Open for commentary — connections to other work, critiques, follow-up reading.