4th Workshop on Computer Vision in the Built Environment

Event: CVPR 2024 Workshop · Duration: 461 min · ▶ Watch on YouTube

Abstract

This video segment features an introduction to the CVPR 2024 Workshop on Computer Vision in the Built Environment, highlighting the motivation for bringing together computer vision and AEC communities. The first keynote presentation, by Derek Lichti, focuses on rigorous object precision modeling for reality capture viewpoint planning, emphasizing quality assurance and control in terrestrial laser scanning. The second keynote, by Francis Engelmann, introduces foundation models for open-vocabulary 3D scene understanding, demonstrating how large visual language models can be used for instance segmentation and searching arbitrary objects in 3D scenes. This segment features two talks. The first talk demonstrates a 3D scene understanding model that leverages pre-trained foundation models for object detection and functional analysis in 3D scenes, introducing the SceneFun3D dataset for understanding interactions and functionalities. The second talk focuses on digital transformation for circular construction, highlighting the environmental and ethical challenges of the linear economy in the built environment. It showcases various digital technologies, including 3D scanning, computer vision for material detection, and robotics for material processing, to facilitate the reuse of building components and promote a circular economy. This segment features multiple talks from the 4th Workshop on Computer Vision in the Built Environment. Topics covered include zero-shot object detection in construction, real-time ergonomic risk assessment, window-to-wall ratio detection, spatial mean radiant temperature mapping, and 3D semantic reconstruction for BIM models. The presentations highlight novel datasets, frameworks, and methods to address challenges in safety, efficiency, and environmental understanding within construction and architectural contexts. This video segment features a presentation on a BIM-Module for deep learning-based parametric IFC reconstruction, developed by KU Leuven and Fondazione Bruno Kessler. The speaker details an 8-step framework for converting scan data into BIM models, covering semantic segmentation, column, wall, and door detection and reconstruction. Following the presentation, the session chair provides logistical information about the workshop’s lunch break, poster session, and reconvening time, before the video transitions to a slide displaying the results of the 2D Scan2BIM Challenge. This segment presents the results of the 2D and 3D Scan2BIM Challenges, showcasing winning submissions that leverage deep learning for tasks like floorplan generation, occlusion handling, and parametric IFC reconstruction. Following the challenge presentations, a keynote speech explores data-driven design for sustainable built environments, highlighting the role of computational design, generative AI, digital fabrication, and circular design in addressing climate change and improving architectural performance. The keynote emphasizes the importance of integrating performance analysis early in the design process and utilizing advanced technologies for complex, low-carbon structures. This segment captures a dynamic panel discussion at the CVPR 2024 Workshop, exploring the transformative potential of computer vision and 3D modeling within the built environment. Experts from academia and industry delve into critical challenges such as data quality, the need for robust model representations, and the complexities of tech transfer. The discussion highlights the importance of interdisciplinary collaboration, user-centric design, and ethical considerations in developing and deploying these technologies to create more efficient, sustainable, and resilient urban spaces.

Speakers

Michael Olsen — Oregon State University
Derek Lichti — University of Calgary, Department of Geomatics Engineering
Francis Engelmann — PostDoc ETH Zurich, Visiting Researcher Google
Iro Armeni — ETH Zurich
Catherine De Wolf — ETH Zurich
Maryam Soleymani — Louisiana State University
Mahdi Bonyani — Louisiana State University
Zoe De Simone — MIT Architecture
Wei Liang — Carnegie Mellon University
Ka Lung Cheung — The Chinese University of Hong Kong
Mohammad Moein Sheikholeslami — Lassonde School of Engineering, York University, Canada
Ing. Sam De Geyter — KU Leuven – Geomatics research group, MEET HET – Research & Development
Dr. ing. Maarten Bassier — KU Leuven – Geomatics research group
Ing. Heinder De Winter — KU Leuven – Geomatics research group
Prof. dr. ir. Maarten Vergauwen — KU Leuven – Geomatics research group
Roberto Battisti — Fondazione Bruno Kessler
Oscar Roman — Fondazione Bruno Kessler
Longyong Wu — Department of Real Estate and Construction, The University of Hong Kong
Ziqi Li — Department of Real Estate and Construction, The University of Hong Kong
Meng Sun — Department of Real Estate and Construction, The University of Hong Kong
Fan Xue — Department of Real Estate and Construction, The University of Hong Kong
Siyuan Meng — Faculty of Architecture, The University of Hong Kong
Sou-Han Chen — Faculty of Architecture, The University of Hong Kong
Jiajia Wang — Faculty of Architecture, The University of Hong Kong
Dr. ing. Heinder De Winter — KU Leuven / MEET HET
Dr. Jason Rambach — DFKI / HumanTech
Caitlin Mueller — Associate Professor, MIT Architecture + Civil and Environmental Engineering, Director, Digital Structures
Caitlin Mueller Lochen — MIT
Thomas — Apple
Amber Xiangli — Cornell Tech

Talks (18)

00:17:13 — Derek Lichti: Rigorous Object Precision Modelling for Reality Capture Viewpoint Planning
- This talk emphasizes the importance of quality assurance and control in 3D reality capture, particularly for terrestrial laser scanning, and introduces a rigorous variance-covariance propagation method for viewpoint planning to optimize data collection and ensure object positional precision meets specified quality requirements.
00:52:24 — Francis Engelmann: Foundation Models for 3D Scene Understanding
- This talk explores the use of foundation models, specifically large visual language models (VLM) like CLIP, for open-vocabulary 3D scene understanding, demonstrating how these models can be used for instance segmentation and searching arbitrary objects in 3D scenes using natural language queries.
02:33:35 — Maryam Soleymani: Zero-Shot Construction Object Detection through Knowledge-based Feature Integrator
- This talk introduces ZSCODet, a zero-shot construction object detection framework that leverages construction knowledge graphs and multi-model graph fusion to detect previously unobserved objects, aiming to improve project quality, safety, and modular design.
02:39:51 — Mahdi Bonyani: Real-time Ergonomic Risk Assessment in Construction Sites: Revolutionizing Safety and Efficiency in Construction
- This talk presents a real-time ergonomic risk assessment method for construction sites, utilizing a spatio-temporal graph convolutional network (ST-GCN) to extract 2D and 3D keypoint data from video, enabling continuous and accurate ergonomic risk assessment for worker safety.
02:49:15 — Zoe De Simone: Window To Wall Ratio Detection using Semantic Segmentation
- This talk focuses on detecting Window-to-Wall Ratios (WWR) using semantic segmentation, a crucial metric for assessing building performance, by training models to concurrently detect windows and walls and addressing challenges like image perspective distortion and varying lighting conditions.
02:57:25 — Wei Liang: SegMRT: An Expeditious Spatial Mean Radiant Temperature Mapping Framework using visual SLAM and Semantic Segmentation
- This talk introduces SegMRT, a framework for expeditious spatial Mean Radiant Temperature (MRT) mapping, which uses semantic segmentation and a TIR-RGB-D-Tracking camera array to create detailed MRT maps for thermal comfort evaluation in built environments.
03:06:05 — Ka Lung Cheung: ARCH2S: Dataset, Benchmark and Challenges for Learning Exterior Architectural Structures from Point Clouds
- This talk introduces ARCH2S, a dataset, benchmark, and challenges for learning exterior architectural structures from point clouds, addressing the lack of detailed annotated outdoor 3D point cloud datasets and presenting experiments with convolutional and transformer-based segmentation methods.
03:12:25 — Ka Lung Cheung: Towards Automating the Retrospective Generation of BIM Models: A Unified Framework for 3D Semantic Reconstruction of the Built Environment
- This talk presents SRBIM, a unified framework for 3D semantic reconstruction of the built environment, aiming to automate the retrospective generation of BIM models by processing point clouds through semantic segmentation, mesh generation, and mapping to IFC schema.
03:20:35 — Mohammad Moein Sheikholeslami: Enhancing Polygonal Building Segmentation via Oriented Corners
- This talk introduces a novel method for enhancing polygonal building segmentation using oriented corners, which are used as a mid-level auxiliary representation to predict more regularized and simplified polygons through a Graph Convolutional Network (GCN) for iterative refinement.
03:36:47 — Catherine De Wolf: Digital Transformation for Circular Construction
- Discusses the application of digital technologies like 3D scanning, computer vision, and robotics to enable the reuse and recycling of building materials, promoting a circular economy in the built environment.
03:50:22 — Ing. Sam De Geyter: BIM-Module for deep learning-based parametric IFC reconstruction
- This presentation introduces a BIM-Module framework for creating scan-to-BIM models from scanning data, detailing steps from semantic segmentation to door reconstruction using deep learning and geometric methods.
04:57:04 — Tsinghua-CBIMS: 2D Challenge Results
- This segment presents the results of the 2D challenge, showing the performance metrics (IoU, F1, warping error, betti error, and overall score) for the Tsinghua-CBIMS team, and comparing them to previous years’ winners.
05:14:04 — Longyong Wu, Ziqi Li, Meng Sun, Fan Xue: 2D Scan2BIM Challenge Submission: Floorplan Generation from Point Clouds
- Presentation on the 2D Scan2BIM Challenge submission, detailing the team’s approach to generating floorplans from point clouds, including preprocessing, line prediction, planar detection, and SAM-based room segmentation.
05:28:44 — Siyuan Meng, Sou-Han Chen, Jiajia Wang, Fan (Frank) Xue: Handing Occlusion in Scan-to-BIM automation: Space-voxel-guided Boundary Adaptation to Semantics Ensemble – Completion of Occluded/Opening Points (SBASE-CO)
- This presentation introduces the SBASE-CO method for Scan-to-BIM automation, focusing on handling occlusions and completing occluded/opening points using a space-voxel-guided boundary adaptation to semantics ensemble.
05:31:45 — Ing. Sam De Geyter, Dr. ing. Maarten Bassier, Dr. ing. Heinder De Winter, Prof. dr. ir. Maarten Vergauwen and Roberto Battisti, Oscar Roman: BIM-Module for deep learning-based parametric IFC reconstruction
- This presentation details a BIM-Module for deep learning-based parametric IFC reconstruction, covering semantic segmentation, column detection using YOLOv8, filtering and clustering, level reconstruction, wall reconstruction, column reconstruction, and door detection/reconstruction.
05:40:22 — Dr. Jason Rambach: Scan-to-BIM: Digital Twin Generation Pipeline
- This presentation outlines a digital twin generation pipeline for Scan-to-BIM, including semantic segmentation, methods for closing doors and inverse reconstruction, and future work directions.
05:51:43 — Caitlin Mueller: Designing with data for a sustainable built environment
- Keynote speech exploring the use of data-driven design for a sustainable built environment, covering topics like high-performance design, design space exploration, generative AI, digital fabrication, robotic assembly, circular design with waste materials, and algorithmic circular design with reinforcement learning.
06:23:57 — Multiple Panelists: Panel Discussion: Computer Vision in the Built Environment
- This panel discussion explores the transformative potential of computer vision and 3D modeling within the built environment, addressing challenges in data quality, model representation, tech transfer, and industry adoption, while highlighting interdisciplinary collaboration and ethical considerations.

Key Takeaways

Integrating computer vision with AEC (Architecture, Engineering, Construction) is crucial for addressing real-world challenges in the built environment, especially with the growth of 3D point cloud data.
Rigorous quality assurance and control methods, including sensor modeling, system calibration, and network design, are essential for ensuring the reliability and accuracy of 3D reality capture data.
Object positional precision, derived from variance-covariance propagation, is a more relevant metric for evaluating 3D reality capture designs than simple radiated point precision.
Foundation models, particularly large visual language models (VLM), enable open-vocabulary 3D scene understanding, allowing detection and searching of arbitrary objects in 3D scenes using natural language, even for objects not explicitly trained on.
Foundation models can be effectively used for 3D scene understanding, including object detection and functional analysis, without extensive retraining.
Transitioning to a circular economy in the built environment requires digital transformation to address material waste, embodied energy, and ethical concerns in construction.
Advanced digital technologies like 3D scanning, computer vision, and robotics are crucial for inventorying, tracking, and processing reclaimed building materials for reuse.
Material passports and robust legal frameworks are essential to overcome challenges related to material fatigue, liability, and certification for reused building components.
Zero-shot object detection frameworks like ZSCODet can leverage knowledge graphs and multi-model graph fusion to identify previously unobserved objects in complex construction environments, improving safety and project quality.
Real-time ergonomic risk assessment using spatio-temporal graph convolutional networks (ST-GCN) offers a robust solution for monitoring worker posture and preventing musculoskeletal disorders in construction.
Semantic segmentation models, such as ResNet-50 and SegFormer, can accurately detect window-to-wall ratios and other building features, providing crucial data for energy modeling and architectural analysis.
Novel frameworks like SegMRT integrate visual SLAM and semantic segmentation with thermal cameras to create detailed spatial maps of Mean Radiant Temperature (MRT), enhancing thermal comfort assessment in buildings.
The presented BIM-Module utilizes a multi-stage deep learning pipeline for comprehensive parametric IFC reconstruction from point cloud data.
Specific challenges like column and door detection are addressed with specialized models (YOLOv8, Grounding Dino) and conditional filtering to improve accuracy.
The framework includes steps for semantic segmentation, level reconstruction, and geometric/topological reconstruction of walls, columns, and doors.
The Scan2BIM Challenge results highlight the performance metrics used to evaluate the accuracy of 2D reconstruction, including IoU, F1 score, warping error, and betti error.
The Scan2BIM Challenge highlights advancements in automating BIM model generation from point clouds using deep learning, with improvements in 3D reconstruction metrics and specialized techniques for handling occlusions and incomplete data.
Data-driven design, leveraging generative AI and computational tools, offers significant potential for creating high-performance, diverse, and sustainable architectural solutions by systematically exploring design spaces and integrating performance analysis.
Digital fabrication and robotic assembly are crucial for materializing geometrically complex, low-carbon structures and enabling circular design practices with irregular or upcycled waste materials, transforming construction economics.
Integrating performance analysis early in the design process, through real-time feedback and systematic design space exploration, is essential for addressing the climate crisis in the built environment and fostering human creativity in design.
Effective tech transfer from research to industry in the AEC sector requires addressing practical constraints like data quality, scalability, and user-friendliness, often necessitating collaboration and shared data resources.
The choice of 3D model representation is application-dependent, with a growing need for differentiable conversions between various representations to support diverse tasks like simulation, design, and rendering.
Robustness against data imperfections (blurriness, noise, incomplete views) is a critical challenge, with solutions potentially involving advanced deblurring, dynamic Gaussian splatting, and multi-modal data fusion.
Future trends include leveraging AI for design optimization, digital twins, automated construction, and personalized environments, while also addressing ethical implications, data privacy, and the need for explainable AI.

Methods / Models / Datasets Mentioned

ARCH2S
Apple's RoomPlan
AutoCAD
Autodesk Revit
BIM
BLK-360
Betti Number Error
BlenderBIM
CHOMP
CLIP
CT Scanning
DBSCAN
DINO
Faro Focus3D
GCN
GPR
Gaussian Splatting
Grounding DINO
Grounding Dino Object Detection
HiSup
Hungarian Algorithm
Hungarian matching
ICP
IFC
IoU
LCNN
LLMs
LiDAR
MFS Module
MLP
Mask R-CNN
Mask Transformer
Mask3D
MatterPort
MinkUNet
NFC
NeRF
OpenMask3D
PTV1
PTV2
PTV3
Photogrammetry
Point Prompt Training
Point Transformer v3
Pointnet
QR codes
RANSAC
RFID
RRT*
Reflectometry
ResNet-50
Revit
Robotic additive joining
Robotic plasma cutting
RoomFormer
RoomPlan
S3DIS
SAM
SBASE-CO
SDF
SLAM
SRBIM
ST-GCN
STOMP
ScanNet
SceneFun3D
SegFormer
SegMRT
SfM/MVS Photogrammetry
SigLIP
SpUNet
Structured3D
Thermal Imaging
USDC
VAE
VLM
X-ray
YOLOv8
YOLOv8 Object Detection
ZSCODet

Topics

3D Instance Segmentation · 3D Modeling · 3D Reality Capture · 3D Scene Understanding · AI in Design & Construction · BIM Modeling · BIM-Module · Building Material Reuse · Built Environment · CLIP · Circular Design · Circular Economy · Column Detection · Computational Design · Computer Vision · Computer Vision in AEC · Computer Vision in Built Environment · Construction Safety · Data Quality · Deep Learning · Deep Learning in Architecture · Digital Fabrication · Digital Transformation · Digital Twin Generation · Door Detection · Ergonomic Risk Assessment · Ethical AI · Floorplan Generation · Foundation Models · Generative AI · IFC Reconstruction · Industry Adoption · Large Visual Language Models (VLM) · LiDAR Scanning · Model Representation · Non-Destructive Testing · Object Positional Precision · Open-Vocabulary Learning · Point Clouds · Quality Assurance (QA) · Quality Control (QC) · Reinforcement Learning · Robotic Assembly · Robotics · Scan-to-BIM · Scan2BIM Challenge · Semantic Segmentation · Sustainable Construction · Tech Transfer · Terrestrial Laser Scanning (TLS) · Thermal Comfort · Viewpoint Planning · Wall Reconstruction · Zero-Shot Object Detection

Notes

Open for commentary — connections to other work, critiques, follow-up reading.