4th Workshop on Computer Vision in the Built Environment

Event: CVPR 2024 Workshop · Duration: 461 min · ▶ Watch on YouTube

Abstract

This video segment features an introduction to the CVPR 2024 Workshop on Computer Vision in the Built Environment, highlighting the motivation for bringing together computer vision and AEC communities. The first keynote presentation, by Derek Lichti, focuses on rigorous object precision modeling for reality capture viewpoint planning, emphasizing quality assurance and control in terrestrial laser scanning. The second keynote, by Francis Engelmann, introduces foundation models for open-vocabulary 3D scene understanding, demonstrating how large visual language models can be used for instance segmentation and searching arbitrary objects in 3D scenes. This segment features two talks. The first talk demonstrates a 3D scene understanding model that leverages pre-trained foundation models for object detection and functional analysis in 3D scenes, introducing the SceneFun3D dataset for understanding interactions and functionalities. The second talk focuses on digital transformation for circular construction, highlighting the environmental and ethical challenges of the linear economy in the built environment. It showcases various digital technologies, including 3D scanning, computer vision for material detection, and robotics for material processing, to facilitate the reuse of building components and promote a circular economy. This segment features multiple talks from the 4th Workshop on Computer Vision in the Built Environment. Topics covered include zero-shot object detection in construction, real-time ergonomic risk assessment, window-to-wall ratio detection, spatial mean radiant temperature mapping, and 3D semantic reconstruction for BIM models. The presentations highlight novel datasets, frameworks, and methods to address challenges in safety, efficiency, and environmental understanding within construction and architectural contexts. This video segment features a presentation on a BIM-Module for deep learning-based parametric IFC reconstruction, developed by KU Leuven and Fondazione Bruno Kessler. The speaker details an 8-step framework for converting scan data into BIM models, covering semantic segmentation, column, wall, and door detection and reconstruction. Following the presentation, the session chair provides logistical information about the workshop’s lunch break, poster session, and reconvening time, before the video transitions to a slide displaying the results of the 2D Scan2BIM Challenge. This segment presents the results of the 2D and 3D Scan2BIM Challenges, showcasing winning submissions that leverage deep learning for tasks like floorplan generation, occlusion handling, and parametric IFC reconstruction. Following the challenge presentations, a keynote speech explores data-driven design for sustainable built environments, highlighting the role of computational design, generative AI, digital fabrication, and circular design in addressing climate change and improving architectural performance. The keynote emphasizes the importance of integrating performance analysis early in the design process and utilizing advanced technologies for complex, low-carbon structures. This segment captures a dynamic panel discussion at the CVPR 2024 Workshop, exploring the transformative potential of computer vision and 3D modeling within the built environment. Experts from academia and industry delve into critical challenges such as data quality, the need for robust model representations, and the complexities of tech transfer. The discussion highlights the importance of interdisciplinary collaboration, user-centric design, and ethical considerations in developing and deploying these technologies to create more efficient, sustainable, and resilient urban spaces.

Speakers

  • Michael Olsen — Oregon State University
  • Derek Lichti — University of Calgary, Department of Geomatics Engineering
  • Francis Engelmann — PostDoc ETH Zurich, Visiting Researcher Google
  • Iro Armeni — ETH Zurich
  • Catherine De Wolf — ETH Zurich
  • Maryam Soleymani — Louisiana State University
  • Mahdi Bonyani — Louisiana State University
  • Zoe De Simone — MIT Architecture
  • Wei Liang — Carnegie Mellon University
  • Ka Lung Cheung — The Chinese University of Hong Kong
  • Mohammad Moein Sheikholeslami — Lassonde School of Engineering, York University, Canada
  • Ing. Sam De Geyter — KU Leuven – Geomatics research group, MEET HET – Research & Development
  • Dr. ing. Maarten Bassier — KU Leuven – Geomatics research group
  • Ing. Heinder De Winter — KU Leuven – Geomatics research group
  • Prof. dr. ir. Maarten Vergauwen — KU Leuven – Geomatics research group
  • Roberto Battisti — Fondazione Bruno Kessler
  • Oscar Roman — Fondazione Bruno Kessler
  • Longyong Wu — Department of Real Estate and Construction, The University of Hong Kong
  • Ziqi Li — Department of Real Estate and Construction, The University of Hong Kong
  • Meng Sun — Department of Real Estate and Construction, The University of Hong Kong
  • Fan Xue — Department of Real Estate and Construction, The University of Hong Kong
  • Siyuan Meng — Faculty of Architecture, The University of Hong Kong
  • Sou-Han Chen — Faculty of Architecture, The University of Hong Kong
  • Jiajia Wang — Faculty of Architecture, The University of Hong Kong
  • Dr. ing. Heinder De Winter — KU Leuven / MEET HET
  • Dr. Jason Rambach — DFKI / HumanTech
  • Caitlin Mueller — Associate Professor, MIT Architecture + Civil and Environmental Engineering, Director, Digital Structures
  • Caitlin Mueller Lochen — MIT
  • Thomas — Apple
  • Amber Xiangli — Cornell Tech

Talks (18)

  • 00:17:13Derek Lichti: Rigorous Object Precision Modelling for Reality Capture Viewpoint Planning
    • This talk emphasizes the importance of quality assurance and control in 3D reality capture, particularly for terrestrial laser scanning, and introduces a rigorous variance-covariance propagation method for viewpoint planning to optimize data collection and ensure object positional precision meets specified quality requirements.
  • 00:52:24Francis Engelmann: Foundation Models for 3D Scene Understanding
    • This talk explores the use of foundation models, specifically large visual language models (VLM) like CLIP, for open-vocabulary 3D scene understanding, demonstrating how these models can be used for instance segmentation and searching arbitrary objects in 3D scenes using natural language queries.
  • 02:33:35Maryam Soleymani: Zero-Shot Construction Object Detection through Knowledge-based Feature Integrator
    • This talk introduces ZSCODet, a zero-shot construction object detection framework that leverages construction knowledge graphs and multi-model graph fusion to detect previously unobserved objects, aiming to improve project quality, safety, and modular design.
  • 02:39:51Mahdi Bonyani: Real-time Ergonomic Risk Assessment in Construction Sites: Revolutionizing Safety and Efficiency in Construction
    • This talk presents a real-time ergonomic risk assessment method for construction sites, utilizing a spatio-temporal graph convolutional network (ST-GCN) to extract 2D and 3D keypoint data from video, enabling continuous and accurate ergonomic risk assessment for worker safety.
  • 02:49:15Zoe De Simone: Window To Wall Ratio Detection using Semantic Segmentation
    • This talk focuses on detecting Window-to-Wall Ratios (WWR) using semantic segmentation, a crucial metric for assessing building performance, by training models to concurrently detect windows and walls and addressing challenges like image perspective distortion and varying lighting conditions.
  • 02:57:25Wei Liang: SegMRT: An Expeditious Spatial Mean Radiant Temperature Mapping Framework using visual SLAM and Semantic Segmentation
    • This talk introduces SegMRT, a framework for expeditious spatial Mean Radiant Temperature (MRT) mapping, which uses semantic segmentation and a TIR-RGB-D-Tracking camera array to create detailed MRT maps for thermal comfort evaluation in built environments.
  • 03:06:05Ka Lung Cheung: ARCH2S: Dataset, Benchmark and Challenges for Learning Exterior Architectural Structures from Point Clouds
    • This talk introduces ARCH2S, a dataset, benchmark, and challenges for learning exterior architectural structures from point clouds, addressing the lack of detailed annotated outdoor 3D point cloud datasets and presenting experiments with convolutional and transformer-based segmentation methods.
  • 03:12:25Ka Lung Cheung: Towards Automating the Retrospective Generation of BIM Models: A Unified Framework for 3D Semantic Reconstruction of the Built Environment
    • This talk presents SRBIM, a unified framework for 3D semantic reconstruction of the built environment, aiming to automate the retrospective generation of BIM models by processing point clouds through semantic segmentation, mesh generation, and mapping to IFC schema.
  • 03:20:35Mohammad Moein Sheikholeslami: Enhancing Polygonal Building Segmentation via Oriented Corners
    • This talk introduces a novel method for enhancing polygonal building segmentation using oriented corners, which are used as a mid-level auxiliary representation to predict more regularized and simplified polygons through a Graph Convolutional Network (GCN) for iterative refinement.
  • 03:36:47Catherine De Wolf: Digital Transformation for Circular Construction
    • Discusses the application of digital technologies like 3D scanning, computer vision, and robotics to enable the reuse and recycling of building materials, promoting a circular economy in the built environment.
  • 03:50:22Ing. Sam De Geyter: BIM-Module for deep learning-based parametric IFC reconstruction
    • This presentation introduces a BIM-Module framework for creating scan-to-BIM models from scanning data, detailing steps from semantic segmentation to door reconstruction using deep learning and geometric methods.
  • 04:57:04Tsinghua-CBIMS: 2D Challenge Results
    • This segment presents the results of the 2D challenge, showing the performance metrics (IoU, F1, warping error, betti error, and overall score) for the Tsinghua-CBIMS team, and comparing them to previous years’ winners.
  • 05:14:04Longyong Wu, Ziqi Li, Meng Sun, Fan Xue: 2D Scan2BIM Challenge Submission: Floorplan Generation from Point Clouds
    • Presentation on the 2D Scan2BIM Challenge submission, detailing the team’s approach to generating floorplans from point clouds, including preprocessing, line prediction, planar detection, and SAM-based room segmentation.
  • 05:28:44Siyuan Meng, Sou-Han Chen, Jiajia Wang, Fan (Frank) Xue: Handing Occlusion in Scan-to-BIM automation: Space-voxel-guided Boundary Adaptation to Semantics Ensemble – Completion of Occluded/Opening Points (SBASE-CO)
    • This presentation introduces the SBASE-CO method for Scan-to-BIM automation, focusing on handling occlusions and completing occluded/opening points using a space-voxel-guided boundary adaptation to semantics ensemble.
  • 05:31:45Ing. Sam De Geyter, Dr. ing. Maarten Bassier, Dr. ing. Heinder De Winter, Prof. dr. ir. Maarten Vergauwen and Roberto Battisti, Oscar Roman: BIM-Module for deep learning-based parametric IFC reconstruction
    • This presentation details a BIM-Module for deep learning-based parametric IFC reconstruction, covering semantic segmentation, column detection using YOLOv8, filtering and clustering, level reconstruction, wall reconstruction, column reconstruction, and door detection/reconstruction.
  • 05:40:22Dr. Jason Rambach: Scan-to-BIM: Digital Twin Generation Pipeline
    • This presentation outlines a digital twin generation pipeline for Scan-to-BIM, including semantic segmentation, methods for closing doors and inverse reconstruction, and future work directions.
  • 05:51:43Caitlin Mueller: Designing with data for a sustainable built environment
    • Keynote speech exploring the use of data-driven design for a sustainable built environment, covering topics like high-performance design, design space exploration, generative AI, digital fabrication, robotic assembly, circular design with waste materials, and algorithmic circular design with reinforcement learning.
  • 06:23:57Multiple Panelists: Panel Discussion: Computer Vision in the Built Environment
    • This panel discussion explores the transformative potential of computer vision and 3D modeling within the built environment, addressing challenges in data quality, model representation, tech transfer, and industry adoption, while highlighting interdisciplinary collaboration and ethical considerations.

Key Takeaways

  • Integrating computer vision with AEC (Architecture, Engineering, Construction) is crucial for addressing real-world challenges in the built environment, especially with the growth of 3D point cloud data.
  • Rigorous quality assurance and control methods, including sensor modeling, system calibration, and network design, are essential for ensuring the reliability and accuracy of 3D reality capture data.
  • Object positional precision, derived from variance-covariance propagation, is a more relevant metric for evaluating 3D reality capture designs than simple radiated point precision.
  • Foundation models, particularly large visual language models (VLM), enable open-vocabulary 3D scene understanding, allowing detection and searching of arbitrary objects in 3D scenes using natural language, even for objects not explicitly trained on.
  • Foundation models can be effectively used for 3D scene understanding, including object detection and functional analysis, without extensive retraining.
  • Transitioning to a circular economy in the built environment requires digital transformation to address material waste, embodied energy, and ethical concerns in construction.
  • Advanced digital technologies like 3D scanning, computer vision, and robotics are crucial for inventorying, tracking, and processing reclaimed building materials for reuse.
  • Material passports and robust legal frameworks are essential to overcome challenges related to material fatigue, liability, and certification for reused building components.
  • Zero-shot object detection frameworks like ZSCODet can leverage knowledge graphs and multi-model graph fusion to identify previously unobserved objects in complex construction environments, improving safety and project quality.
  • Real-time ergonomic risk assessment using spatio-temporal graph convolutional networks (ST-GCN) offers a robust solution for monitoring worker posture and preventing musculoskeletal disorders in construction.
  • Semantic segmentation models, such as ResNet-50 and SegFormer, can accurately detect window-to-wall ratios and other building features, providing crucial data for energy modeling and architectural analysis.
  • Novel frameworks like SegMRT integrate visual SLAM and semantic segmentation with thermal cameras to create detailed spatial maps of Mean Radiant Temperature (MRT), enhancing thermal comfort assessment in buildings.
  • The presented BIM-Module utilizes a multi-stage deep learning pipeline for comprehensive parametric IFC reconstruction from point cloud data.
  • Specific challenges like column and door detection are addressed with specialized models (YOLOv8, Grounding Dino) and conditional filtering to improve accuracy.
  • The framework includes steps for semantic segmentation, level reconstruction, and geometric/topological reconstruction of walls, columns, and doors.
  • The Scan2BIM Challenge results highlight the performance metrics used to evaluate the accuracy of 2D reconstruction, including IoU, F1 score, warping error, and betti error.
  • The Scan2BIM Challenge highlights advancements in automating BIM model generation from point clouds using deep learning, with improvements in 3D reconstruction metrics and specialized techniques for handling occlusions and incomplete data.
  • Data-driven design, leveraging generative AI and computational tools, offers significant potential for creating high-performance, diverse, and sustainable architectural solutions by systematically exploring design spaces and integrating performance analysis.
  • Digital fabrication and robotic assembly are crucial for materializing geometrically complex, low-carbon structures and enabling circular design practices with irregular or upcycled waste materials, transforming construction economics.
  • Integrating performance analysis early in the design process, through real-time feedback and systematic design space exploration, is essential for addressing the climate crisis in the built environment and fostering human creativity in design.
  • Effective tech transfer from research to industry in the AEC sector requires addressing practical constraints like data quality, scalability, and user-friendliness, often necessitating collaboration and shared data resources.
  • The choice of 3D model representation is application-dependent, with a growing need for differentiable conversions between various representations to support diverse tasks like simulation, design, and rendering.
  • Robustness against data imperfections (blurriness, noise, incomplete views) is a critical challenge, with solutions potentially involving advanced deblurring, dynamic Gaussian splatting, and multi-modal data fusion.
  • Future trends include leveraging AI for design optimization, digital twins, automated construction, and personalized environments, while also addressing ethical implications, data privacy, and the need for explainable AI.

Methods / Models / Datasets Mentioned

  • ARCH2S
  • Apple's RoomPlan
  • AutoCAD
  • Autodesk Revit
  • BIM
  • BLK-360
  • Betti Number Error
  • BlenderBIM
  • CHOMP
  • CLIP
  • CT Scanning
  • DBSCAN
  • DINO
  • Faro Focus3D
  • GCN
  • GPR
  • Gaussian Splatting
  • Grounding DINO
  • Grounding Dino Object Detection
  • HiSup
  • Hungarian Algorithm
  • Hungarian matching
  • ICP
  • IFC
  • IoU
  • LCNN
  • LLMs
  • LiDAR
  • MFS Module
  • MLP
  • Mask R-CNN
  • Mask Transformer
  • Mask3D
  • MatterPort
  • MinkUNet
  • NFC
  • NeRF
  • OpenMask3D
  • PTV1
  • PTV2
  • PTV3
  • Photogrammetry
  • Point Prompt Training
  • Point Transformer v3
  • Pointnet
  • QR codes
  • RANSAC
  • RFID
  • RRT*
  • Reflectometry
  • ResNet-50
  • Revit
  • Robotic additive joining
  • Robotic plasma cutting
  • RoomFormer
  • RoomPlan
  • S3DIS
  • SAM
  • SBASE-CO
  • SDF
  • SLAM
  • SRBIM
  • ST-GCN
  • STOMP
  • ScanNet
  • SceneFun3D
  • SegFormer
  • SegMRT
  • SfM/MVS Photogrammetry
  • SigLIP
  • SpUNet
  • Structured3D
  • Thermal Imaging
  • USDC
  • VAE
  • VLM
  • X-ray
  • YOLOv8
  • YOLOv8 Object Detection
  • ZSCODet

Topics

3D Instance Segmentation · 3D Modeling · 3D Reality Capture · 3D Scene Understanding · AI in Design & Construction · BIM Modeling · BIM-Module · Building Material Reuse · Built Environment · CLIP · Circular Design · Circular Economy · Column Detection · Computational Design · Computer Vision · Computer Vision in AEC · Computer Vision in Built Environment · Construction Safety · Data Quality · Deep Learning · Deep Learning in Architecture · Digital Fabrication · Digital Transformation · Digital Twin Generation · Door Detection · Ergonomic Risk Assessment · Ethical AI · Floorplan Generation · Foundation Models · Generative AI · IFC Reconstruction · Industry Adoption · Large Visual Language Models (VLM) · LiDAR Scanning · Model Representation · Non-Destructive Testing · Object Positional Precision · Open-Vocabulary Learning · Point Clouds · Quality Assurance (QA) · Quality Control (QC) · Reinforcement Learning · Robotic Assembly · Robotics · Scan-to-BIM · Scan2BIM Challenge · Semantic Segmentation · Sustainable Construction · Tech Transfer · Terrestrial Laser Scanning (TLS) · Thermal Comfort · Viewpoint Planning · Wall Reconstruction · Zero-Shot Object Detection


Notes

Open for commentary — connections to other work, critiques, follow-up reading.