31.08.2022

Storing BIM data for Machinelearning

In collaboration with UCN (University College of Northern Denmark) we at Link IO have started an initiative to map out how to store our projects to start building upon a database of all our projects to later integrate them into Machine Learning methods. In a paper which we will be presenting at ECPPM 2022 in Trondheim we looked at 4 formats and how well they integrate into Machine learning purposes in architectural practice.

As new machine learning powered tools start to emerge, more relevant than ever with Midjournet, Dall:e and stable diffusion becoming increasingly more popular. We at Link view it as the outmost importance to not only integrate these tools into our existing processes but to be able to adapt them to our needs. As data is at the core of all Machine learning our first step is to create the infrastructure internally to build up a large source of data that is localized to follow the local regulation and according to Link's standards.

The AEC industry sits on an enormous data bank of buildings, both from the early sketch with relationships of volumes and the impact they have on the built environment surrounding these. To the later stages with detailed drawings from a floorplan to the way each brick is laid in a facade. This data is currently not being seen as the massive asset that it is, if structured properly and archived in a way so that the projects can easily be broken down into sub-parts that can be used it can become an asset for AEC firms.

Four methods for exporting and importing highly detailed models drawn in both Revit and ArchiCAD were developed, and then loaded into a neural network designed to calculate CO² emissions on wall elements by reading thickness, naming conventions, types, and volume for each element in the project. The actual training of the network to provide valuable results were not important but served as a dummy machine learning model to load the actual data into.

Our results showed current standards of storage (IFC) works well for breaking down individual projects, but methods used more conventionally in data science such as Json and Petastorm works better when aggregating a larger number of projects. We also looked at Speckle as a cloud-based storage solution where interoperability and the opportunity to build machine learning methods that connect directly towards the web API can be used in smaller scale projects.

We see the effort to provide a large nordic data base for buildings should be something that all AEC companies in the nordic should work towards. Starting the discussion with how we can share and distribute the information in a clear and safe way not giving away company secrets. Federated learning practices used commonly in medical science can be used to anonymize data while still building upon a larger dataset for training neural networks.