Scalable Cloud-Native Pipeline for Efficient 3D Model Reconstruction from Monocular Smartphone Images

Potito Aghilar,Vito Walter Anelli,Michelantonio Trizio,Tommaso Di Noia
2024-09-28
Abstract:In recent years, 3D models have gained popularity in various fields, including entertainment, manufacturing, and simulation. However, manually creating these models can be a time-consuming and resource-intensive process, making it impractical for large-scale industrial applications. To address this issue, researchers are exploiting Artificial Intelligence and Machine Learning algorithms to automatically generate 3D models effortlessly. In this paper, we present a novel cloud-native pipeline that can automatically reconstruct 3D models from monocular 2D images captured using a smartphone camera. Our goal is to provide an efficient and easily-adoptable solution that meets the Industry 4.0 standards for creating a Digital Twin model, which could enhance personnel expertise through accelerated training. We leverage machine learning models developed by NVIDIA Research Labs alongside a custom-designed pose recorder with a unique pose compensation component based on the ARCore framework by Google. Our solution produces a reusable 3D model, with embedded materials and textures, exportable and customizable in any external 3D modelling software or 3D engine. Furthermore, the whole workflow is implemented by adopting the microservices architecture standard, enabling each component of the pipeline to operate as a standalone replaceable module.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to efficiently and automatically reconstruct 3D models from monocular 2D images taken by smartphones to meet the requirements of large - scale applications in Industry 4.0 standards**. Specifically, the author proposes an extensible cloud - native pipeline, aiming to reduce the time and resources required for 3D model reconstruction, provide a cost - effective solution, and improve the data acquisition process through augmented reality (AR) technology. ### Problem Background Traditionally, the creation of 3D models is a time - consuming and resource - intensive process, especially when large - scale industrial applications are required. Manual modeling, although effective, requires a large amount of time and human resources, so it is not suitable for large - scale applications. In addition, existing hardware - based technologies such as Light Detection and Ranging (LIDAR) can generate high - quality 3D models, but the devices are expensive and the operation is complex. To overcome these challenges, researchers have begun to use artificial intelligence (AI) and machine learning (ML) algorithms to automate the generation process of 3D models. ### Main Contributions of the Paper 1. **Defined an extensible cloud - native pipeline**: This pipeline can automatically generate 3D models from monocular 2D images and follow the microservice architecture standards. 2. **Designed and implemented a custom pose recorder component based on ARCore**: It is used to obtain images of objects and the pose of their cameras. ### Key Technologies of the Solution - **Instant NeRF**: Use neural networks and multi - resolution hash - encoding grids to reconstruct 3D models from 2D images. - **nvdiffrec**: Reconstruct 3D model surfaces with textures and materials from 2D images through differential rendering and the depth - marching tetrahedron (DMTet) technique. ### Process Overview 1. **Dataset Generation Stage**: Obtain images and camera poses through the ARCore framework. 2. **Data Pre - processing Stage**: Pre - process the images and poses to generate corresponding alpha masks. 3. **Reconstruction Stage**: Use the nvdiffrec tool to generate 3D models and provide feedback on the reconstruction progress. 4. **Architecture Design**: Adopt the microservice architecture standards, deploy in a Kubernetes cluster, and support the efficient execution of resource - intensive tasks. Through these methods, this paper provides an efficient and automated 3D model reconstruction solution suitable for large - scale industrial applications, while improving the flexibility and extensibility of data acquisition and processing.