Procedural Modeling and Physically Based Rendering for Synthetic Data Generation in Automotive Applications

Apostolia Tsirikoglou,Joel Kronander,Magnus Wrenninge,Jonas Unger
DOI: https://doi.org/10.48550/arXiv.1710.06270
2017-10-18
Abstract:We present an overview and evaluation of a new, systematic approach for generation of highly realistic, annotated synthetic data for training of deep neural networks in computer vision tasks. The main contribution is a procedural world modeling approach enabling high variability coupled with physically accurate image synthesis, and is a departure from the hand-modeled virtual worlds and approximate image synthesis methods used in real-time applications. The benefits of our approach include flexible, physically accurate and scalable image synthesis, implicit wide coverage of classes and features, and complete data introspection for annotations, which all contribute to quality and cost efficiency. To evaluate our approach and the efficacy of the resulting data, we use semantic segmentation for autonomous vehicles and robotic navigation as the main application, and we train multiple deep learning architectures using synthetic data with and without fine tuning on organic (i.e. real-world) data. The evaluation shows that our approach improves the neural network's performance and that even modest implementation efforts produce state-of-the-art results.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to generate high - quality synthetic data for training deep neural networks in semantic segmentation tasks in autonomous vehicles and robot navigation. Specifically, the paper proposes a new systematic method for generating highly realistic, annotated synthetic data. The main contribution of this method lies in the procedural world - modeling method, which can achieve high variability and physically accurate image synthesis, which is different from the manual - modeling virtual worlds and approximate image - synthesis methods used in real - time applications. The paper points out that existing synthetic datasets are usually generated based on game engines, and these engines use a large number of approximation processes when generating images, resulting in limitations in the realism and diversity of the generated images. In addition, although the manually - annotated real - world datasets are of high quality, the annotation process is time - consuming and costly, and it is difficult to generate on a large scale. Therefore, this research aims to generate more diverse and realistic synthetic data by combining procedural modeling and physically - based rendering techniques, thereby improving the performance of deep - learning models in computer - vision tasks. To evaluate the effectiveness of the proposed method and the data it generates, the authors selected semantic segmentation as the main application scenario and conducted experiments using multiple deep - learning architectures. The experimental results show that even with a moderate implementation effort, the proposed method can produce results that reach or exceed the state - of - the - art level, significantly improving the performance of neural networks.