A Domain Decomposition-Based CNN-DNN Architecture for Model Parallel Training Applied to Image Recognition Problems

Axel Klawonn,Martin Lanser,Janine Weber
2024-07-02
Abstract:Deep neural networks (DNNs) and, in particular, convolutional neural networks (CNNs) have brought significant advances in a wide range of modern computer application problems. However, the increasing availability of large amounts of datasets as well as the increasing available computational power of modern computers lead to a steady growth in the complexity and size of DNN and CNN models, respectively, and thus, to longer training times. Hence, various methods and attempts have been developed to accelerate and parallelize the training of complex network architectures. In this work, a novel CNN-DNN architecture is proposed that naturally supports a model parallel training strategy and that is loosely inspired by two-level domain decomposition methods (DDM). First, local CNN models, that is, subnetworks, are defined that operate on overlapping or nonoverlapping parts of the input data, for example, sub-images. The subnetworks can be trained completely in parallel and independently of each other. Each subnetwork then outputs a local decision for the given machine learning problem which is exclusively based on the respective local input data. Subsequently, in a second step, an additional DNN model is trained which evaluates the local decisions of the local subnetworks and generates a final, global decision. In this paper, we apply the proposed architecture to image classification problems using CNNs. Experimental results for different 2D image classification problems are provided as well as a face recognition problem, and a classification problem for 3D computer tomography (CT) scans. Therefore, classical ResNet and VGG architectures are considered. The results show that the proposed approach can significantly accelerate the required training time compared to the global model and, additionally, can also help to improve the accuracy of the underlying classification problem.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper proposes a solution to the problem of long training time and high complexity of deep neural networks (DNN) and convolutional neural networks (CNN). With the increase of large datasets and computing power, network models are becoming larger, resulting in longer training time. In order to solve this problem, the paper introduces a CNN-DNN architecture based on domain decomposition for model parallel training. This architecture first decomposes the global CNN into multiple local CNN subnetworks, which can train independently and in parallel on overlapping or non-overlapping parts of the input data. Each subnetwork produces a local decision based on its local input data. Then, by training an additional DNN model, this model evaluates the decisions of the local subnetworks and generates the final global decision. This DNN can be seen as a coarse problem, and the whole method is similar to a two-layer domain decomposition approach. Experiments show that this method can significantly accelerate training time and may improve the accuracy of classification problems. The application range includes 2D image classification, face recognition, and 3D computer tomography (CT) image classification, etc. Although the paper mainly focuses on the classical ResNet and VGG architectures, it suggests considering more modern network structures such as MobileNet2 in the future. In summary, the goal of the paper is to propose a model parallel training strategy to speed up CNN training on GPU clusters while maintaining or improving classification accuracy.