Factors of Performance for Application of AI Models in GPU Cloud

Vadim Tulchinsky,Serhii Lavreniuk,Viacheslav Roganov,Petro Tulchinsky,Valerii Khalimendik
DOI: https://doi.org/10.34229/2707-451x.20.1.8
2020-03-31
Cybernetics and Computer Technologies
Abstract:Introduction. In machine learning (ML) and artificial intelligence (AI) works, the emphasis is usually on the quality of classification or the accuracy of parameter estimation. If the focus is on performance, then it is also mainly about the performance of the model's training phase. However, with the proliferation of AI applications in real-world problems, the problem of ensuring high data processing performance with ready models becomes more important. By its nature, this problem is fundamentally different from the one of model training: the latter deals with intensive calculations and the former with simple calculations, but large flows of data (files) coming from the network or file system for processing. That is, the typical task of parallel processing with intensive input-output. Besides, in terms of application, the AI module that performs classification, evaluation, or other data processing is a "black box": the cost of developing and training the model, as well as the risks of failure, are too high to handle such tasks in a non-professional manner. Therefore, performance optimization primarily involves the selection and balancing of system parameters. Cloud systems with their flexibility, manageability and easy scaling are the ideal platforms for such tasks. Consider in more detail the task of investigating the factors which affect performance on a single, but notable, pattern recognition sample of a subset of ImageNet image collection [1] classified by the 50-layer deep learning neural network ResNet-50 [2]. The purpose of the paper is to experimentally investigate the factors that influence the performance of a ready-to-use neural network model application in GPU cloud systems of various architectures. Results. Overheads related to microservices and distributed architectures, memory, network, batch size, synchronous and asynchronous interactions are estimated. The complex nonlinear nature of the influence of the system parameters in various combinations is demonstrated.
What problem does this paper attempt to address?