Partitioning Convolutional Neural Networks for Inference on Constrained Internet-of-Things Devices

Fabiola Martins Campos de Oliveira,Edson Borin
DOI: https://doi.org/10.1109/cahpc.2018.8645927
2018-09-01
Abstract:With the prospects of a world in which the IoT will be pervasive in a near future, the great amount of data produced by its devices will have to be processed and interpreted in an efficient and intelligent way. One approach to do that is the use of fog computing, in which the network infrastructure and the devices themselves can process data. Deep learning techniques have been successfully applied to the interpretation of the kind of data generated by the IoT, however, even the inference execution of convolutional neural networks may be computationally costly when resource-limited devices are considered. In order to enable the execution of neural network models on resource-constrained IoT systems, the code may be partitioned and distributed among multiple devices. Different partitioning approaches are possible, nonetheless, some of them increase the amount of communication that needs to be performed between the IoT devices. In this work, we propose KLP, a Kernighan-and-Lin-based partitioning algorithm that partitions neural network models for efficient distributed execution on multiple IoT devices. Our results show that KLP is capable of producing partitions that require up to 4.5 times less communication than partitioning approaches used by TensorFlow and other frameworks.
What problem does this paper attempt to address?