SmartLite: A DBMS-Based Serving System for DNN Inference in Resource-Constrained Environments

Qiuru Lin,Sai Wu,Junbo Zhao,Jian Dai,Meng Shi,Gang Chen,Feifei Li
DOI: https://doi.org/10.14778/3632093.3632095
2024-01-01
Abstract:Many IoT applications require the use of multiple deep neural networks (DNNs) to perform various tasks on low-cost edge devices with limited computation resources. However, existing DNN model serving platforms, such as TensorFlow Serving and TorchServe, are resource-intensive and require high-performance GPUs that are often not available on low-cost edge devices. In this paper, we propose SmartLite, a lightweight DBMS that addresses these challenges by storing the parameters and structural information of neural networks as database tables and implementing neural network operators inside the DBMS engine. SmartLite quantizes model parameters as binarized values, applies neural pruning techniques to compress the models, and transforms tensor manipulations into value lookup operations of the DBMS to reduce computation overhead. Experimental results show that SmartLite requires 98% less memory while achieving about a 134% performance speedup compared to Torch-Serve. Our proposed solution addresses the challenges of running multiple DNN models on low-cost edge devices and provides a significant contribution to the field of IoT applications.
What problem does this paper attempt to address?