CODE +: Fast and Accurate Inference for Compact Distributed IoT Data Collection
Huali Lu,Feng Lyu,Ju Ren,Huaqing Wu,Conghao Zhou,Zhongyuan Liu,Yaoxue Zhang,Xuemin Shen
DOI: https://doi.org/10.1109/tpds.2024.3453607
IF: 5.3
2024-09-20
IEEE Transactions on Parallel and Distributed Systems
Abstract:In distributed IoT data systems, full-size data collection is impractical due to the energy constraints and large system scales. Our previous work has investigated the advantages of integrating matrix sampling and inference for compact distributed IoT data collection, to minimize the data collection cost while guaranteeing the data benefits. This paper further advances the technology by boosting fast and accurate inference for those distributed IoT data systems that are sensitive to computation time, training stability, and inference accuracy. Particularly, we propose CODE +, i.e., Compact Distributed IOT Data CollEction Plus, which features a cluster-based sampling module and a Convolutional Neural Network (CNN)-Transformer Autoencoders-based inference module, to reduce cost and guarantee the data benefits. The sampling component employs a cluster-based matrix sampling approach, in which data clustering is first conducted and then a two-step sampling is performed in accordance with the number of clusters and clustering errors. The inference component integrates a CNN-Transformer Autoencoders-based matrix inference model to estimate the full-size spatio-temporal data matrix, which consists of a CNN-Transformer encoder that extracts the underlying features from the sampled data matrix and a lightweight decoder that maps the learned latent features back to the original full-size data matrix. We implement CODE + under three operational large-scale IoT systems and one synthetic Gaussian distribution dataset, and extensive experiments are provided to demonstrate its efficiency and robustness. With a 20% sampling ratio, CODE + achieves an average data reconstruction accuracy of 94% across four datasets, outperforming our previous version of 87% and state-of-the-art baseline of 71%.
computer science, theory & methods,engineering, electrical & electronic