DeepAdaIn-Net: Deep Adaptive Device-Edge Collaborative Inference for Augmented Reality
Li Wang,Xin Wu,Yi Zhang,Xinyun Zhang,Lianming Xu,Zhihua Wu,Aiguo Fei
DOI: https://doi.org/10.1109/jstsp.2023.3312914
IF: 7.695
2023-01-01
IEEE Journal of Selected Topics in Signal Processing
Abstract:The object inference for augmented reality (AR) requires a precise object localization within user's physical environment and the adaptability to dynamic communication conditions. Deep learning (DL) is advantageous in capturing highly-nonlinear features of diverse data sources drawn from complex objects. However, the existing DL techniques may have disfluency or instability issues when deployed on resource-constrained devices with poor communication conditions, resulting in bad user experiences. This paper addresses these issues by proposing a deep adaptive inference network called DeepAdaIn-Net for the real-time device-edge collaborative object inference, aiming at reducing feature transmission volume while ensuring high feature-fitting accuracy during inference. Specifically, DeepAdaIn-Net encompasses a partition point selection (PPS) module, a high feature compression learning (HFCL) module, a bandwidth-aware feature configuration (BaFC) module, and a feature consistency compensation (FCC) module. The PPS module minimizes the total execution latency, including inference and transmission latency. The HFCL and BaFC modules can decouple the training and inference process by integrating a high-compression ratio feature encoder with the bandwidth-aware feature configuration, which ensures that the compressed data can adapt to the varying communication bandwidths. The FCC module fills the information gaps among the compressed features, guaranteeing high feature expression ability. We conduct extensive experiments to validate DeepAdaIn-Net using two object inference datasets: COCO2017 and emergency fire datasets, and the results demonstrate that our approach outperforms several conventional methods by deriving an optimal 123x feature compression for $640\times 640$ images, which results in a mere 63.3 ms total latency and an accuracy loss of less than 3% when operating at a bandwidth of 16 Mbps.
engineering, electrical & electronic