Revisiting Large-Kernel CNN Design Via Structural Re-Parameterization for Sensor-Based Human Activity Recognition

Minghui Yao,Lei Zhang,Dongzhou Cheng,Lutong Qin,Xin Liu,Zenan Fu,Hao Wu,Aiguo Song
DOI: https://doi.org/10.1109/jsen.2024.3371462
IF: 4.3
2024-01-01
IEEE Sensors Journal
Abstract:During recent years, human activity recognition (HAR) using smart wearable sensors has become a main research focus in ubiquitous computing scenario. Deep convolutional neural networks (CNNs) have achieved significant success in HAR due to their automatic feature extracting ability in capturing local activity details. Due to superior performance, previous most works always prefer to apply small kernels instead of large kernels to handle time series sensor data for activity recognition. However, they do not intend to answer the key questions: why do large kernels underperform small kernels? How to close the performance gap? Intuitively, benefiting from larger receptive field (RF), larger kernels should have a great potential to model long-range dependencies in time series sensor data. So far, there has been little effort devoted to the larger-kernel design. In this article, we revisit the design of larger-kernel convolutions, which long have been neglected in the context of HAR. We find that both identity shortcut and structural re-parameterization can fully unleash the potential of larger-kernel convolutions. Extensive experiments and ablation studies on four mainstream benchmark datasets including PAMAP2, USC-HAD, UniMiB-SHAR, and OPPORTUNITY, show that our larger-kernel convolutions can further push the limit of small-kernel CNN performances under similar inference time, which can be used a drop-in replacement for small-kernel conv layers. For example, compared to the small-kernel baselines, our proposed approach can consistently boost recognition accuracy by 0.55%, 1.00%, 3.94%, and 1.64% on PAMAP2, USC-HAD, UniMiB-SHAR, and OPPORTUNITY, respectively, which is very competitive among the state-of-the-arts (SOTA). We believe that the incurred high performance is mainly due to larger effective RFs built via large kernels. The practical inference time is evaluated on a real hardware device. Our code can be available at: https://github.com/MinghuiYao/ELK-HAR/ .
What problem does this paper attempt to address?