Distillation-boosted heterogeneous architecture search for aphid counting

Shengqin Jiang,Qian Jie,Fengna Cheng,Yu Liu,Kelu Yao,Chao Li
DOI: https://doi.org/10.1016/j.eswa.2024.125936
IF: 8.5
2024-12-07
Expert Systems with Applications
Abstract:Aphids are widely recognized as one of the most detrimental agricultural pests, inflicting harm upon a diverse range of crops. Accurate aphid counting is thus crucial for devising effective and scientific control strategies. Recent endeavors have shifted towards computer vision solutions which obtain promising performance. Nevertheless, these methods often entail high computational complexity, thereby impeding real-time inference when deployed on edge devices. To address this challenge, we study a distillation-boosted heterogeneous architecture search for aphid counting, which leverages the gradient descent method to explore optimal neural network architectures. Within this framework, we establish a heterogeneous channel-level search space comprising two types of cells: lightweight fiber cells and multi-scale fiber cells. The former treats low-complexity convolution operations as basic operators within the search space. The latter predominantly captures scale variations through convolution kernels of varying sizes. These two search spaces are explored simultaneously, and an efficient cell is chosen based on learned architecture parameters as the final network for training. Since the number of samples is relatively limited, the network is prone to overfitting, thereby limiting its ultimate performance. To overcome this issue, we further use a teacher network with prior knowledge to explore the impact of knowledge distillation on architecture search and training. We empirically find that implementing a distillation strategy after architecture search yields greater benefits for knowledge transfer. Finally, experimental results demonstrate the effectiveness of our network, achieving a mean absolute error of 25.43 and a root mean squared error of 43.99 within a parameter size of 570.38K, surpassing other state-of-the-art methods in terms of both model size and performance. Notably, the network achieves real-time inference speeds on Jetson Nano TX2 without any accelerated inference optimization.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?