Lightweight robotic grasping detection network based on dual attention and inverted residual
Yuequan Yang,Wei Li,Zhiqiang Cao,Jiatong Bao,Fudong Li
DOI: https://doi.org/10.1177/01423312241247346
IF: 2.146
2024-04-24
Transactions of the Institute of Measurement and Control
Abstract:Transactions of the Institute of Measurement and Control, Ahead of Print. Grasping detection is one of the crucial capabilities for robot systems. Deep learning has achieved remarkable outcomes in robot grasping tasks; however, many deep neural networks were at the expense of high computation cost with memory requirements, which hindered their deployment on computing-constrained devices. To solve this problem, this paper proposes an end-to-end lightweight network with dual attention and inverted residual strategies (LiDAIR), which adopts a generative pixel-level prediction to achieve grasp detection. The LiDAIR is composed of the convolution modules (Conv), the inverted residual convolution module (IRCM), the convolutional block attention connection module (CBACM), and the transposed convolution modules (TConv). The Convs are utilized in downsampling processes to extract the input image features. Then, the IRCM is proposed as a bridge between the downsampling and upsampling phases. In the upsampling phase, the CBACM is designed to focus on the valuable regions from spatial and channel dimensions, where the skip connection is employed to attain multi-level feature fusion. Afterwards, the TConvs are used to restore image resolution. The LiDAIR is lightweight with 704K parameters and enjoys a good tradeoff among lightweight structure, accuracy, and speed. It was evaluated on both the Cornell data set and the Jacquard data set within 10 ms inference time, and the detection accuracy on both the data sets were 97.7% and 92.7%, respectively.
automation & control systems,instruments & instrumentation