Grasp Stability Assessment Through Spatio-Temporal Attention Mechanism and Adaptive Gate Fusion

Song Li,Wei Sun,Qiaokang Liang,Jian Sun,Chongpei Liu
DOI: https://doi.org/10.1109/jsen.2024.3493117
IF: 4.3
2024-01-01
IEEE Sensors Journal
Abstract:In the field of robotic grasping and manipulation, accurately assessing the stability of handheld objects plays a critical role in achieving proficient manipulation. Methods relying solely on visual or tactile information for slip detection often have limited applicability across different scenarios. Relevant studies have shown that combining visual and tactile sensing can significantly improve grasping performance. This study proposes a novel deep neural network architecture, specifically adopting a spatiotemporal attention mechanism to fuse multi-level spatiotemporal features, effectively integrating deep high-level features with shallow low-level features. It extracts important slip features across temporal and spatial dimensions from both visual RGB image sequences and tactile image sequences, thereby facilitating stability prediction. The gating mechanism builds a resilient network architecture that adaptively fuses features with appropriate weights, maintaining highly accurate and robust predictive performance even when sensor signal quality degrades. Validation results from both public and custom datasets demonstrate that the proposed model is highly effective in accurately predicting grasp stability, even in the presence of missing, occluded, noisy, or corrupted visual or tactile sensor signals. The practicality of this approach extends to various downstream applications in robotics, including grasp force control, generation of grasping strategies, and proficient manipulation in challenging scenarios.
What problem does this paper attempt to address?