Abstract:Purpose This paper aims to use fully convolutional network (FCN) to predict pixel-wise antipodal grasp affordances for unknown objects and improve the grasp detection performance through multi-scale feature fusion. Design/methodology/approach A modified FCN network is used as the backbone to extract pixel-wise features from the input image, which are further fused with multi-scale context information gathered by a three-level pyramid pooling module to make more robust predictions. Based on the proposed unify feature embedding framework, two head networks are designed to implement different grasp rotation prediction strategies (regression and classification), and their performances are evaluated and compared with a defined point metric. The regression network is further extended to predict the grasp rectangles for comparisons with previous methods and real-world robotic grasping of unknown objects. Findings The ablation study of the pyramid pooling module shows that the multi-scale information fusion significantly improves the model performance. The regression approach outperforms the classification approach based on same feature embedding framework on two data sets. The regression network achieves a state-of-the-art accuracy (up to 98.9%) and speed (4 ms per image) and high success rate (97% for household objects, 94.4% for adversarial objects and 95.3% for objects in clutter) in the unknown object grasping experiment. Originality/value A novel pixel-wise grasp affordance prediction network based on multi-scale feature fusion is proposed to improve the grasp detection performance. Two prediction approaches are formulated and compared based on the proposed framework. The proposed method achieves excellent performances on three benchmark data sets and real-world robotic grasping experiment.

Rotation adaptive grasping estimation network oriented to unknown objects based on novel RGB-D fusion strategy

Recurrent Volume-based 3D Feature Fusion for Real-time Multi-view Object Pose Estimation

Recurrent Volume-Based 3-D Feature Fusion for Real-Time Multiview Object Pose Estimation.

A New Robotic Grasp Detection Method Based on RGB-D Deep Fusion.

Grasp Detection Via Visual Rotation Object Detection and Point Cloud Spatial Feature Scoring

Cascaded Feature Fusion Grasping Network for Real-Time Robotic Systems

Lightweight Pixel-Wise Generative Robot Grasping Detection Based on RGB-D Dense Fusion

A robot grasping detection network based on flexible selection of multi-modal feature fusion structure

Bilateral Cross-Modal Fusion Network for Robot Grasp Detection

Efficient Fully Convolutional Network and Optimization Approach for Robotic Grasping Detection Based on RGB-D Images

Efficient Grasp Detection Network with Gaussian-Based Grasp Representation for Robotic Manipulation

Real-time Pixel-Wise Grasp Affordance Prediction Based on Multi-Scale Context Information Fusion

Residual Squeeze-and-Excitation Network with Multi-scale Spatial Pyramid Module for Fast Robotic Grasping Detection

GraspFusionNet: a Two-Stage Multi-Parameter Grasp Detection Network Based on RGB–XYZ Fusion in Dense Clutter

A RGB-D Based 6D Object Pose Estimation and Its Application in Robotic Grasping

6-DoF grasp estimation method that fuses RGB-D data based on external attention

Robot Unknown Objects Instance Segmentation Based on Collaborative Weight Assignment RGB–Depth Fusion Strategy

High-performance Pixel-level Grasp Detection Based on Adaptive Grasping and Grasp-aware Network

FFBGNet:Full-Flow Bidirectional Feature Fusion Grasp Detection Network Based on Hybrid Architecture

Estimating 6D Object Poses with Temporal Motion Reasoning for Robot Grasping in Cluttered Scenes

Instance-level 6D pose estimation based on multi-task parameter sharing for robotic grasping