Hybrid dilated multilayer faster RCNN for object detection

Xin, Fangfang
DOI: https://doi.org/10.1007/s00371-023-02789-y
2023-03-08
Abstract:Faster region-based convolution neural network (Faster RCNN) architecture was proposed as an efficient object detection method, wherein a CNN is used to extract image features. However, CNNs require a large number of learning parameters, and an excessive amount of pooling layers lead to a loss of information on small objects, which may affect efficiency. In this study, we proposed a hybrid dilated multilayer Faster RCNN model to address this problem. The key contributions of this work are summarized as follows: (1) We substituted a hybrid dilated CNN (HDC) model for the VGG16 network used in the original Faster RCNN architecture to extract features and ensure portability. We also used a LeakyReLU activation function to improve the mapping ability of negative input information to detect objects rapidly and accurately. (2) We used a multilayer feature spatial pyramid to convert single-scale features into multi-scale features, and higher-resolution information was obtained through a deconvolutional network to achieve more accurate object detection. (3) We conducted experiments to verify the performance of the proposed HDMF-RCNN model using the Microsoft COCO data set. The results indicated that the accuracy of HDMF-RCNN was 8.12% greater than that of the traditional Faster RCNN model, and the training loss and training time were lower by 44.64% and 39.46% on average, respectively. Overall, the results verified that HDMF-RCNN can significantly improve on the efficiency of existing object detection methods. As an independent feature extraction network, HDC can be adapted to different network frameworks.
computer science, software engineering
What problem does this paper attempt to address?