Abstract:Image classification models, including convolutional neural networks (CNNs), perform well on a variety of classification tasks but struggle under conditions of partial occlusion, i.e., conditions in which objects are partially covered from the view of a camera. Methods to improve performance under occlusion, including data augmentation, part-based clustering, and more inherently robust architectures, including Vision Transformer (ViT) models, have, to some extent, been evaluated on their ability to classify objects under partial occlusion. However, evaluations of these methods have largely relied on images containing artificial occlusion, which are typically computer-generated and therefore inexpensive to label. Additionally, methods are rarely compared against each other, and many methods are compared against early, now outdated, deep learning models. We contribute the Image Recognition Under Occlusion (IRUO) dataset, based on the recently developed Occluded Video Instance Segmentation (OVIS) dataset (<a class="link-https" data-arxiv-id="2102.01558" href="https://arxiv.org/abs/2102.01558">arXiv:2102.01558</a>). IRUO utilizes real-world and artificially occluded images to test and benchmark leading methods' robustness to partial occlusion in visual recognition tasks. In addition, we contribute the design and results of a human study using images from IRUO that evaluates human classification performance at multiple levels and types of occlusion. We find that modern CNN-based models show improved recognition accuracy on occluded images compared to earlier CNN-based models, and ViT-based models are more accurate than CNN-based models on occluded images, performing only modestly worse than human accuracy. We also find that certain types of occlusion, including diffuse occlusion, where relevant objects are seen through "holes" in occluders such as fences and leaves, can greatly reduce the accuracy of deep recognition models as compared to humans, especially those with CNN backbones.

TDAPNet: Prototype Network with Recurrent Top-Down Attention for Robust Object Classification under Partial Occlusion

TDMPNet: Prototype Network with Recurrent Top-Down Modulation for Robust Object Classification Under Partial Occlusion

Now You See Me: Robust approach to Partial Occlusions

Multiclass objects detection algorithm using DarkNet-53 and DenseNet for intelligent vehicles

Attention Disturbance and Dual-Path Constraint Network for Occluded Person Re-identification

Occluded Pedestrian Attention Network : an Occluded Pedestrian Detector

Combining Compositional Models and Deep Networks For Robust Object Classification under Occlusion

Integrated Single Shot Multi-Box Detector and Efficient Pre-Trained Deep Convolutional Neural Network for Partially Occluded Face Recognition System

Breaking Immutable: Information-Coupled Prototype Elaboration for Few-Shot Object Detection

DPNet: Dual-Path Network for Real-Time Object Detection With Lightweight Attention

Are Deep Learning Models Robust to Partial Object Occlusion in Visual Recognition Tasks?

Object Detection in Remote Sensing Imagery Based on Prototype Learning Network With Proposal Relation

Mask-Guided Attention Network for Occluded Pedestrian Detection

Occluded Scene Classification via Cascade Supervised Contrastive Learning

AGO-Net: Association-Guided 3D Point Cloud Object Detection Network

DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions

Few-Shot Object Detection Based on Adaptive Attention Mechanism and Large-Margin Softmax

The attentive reconstruction of objects facilitates robust object recognition

Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion

Incremental Generative Occlusion Adversarial Suppression Network for Person ReID

Attend and Guide (AG-Net): A Keypoints-driven Attention-based Deep Network for Image Recognition