Abstract:Image classification models, including convolutional neural networks (CNNs), perform well on a variety of classification tasks but struggle under conditions of partial occlusion, i.e., conditions in which objects are partially covered from the view of a camera. Methods to improve performance under occlusion, including data augmentation, part-based clustering, and more inherently robust architectures, including Vision Transformer (ViT) models, have, to some extent, been evaluated on their ability to classify objects under partial occlusion. However, evaluations of these methods have largely relied on images containing artificial occlusion, which are typically computer-generated and therefore inexpensive to label. Additionally, methods are rarely compared against each other, and many methods are compared against early, now outdated, deep learning models. We contribute the Image Recognition Under Occlusion (IRUO) dataset, based on the recently developed Occluded Video Instance Segmentation (OVIS) dataset (<a class="link-https" data-arxiv-id="2102.01558" href="https://arxiv.org/abs/2102.01558">arXiv:2102.01558</a>). IRUO utilizes real-world and artificially occluded images to test and benchmark leading methods' robustness to partial occlusion in visual recognition tasks. In addition, we contribute the design and results of a human study using images from IRUO that evaluates human classification performance at multiple levels and types of occlusion. We find that modern CNN-based models show improved recognition accuracy on occluded images compared to earlier CNN-based models, and ViT-based models are more accurate than CNN-based models on occluded images, performing only modestly worse than human accuracy. We also find that certain types of occlusion, including diffuse occlusion, where relevant objects are seen through "holes" in occluders such as fences and leaves, can greatly reduce the accuracy of deep recognition models as compared to humans, especially those with CNN backbones.

Are Deep Learning Models Robust to Partial Object Occlusion in Visual Recognition Tasks?

Now You See Me: Robust approach to Partial Occlusions

TDMPNet: Prototype Network with Recurrent Top-Down Modulation for Robust Object Classification Under Partial Occlusion

Combining Compositional Models and Deep Networks For Robust Object Classification under Occlusion

TDAPNet: Prototype Network with Recurrent Top-Down Attention for Robust Object Classification under Partial Occlusion

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge

Object Occlusion of Adding New Categories in Objection Detection

OVPT: Optimal Viewset Pooling Transformer for 3D Object Recognition.

Hard Occlusions in Visual Object Tracking

On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes

Occlusion Are Underrated: an Occlusion-Attention Strategy Assembled in 3D Object Detectors

Progress and limitations of deep networks to recognize objects in unusual poses

Partial success in closing the gap between human and machine vision

Recognizing multi-view objects with occlusions using a deep architecture

Occluded Video Instance Segmentation: A Benchmark

Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion

In-and-Out: a data augmentation technique for computer vision tasks

Robust face recognition and impostors detection with partial occlusion and small number of training samples

Occluded Scene Classification via Cascade Supervised Contrastive Learning

A survey of face recognition techniques under occlusion

OccRob: Efficient SMT-Based Occlusion Robustness Verification of Deep Neural Networks