Illicit object detection in X-ray images using Vision Transformers

Jorgen Cani,Ioannis Mademlis,Adamantia Anna Rebolledo Chrysochoou,Georgios Th. Papadopoulos

2024-04-29

Abstract:Illicit object detection is a critical task performed at various high-security locations, including airports, train stations, subways, and ports. The continuous and tedious work of examining thousands of X-ray images per hour can be mentally taxing. Thus, Deep Neural Networks (DNNs) can be used to automate the X-ray image analysis process, improve efficiency and alleviate the security officers' inspection burden. The neural architectures typically utilized in relevant literature are Convolutional Neural Networks (CNNs), with Vision Transformers (ViTs) rarely employed. In order to address this gap, this paper conducts a comprehensive evaluation of relevant ViT architectures on illicit item detection in X-ray images. This study utilizes both Transformer and hybrid backbones, such as SWIN and NextViT, and detectors, such as DINO and RT-DETR. The results demonstrate the remarkable accuracy of the DINO Transformer detector in the low-data regime, the impressive real-time performance of YOLOv8, and the effectiveness of the hybrid NextViT backbone.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problems that this paper attempts to solve are: in high - security places such as airports, railway stations, subway stations and ports, how to use Vision Transformers to automatically detect illegal items in X - ray images. Specifically, this research aims to: 1. **Reduce the burden of manual inspection**: By automating the X - ray image analysis process, reduce the heavy workload of security inspectors who need to manually check thousands of X - ray images per hour, thereby improving work efficiency and reducing wrong decisions caused by fatigue. 2. **Improve detection accuracy**: Evaluate the performance of Vision Transformers (ViTs) and their hybrid architectures (such as SWIN and NextViT) in detecting illegal items in X - ray images, especially in low - data environments. This helps to make up for the deficiency in the existing literature that mainly relies on Convolutional Neural Networks (CNNs) and less uses Transformer architectures. 3. **Achieve real - time performance**: Explore models that can achieve fast inference while ensuring detection accuracy, such as YOLOv8 and RT - DETR, to meet the requirements for real - time performance in practical applications. 4. **Address the challenges specific to X - ray images**: Solve the problems existing in X - ray images, such as the occlusion of stacked objects, complex backgrounds, the distinction of similar objects, and the influence of specific materials on the image appearance, which may be exploited by criminals to hide contraband. To achieve these goals, this research systematically evaluates the performance of different Vision Transformer architectures in the illegal item detection task and verifies their effectiveness and efficiency through experiments.

Illicit object detection in X-ray images using Vision Transformers

Visual inspection for illicit items in X-ray images using Deep Learning

Protego: Detecting Adversarial Examples for Vision Transformers Via Intrinsic Capabilities

Illicit item detection in X-ray images for security applications

ViDT: An Efficient and Effective Fully Transformer-based Object Detector

A Novel Visual Transformer for Long-Distance Pipeline Pattern Recognition in Complex Environment

An Extendable, Efficient and Effective Transformer-based Object Detector

Detection of threat objects in baggage inspection with X-ray images using deep learning

Video Vision Transformers for Violence Detection

Aerial Image Object Detection With Vision Transformer Detector (ViTDet)

Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios

Denoising Vision Transformers

The Good, the Bad and the Ugly: Evaluating Convolutional Neural Networks for Prohibited Item Detection Using Real and Synthetically Composited X-ray Imagery

Study of Vision Transformers for Covid-19 Detection from Chest X-rays

Efficient Decoder-Free Object Detection with Transformers

Network Intrusion Detection Based on Feature Image and Deformable Vision Transformer Classification

Suspicious activities detection using spatial–temporal features based on vision transformer and recurrent neural network

Evaluating and enhancing the robustness of vision transformers against adversarial attacks in medical imaging

Automated detection of smuggled high-risk security threats using Deep Learning

Incremental convolutional transformer for baggage threat detection