Abstract:To determine whether a pedestrian of interest has been captured by another distinct camera across a network of non-overlapping cameras, or by the same camera at a distinct time, is known as the problem of person re-identification and is considered one of the most fascinating challenges in computer vision. When query image of person of interest gets concealed, blocked, obscured or obstructed, the issue becomes considerably more challenging. Termed as occluded person re-identification, this covers scenarios that are closer to the real world crowded scenarios such as market place, airports, commercial malls, university campuses etc. Using a combination of global pedestrian level information along with part-level local feature information has increasingly has been shown to be a successful strategy for dealing with occluded person re-identification as it captures fine grained information from the non-occluded visible part. This paper proposes a swin transformer with part-level tokenization (SwinPLT) model that uses a swin transformer-based backbone enhanced with singular value decomposition (SVD). The proposed model leverages the hierarchical representation learning capabilities of swin transformer, combined with SVD to extract uncorrelated local tokens. Our approach aims to enhance the model's discriminative ability by effectively handling occlusions in person images. Employing a combination of hard triplet loss and cross-entropy loss, the proposed SwinPLT surpasses the state-of-the-art results by at least 18.14% Rank1-accuracy and 17.28% mAP on the occluded DukeMTMC-reID dataset. On the Occluded-ReID dataset, the proposed SwinPLT model outperforms the other alternative approaches by 9.06% Rank1-accuracy and 7.71% mAP. On P-DukeMTMC-reID dataset, our model shows an improvement of 1.7% Rank1-accuracy and 2.4% mAP, whereas on Partial-iLIDS, it shows an improvement of 11.8% Rank1-accuracy and 4.26% mAP. We will be making the code and the model publically available at https://github.com/Ranjitkm2007/SwinPLT.

SPT-Swin: A Shifted Patch Tokenization Swin Transformer for Image Classification

Shifted Windows Transformers for Medical Image Quality Assessment

SWIN transformer based contrastive self-supervised learning for animal detection and classification

SwinSight: a hierarchical vision transformer using shifted windows to leverage aerial image classification

MSTRIQ: No Reference Image Quality Assessment Based on Swin Transformer with Multi-Stage Fusion

PicT: A Slim Weakly Supervised Vision Transformer for Pavement Distress Classification

SparseSwin: Swin Transformer with Sparse Transformer Block

Spectral Swin Transformer Network for Hyperspectral Image Classification

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Swin Transformer for Robust Differentiation of Real and Synthetic Images: Intra- and Inter-Dataset Analysis

SFRSwin: A Shallow Significant Feature Retention Swin Transformer for Fine-Grained Image Classification of Wildlife Species.

Swin transformer with part-level tokenization for occluded person re-identification

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

SwinVid: Enhancing Video Object Detection Using Swin Transformer

Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation

SwinFG: A fine-grained recognition scheme based on swin transformer

SwinIR: Image Restoration Using Swin Transformer

Spectral-Swin Transformer with Spatial Feature Extraction Enhancement for Hyperspectral Image Classification

SwinVI:3D Swin Transformer Model with U-net for Video Inpainting.