C2I-CAT: Class-to-Image Cross Attention Transformer for Out-of-Distribution Detection

Jaeho Chung,Seokho Cho,Hyunjun Choi,Daeung Jo,Yoonho Jung,Jin Young Choi
DOI: https://doi.org/10.1109/access.2024.3391808
IF: 3.9
2024-05-10
IEEE Access
Abstract:In our work, we have empirically found that Vision Transformer (ViT) could not extract object-centric features when applied to out-of-distribution (OOD) detection. To make object-centric attention, we design an additional module that employs a cross-attention between class-wise token proxy and feature token sequence of an input image. For inference suitable to our cross-attention structure with multiple class-wise token proxies, we propose a score ensemble that can be applied to any scoring function. Compared to ViT, the proposed inference scheme achieves outperforming performance by synergizing with our cross-attention structure. Through experiments, we demonstrate that the proposed cross-attention structure with score ensemble inference improves largely near OOD detection performance, where FPR95 improvement in near OOD detection compared to the state-of-the-art method becomes 2.55% for CIFAR-10 and 2.67% for CIFAR-100, keeping competitive classification accuracy.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?