Towards Efficient Cross-Modal Anomaly Detection Using Triple-Adaptive Network and Bi-Quintuple Contrastive Learning
Shu-Juan Peng,Ye Fan,Yiu-ming Cheung,Xin Liu,Zhen Cui,Taihao Li
DOI: https://doi.org/10.1109/tetci.2023.3256466
2024-01-01
IEEE Transactions on Emerging Topics in Computational Intelligence
Abstract:Cross-modal anomaly detection is a relatively new and challenging research topic in machine learning field, which aims at identifying the anomalies whose patterns are disparate across different modalities. As far as we know, this topic has yet to be well studied, and existing works often suffer from the incomplete anomalous data detection and low data utilization problems. To alleviate these limitations, this paper proposes an efficient deep cross-modal anomaly detection approach via Triple-adaptive Network and Bi-quintuple Contrastive Learning (TN-BCL), which lies among the earliest attempt to detect various cross-modal anomalies within the heterogeneous multi-modal data. To be specific, a triple-adaptive network is explicitly designed to identify various anomalies, whose patterns are disparate in both single-modal scenario and cross-modal scenario. On the one hand, the top branch network is utilized to adaptively detect the attribute anomalies and part of mixed anomalies in multi-modal data samples. On the other hand, the bottom two-branch network, with shared residual blocks, is leveraged to learn the discriminative cross-modal embeddings. At the same time, an efficient bi-quintuple contrastive learning method is designed to enhance the feature correlation between the same attribute data, while maximally enlarging the feature difference between different attribute data. Besides that, the bidirectional learning scheme is employed to significantly improve the data utilization. Through the joint exploitation of the above, different kinds of anomalous samples can be well detected across different modalities. Extensive experiments show that the proposed framework outperforms the state-of-the-art competing methods, with a large improvement margin.