FUN-AD: Fully Unsupervised Learning for Anomaly Detection with Noisy Training Data

Jiin Im,Yongho Son,Je Hyeong Hong
2024-11-25
Abstract:While the mainstream research in anomaly detection has mainly followed the one-class classification, practical industrial environments often incur noisy training data due to annotation errors or lack of labels for new or refurbished products. To address these issues, we propose a novel learning-based approach for fully unsupervised anomaly detection with unlabeled and potentially contaminated training data. Our method is motivated by two observations, that i) the pairwise feature distances between the normal samples are on average likely to be smaller than those between the anomaly samples or heterogeneous samples and ii) pairs of features mutually closest to each other are likely to be homogeneous pairs, which hold if the normal data has smaller variance than the anomaly data. Building on the first observation that nearest-neighbor distances can distinguish between confident normal samples and anomalies, we propose a pseudo-labeling strategy using an iteratively reconstructed memory bank (IRMB). The second observation is utilized as a new loss function to promote class-homogeneity between mutually closest pairs thereby reducing the ill-posedness of the task. Experimental results on two public industrial anomaly benchmarks and semantic anomaly examples validate the effectiveness of FUN-AD across different scenarios and anomaly-to-normal ratios. Our code is available at <a class="link-external link-https" href="https://github.com/HY-Vision-Lab/FUNAD" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to conduct fully unsupervised anomaly detection in the presence of noisy training data. Specifically, anomaly detection tasks in industrial environments often face problems such as mislabeling or lack of labels for new products, resulting in noisy training data. Traditional one - class classification methods perform poorly in this case because they rely on clean and correctly labeled normal samples. Therefore, this paper proposes a new deep - learning - based method - FUN - AD (Fully Unsupervised Learning for Anomaly Detection with Noisy Training Data), aiming to solve the following problems: 1. **Data Imbalance and Diversity**: Anomaly samples are scarce and difficult to obtain, leading to a significant data imbalance between normal and anomaly samples. In addition, anomaly samples may come from a variety of different reasons and have diverse distributions. 2. **The Influence of Noisy Data**: In practical applications, training data may contain noise (for example, obsolete data due to manual labeling errors or product updates). This noise will seriously affect the performance of one - class classification methods because they cannot distinguish between anomaly and normal samples in the training set. 3. **The Need for Fully Unsupervised Learning**: In real - world scenarios, obtaining clean and well - labeled training data is costly and time - consuming. Therefore, a method that can accurately detect anomalies without labeled data is required. To solve these problems, FUN - AD proposes the following innovations: - **Pseudo - Label Generation Strategy**: Through the Iterative Reconstruction Memory Bank (IRMB), use the distance statistics information between feature pairs to generate pseudo - labels, thereby distinguishing normal samples from anomaly samples. - **Mutual Proximity - Smoothing Loss Function**: Based on the assumption of mutually closest pairs, a new loss function is designed to reduce the ill - posedness of the task and ensure that samples of the same category have similar anomaly scores. - **Fully Unsupervised Framework**: A simple but effective iterative learning framework is proposed, achieving state - of - the - art performance on multiple public industrial datasets, especially in the setting of noisy training data. In summary, the core objective of this paper is to develop a fully unsupervised anomaly detection method that can handle noisy training data, thereby improving the robustness and practicality of anomaly detection in industrial environments.