Large-Scale Evaluation of Open-Set Image Classification Techniques

Halil Bisgin,Andres Palechor,Mike Suter,Manuel Günther
2024-06-13
Abstract:The goal for classification is to correctly assign labels to unseen samples. However, most methods misclassify samples with unseen labels and assign them to one of the known classes. Open-Set Classification (OSC) algorithms aim to maximize both closed and open-set recognition capabilities. Recent studies showed the utility of such algorithms on small-scale data sets, but limited experimentation makes it difficult to assess their performances in real-world problems. Here, we provide a comprehensive comparison of various OSC algorithms, including training-based (SoftMax, Garbage, EOS) and post-processing methods (Maximum SoftMax Scores, Maximum Logit Scores, OpenMax, EVM, PROSER), the latter are applied on features from the former. We perform our evaluation on three large-scale protocols that mimic real-world challenges, where we train on known and negative open-set samples, and test on known and unknown instances. Our results show that EOS helps to improve performance of almost all post-processing algorithms. Particularly, OpenMax and PROSER are able to exploit better-trained networks, demonstrating the utility of hybrid models. However, while most algorithms work well on negative test samples -- samples of open-set classes seen during training -- they tend to perform poorly when tested on samples of previously unseen unknown classes, especially in challenging conditions.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper primarily aims to address key issues in Open-Set Image Classification (OSC), specifically including: 1. **Limitations of Existing Methods**: Most existing classification methods incorrectly classify unseen samples into known categories. Open-set classification algorithms aim to maximize the recognition capabilities of both closed sets (known categories) and open sets (unknown categories). 2. **Limitations of Small-Scale Dataset Evaluation**: Although some studies have demonstrated the practicality of open-set classification algorithms on small-scale datasets, such limited experimental setups make it difficult to assess the performance of these algorithms in real-world problems. 3. **Lack of Large-Scale Evaluation**: There is currently a lack of large-scale evaluations of various open-set classification algorithms, especially a comprehensive comparison on datasets that simulate real-world challenges. 4. **Issues with Evaluation Metrics**: Most of the evaluation metrics used in research are not suitable for or do not accurately reflect the effectiveness of open-set classification. To address these issues, the paper undertakes the following work: - **Comprehensive Comparison**: For the first time, a large-scale comparative evaluation of various training-based and post-processing methods is conducted, including training-based methods like SoftMax, Garbage Class, EOS, and post-processing methods like Maximum SoftMax Scores, Maximum Logit Scores, OpenMax, EVM, PROSER. - **Method Combination**: For the first time, an attempt is made to combine the above two complementary methods to further enhance performance. - **Evaluation Protocols and Datasets**: Utilizes three previously defined large-scale evaluation protocols (P1, P2, P3), which are based on the ImageNet dataset and designed with different levels of semantic similarity to simulate real-world challenges. - **Evaluation Metrics**: Adopts the Open-Set Classification Rate (OSCR) curve as an evaluation metric, which can separately handle known and unknown test samples, better aligning with the practical needs of open-set classification. Through this work, the paper aims to fill the current gaps in open-set classification research and promote further development in this field.