Open world long-tailed data classification through active distribution optimization

Min Wang,Lei Zhou,Qian Li,An-an Zhang
DOI: https://doi.org/10.1016/j.eswa.2022.119054
IF: 8.5
2023-01-01
Expert Systems with Applications
Abstract:Real-world data exhibits a long-tailed label distribution, which leads to classification bias. Popular re-sampling or re-weighting methods usually require known category information. However, learning from long-tailed data with open categories is a challenging issue. In this paper, we propose an active distribution optimization algorithm (DALC) to handle the interesting issue. Through clustering, querying and classification iterations, we explore new categories and balance label distribution. For clustering, we present an exploration technique that adaptively obtains optimal data distribution with minimal total distance/cost. For each query, we design a critical instance selection strategy with the cluster information. For classification, we establish an ensemble model to continuously balance the label distribution. We conducted experiments on synthetic, benchmark and domain datasets. The results of the significance test verified the effectiveness of DALC and its superiority over state-of-the-art long-tailed data classification and open set classification algorithms.
What problem does this paper attempt to address?