LERI: Local Exploration for Rare-Category Identification

Hao Huang,Qian Yan,Wei Lu,Huaizhong Lin,Yunjun Gao,Lei Chen
DOI: https://doi.org/10.1109/TKDE.2019.2911941
2020-01-01
Abstract:AbstractTo identify the data examples of rare categories that form small compact clusters in large data sets, existing approaches mostly require enough labeled data examples as a training set to learn a classifier, assuming that the rare-category clusters are spherical or nearly spherical. Nonetheless, a large enough training set is usually difficult to obtain in practice, and rare categories in many real-world applications often form small compact clusters with arbitrary shapes. In this paper, we investigate how to identify all data examples of a rare category with an arbitrary shape based on only one seed (i.e., a labeled rare-category data example). Instead of finding a compact and spherical local region around the seed, we locally explore the data set from the seed by continuously searching and visiting the $k$k-nearest neighbors of each newly visited data example. The local exploration connects the data examples in the objective rare category by the relationship of $k$k-nearest neighbors, and meanwhile, suspected external data examples are filtered out if they are not close enough to any visited data example. Experimental results on both synthetic and real-world data sets are conducted, and the results verify the effectiveness and efficiency of our approach.
What problem does this paper attempt to address?