Online Semi-Supervised Classification on Multilabel Evolving High-Dimensional Text Streams

Jay Kumar,Junming Shao,Rajesh Kumar,Salah Ud Din,Cobbinah B. Mawuli,Qinli Yang
DOI: https://doi.org/10.1109/tsmc.2023.3275298
2023-01-01
Abstract:The multilabel learning task aims to predict the associated multiple classes of a given example simultaneously. Such task becomes more challenging when data arrives in stream since it requires concept drift adaptative, robust, and fast algorithm. In this article, we present an online semi-supervised classification algorithm (OSMTS) for multilabel text streams. By leveraging a few labeled instances, OSMTS dynamically maintains the subspace of terms for each label with a set of evolving micro-clusters. For multilabel classification, k nearest micro-clusters are employed for prediction by using a non-parametric Dirichlet model. To handle the gradual concept drift in term space, the triangular time function is adopted to calculate the difference between term arriving time and cluster life span. Whereas, abrupt concept drift is dealt by considering two procedures: 1) deleting outdated micro-cluster by exploiting the exponential decay function and 2) creating new micro-clusters by adopting the Chinese restaurant process based on the Dirichlet process. The conducted experimental study provides a comparison with 12 state-of-the-art algorithms on nine datasets in terms of classification performance, runtime, and memory consumption.
What problem does this paper attempt to address?