Difficult Novel Class Detection in Semisupervised Streaming Data
Peng Zhou,Ni Wang,Shu Zhao,Yanping Zhang,Xindong Wu
DOI: https://doi.org/10.1109/tnnls.2022.3213682
IF: 14.255
2022-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Streaming data mining can be applied in many practical applications, such as social media, market analysis, and sensor networks. Most previous efforts assume that all training instances except for the novel class have been completely labeled for novel class detection in streaming data. However, a more realistic situation is that only a few instances in the data stream are labeled. In addition, most existing algorithms are potentially dependent on the strong cohesion between known classes or the greater separation between novel class and known classes in the feature space. Unfortunately, this potential dependence is usually not an inherent characteristic of streaming data. Therefore, to classify data streams and detect novel classes, the proposed algorithm should satisfy: 1) it can handle any degree of separation between novel class and known classes (both easy and difficult novel class detection) and 2) it can use limited labeled instances to build algorithm models. In this article, we tackle these issues by a new framework called semisupervised streaming learning for difficult novel class detection (SSLDN), which consists of three major components: an effective novel class detector based on random trees, a classifier by using the information of nearest neighbors, and an efficient updating process. Empirical studies on several datasets validate that SSLDN can accurately handle different degrees of separation between the novel and known classes in semisupervised streaming data.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture