Incremental Feature Spaces Learning with Label Scarcity

Shilin Gu,Yuhua Qian,Chenping Hou
DOI: https://doi.org/10.1145/3516368
IF: 4.157
2022-01-01
ACM Transactions on Knowledge Discovery from Data
Abstract:Recently, learning and mining from data streams with incremental feature spaces have attracted extensive attention, where data may dynamically expand over time in both volume and feature dimensions. Existing approaches usually assume that the incoming instances can always receive true labels. However, in many real-world applications, e.g., environment monitoring, acquiring the true labels is costly due to the need of human effort in annotating the data. To tackle this problem, we propose a novel incremental Feature spaces Learning with Label Scarcity (FLLS) algorithm, together with its two variants. When data streams arrive with augmented features, we first leverage the margin-based online active learning to select valuable instances to be labeled and thus build superior predictive models with minimal supervision. After receiving the labels, we combine the online passive-aggressive update rule and margin-maximum principle to jointly update the dynamic classifier in the shared and augmented feature space. Finally, we use the projected truncation technique to build a sparse but efficient model. We theoretically analyze the error bounds of FLLS and its two variants. Also, we conduct experiments on synthetic data and real-world applications to further validate the effectiveness of our proposed algorithms.
What problem does this paper attempt to address?