Web and Big Data - 4th International Joint Conference, APWeb-WAIM 2020, Tianjin, China, September 18-20, 2020, Proceedings, Part II.

Xin Wang,Rui Zhang,Young-Koo Lee,Le Sun,Yang-Sae Moon
DOI: https://doi.org/10.1007/978-3-030-60290-1
2020-01-01
Abstract:Many applications need to perform classification on large sparse datasets. Classifying the cold-start users who have very few feedbacks is still a challenging task. Previous work has applied active learning to classification with partially observed data. However, for large and sparse data, the number of feedbacks to be queried is huge and many of them are invalid. In this paper, we develop an active classification framework that can address these challenges by leveraging online Matrix Factorization models. We first identify a step-wise data acquisition heuristic which is useful for active classification. We then use the estimations of online Probabilistic Matrix Factorization to compute this heuristic function. In order to reduce the number of invalid queries, we further estimate the probability that a query can be answered by the cold-start user with online Poisson Factorization. During active learning, a query is selected based on the current knowledge learned in these two online factorization models. We demonstrate with real-world movie rating datasets that our framework is highly effective. It not only gains better improvement in classification, but also reduces the number of invalid queries.
What problem does this paper attempt to address?