F BGD : Learning Embeddings from Positive Unlabeled Data with BGD.

Fajie Yuan,Xin,Xiangnan He,Guibing Guo,Weinan Zhang,Tat-Seng Chua,Joemon M. Jose
2018-01-01
Abstract:Learning sparse features from only positive and unlabeled (PU) data is a fundamental task for problems of several domains, such as natural language processing (NLP), computer vision (CV), information retrieval (IR). Considering the numerous amount of unlabeled data, most prevalent methods rely on negative sampling (NS) to increase computational efficiency. However, sampling a fraction of unlabeled data as negative for training may ignore other important examples, and thus lead to non-optimal prediction performance. To address this, we present a fast and generic batch gradient descent optimizer (f(BGD)) to learn from all training examples without sampling. By leveraging sparsity in PU data, we accelerate f(BGD) by several magnitudes, making its time complexity the same level as the NS-based stochastic gradient descent method. Meanwhile, we observe that the standard batch gradient method suffers from gradient instability issues due to the sparsity property. Driven by a theoretical analysis for this potential cause, an intuitive solution arises naturally. To verify its efficacy, we perform experiments on multiple tasks with PU data across domains, and show that f(BGD) consistently outperforms NS-based models on all tasks with comparable efficiency.
What problem does this paper attempt to address?