Representation Learning for Imbalanced Cross-Domain Classification

Lu Cheng,Ruocheng Guo,K. Selcuk Candan,Huan Liu
DOI: https://doi.org/10.1137/1.9781611976236.54
2020-01-01
Abstract:Deep architectures are trained on massive amounts of labeled data to guarantee the performance of classification. In the absence of labeled data, domain adaptation often provides an attractive option given that labeled data of a similar nature but from a different domain is available. Previous work has chiefly focused on learning domain invariant representations but overlooked the issues of label imbalance in a single domain or across domains, which are common in many machine learning applications such as fake news detection. In this paper, we study a new cross-domain classification problem where data in each domain can be imbalanced (data imbalance), i.e., the classes are not evenly distributed, and the ratio of the number of positive over negative samples varies across domains (domain imbalance). This cross-domain problem is challenging as it entails covariate bias in the input feature space and representation bias in the latent space where domain invariant representations are learned. To address the challenge, in this paper, we propose an effective approach that leverages a doubly balancing strategy to simultaneously control these two types of bias and learn domain invariant representations. To this end, the proposed method aims to learn representations that are (i) robust to data and domain imbalance, (ii) discriminative between classes, and (iii) invariant across domains. Extensive evaluations of two important real-world applications corroborate the effectiveness of the proposed framework.
What problem does this paper attempt to address?