Generalizable Feature Learning in the Presence of Data Bias and Domain Class Imbalance with Application to Skin Lesion Classification

Chris Yoon,Ghassan Hamarneh,Rafeef Garbi
DOI: https://doi.org/10.1007/978-3-030-32251-9_40
2019-01-01
Abstract:AbstractTraining generalizable data-driven models for medical imaging applications is especially challenging as acquiring and accessing sufficiently large medical datasets is often unfeasible. When trained on limited datasets, a high capacity model, as most leading neural network architectures are, is likely to overfit and thus generalize poorly to unseen data. Further aggravating the problem, data used to train models in medicine are typically collected in silos and from narrow data distributions that are determined by specific acquisition hardware, imaging protocols, and patient demographics. In addition, class imbalance within and across datasets is a common complication as disease conditions or sub-types have varying degrees of prevalence. In this paper, we motivate the need for generalizable training in the context of skin lesion classification by evaluating the performance of ResNet across 7 public datasets with dataset bias and class imbalance. To mitigate dataset bias, we extend the classification and contrastive semantic alignment (CCSA) loss that aims to learn domain-invariant features. As the CCSA loss requires labelled data from two domains, we propose a strategy to dynamically sample paired data in a setting where the set of available classes varies across domains. To encourage learning from underrepresented classes, the sampled class probabilities are used to weight the classification and alignment losses. Experimental results demonstrate improved generalizability as measured by the mean macro-average recall across the 7 datasets when training using the weighted CCSA loss and dynamic sampler.
What problem does this paper attempt to address?