Boost AI Power: Data Augmentation Strategies with Unlabeled Data and Conformal Prediction, a Case in Alternative Herbal Medicine Discrimination with Electronic Nose
Li Liu,Xianghao Zhan,Rumeng Wu,Xiaoqing Guan,Zhan Wang,Wei Zhang,Mert Pilanci,You Wang,Zhiyuan Luo,Guang Li
DOI: https://doi.org/10.1109/jsen.2021.3102488
IF: 4.3
2021-01-01
IEEE Sensors Journal
Abstract:Electronic nose has been proven effective in alternative herbal medicine classification, but due to the nature of supervised learning, previous research heavily relies on the labelled training data, which are time-costly and labor-intensive to collect. To alleviate the critical dependency on the training data in real-world applications, this study aims to improve classification accuracy via data augmentation strategies. The effectiveness of five data augmentation strategies under different training data inadequacy are investigated in two scenarios: the noise-free scenario where different availabilities of unlabelled data were considered, and the noisy scenario where different levels of Gaussian noises and translational shift were added to represent sensor drift. The five augmentation strategies, namely noise-adding data augmentation, semi-supervised learning, classifier-based online learning, Inductive Conformal Prediction (ICP) online learning and our novel ensemble ICP online learning (EICP) proposed in this study, are compared against supervised learning baselines, with Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) as the classifiers. This study provides a systematic analysis of different augmentation strategies. Our novel strategy, EICP, outperforms the others by showing non-decreasing classification accuracy on all tasks and a significant improvement on most simulated tasks (25 out of 36 tasks, ${p} \leq 0.05$ ), which demonstrated both effectiveness and robustness in boosting the classification model generalizability. This strategy can be employed in other machine learning applications.