Comparison of semi-supervised and supervised approaches for classification of e-nose datasets: Case studies of tomato juices

Xuezhen Hong,Jun Wang,Guande Qi
DOI: https://doi.org/10.1016/j.chemolab.2015.07.001
IF: 4.175
2015-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:Supervised classification, which is a fundamental classification approach for e-nose data, requires sufficient labeled data for training. However, sufficient labeled data requires extensive money, materials, energy and time. In this paper, a semi-supervised approach—Cluster-then-Label—that simultaneously uses labeled and unlabeled data to build a better classifier with fewer training data was introduced to deal with e-nose data for the first time. A novel clustering algorithm—spectral clustering—was also introduced to improve this semi-supervised approach. Three experiments—discriminating storage shelf life (SL), identifying pretreatments and authenticating juices, respectively—were conducted on cherry tomato juices using a PEN 2 e-nose, generating three datasets of different data structures. For each dataset, only 20% of data were selected for training. Classifications of the datasets by this semi-supervised approach and four supervised approaches (linear discriminant analysis (LDA), quadratic discriminant analysis, multi-class support vector machine and back propagation neural network) were compared. The results indicate that this spectral clustering based semi-supervised approach outperforms the supervised approaches in all cases. By using this semi-supervised approach, it is now possible to build reliable classifiers with only a few labeled data. It is also worth mentioning that this new approach takes no remarkable superiority over LDA. Thus, our next plan is to use more e-nose datasets for test.
What problem does this paper attempt to address?