Classification of Volatile Organic Compounds by Differential Mobility Spectrometry Based on Continuity of Alpha Curves

Anton Rauhameri,Angelo Robiños,Osmo Anttalainen,Timo Salpavaara,Jussi Rantala,Veikko Surakka,Pasi Kallio,Antti Vehkaoja,Philipp Müller
2024-03-14
Abstract:Background: Classification of volatile organic compounds (VOCs) is of interest in many fields. Examples include but are not limited to medicine, detection of explosives, and food quality control. Measurements collected with electronic noses can be used for classification and analysis of VOCs. One type of electronic noses that has seen considerable development in recent years is Differential Mobility Spectrometry (DMS). DMS yields measurements that are visualized as dispersion plots that contain traces, also known as alpha curves. Current methods used for analyzing DMS dispersion plots do not usually utilize the information stored in the continuity of these traces, which suggests that alternative approaches should be investigated.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the classification of volatile organic compounds (VOCs), specifically, classifying VOCs by differential mobility spectrometry (DMS) based on continuous α - curves. Current methods usually do not utilize the continuity information of these traces (i.e., α - curves) when analyzing DMS dispersion diagrams, which may lead to poor classification results. Therefore, this paper proposes a new method, which interprets the dispersion diagrams as a series of measurement values evolving over time, and assumes that time - series classification algorithms can be effectively used for the classification and analysis of dispersion diagrams. ### Specific Problem Description 1. **Limitations of Existing Methods**: - Current methods for analyzing DMS dispersion diagrams usually do not utilize the continuity information of α - curves. - Such methods may overlook potentially important features, resulting in low classification accuracy. 2. **Research Objectives**: - Propose a new classification method that interprets dispersion diagrams as time - series data. - Verify whether time - series classification algorithms (such as long - short - term memory network, LSTM) can improve classification accuracy. - Create a public data set for other researchers to compare and improve classification algorithms. ### Method Overview - **Data Collection**: Collected 900 dispersion diagrams, involving measurement results of 5 chemical substances at 5 different flow rates. - **Pre - processing**: Cropped, normalized, and performed principal component analysis (PCA) on the dispersion diagrams to reduce redundant information and improve classification performance. - **Classification Algorithms**: Used six different classification algorithms, including LSTM neural network, K - nearest neighbor (KNN), linear discriminant analysis (LDA), extra - trees classifier, etc. - **Validation and Evaluation**: Evaluated the classification performance through repeated stratified K - fold cross - validation (RSCV), and the highest classification accuracy rate reached 88%. ### Main Contributions 1. **High - Precision Classification**: Achieved a classification accuracy rate as high as 89%, which is the first report in multi - label machine - learning problems using DMS measurement data. 2. **Time - Series Deep - Learning Application**: Demonstrated how to apply LSTM to DMS data and proved that it is superior to other common classifiers. 3. **Public Data Set**: Shared a dispersion - diagram data set containing multiple chemical substances and flow - rate conditions for other researchers to compare and improve. Through these works, the author hopes to provide a new perspective for the classification of DMS dispersion diagrams and promote the further development of related fields.