Towards robust machine olfaction: debiasing GC-MS data enhances prostate cancer diagnosis from urine volatiles

Adan Rotteveel,Wen-Yee Lee,Zoi Kountouri,Nikolas Stefanou,Howard Kivell,Clifford Gluck,Shuguang Zhang,Andreas Mershin
DOI: https://doi.org/10.1101/2024.11.19.624273
2024-11-21
Abstract:We present here a generalizable approach to de-biasing and de-trending before clustering complex datasets, using the specific example of prostate cancer diagnosis from gas chromatography-mass spectrometry ion chromatograms. We endeavor to mimic the olfactory cancer diagnostic prowess of trained dogs to establish, in perceptual space (as opposed to analytical chemistry space), a robustly recognizable signal that can be associated with the presence or absence of prostate cancer in the person who provided the urine sample. Upon performing principal component analysis, we found that using unsupervised clustering, the data had a strong tendency to cluster with respect to the urine sample source medical center, so a bias detector and baseline-drift remover had to be created as part of our pre-processing. Our machine olfaction approach marks a departure from conventional analytical chemistry sample analysis that leads to identification of the constituent compounds by name and concentration and moves us towards a generalizable, adaptable biomarker which in this case we call the emergent scent character. These scent characters are informed by measurements performed on volatile organic compounds (VOCs) but are not in themselves lists of VOCs or any of their physicochemical parameters added linearly. Ultimately, we use techniques prevalent in machine vision repurposed for machine olfaction by allowing for recognition and categorization of our scent character patterns as if they were images.
Biology
What problem does this paper attempt to address?