A flexible framework for minimal biomarker signature discovery from clinical omics studies without library size normalisation

Daniel P Rawlinson,Chenxi P Zhou,Kim-Anh Le Cao,Lachlan J M Coin
DOI: https://doi.org/10.1101/2024.07.03.601811
2024-07-03
Abstract:Application of transcriptomics, proteomics and metabolomics technologies to clinical cohorts has uncovered a variety of signatures for predicting disease. Many of these signatures require the full omics data for evaluation on unseen samples, either explicitly or implicitly through library size normalisation. Translation to low-cost point-of-care tests requires development of signatures which measure as few analytes as possible without relying on direct measurement of library size. To achieve this, we have developed a feature selection method (Forward Selection-Partial Least Squares) which generates minimal disease signatures from high-dimensional omics datasets with applicability to continuous, binary or multi-class outcomes. Through extensive benchmarking, we show that FS-PLS has comparable performance to commonly used signature discovery methods while delivering signatures which are an order of magnitude smaller. We show that FS-PLS can be used to select features predictive of library size, and that these features can be used to normalize unseen samples, meaning that the features in the complete model can be measured in isolation for making new predictions. By enabling discovery of small, high-performance signatures, FS-PLS addresses an important impediment for the further development of precision medical care.
Bioinformatics
What problem does this paper attempt to address?