GFPrint™: A MACHINE LEARNING TOOL FOR TRANSFORMING GENETIC DATA INTO CLINICAL INSIGHTS

Guillermo Sanz-Martín,Daniela Paula Migliore,Pablo Gómez del Campo,José del Castillo-Izquierdo,Juan Manuel Domínguez
DOI: https://doi.org/10.1101/2024.03.08.584090
2024-04-30
Abstract:The increasing availability of massive genetic sequencing data in the clinical setting has triggered the need for appropriate tools to help fully exploit the wealth of information these data possess. GFPrint is a proprietary streaming algorithm designed to meet that need. By extracting the most relevant functional features, GFPrint transforms high-dimensional, noisy genetic sequencing data into an embedded representation, allowing unsupervised models to create data clusters that can be re-mapped to the original clinical information. Ultimately, this allows the identification of genes and pathways relevant to disease onset and progression. GFPrint has been tested and validated using two cancer genomic datasets publicly available. Analysis of the TCGA dataset has identified panels of genes whose mutations appear to negatively influence survival in non-metastatic colorectal cancer (15 genes), epidermoid non-small cell lung cancer (167 genes) and pheochromocytoma (313 genes) patients. Likewise, analysis of the Broad Institute dataset has identified 75 genes involved in pathways related to extracellular matrix reorganization whose mutations appear to dictate a worse prognosis for breast cancer patients. GFPrint is accessible through a secure web portal and can be used in any therapeutic area where the genetic profile of patients influences disease evolution.
Bioinformatics
What problem does this paper attempt to address?