Protein function prediction through multi-view multi-label latent tensor reconstruction

Robert Ebo Armah-Sekum,Sandor Szedmak,Juho Rousu
DOI: https://doi.org/10.1186/s12859-024-05789-4
IF: 3.307
2024-05-05
BMC Bioinformatics
Abstract:In last two decades, the use of high-throughput sequencing technologies has accelerated the pace of discovery of proteins. However, due to the time and resource limitations of rigorous experimental functional characterization, the functions of a vast majority of them remain unknown. As a result, computational methods offering accurate, fast and large-scale assignment of functions to new and previously unannotated proteins are sought after. Leveraging the underlying associations between the multiplicity of features that describe proteins could reveal functional insights into the diverse roles of proteins and improve performance on the automatic function prediction task.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?
The paper is primarily dedicated to addressing the problem of protein function prediction. Specifically, the research targets the following key challenges: 1. **High-throughput sequencing technology accelerates the speed of protein discovery**: This has led to the discovery and sequencing of a large number of proteins, but the functions of most of these proteins have not yet been experimentally confirmed. 2. **Time and resource constraints of experimental functional identification**: Due to the time-consuming and costly nature of experimental identification methods, only a very small proportion (<1%) of discovered proteins have been functionally characterized. 3. **Need for efficient and accurate large-scale computational methods**: To compensate for the limitations of experimental identification, there is an urgent need to develop computational methods that can quickly, massively, and accurately annotate the functions of newly discovered or unannotated proteins. To address the above issues, the authors propose a method called GO-LTR (Gene Ontology - Latent Tensor Reconstruction). This is a multi-view multi-label prediction model that uses high-order tensor approximation to model the high-order relationships between protein features and can predict the functional categories of proteins. The main contributions of GO-LTR include: - Proposing an automatic function annotation method based on latent tensor reconstruction; - Demonstrating the performance improvement of GO-LTR in function prediction tasks; - Validating that integrating multiple protein modality information (such as sequence embeddings, interaction fingerprints, etc.) can further improve prediction performance; - Conducting detailed performance analysis, including studies on sequence similarity in the training set, the depth and frequency of Gene Ontology categories, and prediction thresholds. Through the above methods, this research aims to advance the field of protein function annotation, especially in predicting protein functions in cases of low sequence similarity, rare, and highly specific Gene Ontology terms.