Efficient Motif Discovery in Protein Sequences Using a Branch and Bound Algorithm
Rahele Mohammadi,Peyman Neamatollahi,Morteza Moradi,Mahmoud Naghibzadeh,Abdorreza Savadi
DOI: https://doi.org/10.1109/jbhi.2024.3355964
IF: 7.7
2024-01-01
IEEE Journal of Biomedical and Health Informatics
Abstract:Identifying motifs within sets of protein sequences constitutes a pivotal challenge in proteomics, imparting insights into protein evolution, function prediction, and structural attributes. Motifs hold the potential to unveil crucial protein aspects like transcription factor binding sites and protein-protein interaction regions. However, prevailing techniques for identifying motif sequences in extensive protein collections often entail significant time investments. Furthermore, ensuring the accuracy of obtained results remains a persistent motif discovery challenge. This paper introduces an innovative approach-a branch and bound algorithm-for exact motif identification across diverse lengths. This algorithm exhibits superior performance in terms of reduced runtime and enhanced result accuracy, as compared to existing methods. To achieve this objective, the study constructs a comprehensive tree structure encompassing potential motif evolution pathways. Subsequently, the tree is pruned based on motif length and targeted similarity thresholds. The proposed algorithm efficiently identifies all potential motif subsequences, characterized by maximal similarity, within expansive protein sequence datasets. Experimental findings affirm the algorithm's efficacy, highlighting its superior performance in terms of runtime, motif count, and accuracy, in comparison to prevalent practical techniques.
computer science, interdisciplinary applications,mathematical & computational biology,medical informatics, information systems