Characterising Protein Search Drift using exhaustive protein search and Alphafold2

Daniel WA Buchan
DOI: https://doi.org/10.1101/2024.11.14.623594
2024-11-15
Abstract:In this paper we present the first exhaustive analysis of iterative protein search drift and show how such results may impact downstream modelling. Assembling and extracting evolutionary information from families of related proteins is a core challenge in the studey of molecular evolution. For instance, iterative protein search is a common first step in a wide variety of bioinformatics tools and pipelines. And the output of such searches often form the inputs for modelling tools such as Alphafold2. Here we characterise profile drift; the tendency for some searches to become contaminated with sequences outside of the intended evolutionary family. We observe that drift occurs in nearly 15% of searches and can be observed to have measurable impacts on downstream predictive tasks such as structure prediction.
Bioinformatics
What problem does this paper attempt to address?