AlphaFold predictions on whole genomes at a glance

Frederic Cazals,Edoardo Sarti
DOI: https://doi.org/10.1101/2024.11.16.623929
2024-11-27
Abstract:For model organisms, AlphaFold predictions show that 30% to 40% of amino acids have a (very) low pLDDT confidence score. This observation, combined with the method's high complexity, commands to investigate potential hallucinations and difficult cases. We do so via three contributions. First, we leverage the 3D atomic packing properties of predictions to represent a structure as a distribution. This distribution is then mapped into the so-called 2D arity map, which simultaneously performs dimensionality reduction and clustering, effectively summarizing all structural elements across all predictions. Second, using the database of domains ECOD, we study potential biases in AlphaFold predictions at the sequence and structural levels, identifying a specific region of the arity map populated with low quality 3D domains. Third, with a focus on proteins with intrinsically disordered regions (IDRs), we identify another specific region of the arity map enriched for false positives in IDRs. Summarizing, the arity map sheds light on the accuracy of AlphaFold predictions, both in terms of 3D domains and IDRs.
Bioinformatics
What problem does this paper attempt to address?