Predicting the DNA binding specificity of mutated transcription factors using family-level biophysically interpretable machine learning

Shaoxun Liu,Pilar Gomez-Alcala,Christ Leemans,William J. Glassford,Richard S. Mann,Harmen J. Bussemaker
DOI: https://doi.org/10.1101/2024.01.24.577115
2024-01-29
Abstract:Sequence-specific interactions of transcription factors (TFs) with genomic DNA underlie many cellular processes. High-throughput binding assays coupled with computational analysis have made it possible to accurately define such sequence recognition in a biophysically interpretable yet mechanism-agonistic way for individual TFs. The fact that such sequence-to-affinity models are now available for hundreds of TFs provides new avenues for predicting how the DNA binding specificity of a TF changes when its protein sequence is mutated. To this end, we developed an analytical framework based on a tetrahedron embedding that can be applied at the level of a given structural TF family. Using bHLH as a test case, we demonstrate that we can systematically map dependencies between the protein sequence of a TF and base preference within the DNA binding site. We also develop a regression approach to predict the quantitative energetic impact of mutations in the DNA binding domain of a TF on its DNA binding specificity, and perform SELEX-seq assays on mutated TFs to experimentally validate our results. Our results point to the feasibility of predicting the functional impact of disease mutations and allelic variation in the cell-wide TF repertoire by leveraging high-quality functional information across sets of homologous wild-type proteins.
Genomics
What problem does this paper attempt to address?