Perspectives on Codebook: sequence specificity of uncharacterized human transcription factors
Arttu Jolma,Kaitlin U Laverty,Ali Fathi,Ally WH Yang,Isaac Yellan,Ilya E Vorontsov,Sachi Inukai,Judith Franziska Kribelbauer,Antoni Jakub Gralak,Rozita Razavi,Mihai Albu,Alexander Brechalov,Zain M Patel,Vladimir Nozdrin,Georgy Meshcheryakov,Ivan Kozin,Sergey Abramov,Alexandr Boytsov,The Codebook Consortium,Oriol Fornes,Vsevolod J Makeev,Jan Grau,Ivo Grosse,Philipp Bucher,Bart Deplancke,Ivan V Kulakovskiy,Timothy R Hughes
DOI: https://doi.org/10.1101/2024.11.11.622097
2024-11-12
Abstract:We describe an effort ("Codebook") to determine the sequence specificity of 332 putative and largely uncharacterized human transcription factors (TFs), as well as 61 control TFs. Nearly 5,000 independent experiments across multiple and assays produced motifs for just over half of the putative TFs analyzed (177, or 53%), of which most are unique to a single TF. The data highlight the extensive contribution of transposable elements to TF evolution, both in cis and trans, and identify tens of thousands of conserved, base-level binding sites in the human genome. The use of multiple assays provides an unprecedented opportunity to benchmark and analyze TF sequence specificity, function, and evolution, as further explored in accompanying manuscripts. 1,421 human TFs are now associated with a DNA binding motif. Extrapolation from the Codebook benchmarking, however, suggests that many of the currently known binding motifs for well-studied TFs may inaccurately describe the TF's true sequence preferences.
Genomics
What problem does this paper attempt to address?