Prediction and functional interpretation of inter-chromosomal genome architecture from DNA sequence with TwinC

Anupama Jha,Borislav Hristov,Xiao Wang,Sheng Wang,William Greenleaf,Anshul Kundaje,Erez Lieberman Aiden,Alessandro Bertero,William Stafford Noble
DOI: https://doi.org/10.1101/2024.09.16.613355
2024-09-23
Abstract:Three-dimensional nuclear DNA architecture comprises well-studied intra-chromosomal (cis) folding and less characterized inter-chromosomal (trans) interfaces. Current predictive models of 3D genome folding can effectively infer pairwise cis-chromatin interactions from the primary DNA sequence but generally ignore trans contacts. There is an unmet need for robust models of trans-genome organization that provide insights into their underlying principles and functional relevance. We present TwinC, an interpretable convolutional neural network model that reliably predicts trans contacts measurable through genome-wide chromatin conformation capture (Hi-C). TwinC uses a paired sequence design from replicate Hi-C experiments to learn single base pair relevance in trans interactions across two stretches of DNA. The method achieves high predictive accuracy (AUROC=0.80) on a cross-chromosomal test set from Hi-C experiments in heart tissue. Mechanistically, the neural network learns the importance of compartments, chromatin accessibility, clustered transcription factor binding and G-quadruplexes in forming trans contacts. In summary, TwinC models and interprets trans genome architecture, shedding light on this poorly understood aspect of gene regulation.
Bioinformatics
What problem does this paper attempt to address?