Pairtools: From sequencing data to chromosome contacts
Open2C,Nezar Abdennur,Geoffrey Fudenberg,Ilya M. Flyamer,Aleksandra A. Galitsyna,Anton Goloborodko,Maxim Imakaev,Sergey V. Venev
DOI: https://doi.org/10.1371/journal.pcbi.1012164
2024-05-30
PLoS Computational Biology
Abstract:The field of 3D genome organization produces large amounts of sequencing data from Hi-C and a rapidly-expanding set of other chromosome conformation protocols (3C+). Massive and heterogeneous 3C+ data require high-performance and flexible processing of sequenced reads into contact pairs. To meet these challenges, we present pairtools –a flexible suite of tools for contact extraction from sequencing data. Pairtools provides modular command-line interface (CLI) tools that can be flexibly chained into data processing pipelines. The core operations provided by pairtools are parsing of.sam alignments into Hi-C pairs, sorting and removal of PCR duplicates. In addition, pairtools provides auxiliary tools for building feature-rich 3C+ pipelines, including contact pair manipulation, filtration, and quality control. Benchmarking pairtools against popular 3C+ data pipelines shows advantages of pairtools for high-performance and flexible 3C+ analysis. Finally, pairtools provides protocol-specific tools for restriction-based protocols, haplotype-resolved contacts, and single-cell Hi-C. The combination of CLI tools and tight integration with Python data analysis libraries makes pairtools a versatile foundation for a broad range of 3C+ pipelines. Our study introduces pairtools , a computational suite for extracting pairwise contacts from Hi-C and the rapidly-expanding constellation of chromosome conformation protocols (3C+). These experiments use DNA sequencing to measure the 3D structure of chromosomes inside cells. However, specialized software is needed to extract chromosome contacts from the raw sequencing data. Pairtools provides fast, flexible, and modular command-line tools and a Python framework to bridge this gap. We show pairtools can process data from many Hi-C protocol variants beyond standard Hi-C and is easily integrated into pipelines for high-throughput 3D genome data processing. By converting sequence data into tables of chromosome contacts, pairtools facilitates statistical analysis and visualization. Pairtools represents a versatile new foundation for studying principles of 3D genome organization and their impacts on gene regulation and cellular phenotypes.
biochemical research methods,mathematical & computational biology