PyCoM: a python library for large-scale analysis of residue-residue coevolution data
Philipp Bibik,Sabriyeh Alibai,Alessandro Pandini,Sarath Chandra Dantu
DOI: https://doi.org/10.1093/bioinformatics/btae166
IF: 5.8
2024-03-26
Bioinformatics
Abstract:Abstract Motivation Computational methods to detect correlated amino acid positions in proteins have become a valuable tool to predict intra and inter-residue protein contacts, protein structures, and effects of mutation on protein stability and function. While there are many tools and webservers to compute coevolution scoring matrices, there is no central repository of alignments and coevolution matrices for large-scale studies and pattern detection leveraging on structural and biological annotation already available in UniProt. Results We present a Python library, PyCoM, which enables users to query and analyse coevolution matrices and sequence alignments of 457,622 proteins, selected from UniProtKB/Swiss-Prot database (length ≤ 500 residues), from a pre-compiled coevolution matrix database (PyCoMdb). PyCoM facilitates the development of statistical analyses of residue coevolution patterns using filters on structural and biological annotation from UniProtKB/Swiss-Prot, with simple access to PyCoMdb for both novice and advanced users, supporting Jupyter Notebooks, Python scripts, and a web API access. The resource is open source and will help in generating data-driven computational models and methods to study and understand protein structures, stability, function, and design. Availability and implementation PyCoM code is freely available from https://github.com/scdantu/pycom and PyCoMdb and the Jupyter Notebook tutorials are freely available from https://pycom.brunel.ac.uk. Supplementary information Supplementary data are available at Bioinformatics online.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology