: Unraveling Networks of Gene Co-occurrence and Avoidance in Bacterial Pangenomes

Athina Gavriilidou,Emilian Paulitz,Christian Resl,Nadine Ziemert,Anne Kupczok,Franz Baumdicker
DOI: https://doi.org/10.1101/2024.04.29.591652
2024-05-02
Abstract:The pangenome is the set of all genes present in a prokaryotic species. Most pangenomes contain many accessory genes that are present in only some of the species members. Genes need to function together, and it has been suggested that selection for certain gene combinations affects the structure of prokaryotic pangenomes. Nevertheless, genes might also co-occur simply due to being linked on the genome, and efficient tools are needed to distinguish linkage from co-selection. Here we present Goldfinder, an approach to infer co-occurrence and co-avoidance between gene pairs by taking the phylogenetic relationships of the species into account. The approach is implemented in an efficient Python script available at . We also provide scripts for clustering co-occurring genes and for visualizing the resulting co-occurrence and co-avoidance networks in Cytoscape. In comparison to the co-occurrence inference tool Coinfinder, Goldfinder finds fewer co-occurring pairs in a real species pangenome, suggesting that fewer spurious associations due to phylogenetic dependencies are detected. We conclude that Goldfinder is a fast and accurate tool to infer gene co-occurrence and co-avoidance, which will enable large-scale analyses to infer co-selected genes across bacterial pangenomes.
Bioinformatics
What problem does this paper attempt to address?