Abstract:The study of protein interaction is still promising for a better understanding of diseases. At first glance, the term interaction among proteins could allude to, for instance, the interaction between the protein of a host and a pathogen. Such an interaction is being pursued mainly via machine learning algorithms since it is difficult to discriminate direct rules for it. However, the interaction among proteins on the same genome is also critical, for instance, to understand how a pathogen survives, starts, or maintains an infection. We can analyze interactions within a genome deterministically at the price of significant hardware employment. Our software GenPPI, in its first edition, allows us to explore interaction networks in genomes using mainly the known rules for neighborhood and phylogenetic profiles conserved. However, despite the speed, it suffered from underrepresentation from the core pangenome due to a simplistic algorithm to raise that, losing a pair of proteins possessing less than 90% of amino acid identity. The present work describes the new GenPPI software enhancements on determining homology between protein pairs, which is one of the principal bottlenecks in creating ab initio interaction networks from genomes and the primary step inferring neighborhood and phylogenetic profiles conserved for all genomes under analysis. This improvement was achieved using the Random Forest algorithm, working on biophysical features derived from ten amino acid propensity indexes used to calculate sixty features for each genome's proteins. We crafted a training data set of homolog and non-homolog proteins using nine full proteomes from critical bacteria. A significant number of expressive genomes as the training dataset allowed us to classify similar proteins with more than 65% amino acid identity via a machine learning test, an average result obtained from dozens of validations. Such a strategy resulted in more comprehensive and accurate protein interaction networks capable of analyzing genomes of different organisms. Our testing of the new GenPPI improvement using the bacterium Buchenera aphidicola yielded impressive results. We achieved an overlap of 62% with the interactions documented in the STRING, surpassing the previous GenPPI version, which was limited to less than 50% compared to STRING. More notably, we were able to achieve a full overlap using alternative GenPPI parameters, albeit at the cost of interactions absent on the STRING database. This significant achievement underscores the software's potential as a flexible tool for advancing research in various areas of biomedicine and other scientific fields, balancing precision, completeness, and a lower density of interaction networks. GenPPI is available for access at \url{https://genppi.facom.ufu.br/

CAPRIB: a user-friendly tool to study amino acid changes and selection for the exploration of intra-genus evolution

MyBASE: a Database for Genome Polymorphism and Gene Function Studies of Mycobacterium

GEnView: a gene-centric, phylogeny-based comparative genomics pipeline for bacterial genomes and plasmids

BLOOD FROM TRAUMATIZED LIMBS.

Identifying the Genetic Basis of Functional Protein Evolution Using Reconstructed Ancestors

mBARq: a versatile and user-friendly framework for the analysis of DNA barcodes from transposon insertion libraries, knockout mutants, and isogenic strain populations

The GenPPI tool enhanced Protein Interaction Network Generation with Machine Learning-Based Protein Similarity Inference

PhyloAcc-GT: A Bayesian method for inferring patterns of substitution rate shifts on targeted lineages accounting for gene tree discordance

EDGAR 2.0: an enhanced software platform for comparative gene content analyses

Multicentric granulocytic sarcoma of the breast: mammographic, sonographic, and MR findings.

A fast comparative genome browser for diverse bacteria and archaea

antibacTR: dynamic antibacterial-drug-target ranking integrating comparative genomics, structural analysis and experimental annotation

RIBAP: a comprehensive bacterial core genome annotation pipeline for pangenome calculation beyond the species level

A phylogenetic method linking nucleotide substitution rates to rates of continuous trait evolution

DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family

ResFinder – an open online resource for identification of antimicrobial resistance genes in next-generation sequencing data and prediction of phenotypes from genotypes

Detecting gene innovations for phenotypic diversity across multiple genomes

ParallelEvolCCM: Quantifying co-evolutionary patterns among genomic features

Bayesian identification of bacterial strains from sequencing data

Genebe.net: Implementation and validation of an automatic ACMG variant pathogenicity criteria assignment

Augur: a bioinformatics toolkit for phylogenetic analyses of human pathogens