An Integer Linear Programming Solution for the Domain-Gene-Species Reconciliation Problem

Lei Li,Mukul S. Bansal
DOI: https://doi.org/10.1145/3233547.3233603
2018-01-01
Abstract:It is well-understood that most eukaryotic genes contain one or more protein domains and that the domain content of a gene can change over time. This change in domain content, through domain duplications, transfers, or losses, has important evolutionary and functional consequences. Recently, a powerful new reconciliation framework, called Domain-Gene-Species (DGS) reconciliation, was introduced to simultaneously model the evolution of a domain family inside one or more gene families and the evolution of those gene families inside a species tree. The underlying computational problem in DGS reconciliation is NP-hard and a heuristic algorithm is currently used to estimate optimal DGS reconciliations. However, this heuristic has several undesirable limitations. First, it offers no guarantee of optimality or near-optimality. Second, it can result in biologically unrealistic evolutionary scenarios. And third, it only computes a single DGS reconciliation even though there can be multiple optimal DGS reconciliations. In this work, we introduce the first exact algorithm for computing optimal DGS reconciliations that addresses all three limitations. Our algorithm is based on an integer linear programming formulation of the problem, which we solve iteratively by solving a series of linear programming relaxations. Our experimental results on over $3,400$ domain trees and over 7,000 gene trees from 12 fly species shows that our new algorithm is highly scalable and that it leads to significant improvement in DGS reconciliation inference. An implementation of our exact algorithm is available freely from http://compbio.engr.uconn.edu/software/seadog/.
What problem does this paper attempt to address?