Reconstruction of Ancestral Gene Order Following Large Scale Genome Duplication and Gene Loss

Jun Huan,J. Prins,Wei Wang,T. Vision
2003-01-01
Abstract:Gene order evolves through gross chromosomal rearrangements, small scale inversions and transpositions, gene duplication, and gene loss. Much research has been done on the calculation of edit distance and on sorting algorithms under a variety of rearrangement models in which the genome may be represented as conserved segments with permuted order and orientation. However, gene loss within otherwise conserved segments, as typically occurs following large scale genome duplication, has not been well studied algorithmically. This has been a major impediment to comparative genomics in certain taxa, such as plants and fish. When large scale genome duplication and gene loss are occurring, how well can we infer both the true gene order within ancestral chromosomal segments and the ancestral ordering of those segments? We propose a heuristic algorithm for the inference of ancestral gene order in a set of genomes for which at least some genomic segments are partially related by common ancestry to two or more different segments. It does not require gene content and order to be perfectly conserved among segments. First, conserved chromosomal regions are identified using existing pairwise genomic alignment algorithms. Second, segments are iteratively clustered under the control of two parameters, (1) the minimal required number of shared genes between two segments or clusters and (2) the maximal allowed number of rearrangement breakpoints along the lineage leading to each descendant segment. Finally, we compute the estimated ancestral gene order for each cluster. We evaluate the performance of this algorithm on simulated data that models a genome evolving by large-scale duplication, duplicate gene loss, transposition, translocation, and inversion. The results suggest that ancestral gene orders may be estimated with sufficient accuracy to substantially improve the detection sensitivity of pairwise genomic alignment algorithms.
What problem does this paper attempt to address?