Cs 374 Paper Summary – Wissam Kazan

Wissam Kazan,Xin Chen,Jie Zheng,Zheng Fu,Peng Nan,Yang Zhong,Stefano Lonardi,Tao Jiang
2006-01-01
Abstract:The paper discusses a new algorithm to assign orthologous genes between a pair of genomes using both sequence similarity and evolutionary events. Many ortholog assignment algorithms have already been developed, but most of them are based on DNA/protein sequence similarity or use a homology search algorithm. The paper divides the problem in two phases: the signed reversal distance with duplicates (SRDD) computation phase, and the ortholog assignment phase that uses an algorithm (SOAR) developed by the authors. The former is solved using two techniques, minimum common partition of two given genomes and maximum cycle decomposition on a complete graph. SOAR has then be tested on real and simulated genome data, and then compared to the results returned by INPARANOID and by an iterative version of the exemplar algorithm presented in [1]. Summary The paper describes a system used for ortholog assignments, called SOAR (System for Ortholog Assignment based on sorting by Reversal). This system uses an efficient heuristic algorithm for SRDD. The system takes into consideration gene level local mutations and genome-level global rearrangements which are measured by sequence similarity and by the minimum number of rearrangement events respectively. SOAR first takes two annotated genomes as input and constructs a gene family by applying a homology search. This involves performing an All-versus-All comparison by BLASTp and then summing up their HSPs (high scoring segment pairs generated by BLASTp). It then assigns orthologs by using the SRDD algorithm described in the paper: first, they apply three suboptimal rules, then they apply the minimum common partition and decompose the resulting complete graph into cycles by using maximum cycle decomposition. The performance of the SOAR system has been tested against simulated data and real genome sequence data, and the results have been compared to two algorithms: INPARANOID and an iterative version of the exemplar algorithm [1]. SOAR outperformed the exemplar algorithm and achieved slightly better results that INPARANOID. Discussion In most of the cases, SOAR sensitivity outperforms INPARANOID’s but the specificity of the latter is always better. Based on that fact, we can probably say that both approaches perform almost the same (the difference between the sensitivities is not that big). In my opinion, the importance of the paper doesn’t lie in the results per se, but rather in the approach and in the algorithms developed: MCP (Minimum Common Partition) and MCD (Maximum Cycle Decomposition). Those two algorithms proved to be very efficient to find the signed reversal distance with duplicates (SRDD) between two genomes. [1] D. Sankoff, “Genome Rearrangement with Gene Families,” Bioinformatics, vol. 15, no. 11, pp. 909-917, 1999.
What problem does this paper attempt to address?