Linear Algebraic Tag SNP Selection and Haplotype Reconstruction

J. He,K. Westbrooks,A. Zelikovsky
2005-01-01
Abstract:Constructing a complete human haplotype map is helpful when associating complex diseases with their related SNPs. Unfortunately, the number of SNPs is very large and it is costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNPs that should be sequenced to a small number of informative representatives called tag SNPs. Also, the tag SNP selection may reduce the noise introducing by irrelevant SNPs for disease association. In this paper, we propose a new linear algebraic method for tag SNP selection and haplotype reconstruction. Our new haplotype reconstruction method is purely combinatorial and can be applied to any set of tag SNPs. We compare the quality of our new linear algebraic methods with several previously known methods. We use the data sets, evaluation methodology, and sometimes tag SNPs suggested by the respective authors. In our comparisons, the proposed linear algebraic algorithm considerably improves the quality of haplotype reconstruction. For example, for the LPL [5] and Chromosome 21 data [15] when 10% of SNPs are used as tags, the new linear algebraic algorithm reaches 80% accuracy, while the methods of Halldorsson et al. [9] and Zhang et al. [19] only reach 20% accuracy.
What problem does this paper attempt to address?