Optimization Methods for Genotype Data Analysis in Epidemiological Studies

Dumitru Brinza,Jingwu He,Alexander Zelikovsky
DOI: https://doi.org/10.1002/9780470253441.ch18
2007-01-01
Abstract:Recent improvement in accessibility of high throughput DNA sequencing brought a great deal of attention to disease association and susceptibility studies. Successful genome-wide searches for disease-associated gene variations have been recently reported [18,26]. However, complex diseases can be caused by combinations of several unlinked gene variations. This chapter addresses computational challenges of genotype data analysis in epidemiological studies including selecting of informative SNPs, searching for diseases associated SNPs, and predicting of genotype susceptibility. Disease association studies analyze genetic variation across exposed to a disease (diseased) and healthy (non diseased) individuals. The difference between individual DNA sequences occurs at a single base sites, in which more than one allele is observed across population. Such variations are called single nucleotide polymorphisms (SNPs). The number of simultaneously typed SNPs for association and linkage studies is reaching 106 for SNP Mapping Arrays [1]. High density maps of SNPs as well as massive DNA data with large number of individuals and number of SNPs become publicly available [12]. Diploid organisms, like human, have two near identical copies of each chromosome. Most genotyping techniques (e.g., SNP Mapping Arrays [1]) do not provide separate SNP sequences (haplotypes) for each of the two
What problem does this paper attempt to address?