Shrunken Dissimilarity Measure for Genome-wide SNP Data Classification ∗

Haiyong Liao,Yang Liu,M. Ng
Abstract:Recent development of high-resolution single-nucleotide polymorphism (SNP) arrays allows detailed assessment of genome-wide human genome variations. However, SNP data typi- cally has a large number of SNPs (e.g., 400 thousand SNPs in genome-wide Parkinson disease SNP data) and a few hundred of samples. Conventional classification methods may not be effective when applied to such genome-wide SNP data. In this paper, we propose to develop and use shrunken dis- similarity measure to analyze and select relevant SNPs for classification problems. Examples for HapMap data and Parkinson data are given to demonstrate the effectiveness of the proposed method and illustrate it has the potential to become a useful analysis tool for SNP data sets. In particular, we find some SNPs in chromosome 2 that they contain in some genes which is relevant to Parkinson disease.
What problem does this paper attempt to address?