Understanding the characteristic of single nucleotide variants

Ting Chen,Kjong-Van Lehmann
2012-01-01
Abstract:This thesis explores a variety of characteristics towards their use in ranking single nucleotide polymorphisms. The thousand genomes project and many similar ongoing large-scale sequencing efforts require new methods to predict functional variants in both coding and non-coding regions in order to understand phenotype and genotype relationships. Here the development of SInBaD (Sequence-Information-Based-Decision-model) which relies on nucleotide conservation information to evaluate any annotated human variant in all known exons, introns, splice junctions, and promoter regions is described. SInBaD builds separate mathematical models for promoters, exons and introns, using human disease mutations annotated in HGMD as the training data set for functional variants. The 10-fold cross validation shows high prediction accuracy. Validations on test datasets, demonstrate that variants predicted as functional have a significantly higher occurrence in cancer patients. Though the main analysis has been performed on human genomes, the importance of model organisms for scientific advancement can not be underestimated. Therefore a model SinBaD-Fly for functional variant discovery in Drosophila melanogaster has been developed within the framework of this thesis. Population genomics information have been studied in Drosophila melanogaster in order to study to what extent individual next generation full genome sequences might possibly contribute towards the task of prioritizing functional variants. Rigorous validation on the characteristics used in either organisms provide significant inside into the features, shaping such variants which hopefully can be used in the future to improve current models and guide experimental research.
What problem does this paper attempt to address?