Identification of heterozygous point mutation events in DNA sequencing chromatograms

A. S. Guerrero
Abstract:The recent discovery of activating somatic mutation s in cancer that correlate with phenotypes such as drug responsiveness, has generated renewed interest in the sequencing of gen om s of tumor samples and cancer cancer cell lines with the goal of identifying the set of mutations that produce those phenotypes [1]. The two most popular strategies for discovering t hese events are array CGH [2], and direct sequencing of tumor samples and cells at specific loci of genes suspected a priori to be in volved in tumor proliferation and survival. The latter technique involves using PCR amplification of the loci of interest and standard capillary electrophoresis DNA sequencing to generate chromatograms and sequences which are then compared to a reference normal seque nce to reveal mutations. The detection of homozygous events is relatively st raightforward, but the identification of heterozygo us point events is problematic. The process of detecting heterozygous events involv es detecting "peaks within peaks" of chromatogram w aveforms and is plagued by a variety of artifacts in these signals which can p otentially generate false positives. Proposed here in is a detection algorithm based on a classifier which distinguishes candidate "peaks w ithin peaks" that are heterozygous point mutations from those that are false positives based on statistics about the candidate e vent and representation of these artifacts as inter val-scale input variables to a machine learning algorithm.
What problem does this paper attempt to address?