Codon-Based Sequence Alignment for Mutation Analysis by High-Throughput Sequencing

Sing-Hoi Sze,Craig D. Kaplan
DOI: https://doi.org/10.1109/iccabs.2018.8542085
2018-10-01
Abstract:The advance of high-throughput sequencing has made it possible to perform large-scale mutation analysis by altering codons on a sequence and investigating the effect of the changes. While sequence alignment algorithms can be applied to compare the reads to the original unaltered sequence, neither nucleotide-based nor protein-based alignment is completely suitable for analyzing changes at the codon level. We develop a codon-based sequence alignment algorithm by modifying the dynamic programming equations so that an exact match of three letters in a codon is assigned a positive score, a mismatch of at least one letter in a codon is assigned a negative score, and an indel of either one, two or three letters is assigned a constant negative score. This strategy models what could happen within each codon directly. It has the same time complexity as the nucleotide-based dynamic programming algorithm. We apply our algorithm to analyze the effect of mutations within the RNA Polymerase II trigger loop from high-throughput sequencing libraries that we have generated in our lab. We compare our results to the ones obtained with nucleotide-based alignment. We show that our algorithm is able to avoid systematic errors that could be made with nucleotide-based alignment.
What problem does this paper attempt to address?