Probabilistic Model Based Error Correction in a Set of Various Mutant Sequences Analyzed by Next-Generation Sequencing

Takuyo Aita,Norikazu Ichihashi,Tetsuya Yomo
DOI: https://doi.org/10.1016/j.compbiolchem.2013.09.006
IF: 3.737
2013-01-01
Computational Biology and Chemistry
Abstract:To analyze the evolutionary dynamics of a mutant population in an evolutionary experiment, it is necessary to sequence a vast number of mutants by high-throughput (next-generation) sequencing technologies, which enable rapid and parallel analysis of multikilobase sequences. However, the observed sequences include many errors of base call. Therefore, if next-generation sequencing is applied to analysis of a heterogeneous population of various mutant sequences, it is necessary to discriminate between true bases as point mutations and errors of base call in the observed sequences, and to subject the sequences to error-correction processes. To address this issue, we have developed a novel method of error correction based on the Potts model and a maximum a posteriori probability (MAP) estimate of its parameters corresponding to the ''true sequences''. Our method of error correction utilizes (1) the ''quality scores'' which are assigned to individual bases in the observed sequences and (2) the neighborhood relationship among the observed sequences mapped in sequence space. The computer experiments of error correction of artificially generated sequences supported the effectiveness of our method, showing that 50-90% of errors were removed. Interestingly, this method is analogous to a probabilistic model based method of image restoration developed in the field of information engineering.
What problem does this paper attempt to address?