Alignment of protein-coding sequences with frameshift extension penalties

François Bélanger,Aïda Ouangraoua
DOI: https://doi.org/10.48550/arXiv.1508.04783
2015-08-20
Abstract:We introduce an algorithm for the alignment of protein- coding sequences accounting for frameshifts. The main specificity of this algorithm as compared to previously published protein-coding sequence alignment methods is the introduction of a penalty cost for frameshift ex- tensions. Previous algorithms have only used constant frameshift penal- ties. This is similar to the use of scoring schemes with affine gap penalties in classical sequence alignment algorithms. However, the overall penalty of a frameshift portion in an alignment cannot be formulated as an affine function, because it should also incorporate varying codon substitution scores. The second specificity of the algorithm is its search space being the set of all possible alignments between two coding sequences, under the classical definition of an alignment between two DNA sequences. Previous algorithms have introduced constraints on the length of the alignments, and additional symbols for the representation of frameshift openings in an alignment. The algorithm has the same asymptotic space and time complexity as the classical Needleman-Wunsch algorithm.
Data Structures and Algorithms,Computational Engineering, Finance, and Science,Genomics
What problem does this paper attempt to address?