Abstract:1 Algorithm We introduce a new multiple sequence alignment method for protein sequences. We name our methodHSA (Horizontal Sequence Alignment) for it horizontally slides a window on the protein sequences simultaneously. 2 HSA is superior to the existing methods that depend on the order of proteins since we consider all the proteins at once. Unlike most of the existing multiple alignment methods, HSA takes secondary structure information into account to find a biologically relevant alignment. HSA uses a scoring matrix, such as BLOSUM 62 to capture substitution probabilities of amino acids. HSA runs in four steps: Step 1: (Initialization) We start by building a directed graph from the input proteins as follows. Each residue maps to a vertex in the graph. If it is available, Secondary Structure Element (SSE) type ( -helix , -sheet) of each residue is also stored along with the vertex. A directed edge from vertex i to vertex j is added if residue j immediately follows residue i in the same sequence, or residues j and i have a substitution score higher than a given threshold. A weight is also assigned to each edge based on the substitution score and SSE type. If two residues belong to the same SSE type, then we assign a larger edge weight. All sequences are then scanned to find fragments with known SSE types. These fragments will guide the alignment later. The fragments are then clustered into groups, where each group consists of one fragment from every sequence, if they satisfy following four criterion: 1) They have same SSE type. 2) They have similar number of residues. 3) Their positions in the original sequence are close. 4) The substitution score for every fragment pair is greater than a given threshold. Step 2: (Pre-alignment Adjustment) The graph constructed in step 1 is adjusted by inserting gap vertices as follows. The number of residues in fragments and the number of residues between consecutive fragments are calculated first. The count of gap vertices is then computed as a function of these two numbers. For each sequence, gap vertices are inserted to bring the fragments within the same group together. Gap vertices are positioned between consecutive fragments. This pre-alignment adjustment will move similar fragments vertically closer to each other. Thus, they will have higher probability to be aligned together in the next step. Step 3: (Alignment) In this step, the sequences are actually aligned. We start by placing a window of length w at the beginning of each sequence. Typically we use w = 4 or 6. This window defines a subgraph of the graph constructed in Step 2. Next, we greedily choose the clique with the best expectation score from this subgraph. We will explain the expectation score later. A clique here is defined as a complete subgraph of the graph with a constraint that it consists of one vertex from each sequence. In other words, if K sequences are to be aligned, a clique corresponds to the alignment of one letter from each of the K sequences. The score of a clique is defined as the SP (Sum-of-Pairs) score of the corresponding column. For each clique, we align the letters of that clique, and iteratively find the next best clique that 1) does not conflict with this clique, and 2) has at least one letter next to a letter in this clique. This iteration is repeated t times to find t columns. Typically, t = 4. These t cliques define a local alignment of the input sequences. The expectation score of the original clique is defined as the SP score of this local alignment. We then slide the window by one and repeat the same process until it reaches the end of sequences. Step 4: (Post-alignment Adjustment) In this step, the alignment obtained by the previous step is adjusted by examining the gaps. After concatenating the columns, many short gaps may be scattered in the sequence. Thus rearranging gaps may be required to construct fewer but longer gaps. Sequences are scanned again to find

A New Approach for Multiple Sequence Alignment

A Knowledge-Based Multiple-Sequence Alignment Algorithm

Contact-based Simulated Annealing Protein Sequence Alignment Method

A successive sub-grouping method for multiple sequence alignments analysis

Kalign – an accurate and fast multiple sequence alignment algorithm

A Survey of Multiple Sequence Alignment Techniques.

ProbPFP: a multiple sequence alignment algorithm combining hidden Markov model optimized by particle swarm optimization with partition function

Maximum Match Subsequence Alignment Algorithm Finely Grained (MMSAA FG)

Multiple sequence alignment by ant colony optimization and divide-and-conquer

GLProbs: Aligning Multiple Sequences Adaptively

A heuristic algorithm for multiple sequence alignment base on progressive multiple alignment

A Fast Template Based Heuristic For Global Multiple Sequence Alignment

Dynamics of the tuning process between singers

Small Coupling Expansion for Multiple Sequence Alignment

Grouping of Amino Acids and Recognition of Protein Structurally Conserved Regions by Reduced Alphabets of Amino Acids

Alignment of multiple protein sequences without using amino acid frequencies.

MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming

SAlign–a structure aware method for global PPI network alignment

An efficient Z-score algorithm for assessing sequence alignments

Multiobjective artificial fish swarm algorithm for multiple sequence alignment

Alignment Metric Accuracy