Alignment Metric Accuracy

Ariel S. Schwartz,Eugene W. Myers,Lior Pachter

DOI: https://doi.org/10.48550/arXiv.q-bio/0510052

2005-10-28

Abstract:We propose a metric for the space of multiple sequence alignments that can be used to compare two alignments to each other. In the case where one of the alignments is a reference alignment, the resulting accuracy measure improves upon previous approaches, and provides a balanced assessment of the fidelity of both matches and gaps. Furthermore, in the case where a reference alignment is not available, we provide empirical evidence that the distance from an alignment produced by one program to predicted alignments from other programs can be used as a control for multiple alignment experiments. In particular, we show that low accuracy alignments can be effectively identified and discarded. We also show that in the case of pairwise sequence alignment, it is possible to find an alignment that maximizes the expected value of our accuracy measure. Unlike previous approaches based on expected accuracy alignment that tend to maximize sensitivity at the expense of specificity, our method is able to identify unalignable sequence, thereby increasing overall accuracy. In addition, the algorithm allows for control of the sensitivity/specificity tradeoff via the adjustment of a single parameter. These results are confirmed with simulation studies that show that unalignable regions can be distinguished from homologous, conserved sequences. Finally, we propose an extension of the pairwise alignment method to multiple alignment. Our method, which we call AMAP, outperforms existing protein sequence multiple alignment programs on benchmark datasets. A webserver and software downloads are available at <a class="link-external link-http" href="http://bio.math.berkeley.edu/amap/" rel="external noopener nofollow">this http URL</a> .

Quantitative Methods,Statistics Theory

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on the accuracy of sequence alignment and evaluation methods. Specifically, the author proposes a new metric to compare two multiple sequence alignments and addresses the following key issues: 1. **Alignment accuracy evaluation**: - A new metric (Alignment Metric Accuracy, AMA) is proposed to evaluate the similarity between two alignments. - This metric can be used to compare the predicted alignment with the reference alignment, thus providing a more balanced evaluation method that takes into account the accuracy of matches and gaps. 2. **Control in the absence of reference alignment**: - When there is no reference alignment, the author provides experimental evidence showing that the distance between alignments generated by different programs can be measured as an experimental control method. - Alignments with low accuracy can be effectively identified and discarded, thereby improving the overall alignment quality. 3. **Optimization in pairwise sequence alignment**: - In pairwise sequence alignment, the author proposes a method to maximize the expected value of AMA. This method not only improves sensitivity but also can identify non - alignable sequences, thus increasing the overall accuracy. - By adjusting a parameter (gap - factor), a trade - off can be made between sensitivity and specificity. 4. **Extension of multiple sequence alignment**: - The method of pairwise sequence alignment is extended to multiple sequence alignment, and the AMAP algorithm is proposed. - Experimental results show that AMAP outperforms existing multiple protein sequence alignment programs on the benchmark data set. ### Formula summary - **Metric definition**: \[ d(h_i, h_j) = n + m - 2| h_i^H \cap h_j^H | - | h_i^I \cap h_j^I | - | h_i^D \cap h_j^D | \] where \( h_i^H \), \( h_i^I \) and \( h_i^D \) represent the sets of matching pairs, insertions and deletions in alignment \( h_i \), respectively. - **AMA definition**: \[ g(h_p, h_r) = 1 - \frac{d(h_p, h_r)}{n + m} \] - **AMAP algorithm objective function**: \[ h_p = \arg\max_{h \in A_{n,m}} \left( \sum_{(i,j) \in h} H P(\sigma_1^i ✸ \sigma_2^j | \sigma_1, \sigma_2, \theta) + Gf \sum_{i \in h} D P(\sigma_1^i ✸ - | \sigma_1, \sigma_2, \theta) + Gf \sum_{j \in h} I P(\sigma_2^j ✸ - | \sigma_1, \sigma_2, \theta) \right) \] ### Conclusion By introducing new metrics and optimization algorithms, this paper aims to improve the accuracy and reliability of existing alignment methods, especially when dealing with complex and unrelated sequences. These improvements are of great significance for research in the field of bioinformatics, especially in genomics and protein structure analysis.

Alignment Metric Accuracy

Local reliability measures from sets of co-optimal multiple sequence alignments

Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

An efficient Z-score algorithm for assessing sequence alignments

Alignment of multiple protein sequences without using amino acid frequencies.

GUIDANCE: a web server for assessing alignment confidence scores

SAlign–a structure aware method for global PPI network alignment

TPMA: A two pointers meta-alignment tool to ensemble different multiple nucleic acid sequence alignments

Muscle-3D: scalable multiple protein structure alignment

MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming

SigAlign: an alignment algorithm guided by explicit similarity criteria

Multiple Alignment-Free Sequence Comparison

TM-align: a protein structure alignment algorithm based on the TM-score

An efficient parallel algorithm for multiple sequence similarities calculation using a low complexity method.

Maximum Match Subsequence Alignment Algorithm Finely Grained (MMSAA FG)

Exact global alignment using a* with chaining seed heuristic and match pruning

MRFalign: Protein Homology Detection through Alignment of Markov Random Fields

Alignment-free comparison of metagenomics sequences via approximate string matching

MUSCLE: a multiple sequence alignment method with reduced time and space complexity

GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters

An Algorithm for Alignment-free Sequence Comparison using Logical Match