Ground and Free-Variable Tableaux for Variants of Quantified Modal Logics

M. C. Mayer,S. Cerrito

DOI: https://doi.org/10.1023/A:1013838528631

2001-10-01

Abstract:

What problem does this paper attempt to address?

Alignment Metric Accuracy

Ariel S. Schwartz,Eugene W. Myers,Lior Pachter

DOI: https://doi.org/10.48550/arXiv.q-bio/0510052

2005-10-28

Abstract:We propose a metric for the space of multiple sequence alignments that can be used to compare two alignments to each other. In the case where one of the alignments is a reference alignment, the resulting accuracy measure improves upon previous approaches, and provides a balanced assessment of the fidelity of both matches and gaps. Furthermore, in the case where a reference alignment is not available, we provide empirical evidence that the distance from an alignment produced by one program to predicted alignments from other programs can be used as a control for multiple alignment experiments. In particular, we show that low accuracy alignments can be effectively identified and discarded. We also show that in the case of pairwise sequence alignment, it is possible to find an alignment that maximizes the expected value of our accuracy measure. Unlike previous approaches based on expected accuracy alignment that tend to maximize sensitivity at the expense of specificity, our method is able to identify unalignable sequence, thereby increasing overall accuracy. In addition, the algorithm allows for control of the sensitivity/specificity tradeoff via the adjustment of a single parameter. These results are confirmed with simulation studies that show that unalignable regions can be distinguished from homologous, conserved sequences. Finally, we propose an extension of the pairwise alignment method to multiple alignment. Our method, which we call AMAP, outperforms existing protein sequence multiple alignment programs on benchmark datasets. A webserver and software downloads are available at <a class="link-external link-http" href="http://bio.math.berkeley.edu/amap/" rel="external noopener nofollow">this http URL</a> .

Quantitative Methods,Statistics Theory
Hyalinizing spindle cell tumor with giant rosettes of the omentum

Asako Koishi,H. Gomibuchi,J. Inoue,S. Minoura,Eisaku Itoh,Masumi Saito

DOI: https://doi.org/10.1111/j.1341-8076.2003.00133.x

2003-12-01

Abstract:We report the first case of a hyalinizing spindle cell tumor with giant rosettes of the omentum. The mesenchymal tumor arises from a multiplication of fibroblastic cells containing large rosette‐like structures composed of a central collagen core surrounded by plump oval to spindle tumor cells. A 38‐year‐old woman exhibited the symptom of abdominal pain in the right side, with a correlated sensation of a mass in the same area. A tumor consisting of both solid and cystic cytologic features was subsequently diagnosed, on the right side of the uterus. Her serum level of CA‐125 was only slightly elevated. Surgical intervention indicated that the tumor originated from lower pole of the omentum and the histological diagnosis was hyalinizing spindle cell tumor with giant rosettes. The metastatic potential of this type of tumor is considered similar to that of the metastatic low‐grade fibromyxoid sarcoma, which indicated the need for careful clinical follow up of this case.
Letter Change Bias and Local Uniqueness in Optimal Sequence Alignments

Raphael Hauser,Heinrich Matzinger

DOI: https://doi.org/10.1007/s10955-013-0819-4

2013-04-24

Abstract:Considering two optimally aligned random sequences, we investigate the effect on the alignment score caused by changing a random letter in one of the two sequences. Using this idea in conjunction with large deviations theory, we show that in alignments with a low proportion of gaps the optimal alignment is locally unique in most places with high probability. This has implications in the design of recently pioneered alignment methods that use the local uniqueness as a homology indicator.

Probability,Statistics Theory
Sequence Alignment As Hypothesis Testing

Lu Meng,Fengzhu Sun,Xuegong Zhang,Michael S. Waterman

DOI: https://doi.org/10.1089/cmb.2010.0328

2011-01-01

Abstract:Sequence alignment depends on the scoring function that defines similarity between pairs of letters. For local alignment, the computational algorithm searches for the most similar segments in the sequences according to the scoring function. The choice of this scoring function is important for correctly detecting segments of interest. We formulate sequence alignment as a hypothesis testing problem, and conduct extensive simulation experiments to study the relationship between the scoring function and the distribution of aligned pairs within the aligned segment under this framework. We cut through the many ways to construct scoring functions and showed that any scoring function with negative expectation used in local alignment corresponds to a hypothesis test between the background distribution of sequence letters and a statistical distribution of letter pairs determined by the scoring function. The results indicate that the log-likelihood ratio scoring function is statistically most powerful and has the highest accuracy for detecting the segments of interest that are defined by the statistical distribution of aligned letter pairs.
Multiple Alignment-Free Sequence Comparison

Jie Ren,Kai Song,Fengzhu Sun,Minghua Deng,Gesine Reinert

DOI: https://doi.org/10.1093/bioinformatics/btt462

IF: 5.8

2013-01-01

Bioinformatics

Abstract:Motivation: Recently, a range of new statistics have become available for the alignment-free comparison of two sequences based on k-tuple word content. Here, we extend these statistics to the simultaneous comparison of more than two sequences. Our suite of statistics contains, first, C-l* and C-l(S), extensions of statistics for pairwise comparison of the joint k-tuple content of all the sequences, and second, (C-2*) over bar, <(C-2(S))over bar> and <(C-2(geo))over bar>, averages of sums of pairwise comparison statistics. The two tasks we consider are, first, to identify sequences that are similar to a set of target sequences, and, second, to measure the similarity within a set of sequences.Results: Our investigation uses both simulated data as well as cis-regulatory module data where the task is to identify cis-regulatory modules with similar transcription factor binding sites. We find that although for real data, all of our statistics show a similar performance, on simulated data the Shepp-type statistics are in some instances outperformed by star-type statistics. The multiple alignment-free statistics are more sensitive to contamination in the data than the pairwise average statistics.
Alignment of multiple protein sequences without using amino acid frequencies.

Roman Shirokov,Veronika Shelyekhova

DOI: https://doi.org/10.1101/2024.06.05.597668

2024-06-09

Abstract:Current algorithms for aligning protein sequences use substitutability scores that combine the probability to find an amino acid in a specific pair of amino acids and marginal probability to find this amino acid in any pair. However, the positional probability of finding the amino acid at a place in alignment is also conditional on the amino acids at the sequence itself. Content-dependent corrections overparameterize protein alignment models. Here, we propose an approach that is based on (dis)similarily measures, which do not use the marginal probability, and score only probabilities of finding amino acids in pairs. The dissimilarity scoring matrix endows a metric space on the set of aligned sequences. This allowed us to develop new heuristics. Our aligner does not use guide trees and treats all sequences uniformly. We suggest that such alignments that are done without explicit evolution-based modeling assumptions should be used for testing hypotheses about evolution of proteins (e.g., molecular phylogenetics).

Bioinformatics
Small Coupling Expansion for Multiple Sequence Alignment

Louise Budzynski,Andrea Pagnani

DOI: https://doi.org/10.1103/PhysRevE.107.044125

2023-04-28

Abstract:The alignment of biological sequences such as DNA, RNA, and proteins, is one of the basic tools that allow to detect evolutionary patterns, as well as functional/structural characterizations between homologous sequences in different organisms. Typically, state-of-the-art bioinformatics tools are based on profile models that assume the statistical independence of the different sites of the sequences. Over the last years, it has become increasingly clear that homologous sequences show complex patterns of long-range correlations over the primary sequence as a consequence of the natural evolution process that selects genetic variants under the constraint of preserving the functional/structural determinants of the sequence. Here, we present a new alignment algorithm based on message passing techniques that overcomes the limitations of profile models. Our method is based on a new perturbative small-coupling expansion of the free energy of the model that assumes a linear chain approximation as the $0^\mathrm{th}$-order of the expansion. We test the potentiality of the algorithm against standard competing strategies on several biological sequences.

Quantitative Methods,Disordered Systems and Neural Networks,Biological Physics,Biomolecules
Detecting the homology of DNA-sequences based on the variety of optimal alignments: a case study

Erik Hirmo,Jüri Lember,Heinrich Matzinger

DOI: https://doi.org/10.48550/arXiv.1210.3771

2012-10-14

Applications

Abstract:We consider a novel approach of measuring the homology of DNA sequences based of the variety of optimal alignments in the longest common subsequence sense. The proposed approach is compared with BLAST in measuring the homology of four genes.
Chronic Ocular Hypertensive Rat Model using Microbead Injection: Comparison of Polyurethane, Polymethylmethacrylate, Silica and Polystyene Microbeads

S. Rho,Insung Park,G. Seong,N. Lee,Chang-Kyu Lee,Samin Hong,C. Kim

DOI: https://doi.org/10.3109/02713683.2014.884597

2014-08-14

Current Eye Research

Abstract:Abstract Purpose: To establish and assess an ocular hypertensive rat model using intracameral injection with various microbeads of different sizes and materials. Methods: Chronic elevation of intraocular pressure (IOP) was induced by the injection of various microbeads into the anterior chamber of Sprague-Dawley rat eyes. We compared the IOPs induced by the injection of different microbeads [7- and 17-µm polyurethane (PU), 7- and 15-µm polymethylmethacrylate (PMMA), 13-µm silica, and 15-µm polystyrene (PS)] and selected the appropriate microbeads for a chronic ocular hypertensive model in terms of IOP elevation and adverse events. IOP changes were observed for 4 weeks after microbead injections. Axonal degeneration was assessed with transmission electron microscopic photographs and RGC loss was assessed with retrograde labeling. Results: Seventy-eight rats were included. Three days after a single injection of microbeads, IOPs were increased by 24.0% by 7-µm PU microbeads, 101.8% by 17-µm PU microbeads, 56.6% by 7-µm PMMA microbeads, 22.0% by 15-µm PMMA microbeads, 153.0% by 13-µm silica microbeads, and 34.7% by 15-µm PS microbeads. 17-µm PU microbeads produced constant IOP elevation with good reproducibility (standard deviation of <6.5 mmHg). Silica injected eyes showed severe inflammation. Sustained IOP elevation by two injections of 17-µm PU microbeads resulted in a 42% axon loss and 36.5% RGC loss (p < 0.05, Mann–Whitney U test). Conclusions: PU microbead injections offer an applicable and versatile model for a chronic ocular hypertensive model in rats. Among several biomaterials, PU microbeads produced a more stable IOP elevation without adverse events.
Sequence alignment and mutual information

Orion Penner,Peter Grassberger,Maya Paczuski

DOI: https://doi.org/10.48550/arXiv.0810.4355

2008-10-24

Abstract:Background: Alignment of biological sequences such as DNA, RNA or proteins is one of the most widely used tools in computational bioscience. All existing alignment algorithms rely on heuristic scoring schemes based on biological expertise. Therefore, these algorithms do not provide model independent and objective measures for how similar two (or more) sequences actually are. Although information theory provides such a similarity measure -- the mutual information (MI) -- previous attempts to connect sequence alignment and information theory have not produced realistic estimates for the MI from a given alignment. Results: Here we describe a simple and flexible approach to get robust estimates of MI from {\it global} alignments. For mammalian mitochondrial DNA, our approach gives pairwise MI estimates for commonly used global alignment algorithms that are strikingly close to estimates obtained by an entirely unrelated approach -- concatenating and zipping the sequences. Conclusions: This remarkable consistency may help establish MI as a reliable tool for evaluating the quality of global alignments, judging the relative merits of different alignment algorithms, and estimating the significance of specific alignments. We expect that our approach can be extended to establish further connections between information theory and sequence alignment, including applications to local and multiple alignment procedures.

Genomics
Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

Stefano Iantorno,Kevin Gori,Nick Goldman,Manuel Gil,Christophe Dessimoz

DOI: https://doi.org/10.1007/978-1-62703-646-7_4

2012-11-09

Abstract:Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.

Quantitative Methods
An efficient Z-score algorithm for assessing sequence alignments

Hilary S Booth,John H Maindonald,Susan R Wilson,Jill E Gready

DOI: https://doi.org/10.1089/cmb.2004.11.616

Abstract:We describe an alternative method for scoring of the pairwise alignment of two biological sequences. Designed to overcome the bias due to the composition of the alignment, it measures the distance (in standard deviations) between the given alignment and the mean value of all other alignments that can be obtained by a permutation of either sequence. We demonstrate that the standard deviation can be calculated efficiently. By concentrating upon the ungapped case, the mean and standard deviation can be calculated exactly and in two steps, the first being O(N) time, where N is the length of the sequence, the second in a fixed number of calculations, i.e., in O(1) time. We argue that this statistic is a more consistent measure than a similarity score based upon a standard scoring matrix. Even in the ungapped case, the statistic proves in many cases to be more accurate than the commonly used (FASTA) (Pearson and Lipman, 1988) gapped Z-score in which the sequence is matched against a random sample of the database. We demonstrate the use of the POZ-score as a secondary filter which screens out several well-known types of false positive, reducing the amount of manual screening to be done by the biologist.
A simple method for finding related sequences by adding probabilities of alternative alignments

Martin C. Frith

DOI: https://doi.org/10.1101/gr.279464.124

IF: 9.438

2024-09-27

Genome Research

Abstract:The main way of analyzing genetic sequences is by finding sequence regions that are related to each other. There are many methods to do that, usually based on this idea: Find an alignment of two sequence regions, which would be unlikely to exist between unrelated sequences. Unfortunately, it is hard to tell if an alignment is likely to exist by chance. Also, the precise alignment of related regions is uncertain. One alignment does not hold all evidence that they are related. We should consider alternative alignments too. This is rarely done, because we lack a simple and fast method that fits easily into practical sequence-search software. Described here is the simplest-conceivable change to standard sequence alignment, which sums probabilities of alternative alignments and makes it easier to tell if a similarity is likely to occur by chance. This approach is better than standard alignment at finding distant relationships, at least in a few tests. It can be used in practical sequence-search software, with minimal increase in implementation difficulty or run time. It generalizes to different kinds of alignment, for example, DNA-versus-protein with frameshifts. Thus, it can widely contribute to finding subtle relationships between sequences.

genetics & heredity,biochemistry & molecular biology,biotechnology & applied microbiology
Characterization of pairwise and multiple sequence alignment errors

Giddy Landan,Dan Graur

DOI: https://doi.org/10.1016/j.gene.2008.05.016

IF: 3.913

2009-07-15

Gene

Abstract:We characterize pairwise and multiple sequence alignment (MSA) errors by comparing true alignments from simulations of sequence evolution with reconstructed alignments. The vast majority of reconstructed alignments contain many errors. Error rates rapidly increase with sequence divergence, thus, for even intermediate degrees of sequence divergence, more than half of the columns of a reconstructed alignment may be expected to be erroneous. In closely related sequences, most errors consist of the erroneous positioning of a single indel event and their effect is local. As sequences diverge, errors become more complex as a result of the simultaneous mis-reconstruction of many indel events, and the lengths of the affected MSA segments increase dramatically. We found a systematic bias towards underestimation of the number of gaps, which leads to the reconstructed MSA being on average shorter than the true one. Alignment errors are unavoidable even when the evolutionary parameters are known in advance. Correct reconstruction can only be guaranteed when the likelihood of true alignment is uniquely optimal. However, true alignment features are very frequently sub-optimal or co-optimal, with the result that optimal albeit erroneous features are incorporated into the reconstructed MSA. Progressive MSA utilizes a guide-tree in the reconstruction of MSAs. The quality of the guide-tree was found to affect MSA error levels only marginally.
GUIDANCE: a web server for assessing alignment confidence scores

Osnat Penn,Eyal Privman,Haim Ashkenazy,Giddy Landan,Dan Graur,Tal Pupko

DOI: https://doi.org/10.1093/nar/gkq443

Abstract:Evaluating the accuracy of multiple sequence alignment (MSA) is critical for virtually every comparative sequence analysis that uses an MSA as input. Here we present the GUIDANCE web-server, a user-friendly, open access tool for the identification of unreliable alignment regions. The web-server accepts as input a set of unaligned sequences. The server aligns the sequences and provides a simple graphic visualization of the confidence score of each column, residue and sequence of an alignment, using a color-coding scheme. The method is generic and the user is allowed to choose the alignment algorithm (ClustalW, MAFFT and PRANK are supported) as well as any type of molecular sequences (nucleotide, protein or codon sequences). The server implements two different algorithms for evaluating confidence scores: (i) the heads-or-tails (HoT) method, which measures alignment uncertainty due to co-optimal solutions; (ii) the GUIDANCE method, which measures the robustness of the alignment to guide-tree uncertainty. The server projects the confidence scores onto the MSA and points to columns and sequences that are unreliably aligned. These can be automatically removed in preparation for downstream analyses. GUIDANCE is freely available for use at http://guidance.tau.ac.il.
A simple theory for finding related sequences by adding probabilities of alternative alignments

Martin C. Frith

DOI: https://doi.org/10.1101/2023.09.26.559458

2024-04-14

Abstract:The main way of analyzing genetic sequences is by finding sequence regions that are related to each other. There are many methods to do that, usually based on this idea: find an alignment of two sequence regions, which would be unlikely to exist between unrelated sequences. Unfortunately, it is hard to tell if an alignment is likely to exist by chance. Also, the precise alignment of related regions is uncertain. One alignment does not hold all evidence that they are related. We should consider alternative alignments too. This is rarely done, because we lack a simple and fast method that fits easily into practical sequence-search software. Here is described a simplest-possible change to standard sequence alignment, which sums probabilities of alternative alignments. Remarkably, this makes it easier to tell if a similarity is likely to occur by chance. This approach is better than standard alignment at finding distant relationships, at least in a few tests. It can be used in practical sequence-search software, with minimal increase in implementation difficulty or run time. It generalizes to different kinds of alignment, e.g. DNA-versus-protein with frameshifts. Thus, it can widely contribute to finding subtle relationships between sequences.

Bioinformatics
Trimethoprim/sulfamethoxazole therapy of Pasteurella multocida infection.

M. Sands,R. Ashley,R. Brown

DOI: https://doi.org/10.1093/INFDIS/160.2.353

1989-08-01

Abstract:
SAlign–a structure aware method for global PPI network alignment

Umair Ayub,Imran Haider,Hammad Naveed

DOI: https://doi.org/10.1186/s12859-020-03827-5

IF: 3.307

2020-11-04

BMC Bioinformatics

Abstract:Abstract Background High throughput experiments have generated a significantly large amount of protein interaction data, which is being used to study protein networks. Studying complete protein networks can reveal more insight about healthy/disease states than studying proteins in isolation. Similarly, a comparative study of protein–protein interaction (PPI) networks of different species reveals important insights which may help in disease analysis and drug design. The study of PPI network alignment can also helps in understanding the different biological systems of different species. It can also be used in transfer of knowledge across different species. Different aligners have been introduced in the last decade but developing an accurate and scalable global alignment algorithm that can ensures the biological significance alignment is still challenging. Results This paper presents a novel global pairwise network alignment algorithm, SAlign, which uses topological and biological information in the alignment process. The proposed algorithm incorporates sequence and structural information for computing biological scores, whereas previous algorithms only use sequence information. The alignment based on the proposed technique shows that the combined effect of structure and sequence results in significantly better pairwise alignments. We have compared SAlign with state-of-art algorithms on the basis of semantic similarity of alignment and the number of aligned nodes on multiple PPI network pairs. The results of SAlign on the network pairs which have high percentage of proteins with available structure are 3–63% semantically better than all existing techniques. Furthermore, it also aligns 5–14% more nodes of these network pairs as compared to existing aligners. The results of SAlign on other PPI network pairs are comparable or better than all existing techniques. We also introduce $$\hbox {SAlign}^{\mathrm{mc}}$$ SAlign mc , a Monte Carlo based alignment algorithm, that produces multiple network alignments with similar semantic similarity. This helps the user to pick biologically meaningful alignments. Conclusion The proposed algorithm has the ability to find the alignments that are more biologically significant/relevant as compared to the alignments of existing aligners. Furthermore, the proposed method is able to generate alternate alignments that help in studying different genes/proteins of the specie.

biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
Similarity analysis of DNA sequences through local distribution of nucleotides in strategic neighborhood

Probir Mondal,Pratyay Banerjee,Debranjan Pal,Krishnendu Basuli

2024-09-19

Abstract:We propose a new alignment-free algorithm by constructing a compact vector representation on $\mathbb{R}^{24}$ of a DNA sequence of arbitrary length. Each component of this vector is obtained from a representative sequence, the elements of which are the values realized by a function $\Gamma$. This function $\Gamma$ acts on neighborhoods of arbitrary radius that are located at strategic positions within the DNA sequence and carries complete information about the local distribution of frequencies of the nucleotides as a consequence of the uniqueness of prime factorization of integer. The algorithm exhibits linear time complexity and turns out to consume significantly small memory. The two natural parameters characterizing the radius and location of the neighbourhoods are fixed by comparing the phylogenetic tree with the benchmark for full genome sequences of fish mtDNA datasets. Using these fitting parameters, the method is applied to analyze a number of genome sequences from benchmark and other standard datasets. Our algorithm proves to be computationally efficient compared to other well known algorithms when applied on simulated dataset.

Data Structures and Algorithms
Scoring Functions Sensitive to Alignment Error Have a More Difficult Search - A Paradox for Threading

Jeffrey Chang,Michelle Whirl Carrillo,Allison Waugh,Liping Wei,Russ B Altman

2002-01-01

Abstract:Abstract: this paper, we studied diverse globin-like structures with sequenceidentities ranging from 9 to 25 percent. The set contains proteins that areclosely related in terms of function, as well as two that are functionally diverse.We studied the ability of our scoring function to distinguish the correctalignments from shifted alignments

Ground and Free-Variable Tableaux for Variants of Quantified Modal Logics

Alignment Metric Accuracy

Hyalinizing spindle cell tumor with giant rosettes of the omentum

Letter Change Bias and Local Uniqueness in Optimal Sequence Alignments

Sequence Alignment As Hypothesis Testing

Multiple Alignment-Free Sequence Comparison

Alignment of multiple protein sequences without using amino acid frequencies.

Small Coupling Expansion for Multiple Sequence Alignment

Detecting the homology of DNA-sequences based on the variety of optimal alignments: a case study

Chronic Ocular Hypertensive Rat Model using Microbead Injection: Comparison of Polyurethane, Polymethylmethacrylate, Silica and Polystyene Microbeads

Sequence alignment and mutual information

Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

An efficient Z-score algorithm for assessing sequence alignments

A simple method for finding related sequences by adding probabilities of alternative alignments

Characterization of pairwise and multiple sequence alignment errors

GUIDANCE: a web server for assessing alignment confidence scores

A simple theory for finding related sequences by adding probabilities of alternative alignments

Trimethoprim/sulfamethoxazole therapy of Pasteurella multocida infection.

SAlign–a structure aware method for global PPI network alignment

Similarity analysis of DNA sequences through local distribution of nucleotides in strategic neighborhood

Scoring Functions Sensitive to Alignment Error Have a More Difficult Search - A Paradox for Threading