Abstract:Background: Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dot plot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments. Results: To address some of these issues, we created a stand alone, platform independent, graphic alignment tool for comparative sequence analysis (GATA http://gata.sourceforge.net/). GATA uses the NCBI-BLASTN program and extensive post-processing to identify all small sub-alignments above a low cut-off score. These are graphed as two shaded boxes, one for each sequence, connected by a line using the coordinate system of their parent sequence. Shading and colour are used to indicate score and orientation. A variety of options exist for querying, modifying and retrieving conserved sequence elements. Extensive gene annotation can be added to both sequences using a standardized General Feature Format (GFF) file. Conclusions: GATA uses the NCBI-BLASTN program in conjunction with post-processing to exhaustively align two DNA sequences. It provides researchers with a fine-grained alignment and visualization tool aptly suited for non-coding, 0-200 kb, pairwise, sequence analysis. It functions independent of sequence feature ordering or orientation, and readily visualizes both large and small sequence inversions, duplications, and segment shuffling. Since the alignment is visual and does not contain gaps, gene annotation can be added to both sequences to create a thoroughly descriptive picture of DNA conservation that is well suited for comparative sequence analysis.

cPlot: Contig-Plotting Visualization for the Analysis of Short-Read Nucleotide Sequence Alignments

Analysis and Visualization of ChIP-Seq and RNA-Seq Sequence Alignments Using Ngs.plot.

GATA: a graphic alignment tool for comparative sequence analysis

Sashimi plots: Quantitative visualization of RNA sequencing read alignments

CGAP-align: a high performance DNA short read alignment tool.

D-GENIES: dot plot large genomes in an interactive, efficient and simple way

ModDotPlot-rapid and interactive visualization of tandem repeats

Co-Phylog: an Assembly-Free Phylogenomic Approach for Closely Related Organisms

SeqLengthPlot: An easy-to-use Python-based Tool for Visualizing and Retrieving Sequence Lengths from fasta files with a Tunable Splitting Point

DiGAlign: Versatile and Interactive Visualization of Sequence Alignment for Comparative Genomics

Fast and accurate short read alignment with hybrid hash-tree data structure

BOAT: Basic Oligonucleotide Alignment Tool

MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads

Klumpy: A Tool to Evaluate the Integrity of Long-Read Genome Assemblies and Illusive Sequence Motifs

GMAP and GSNAP for Genomic Sequence Alignment: Enhancements to Speed, Accuracy, and Functionality

An efficient Burrows-Wheeler transform-based aligner for short read mapping

COATi: statistical pairwise alignment of protein-coding sequences

Sap-A Sequence Mapping And Analyzing Program For Long Sequence Reads Alignment And Accurate Variants Discovery

Minimap2: pairwise alignment for nucleotide sequences

Fast and Accurate Read Alignment for Resequencing.

Comparative linkage analysis and visualization of high-density oligonucleotide SNP array data