GalterGuides: Sequence Similarity Searching: Multiple Sequence Alignment

Choices for Multiple Sequence Alignment

BLAST is the preferred platform for sequence similarity searching, and most users employ BLAST at the NCBI site (although BLAST is available at a number of other bioinformatics sites). When aligning multiple sequences, however, many options are available.

For years, most researchers used ClustalW for multiple alignment of sequences, and this program worked well with good speed. However, there are a number of newer multiple alignment programs that perform with better accuracy and speed than ClustalW (including the direct replacement: Clustal Omega, which is a greatly-improved update to ClustalW).

You can choose from several multiple alignment tools at the European Bioinformatics Institute's website or at the Max-Planck Institute's Bioinformatics toolkit website (click on the "Alignment" tab at the top of the page) or at the online programs page of Phylogeny.fr . Max-Plank has a good selection of multiple alignment tools in addition to having a number of other bioinformatics tools, but you cannot launch your multiple alignments directly in Jalview upon completion of the alignment. You can, however, save your file and use it in other applications, such as phylogenetic analysis.

EBI and Phylogeny.fr servers allow you to open your multiple alignment in Jalview (a Java-based multiple alignment tool), which is useful for visualization and manual editing of multiple alignments.

The sequence alignment tools at EBI are located under either the "Proteins" or the "DNA & RNA" sections on the EBI's Data Resources and Tools page:

https://www.ebi.ac.uk/services/data-resources-and-tools

Or just do a search for "multiple" in the search box at the top of this page.

Useful MSA tools here are Clustal Omega, Kalign, Lalign, MAFFT, MUSCLE, T-Coffee, and UniProt Align.

Let's try an example using MUSCLE.
MUSCLE performs well for most protein alignments and leads to less need for manual editing of sequences than is necessary with some other programs.

Find and click on the MUSCLE link.

You can paste your sequences in FASTA format in the box, or upload your .fasta file. FASTA is the most common file format for most sequences retrieved from sources like the NCBI, UCSC Genome Browser, Ensembl, etc., but there are several other file formats you can use. You can change the output format and request to have an output tree made from your sequences.

Note: Even though ClustalW is no longer a preferred multiple sequence alignment tool, the ClustalW output format is a good format for results. many phylogenetic programs accept this output file type.

Most results are returned quickly using the interactive format, unless you have many sequences of great length. If you have a larger job, click the checkbox to be notified by email when the job is done. Click the Submit button.

Example: MUSCLE Output

Your results will be returned with a number of buttons across the top to access various output formats:

Alignment (this is the default output view that you'll see when your job is done)
Result Summary
Guide Tree
Phylogenetic Tree (if you requested it)
Submission Details
Download Alignment File
Show Colors
View Result with Jalview
Send to Simple Phylogeny
Send to MView

Click on the Jalview button. This may require you to allow Java permission to access the file (check the dowloaded file's permissions). You'll have to have a current version of Java installed on your computer. After launching Jalview, you'll see your sequences aligned in a graphic format and a number of options at the top to change the formatting.

There are many ways you can annotate and format the Jalview display. Try changing the color to Clustal to see highly-conserved sites.

Comments on Other Multiple Alignment Programs

COBALT - from the NCBI BLAST platform - pairwise construction of multiple alignments of protein sequences. The benefit of COBALT is that you can take your BLAST results directly into the tool without leaving the NCBI site.
MultAlin - while many authors used this for DNA sequence multiple alignment in the past, it is not as fast or accurate as more modern multiple alignment programs
ProbCons - best used at the Phylogeny.fr platform , so you can use Jalview to edit and view alignments or take results directly into a tree-building tool. ProbCons performs quite well for proteins and is a little-known aligner. It is especially good for aligning sequences with low identity for phylogenetic analysis.

Sequence Similarity Searching

Choices for Multiple Sequence Alignment

Example: MUSCLE Output

Comments on Other Multiple Alignment Programs

Northwestern University Feinberg School of Medicine

Northwestern University
Feinberg School of Medicine