Skip to Main Content

Sequence Similarity Searching

A guide to sequence similarity searching using BLAST and other tools.

Choices for Multiple Sequence Alignment

BLAST is the preferred platform for sequence similarity searching, and most users employ BLAST at the NCBI site (although BLAST is available at a number of other bioinformatics sites).  When aligning multiple sequences, however, many options are available.

For years, most researchers used ClustalW for multiple alignment of sequences, and this program worked well with good speed.  However, there are a number of newer multiple alignment programs that perform with better accuracy and speed than ClustalW (including the direct replacement: Clustal Omega, which is a greatly-improved update to ClustalW).  

You can choose from several multiple alignment tools at the European Bioinformatics Institute's website or at the  Max-Planck Institute's Bioinformatics toolkit website  (click on the "Alignment" tab at the top of the page) or at the online programs page of .  Max-Plank has a good selection of multiple alignment tools in addition to having a number of other bioinformatics tools, but you cannot launch your multiple alignments directly in Jalview upon completion of the alignment.  You can, however, save your file and use it in other applications, such as phylogenetic analysis.

EBI and servers allow you to open your multiple alignment in Jalview (a Java-based multiple alignment tool), which is useful for visualization and manual editing of multiple alignments.

The sequence alignment tools at EBI are located under either the "Proteins" or the "DNA & RNA" sections on the EBI's Services page:

Not all tools listed on these pages are for multiple alignment, but ones that are useful are Clustal Omega, Kalign, Lalign, MAFFT, and T-Coffee.  MUSCLE performs well for most protein alignments and leads to less need for manual editing of sequences than is necessary with some other programs.

Find and click on the MUSCLE link.

You can paste your sequences in FASTA format in the box, or upload your .fasta file.  You can change the output format and request to have an output tree made from your sequences.

Note: Even though ClustalW is no longer a preferred multiple sequence alignment tool, the ClustalW output format is a good format for results. many phylogenetic programs accept this output file type.

Most results are returned quickly using the interactive format, unless you have many sequences of great length. If you have a larger job, click the checkbox to be notified by email when the job is done. Click the Submit button.

Example: MUSCLE Output

Your results will be returned with a number of buttons across the top to access various output formats:

  • Alignment (this is the default output view that you'll see when your job is done)
  • Result Summary
  • Guide Tree
  • Phylogenetic Tree (if you requested it)
  • Submission Details
  • Download Alignment File
  • Show Colors
  • View Result with Jalview
  • Send to Simple Phylogeny
  • Send to MView

Click on the Jalview button. This may require you to allow Java permission to access the file (check the dowloaded file's permissions). You'll have to have a current version of Java installed on your computer. After launching Jalview, you'll see your sequences aligned in a graphic format and a number of options at the top to change the formatting.

There are many ways you can annotate and format the Jalview display. Try changing the color to Clustal to see highly-conserved sites.

Comments on Other Multiple Alignment Programs

  • COBALT  - from the NCBI BLAST platform - pairwise construction of multiple alignments of protein sequences. The benefit of COBALT is that you can take your BLAST results directly into the tool without leaving the NCBI site.
  • MultAlin  - while many authors used this for DNA sequence multiple alignment in the past, it is not as fast or accurate as more modern multiple alignment programs
  • ProbCons - best used at the platform , so you can use Jalview to edit and view alignments or take results directly into a tree-building tool.  ProbCons performs quite well for proteins and is a little-known aligner. It is especially good for aligning sequences with low identity for phylogenetic analysis.