Skip to Main Content

Sequence Similarity Searching

A guide to sequence similarity searching using BLAST and other tools.

Issues with Nucleotide Searching

Whenever possible, it's usually best to BLAST with amino acid sequences using BLASTp (protein BLAST).  There are 2 reasons for this:

BLASTn for nucleotide sequences assumes that all substitutions in base pairs are equal when this is not the truth.  The rate of transition mutations (purine to purine or pyrimidine to pyrimidine) is approximately 1.5-5X that of transversion mutations (purine to pyrimidine or vice versa) in all genomes where it has been measured (see Wakely, Mol Biol Evol 11(3):436-42, 1994).
Code Degeneracy.  Some amino acids are coded by more than one codon (eg. serine is coded by UCU or AGC).  This leads to great variation in how the BLAST algorithm may interpret a nucleotide sequence.

However, it's still useful to run BLAST on nucleotide sequences.  Treat it like an experiment:  try blastn, megablast, or discontiguous megablast within the nucleotide BLAST page, or and blastx or tblastx (translated BLAST).