GalterGuides: Sequence Similarity Searching: Issues with Nucleotide Searching

Issues with Nucleotide Searching

Whenever possible, it's usually best to BLAST with amino acid sequences using BLASTp (protein BLAST). There are 2 reasons for this:

BLASTn for nucleotide sequences assumes that all substitutions in base pairs are equal when this is not the truth. The rate of transition mutations (purine to purine or pyrimidine to pyrimidine) is approximately 1.5-5X that of transversion mutations (purine to pyrimidine or vice versa) in all genomes where it has been measured (see Wakely, Mol Biol Evol 11(3):436-42, 1994).
Code Degeneracy. Some amino acids are coded by more than one codon (eg. serine is coded by UCU or AGC). This leads to great variation in how the BLAST algorithm may interpret a nucleotide sequence.

However, it's still useful and often necessary to run BLAST on nucleotide sequences, especially for potentially new genes or regions found in your genome sequencing experiments. Treat it like an experiment: try blastn, megablast, or discontiguous megablast within the nucleotide BLAST page, or and blastx or tblastx (translated BLAST).

Sequence Similarity Searching

Issues with Nucleotide Searching

Northwestern University Feinberg School of Medicine

Northwestern University
Feinberg School of Medicine