Skip to Main Content

Sequence Similarity Searching

A guide to sequence similarity searching using BLAST and other tools.

Complete Statistics of BLAST

The NCBI has a detailed description of all the statistical properties of BLAST scores.
NCBI's Statistics of Sequence Similarity Scores

Outlined in this guide are the statistics of the BLAST report that are useful to know so that you can evaluate the BLAST hits that result from a search.

What's an "E value"?

Expectation value. The likelihood that the alignment has a score equivalent to or better than the BLAST-calculated raw score S that is expected to occur in a database search by chance. The lower the E value, the more significant the score.

What's "S"?

Raw Score. The score of an alignment, S, calculated as the sum of substitution and gap scores.

Substitution scores are given by a look-up table (like PAM, BLOSUM). Gap scores are typically calculated as the sum of G, the gap opening penalty and L, the gap extension penalty. For a gap of length n, the gap cost would be G+Ln. The choice of gap costs, G and L is empirical, but it is customary to choose a high value for G (10-15) and a low value for L (1-2).

What's a "Bit score"?

The value S' (bit score) is derived from the raw score S in which the statistical properties of the scoring system used have been taken into account. Because bit scores have been normalized with respect to the scoring system, they can be used to compare alignment scores from different searches.