Most sequence similarity search tools are based on foundations created by seminal pattern matching algorithms. Some of these are described here. Even though this area of bioinformatics is older, in comparison to data analysis for sequencing experiments, these tools still hold value in determining evolutionary change and laid the foundations for genome alignment tools.
This algorithm finds the best global alignment between any 2 sequences.
In this example of a Needleman-Wunsch matrix:
Match = +1
Mismatch = -1
Gap = -1
Scores are filled in beginning in the upper left corner at the beginning of both sequences. When matrix is complete, the best value is found in the right-most column, and a path is tracked backwards to the beginning of the two sequences to find the best alignment.
Smith-Waterman is an extension of Needleman-Wunsch that compares segments of all possible lengths (thus creating local alignments) between two sequences to maximize alignment
Above is an example of a S/W matrix
Match = + 1
MisMatch = 0.3
Gap = (1 + 0.3k*)
(* where k = number of residues included in gap)
All cell values start at zero and are not allowed to fall below zero (so a new alignment path can begin at any point). Values in cells, like in Needleman-Wunsch, are based upon value of cell plus the highest value in row, column or direct diagonal using gap penalties. FASTA and BLAST are based on this type of comparison matrix.
FASTA generates local alignments. The algorithm uses lookup tables (hash tables) to increase speed. Sensitivity and speed are determined by the size of the "word" used for the initial lookup table. FASTA builds diagonals in conjunction with the results of the lookup tables.
BLAST stands for Basic Local Alignment Search Tool. It generates local alignments. It begins a search by indexing all character strings of a certain wordsize within the query sequence by their starting position in the query. BLAST then scans the target database looking for matches between the "words" indexed in the query and strings found within the database sequences. When a word match is found (two nearby words in the case of protein searches), BLAST attempts to extend both forward and backward from the match to produce an alignment. BLAST will continue this extension as long as the alignment score continues to increase or until it drops by a critical amount owing to the negative scores given by mismatches.