Sequence alignment

5. Multiple sequence alignment: why?

Pairwise alignments are fundamental and useful, but there are some problems with them. For instance, when using one of the popular sequence searching programs (FASTA, BLAST) which perform pairwise alignments to find similar sequences in a database, one very often obtains many sequences that are significantly similar to the query sequence. Comparing each and every sequence to every other may be possible when one has just a few sequences, but it quickly becomes impractical as the number of sequences increases.

What we need is multiple sequence alignment, where all similar sequences can be compared in one single figure or table. The basic idea is that the sequences are aligned on top of each other, so that a coordinate system is set up, where each row is the sequence for one protein, and each column is the 'same' position in each sequence. Each column corresponds to a specific residue in the 'prototypical' protein.

As with pairwise alignment, there will be gaps in some sequences, most often shown by the dash '-' or dot '.' character. Note that to construct a multiple alignment, one may have to introduce gaps in sequences at positions where there were no gaps in the corresponding pairwise alignment. This means that multiple alignments typically contain more gaps than any given pair of aligned sequences.