Stockholm Bioinformatics Center, SBC
Lecture notes: Structural biochemistry and bioinformatics 2001

Lecture 13 Nov 2001, Per Kraulis

Hidden Markov models

7. PSI-BLAST

PSI-BLAST (Altschul et al 1997) is a program for searching for homologs of a probe sequence (protein, usually) that uses a profile strategy to achieve high sensitivity. It stands for Position-Specific Iterated BLAST.

The basic strategy underlying PSI-BLAST is that starting from the probe sequence, an ordinary search of the database is performed. The hits that are found are then used to set up a sequence profile, which is then used again to search for similar sequences. The idea is that instead of having to build a profile (HMM or other) oneself, the program does it automatically from the single sequence one gives it.

The PSI-BLAST program is much more sensitive than the BLAST program, but it takes more CPU time. It competes to a certain extent with the HMM-based approaches.

It has some important drawbacks: If the probe sequence contains a strongly conserved domain, then the profile may be weighted towards it and away from the rest of the sequence as the iterations are performed. There is a danger that one misses weak homologies for other parts in the sequence than the strongly conserved domain. Of course, this may happen with other methods as well.

A particular problem is posed by the so-called low-complexity regions in proteins (typically Gln, Ser, Thr and Pro-rich regions): these regions should be masked out, or the search will likely just try to pick up other proteins that contain this kind of sequence, which is almost certainly not what one wants.


Copyright © 2001 Per Kraulis $Date: 2001/11/09 15:34:26 $