Lecture 13 Nov 2001, Per Kraulis
We have already mentioned the Pfam database as a source of multiple alignments for more than 2400 protein domain families. But what really makes Pfam useful is its tight relationship with a program package that implements hidden Markov models: HMMER, which is written by Sean Eddy.
The Pfam multiple alignments have been used to build HMMs for each and every protein domain family. Using one of the programs in the HMMER package, it is possible to search for the presence of a known domain in a sequence of your own. The web sites for Pfam all have facilities to do this via the Web. This is very useful, and the web sites contains several different tools and search possibilities.
However, if one wishes to work on a multiple alignment of ones own, and use that for searches, one must download the HMMER package and set it up, which is pretty straightforward. The HMMER package contains several different programs:
Pfam itself is built using HMMER. The seed alignments in Pfam are used to create an HMM, which is then used to identify all other sequences that contain the domain in question. These are then used to build automatically the full alignment.
There are several subtle issues when using HMMs, such as how to handle proteins with multiple domains, what to do about proteins with long, uninteresting inserts before the proper domains appear, and partial sequences containing only fragments of domains. The HMMER program has its own particular ways of dealing with these problems, and it is necessary to read the documentation for it if one needs to use extensively for some project.