Stockholm Bioinformatics Center, SBC
Lecture notes: Structural biochemistry and bioinformatics 2001
Lecture 23 Nov 2001,
Per Kraulis
4. Analysing a genome
So, what does one do with a complete genome? After all, a sequenced
genome consists only of so many bases in a defined order. Analysis is
obviously necessary in order to obtain biologically interesting
information.
The analysis of a genome covers many different aspects. Here follows a
list of the most common ones, but it is clear that entirely novel ways
of analysing a complete genome can be invented. The potential for
interesting discoveries in the complete genomes is great; we have
probably just scratched the surface so far.
- Define the location of genes (coding sequences, regulatory
regions): gene prediction (identification).
- Gene prediction ab initio using software based on rules
and patterns. Find Open Reading Frames (ORFs), with some additional
criteria. Fairly simple for bacteria, very difficult for eukaryotes.
- Gene identification through alignment with know proteins and
EST sequences.
- Gene prediction through similarity with proteins or ESTs in
other organisms.
- Gene prediction through comparison with other genomes;
conserved regions may be coding or regulatory regions. Synteny.
- Annotation of the genes: Identify with known genes, similarity
with genes in other organisms. Essentially: labelling the gene.
- Functional classification. Broad groups of functional
characterization, such as 'ribosomal proteins', 'nucleotide
metabolism', 'signal transduction'.
- Metabolic pathways.
- Are any common pathways missing?
- Are there 'gaps' (missing enzymes) in some pathways?
- Compare identified pathways with the life style of the
organism.
- Evolutionary history
- Internal genome duplications can sometimes be detected.
- Gene decay can sometimes be characterized: genes that are on
their 'way out' after duplication, or because the life style of the
organism has changed.
Copyright © 2001
Per Kraulis
$Date: 2001/11/19 13:49:33 $