The genomes

4. Analysing a genome

So, what does one do with a complete genome? After all, a sequenced genome consists only of so many bases in a defined order. Analysis is obviously necessary in order to obtain biologically interesting information.

The analysis of a genome covers many different aspects. Here follows a list of the most common ones, but it is clear that entirely novel ways of analysing a complete genome can be invented. The potential for interesting discoveries in the complete genomes is great; we have probably just scratched the surface so far.

Define the location of genes (coding sequences, regulatory regions): gene prediction (identification).
- Gene prediction ab initio using software based on rules and patterns. Find Open Reading Frames (ORFs), with some additional criteria. Fairly simple for bacteria, very difficult for eukaryotes.
- Gene identification through alignment with know proteins and EST sequences.
- Gene prediction through similarity with proteins or ESTs in other organisms.
- Gene prediction through comparison with other genomes; conserved regions may be coding or regulatory regions. Synteny.
Annotation of the genes: Identify with known genes, similarity with genes in other organisms. Essentially: labelling the gene.
Functional classification. Broad groups of functional characterization, such as 'ribosomal proteins', 'nucleotide metabolism', 'signal transduction'.
Metabolic pathways.
- Are any common pathways missing?
- Are there 'gaps' (missing enzymes) in some pathways?
- Compare identified pathways with the life style of the organism.
Evolutionary history
- Internal genome duplications can sometimes be detected.
- Gene decay can sometimes be characterized: genes that are on their 'way out' after duplication, or because the life style of the organism has changed.