Lecture 26 Jan 2001 Per Kraulis
The recent complete genome projects, and the results in the form of analysis and comparison, have driven home the idea that the completeness of the data gives an entirely new dimension to biological science. New types of analysis can be made and new kinds of experiments can designed which simply could not be done before.
The realization that the completeness of a data set can be extremely useful has made the biological scientific community look into the possibilities of obtaining other such complete data sets. It has even spawned a number of new buzzwords, some of which are less elegant than others:
Description | Time dependence | Manipulation in the cell | Measurement in the cell | |
---|---|---|---|---|
Genome | The complete DNA sequence of the organism. | First approximation: none. | Genetic manipulation; site-directed mutagenesis, genetic screens. | Routine. High accuracy and completeness. |
Transcriptome | The complete set of mRNA transcripts in a cell or tissue of the organism. | Reflects the state of gene expression. | Indirectly via genetic manipulation. | Routine, or nearly so. cDNA microarrays, SAGE, Northern blots. Quantification inaccurate. Approximate completeness. |
Proteome | The complete set of proteins present in a cell or tissue, including variants due to covalent modifications. Sometimes also the associations with other molecular components (complexes). | Responsive to all manner of influences throughout the life cycle. | Indirectly through genetic manipulations. Small molecules as inhibitors of enzymes. | Not routine. 2D-gels with mass spectrometry. Fusions with reporter proteins (GFP). Yeast two-hybrid system. Quantification very difficult. Completeness far away, and very difficult. |
Metabolome | All metabolites and their concentrations. Defined as organic small molecules, not protein, RNA or DNA. | Responsive to all manner of influences throughout the life cycle. | Difficult. | Not routine. NMR, mass spectrometry, a few other special methods for specific molecules. Quantification and resolution difficult. Completeness far away, but maybe reachable. |
Of these four kinds of datasets, only the genome and transcriptome can be routinely measured today. The proteome and metabolome are much more difficult, and require expertise and well-behaved systems, if at all possible. There is a severe lack of good experimental strategies to solve these problems. The interest in developing new methods is correspondingly great.
The future development of biology will to a large extent be based on new contributions to the characterization and analysis of these data sets. There is also a vast potential for new applications and new clever experimental designs that utilize the already established techniques. And finally, the challenge of synthesizing an overall picture out of the mass of data is there for bioinformatics to respond to.