Copyright © 2003 Per Kraulis, Stockholm Bioinformatics Center, SBC

Classical papers in bioinformatics

Ph.D. course, Spring 2003, DBB, SU.

Per Kraulis

Course description

The subject of the course is a set of classical articles that contain fundamental contributions to bioinformatics. This includes papers describing results or novel approaches that have influenced the thinking in bioinformatics, or recent papers that are considered particularly important.

Course format

All students must attend every meeting. All students must read the given article(s) before the meeting. Two students will be responsible for presenting the articles, in order to start the discussion about them. These two students are supposed to meet and prepare the presentation together. The two students should be responsible for a subject that is different from the field of their Ph.D. projects.

All students must prepare one question about the articles for the presenting students two consider. Send this question to the course organizer (Per Kraulis, email address below), at least two days in advance of the meeting, who will forward it to the presenting students.

The senior scientist(s) will not lead the discussion, only participate. That's why the senior scientist is called 'sponsor', not teacher. The presenting students are encouraged to contact the sponsor well in advance of the meeting to discuss the paper(s).

You will have to get copies of the articles yourself. In a few cases (some very old papers), we may provide paper copies.

New rule from 2003-04-11! A student who is unable to attend a meeting will have to write a one-page summary (A4, own words) of each article and send it to Per Kraulis. The reason for this is that it appears that several students will be away from some of the last sessions, and to avoid having to reduce the number of credits (points) given, these students must show that they have put some effort into studying the relevant material.

Meeting date

The meetings are held bi-weekly, Thursdays, 15:00-18:00, at SBC (AlbaNova). There will be about 10 meetings during the spring of 2003.

Note: Change of date!: Instead of 2003-05-22, we will meet 2003-05-15 (Thursday, as usual).


  • Per Kraulis
  • Arne Elofsson
  • Gunnar von Heijne
  • Bengt Sennblad
  • Jens Lagergren

  • Erik Granseth, SBC, DBB, SU
  • Olivia Eriksson, SBC, DBB, SU
  • Johannes Frey-Skött, SBC, DBB, SU
  • Johan Nilsson, SBC, MBB, KI
  • Olof Emanuelsson, SBC, DBB, SU
  • Albin Sandelin, CGB, KI
  • Jesper Lundström, SBC, NADA, KTH
  • Jan Lindqvist, SBC, NADA, KTH
  • Lukas Käll, CGB, KI
  • Markus Wistrand, CGB, KI
  • Sara Light, SBC, DBB, SU
  • Örjan Svensson, SBC, NADA, KTH
  • Abhiman Saraswathi, CGB, KI
  • Karin Melen, SBC, DBB, SU
  • Martin Enge, CCK, KI


Date Article(s) Presenting students Sponsor


Linus Pauling and Emile Zuckerkandl, Chemical Paleogenetics. Molecular "Restoration Studies" of Extinct Forms of Life. Acta Chem. Scand. 17 (1963) Suppl 1, pp 9-16.

Langley, C.H. and Fitch, W.M., An examination of the constancy of the rate of molecular evolution. J. Mol. Evol. 3 (1974) 161-177.

Kimura, M. Evolutionary rate at the molecular level. Nature 217 (1968) 624-626.

Sara Light

Johan Nilsson

Bengt Sennblad


Bowie JU, Luthy R, Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991 Jul 12;253(5016):164-70, PubMed

Jones DT, Taylor WR, Thornton JM. A new approach to protein fold recognition. Nature. 1992 Jul 2;358(6381):86-9, PubMed

Olivia Eriksson

Karin Melen

Arne Elofsson


Tsoka S, Ouzounis CA. Prediction of protein interactions: metabolic enzymes are frequently involved in gene fusion. Nat Genet. 2000 Oct;26(2):141-2, PubMed

Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D. A combined algorithm for genome-wide prediction of protein function. Nature. 1999 Nov 4;402(6757):83-6 PubMed

Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. Detecting protein function and protein-protein interactions from genome sequences. Science. 1999 Jul 30;285(5428):751-3. PubMed

Additional papers of interest:

Doolittle RF. Do you dig my groove? Nature Genetics, 1999, 23, 6-8 PDF

Enright AJ, Ouzounis CA. Functional associationa of proteins in entire genomes by means of exhaustive detection of gene fusions Genome Biology, 2001, 2, 0034.1-0034.7 PDF

Snel B, Bork P, Huynen M. Genome evolution: Gene fusion versus gene fission TIG, 2000, 16, 9-11 PDF

Yanai I, DeLisi C. The society of genes: networks of functional links between genes from comparative genomics Genome Biology, 2002, 3, 0064.1-0064.12 PDF

von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P. Comparative assessment of large-scale data sets of protein-protein interactions Nature, 2002, 417, 399-403. PDF

Jan Lindqvist

Jesper Lundström

Arne Elofsson


Ideker TE, Thorsson V, Karp RM. Discovery of regulatory interactions through perturbation: inference and experimental design. Pac Symp Biocomput 2000;:305-16. PDF, PubMed

Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 2001 May 4;292(5518):929-34. PDF, PubMed

Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 2002 Jul;18 Suppl 1:S233-40 PDF, PubMed

Lukas Käll

Albin Sandelin

Per Kraulis


Steel, M. and Penny, D., Parsimony, likelihood and the role of models in molecular phylogenetics. Molecular Biology and Evolution 17(6) 839-850. PDF

Fitch, W.M., Toward defining the course of evolution: minimum change for a specific tree topology. Systematic Zoology 20 (1971) 406-416. (hardcopy available)

Felsenstein, J. Evolutionary Trees from DNA-Sequences - a Maximum-Likelihood Approach. J. Mol. Evol. (1981) 17:368-376. (hardcopy available)

Martin Enge

Markus Wistrand

Jens Lagergren
Bengt Sennblad


Chou PY, Fasman GD. Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry. 1974 Jan 15;13(2):211-22. (hardcopy available)

Chou PY, Fasman GD. Prediction of protein conformation. Biochemistry. 1974 Jan 15;13(2):222-45. PubMed (hardcopy available)

Rost B, Sander C. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol. 1993 Jul 20;232(2):584-99, PubMed (hardcopy available)

Abhiman Saraswathi

Erik Granseth

Gunnar von Heijne


Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443-53.

Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195-7.

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-10.

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Domains Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. Review. PDF, PubMed

Örjan Svensson

Johannes Frey-Skött

Bob MacCallum


Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL. The large-scale organization of metabolic networks. Nature 2000 Oct 5;407(6804):651-4 PDF, PubMed

Barabasi AL, Albert R. Emergence of scaling in random networks. Science 1999 Oct 15;286(5439):509-12 PDF, PubMed

Olivia Eriksson

Albin Sandelin

Per Kraulis


Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., & Wootton, J. C. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 1993, 262(5131), 208-14. PubMed (hardcopy available)

Gary D. Stormo, DNA binding sites: representation and discovery Bioinformatics 2000 16: 16-23. PDF

Olof Emanuelsson

Lukas Käll

Jens Lagergren

Note: New date!


Fitch, W. M. Homology: a personal view on some of the problems. Trends in Genetics (2000) 16:227-231. (hardcopy available)

Goodman, M., Cselusniak, J., Moore, G. W., Romero-Herrera, A. E., and Matsuda, G. Fitting the gene lineage into its species lineage: A parsimony strategy illustrated by cladograms constructed from globin sequences. Syst. Zool. (1979) 28:132-168. (hardcopy available)

Ohno, S., Wolf, U. and Atkin, N.B. Evolution from fish to mammals by gene duplication. Hereditas (1968) 59:169-187. (hardcopy available)

Markus Wistrand

Karin Melen

Bengt Sennblad

Excluded topics

This is a list of the topics that were suggested, but which I decided (with some input from others) to exclude from the list. This can be used to create one or two more meetings, if needed.

X1 Orengo CA, Jones DT, Thornton JM. Protein superfamilies and domain superfolds. Nature. 1994 Dec 15;372(6507):631-4. PubMed

Park J, Karplus K, Barret C, Hughey R, Haussler D, Hubbard T, Chothia C. Sequence Comparisons Using Multiple Sequences Detect Three Times as Many Remote Homologues as Pairwise Methods. J. Mol. Biol. (1998) 284, 1201-1210. PubMed

Arne Elofsson
X2 Dahiyat BI, Mayo SL. De novo protein design: fully automated sequence selection. Science. 1997 Oct 3;278(5335):82-7. PubMed Arne Elofsson
X3 Chothia, C Hydrophobic bonding and accessible surface area in proteins. Nature 1974 Mar 22;248(446):338-9 Gunnar von Heijne
X4 Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A. 1977 Dec;74(12):5463-7.

Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SB, Hood LE. Fluorescence detection in automated DNA sequence analysis. Nature. 1986 Jun 12-18;321(6071):674-9.

Gunnar von Heijne
X5 H. Kaplan, R. Shamir, R. E. Tarjan, Faster and simpler algorithm for sorting signed permutations by reversals SIAM J. Comput. 29:3 (1999) 880-892 PostScript

Bafna V., Pevzner P.A. Sorting by transpositions. (1998) SIAM J. Discrete Math., 11, 224-240 (preliminary version appeared in Proceedings of the Sixth Annual Symposium on Discrete Algorithms (SODA 95), San Francisco, CA, 614-623)

Jens Lagergren