Gold Standard of Protein Expression in Yeast

Smriti Ramakrishnan
Computer Science Department, University of Texas at Austin, Austin, USA
Christine Vogel
Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, USA
Email: smriti at cs utexas edu, cvogel at mail utexas edu

Protein expression is one of the most important variables describing the 'dimensions' of cellular systems. In contrast to a plethora of mRNA expression data, only limited numbers of protein expression datasets exist, with the exception of yeast. For wild-type yeast grown in rich medium to log-phase a number of proteomics datasets are available that define expression both respect to protein presense and/or protein concentrations.

We assembled these datasets to derive what we call a 'gold standard' of protein expression in yeast - although in reality this may rather be a silver or bronze standard.

Published datasets on protein expression

We used the intersection of MS-based and non-MS-based experimental datasets that are publically available as reference set for the presence (expression) of proteins in wild-type yeast, growing in rich medium, log phase. The file names list first (and last) author, journal and publication year.

Individual datasets

MS-based datasets

1. de Godoy LM, Olsen JV, de Souza GA, Li G, Mortensen P, et al. (2006), Genome Biol, 7:R50 File
2. Washburn MP, Wolters D, Yates JR, 3rd (2001), Nat Biotechnol 19: 242-247 File
3. Peng J, Elias JE, Thoreen CC, Licklider LJ, Gygi SP (2003), J Proteome Res 2: 43-50. File
4. Chi A, Huttenhower C, Geer LY, Coon JJ, Syka JE, et al. (2007), Proc Natl Acad Sci U S A 104: 2193-2198. File

The dataset Data_02 described below represents a fifth list of proteins detected in yeast using mass spectrometry. Data_02 has been collected in-house.

Non-MS-based datasets

1. Newman JR, Ghaemmaghami S, Ihmels J, Breslow DK, Noble M, et al. (2006), Nature 441(7095):840-6 File
2. Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, et al. (2003), Nature 425: 737-741. File
3. Futcher B, Latter GI, Monardo P, McLaughlin CS, Garrels JI (1999), Mol Cell Biol 19: 7357-7368. File


Assembled by Smriti Ramakrishnan.
  1. All proteins in the yeast genome considered:
    6330 proteins.
  2. Proteins are considered present in yeast cytosol if they are in at least two of the four MS-based datasets (excluding the yeast Orbitrap data):
    1648 proteins.
  3. Proteins are considered present in yeast cytosol if they are in at least two of the five MS-based datasets (including the yeast Orbitrap data):
    2060 proteins.
  4. Proteins are considered present in yeast cytosol if any of the three non-MS-based datasets:
    4097 proteins.
  5. Proteins are considered present in yeast cytosol if they are in either set (2) or set (4) or both:
    4265 proteins.

Details on Mass Spectrometry dataset: Yeast - Orbitrap - Wild-type grown in rich medium (YPD)

See MS Data Repository, Dataset Data_02.

LC/LC-MS/MS data was collected on an Thermo LTQ Orbitrap, using a total of 8 injections with varying conditions. The sample was prepared from yeast cellular extract, as described in the standard protocols in the MS Data Repository and here.

Primary authors of this dataset are: Dr. John Prince (Boulder, Colorado) and Dr. Zhihua Li (Austin, Texas). Secondary analysis of the MS/MS data was also performed by: Dr. Christine Vogel (Austin, Texas).


