Empirical Bayes Protein identification algorithm
EBP is the first generally applicable protein identification method that can combine the results of multiple search algorithms and replicated LC MS/MS experiments. EBP is compatible with the Trans-Proteomic Pipeline (TPP) from http://systemsbiology.org and can be used as an alternative to the ProteinProphet algorithm.
Price TS, Lucitt MB, Wu W, Austin DJ, Pizarro A, Yocum AK, Blair IA, FitzGerald GA, Grosser T.
Click here to download and install EBP.
EBP calculates protein expression probabilities based on peptide sequence identifications from search algorithms such as Mascot and Sequest. Protein lists can be generated by choosing proteins whose expression probabilities exceed a threshold value. Varying the probability threshold allows the sensitivity of protein identification to be balanced against the false positive error rate.
The statistical model assumes that every peptide sequence that could theoretically result from enzymatic digestion of a protein in the search database has a chance of being identified in the search results, whether correctly or incorrectly. The probabilities of correct identification are combined across multiple peptide searches using a function that returns the maximum probability from consensus identifications, and penalizes non-consensual identifications.
Both correct and incorrect peptide sequence identifications are assumed to occur at random in this "space" of peptides, at rates that are governed by model parameters including protein length, estimated protein abundance, the size of the search database, and the number of peptide sequence identifications in the dataset. Degenerate peptides whose sequence matches multiple proteins are treated using "Occam's Razor", a principle by which the smallest set of probable proteins is chosen that is sufficient to explain the peptide sequence identifications.
For each protein in the database, a likelihood ratio is calculated for the possibility that the peptide identifications whose sequence matches the protein are all incorrect. These likelihood ratios are used to estimate the expression probabilities, from which updated parameter estimates are obtained. The procedure is iterated until the algorithm converges at the maximum likelihood estimates.Replicated datasets can be analyzed by estimating multiple sets of model parameters can be estimated simultaneously. In this way, hypotheses about protein expression can be tested using the results of replicate experiments.
Application of EBP to the Zebrafish proteome
Three biological replicate protein samples from 5-day old Zebrafish (Danio Rerio) larvae were digested with trypsin and subjected and two-dimensional liquid chromotography prior to electrospray ionization tandem mass spectrometry using a Thermo Finnigan LTQ ion trap mass spectrometer. The resulting product ion spectra were searched against an IPI Zebrafish protein sequence database using both Sequest and Mascot algorithms. The results below are relate to the combined analysis of all three samples, using both Sequest and Mascot search results.
Click here for EBP output in interactive ebpXML format
Click here for EBP output in Excel format
For help or information, email or .
*) Current affiliation: Institute of Psychiatry, Kings College, London, UK.
Institute for Translational Medicine and Therapeutics (ITMAT), University of Pennsylvania School of Medicine.