A statistical model of proteolytic digestion

We present a stochastic model of proteolytic digestion of a proteome, assuming the distribution of parent protein lengths in the proteome, the relative abundances of the 20 amino acids in the proteome, and the digestion "rules" of the enzyme used in the digestion. We derived a closed form expression for the fragment mass distribution for a large class of enzymes including the widely used trypsin. The expression uses the distribution of lengths in a mixture of proteins taken from a proteome, as well as the relative abundances of the 20 amino acids in the proteome. The agreement between theory and the in silica digest is excellent.

[1]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[2]  P. Højrup,et al.  Rapid identification of proteins by peptide-mass fingerprinting , 1993, Current Biology.

[3]  G. Shaw,et al.  Rapid identification of proteins. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[4]  G. Gonnet,et al.  Protein identification by mass profile fingerprinting. , 1993, Biochemical and biophysical research communications.

[5]  George V. Moustakides,et al.  Extension of Wald's first lemma to Markov processes , 1999, Journal of Applied Probability.

[6]  R. Khan,et al.  Sequential Tests of Statistical Hypotheses. , 1972 .