Sample classification from protein mass spectrometry, by 'peak probability contrasts'

MOTIVATION Early cancer detection has always been a major research focus in solid tumor oncology. Early tumor detection can theoretically result in lower stage tumors, more treatable diseases and ultimately higher cure rates with less treatment-related morbidities. Protein mass spectrometry is a potentially powerful tool for early cancer detection. We propose a novel method for sample classification from protein mass spectrometry data. When applied to spectra from both diseased and healthy patients, the 'peak probability contrast' technique provides a list of all common peaks among the spectra, their statistical significance and their relative importance in discriminating between the two groups. We illustrate the method on matrix-assisted laser desorption and ionization mass spectrometry data from a study of ovarian cancers. RESULTS Compared to other statistical approaches for class prediction, the peak probability contrast method performs as well or better than several methods that require the full spectra, rather than just labelled peaks. It is also much more interpretable biologically. The peak probability contrast method is a potentially useful tool for sample classification from protein mass spectrometry data.

[1]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[2]  M. Hansen,et al.  Tumor markers in patients with lung cancer. , 1986, Chest.

[3]  T. Yip,et al.  New desorption strategies for the mass spectrometric analysis of macromolecules , 1993 .

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  S. Weinberger,et al.  Recent advancements in surface‐enhanced laser desorption/ionization‐time of flight‐mass spectrometry , 2000, Electrophoresis.

[7]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  K. Zatloukal,et al.  Identification of tumor antigens in renal cell carcinoma by serological proteome analysis , 2001, Proteomics.

[9]  Bradley Efron,et al.  Microarrays empirical Bayes methods, and false discovery rates , 2001 .

[10]  P. Schellhammer,et al.  Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. , 2002, Cancer research.

[11]  D. Chan,et al.  Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. , 2002, Clinical chemistry.

[12]  E. Petricoin,et al.  Clinical proteomics: translating benchside promise into bedside reality , 2002, Nature Reviews Drug Discovery.

[13]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian Cancer , 2002 .

[14]  John D. Storey A direct approach to false discovery rates , 2002 .

[15]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[16]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  P. Schellhammer,et al.  Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. , 2002, Clinical chemistry.

[18]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[19]  E. Fung,et al.  Proteomic approaches to tumor marker discovery. , 2002, Archives of pathology & laboratory medicine.

[20]  S. Swensen,et al.  Screening for lung cancer with low-dose spiral computed tomography. , 2000, American journal of respiratory and critical care medicine.

[21]  B. Levin,et al.  American Cancer Society Guidelines for the Early Detection of Cancer , 2002, CA: a cancer journal for clinicians.

[22]  S. Hanash,et al.  Disease proteomics , 2003, Nature.

[23]  K. Kozak,et al.  Identification of biomarkers for ovarian cancer using strong anion-exchange ProteinChips: Potential use in diagnosis and prognosis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  J. Potter,et al.  A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. , 2003, Biostatistics.

[25]  David Ward,et al.  Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data , 2003, Bioinform..

[26]  Robert A. Smith,et al.  American Cancer Society Guidelines for the Early Detection of Cancer, 2003 , 2003, CA: a cancer journal for clinicians.

[27]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[28]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.