Canonical polyadic decomposition for unsupervised linear feature extraction from protein profiles

We propose a method for unsupervised linear feature extraction through tensor decomposition. The linear feature extraction can be formulated as a canonical polyadic decomposition (CPD) of a third-order tensor when transformation matrix is constrained to be equal to the Khatri-Rao product of two matrices. Therefore, standard algorithms for computing CPD decomposition can be used for feature extraction. The proposed method is validated on publicly available low-resolution mass spectra of cancerous and non-cancerous samples. Obtained results indicate that this approach could be of practical importance in analysis of protein expression profiles.

[1]  F. L. Hitchcock The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[2]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[3]  Paul Terry,et al.  Application of the GA/KNN method to SELDI proteomics data , 2004, Bioinform..

[4]  Ivica Kopriva,et al.  A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels , 2011, BMC Bioinformatics.

[5]  Rasmus Bro,et al.  A comparison of algorithms for fitting the PARAFAC model , 2006, Comput. Stat. Data Anal..

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Henk A L Kiers,et al.  A fast method for choosing the numbers of components in Tucker3 analysis. , 2003, The British journal of mathematical and statistical psychology.

[8]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Antonio Artés-Rodríguez,et al.  Maximization of Mutual Information for Supervised Linear Feature Extraction , 2007, IEEE Transactions on Neural Networks.

[10]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[11]  A. Cichocki,et al.  FEATURE EXTRACTION FOR CANCER PREDICTION BY TENSOR DECOMPOSITION OF 1D PROTEIN EXPRESSION LEVELS , 2011 .

[12]  A. Cichocki,et al.  Tensor decompositions for feature extraction and classification of high dimensional datasets , 2010 .

[13]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[14]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[15]  Andreas Zell,et al.  A factorization method for the classification of infrared spectra , 2010, BMC Bioinformatics.

[16]  E. Petricoin,et al.  MECHANISMS OF DISEASE Mechanisms of disease Use of proteomic patterns in serum to identify ovarian cancer , 2022 .

[17]  E. Petricoin,et al.  Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[18]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[19]  Feiping Nie,et al.  Extracting the optimal dimensionality for local tensor discriminant analysis , 2009, Pattern Recognit..

[20]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[21]  Heikki Lyytinen,et al.  Benefits of Multi-Domain Feature of mismatch Negativity Extracted by Non-Negative Tensor Factorization from EEG Collected by Low-Density Array , 2012, Int. J. Neural Syst..

[22]  M.M.A. Salama,et al.  Mass spectrometry-based proteomic pattern analysis for prostate cancer detection using neural networks with statistical significance test-based feature selection , 2009, 2009 IEEE Toronto International Conference Science and Technology for Humanity (TIC-STH).

[23]  R. Bro,et al.  A new efficient method for determining the number of components in PARAFAC models , 2003 .

[24]  K. Shadan,et al.  Available online: , 2012 .