Common and Individual Features Analysis: Beyond Canonical Correlation Analysis

Very often data we encounter in practice is a collection of matrices rather than a single matrix. These multi-block data are naturally linked and hence often share some common features and at the same time they have their own individual features, due to the background in which they are measured and collected. In this study we proposed a new scheme of common and individual feature analysis (CIFA) that processes multi-block data in a linked way aiming at discovering and separating their common and individual features. According to whether the number of common features is given or not, two efficient algorithms were proposed to extract the common basis which is shared by all data. Then feature extraction is performed on the common and the individual spaces separately by incorporating the techniques such as dimensionality reduction and blind source separation. We also discussed how the proposed CIFA can significantly improve the performance of classification and clustering tasks by exploiting common and individual features of samples respectively. Our experimental results show some encouraging features of the proposed methods in comparison to the state-of-the-art methods on synthetic and real data.

[1]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[2]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[3]  Minje Kim,et al.  Nonnegative Matrix Partial Co-Factorization for Spectral and Temporal Drum Source Separation , 2011, IEEE Journal of Selected Topics in Signal Processing.

[4]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[5]  Feiping Nie,et al.  Improved MinMax Cut Graph Clustering with Nonnegative Relaxation , 2010, ECML/PKDD.

[6]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[7]  Tommy Löfstedt,et al.  OnPLS—a novel multiblock method for the modelling of predictive and orthogonal variation , 2011 .

[8]  Eric Moulines,et al.  A blind source separation technique using second-order statistics , 1997, IEEE Trans. Signal Process..

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Fernando De la Torre,et al.  A Least-Squares Framework for Component Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[12]  Johan Trygg,et al.  O2‐PLS, a two‐block (X–Y) latent variable regression (LVR) method with an integral OSC filter , 2003 .

[13]  A. Tenenhaus,et al.  Regularized Generalized Canonical Correlation Analysis , 2011, Eur. J. Oper. Res..

[14]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[16]  Kyuwan Choi,et al.  Detecting the Number of Clusters in n-Way Probabilistic Clustering , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Allen S. Mandel Comment … , 1978, British heart journal.

[18]  John Shawe-Taylor,et al.  Convergence analysis of kernel Canonical Correlation Analysis: theory and practice , 2008, Machine Learning.

[19]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[20]  Gene H. Golub,et al.  Matrix computations , 1983 .

[21]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[22]  Michael Biehl,et al.  A General Framework for Dimensionality-Reducing Data Visualization Mapping , 2012, Neural Computation.

[23]  Eric F Lock,et al.  JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. , 2011, The annals of applied statistics.

[24]  Bernt Schiele,et al.  Analyzing appearance and contour based methods for object categorization , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[25]  Andrzej Cichocki,et al.  Fast Nonnegative Matrix/Tensor Factorization Based on Low-Rank Approximation , 2012, IEEE Transactions on Signal Processing.

[26]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Sheng Luo,et al.  Population Value Decomposition, a Framework for the Analysis of Image Populations , 2011, Journal of the American Statistical Association.

[28]  Chong-sun Kim Canonical Analysis of Several Sets of Variables , 1973 .

[29]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[30]  Jieping Ye,et al.  Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[32]  Vince D. Calhoun,et al.  Joint Blind Source Separation by Multiset Canonical Correlation Analysis , 2009, IEEE Transactions on Signal Processing.

[33]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  David J. Kriegman,et al.  Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Chong-Yung Chi,et al.  Nonnegative Least-Correlated Component Analysis for Separation of Dependent Sources by Volume Maximization , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Andrzej Cichocki,et al.  Adaptive Blind Signal and Image Processing - Learning Algorithms and Applications , 2002 .

[37]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[38]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .