Sparse Component Analysis: a New Tool for Data Mining

In many practical problems for data mining the data X under consideration (given as (m × N)-matrix) is of the form X = AS, where the matrices A and S with dimensions m×n and n × N respectively (often called mixing matrix or dictionary and source matrix) are unknown (m ≤ n < N). We formulate conditions (SCA-conditions) under which we can recover A and S uniquely (up to scaling and permutation), such that S is sparse in the sense that each column of S has at least one zero element. We call this the Sparse Component Analysis problem (SCA). We present new algorithms for identification of the mixing matrix (under SCA-conditions), and for source recovery (under identifiability conditions). The methods are illustrated with examples showing good performance of the algorithms. Typical examples are EEG and fMRI data sets, in which the SCA algorithm allows us to detect some features of the brain signals. Special attention is given to the application of our method to the transposed system X T = S T A T utilizing the sparseness of the mixing matrix A in appropriate situations. We note that the sparseness conditions could be obtained with some preprocessing methods and no independence conditions for the source signals are imposed (in contrast to Independent Component Analysis). We applied our method to fMRI data sets with dimension (128 × 128 × 98) and to EEG data sets from a 256-channels EEG machine.

[1]  Andrzej Cichocki,et al.  Adaptive blind signal and image processing , 2002 .

[2]  Andrzej Cichocki,et al.  Sparse component analysis of overcomplete mixtures by improved basis pursuit method , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[3]  Fabian J. Theis,et al.  A geometric algorithm for overcomplete linear ICA , 2004, Neurocomputing.

[4]  E. Oja,et al.  Independent Component Analysis , 2013 .

[5]  G F Potts,et al.  Dense sensor array topography of the event-related potential to task-relevant auditory stimuli. , 1998, Electroencephalography and clinical neurophysiology.

[6]  Fathi M. Salem,et al.  ALGEBRAIC OVERCOMPLETE INDEPENDENT COMPONENT ANALYSIS , 2003 .

[7]  Yehoshua Y. Zeevi,et al.  Separation of reflections via sparse ICA , 2003, SPIE Optics + Photonics.

[8]  T. Picton,et al.  The N1 wave of the human electric and magnetic response to sound: a review and an analysis of the component structure. , 1987, Psychophysiology.

[9]  J. Timmer,et al.  Some considerations on estimating event-related brain signals , 2005, Journal of Neural Transmission / General Section JNT.

[10]  J. Mazziotta,et al.  Rapid Automated Algorithm for Aligning and Reslicing PET Images , 1992, Journal of computer assisted tomography.

[11]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[13]  Adel Belouchrani,et al.  JOINT CUMULANT AND CORRELATION BASED SIGNAL SEPARATION WITH APPLICATION TO EEG DATA ANALYSIS � , 2001 .

[14]  Barak A. Pearlmutter,et al.  Blind Source Separation by Sparse Decomposition in a Signal Dictionary , 2001, Neural Computation.

[15]  Victoria Stodden,et al.  When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[16]  Yehoshua Y. Zeevi,et al.  BLIND SEPARATION OF REFLECTIONS USING SPARSE ICA , 2003 .

[17]  Fabian J. Theis,et al.  Blind source separation and sparse component analysis of overcomplete mixtures , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Paul S. Bradley,et al.  Mathematical Programming for Data Mining: Formulations and Challenges , 1999, INFORMS J. Comput..

[19]  S Makeig,et al.  Analysis of fMRI data by blind separation into independent spatial components , 1998, Human brain mapping.

[20]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[21]  T. Sejnowski,et al.  Analysis and visualization of single‐trial event‐related potentials , 2001, Human brain mapping.

[22]  Paul S. Bradley,et al.  k-Plane Clustering , 2000, J. Glob. Optim..

[23]  Michael Zibulevsky,et al.  Underdetermined blind source separation using sparse representations , 2001, Signal Process..

[24]  Armando Malanda,et al.  Independent Component Analysis as a Tool to Eliminate Artifacts in EEG: A Quantitative Study , 2003, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[25]  R. Barry,et al.  Removal of ocular artifact from the EEG: a review , 2000, Neurophysiologie Clinique/Clinical Neurophysiology.

[26]  Pando G. Georgiev,et al.  Optimization techniques for independent component analysis with applications to EEG data , 2004 .

[27]  L. K. Hansen,et al.  Independent component analysis of functional MRI: what is signal and what is noise? , 2003, Current Opinion in Neurobiology.

[28]  Terrence J. Sejnowski,et al.  Blind source separation of more sources than mixtures using overcomplete representations , 1999, IEEE Signal Processing Letters.

[29]  Axel Wismüller,et al.  Cluster Analysis of Biomedical Image Time-Series , 2002, International Journal of Computer Vision.

[30]  Michael Zibulevsky,et al.  Underdetermined blind source separation using sparse representations Pau Bo # lla , 2001 .

[31]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..