Analysis of gene expression profiles: class discovery and leaf ordering

We approach the class discovery and leaf ordering problems using spectral graph partitioning methodologies. For class discovery or clustering, we present a min-max cut hierarchical clustering method and show it produces subtypes quite close to human expert labeling on the lymphoma dataset with 6 classes. On optimal leaf ordering for displaying the gene expression data, we present a sequential ordering method that can be computed in O(n2) time which also preserves the cluster structure. We also show that the well known statistic methods such as F-statistic test and the principal component analysis are very useful in gene expression analysis.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  R. Sharan,et al.  CLICK: a clustering algorithm with applications to gene expression analysis. , 2000, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[5]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  A. Hoffman,et al.  Lower bounds for the partitioning of graphs , 1973 .

[7]  Richard M. Karp,et al.  CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts , 2001, ISMB.

[8]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[9]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  M. Fiedler Algebraic connectivity of graphs , 1973 .

[11]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[12]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[13]  M. D. Wilkinson,et al.  Management science , 1989, British Dental Journal.

[14]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[15]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[16]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[17]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[19]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[20]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Kenneth M. Hall An r-Dimensional Quadratic Placement Algorithm , 1970 .

[22]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[23]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[24]  H. D. Simon,et al.  A spectral algorithm for envelope reduction of sparse matrices , 1993, Supercomputing '93. Proceedings.

[25]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[26]  Chung-Kuan Cheng,et al.  An improved two-way partitioning algorithm with stable performance [VLSI] , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[27]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[28]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[29]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[30]  Chris H. Q. Ding,et al.  A spectral method to separate disconnected and nearly-disconnected web graph components , 2001, KDD '01.