A Multiclass Classification Tool Using Cloud Computing Architecture

Multiclass classification is an important technique to many complex biomedicine problems. Genetic algorithms (GA) are proven to be effective to select features prior to multiclass classification by support vector machines (SVM). However, their use is computation intensive. Based on SOA (Service Oriented Architecture) design principles, this paper proposes a cloud computing framework that exploits the inherent parallelism of GA-SVM classification to speed up the work. The performance evaluations on an mRNA benchmark cancer dataset have shown the effectiveness and efficiency of the framework. With a user-friendly web interface, the framework provides researchers an easy way to investigate the unrevealed secrets in the fast-growing repository of biomedical data.

[1]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[2]  Geoffrey C. Fox,et al.  IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Cloud Technologies for Bioinformatics Applications , 2022 .

[3]  Woo Ick Yang,et al.  Genome-wide molecular characterization of mucinous colorectal adenocarcinoma using cDNA microarray analysis. , 2011, Oncology reports.

[4]  Wen-Chung Kao,et al.  Automatic phonocardiograph signal analysis for detecting heart valve disorders , 2011, Expert Syst. Appl..

[5]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[6]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[7]  Shrideep Pallickara,et al.  Analyzing Electroencephalograms Using Cloud Computing Techniques , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[8]  M. Batzer,et al.  Alu repeats and human genomic diversity , 2002, Nature Reviews Genetics.

[9]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[10]  Subha Madhavan,et al.  PUGSVM: a caBIGTM analytical tool for multiclass gene selection and predictive classification , 2011, Bioinform..

[11]  Xuefeng Bruce Ling,et al.  Multiclass cancer classification and biomarker discovery using GA-based algorithms , 2005, Bioinform..

[12]  Mitsuo Tachibana,et al.  Expression of Trefoil Factor Family Members Correlates with Patient Prognosis and Neoangiogenesis , 2005, Clinical Cancer Research.

[13]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[16]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  R. Aharonov,et al.  MicroRNAs accurately identify cancer tissue origin , 2008, Nature Biotechnology.

[18]  G. Nolan,et al.  Cloud and heterogeneous computing solutions exist today for the emerging big data problems in biology , 2011, Nature Reviews Genetics.

[19]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[21]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[22]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[23]  A M Martelli,et al.  Phosphoinositide 3-kinase/Akt signaling pathway and its therapeutical implications for human acute myeloid leukemia , 2006, Leukemia.