Machine Learning in Bioinformatics

This article reviews machine learning methods for bioinformatics. It presents modelling methods, such as supervised classification, clustering and probabilistic graphical models for knowledge discovery, as well as deterministic and stochastic heuristics for optimization. Applications in genomics, proteomics, systems biology, evolution and text mining are also shown.

[1]  F. Valafar Pattern recognition techniques in microarray data analysis: a survey. , 2002, Annals of the New York Academy of Sciences.

[2]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[3]  David Page,et al.  Modelling regulatory pathways in E. coli from time series expression profiles , 2002, ISMB.

[4]  Royston Goodacre,et al.  Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data , 2005, Bioinform..

[5]  Bart De Moor,et al.  A genetic algorithm for the detection of new cis-regulatory modules in sets of coregulated genes , 2004, Bioinform..

[6]  KonagayaAkihiko,et al.  Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm , 2005 .

[7]  Sung-Bae Cho,et al.  Prediction of colon cancer using an evolutionary neural network , 2004, Neurocomputing.

[8]  J. Ross,et al.  Genetic-algorithm selection of a regulatory structure that directs flux in a simple metabolic model. , 1995, Biophysical journal.

[9]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[10]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[11]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[12]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[13]  David G. Stork,et al.  Pattern Classification , 1973 .

[14]  Patrick Tan,et al.  Genetic algorithms applied to multi-class prediction for the analysis of gene expression data , 2003, Bioinform..

[15]  Jaime R. Robles,et al.  lga972: a cross-platform application for optimizing LD studies using a genetic algorithm , 2004, Bioinform..

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[18]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[19]  Sudhir Kumar,et al.  A stepwise algorithm for finding minimum evolution trees. , 1996, Molecular biology and evolution.

[20]  Yan Cui,et al.  Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information , 2005, Bioinform..

[21]  Simon Kasif,et al.  Modeling splice sites with Bayes networks , 2000, Bioinform..

[22]  W. Wong,et al.  Evolutionary Monte Carlo for protein folding simulations , 2001 .

[23]  Masaru Tomita,et al.  Dynamic modeling of genetic networks using genetic algorithm and S-system , 2003, Bioinform..

[24]  Jesper Tegnér,et al.  Growing Bayesian network models of gene networks from seed genes , 2005, ECCB/JBI.

[25]  Satoru Miyano,et al.  Predicting gene regulation by sigma factors in Bacillus subtilis from genome-wide data , 2004, ISMB/ECCB.

[26]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[27]  Gary B. Lamont,et al.  Toward Effective Polypeptide Structure Prediction with Parallel Fast Messy Genetic Algorithms , 2003 .

[28]  A C C Gibbs,et al.  Data Analysis , 2009, Encyclopedia of Database Systems.

[29]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[30]  Han-Lin Li,et al.  A linear programming approach for identifying a consensus sequence on DNA sequences , 2005, Bioinform..

[31]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[32]  David Corne,et al.  Evolutionary Computation In Bioinformatics , 2003 .

[33]  Zoubin Ghahramani,et al.  A Bayesian network model for protein fold and remote homologue recognition , 2002, Bioinform..

[34]  Tao Jiang,et al.  Identifying transcription factor binding sites through Markov chain optimization , 2002, ECCB.

[35]  Kevin Murphy,et al.  Modelling Gene Expression Data using Dynamic Bayesian Networks , 2006 .

[36]  Simon Cawley,et al.  HMM sampling and applications to gene finding and alternative splicing , 2003, ECCB.

[37]  Adam Prügel-Bennett,et al.  Training HMM structure with genetic algorithm for biological sequence analysis , 2004, Bioinform..

[38]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[39]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[40]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[41]  P. Pevzner,et al.  Computational Molecular Biology , 2000 .

[42]  Wilfried Seidel,et al.  Editorial: recent developments in mixture models , 2003, Comput. Stat. Data Anal..

[43]  Michael J. E. Sternberg,et al.  Predicting the Sub-Cellular Location of Proteins from Text Using Support Vector Machines , 2001, Pacific Symposium on Biocomputing.

[44]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[45]  Qiang Yang,et al.  Guest Editors' Introduction to the Special Issue: Machine Learning for Bioinformatics - Part 1 , 2005, IEEE ACM Trans. Comput. Biol. Bioinform..

[46]  Kalpathi R. Subramanian,et al.  Interactive Analysis of Gene Interactions Using Graphical gaussian model , 2003, BIOKDD.

[47]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[48]  E. Lander,et al.  Protein secondary structure prediction using nearest-neighbor methods. , 1993, Journal of molecular biology.

[49]  Thomas D. Schneider,et al.  Fast Multiple Alignment of Ungapped DNA Sequences Using Information Theory and a Relaxation Method , 1996, Discret. Appl. Math..

[50]  V. W. Porto,et al.  Discovery of RNA structural elements using evolutionary computation. , 2002, Nucleic acids research.

[51]  Satoru Miyano,et al.  Using Protein-Protein Interactions for Refining Gene Networks Estimated from Microarray Data by Bayesian Networks , 2003, Pacific Symposium on Biocomputing.

[52]  Pedro Larrañaga,et al.  Protein Folding in 2-Dimensional Lattices with Estimation of Distribution Algorithms , 2004, ISBMDA.

[53]  Lawrence Carin,et al.  Joint Classifier and Feature Optimization for Comprehensive Cancer Diagnosis Using Gene Expression Data , 2004, J. Comput. Biol..

[54]  L L Looger,et al.  Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. , 2001, Journal of molecular biology.

[55]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[56]  Daniel Ashlock,et al.  Evolutionary Computation and Fractal Visualization of Sequence Data , 2003 .

[57]  L. N. Kanal,et al.  Handbook of Statistics, Vol. 2. Classification, Pattern Recognition and Reduction of Dimensionality. , 1985 .

[58]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[59]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[60]  Padraig Cunningham,et al.  Biclustering of expression data using simulated annealing , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[61]  Jacek Blazewicz,et al.  RNA tertiary structure determination: NOE pathways construction by tabu search , 2005, Bioinform..

[62]  Iñaki Inza,et al.  Gene selection by sequential search wrapper approaches in microarray cancer class prediction , 2002, J. Intell. Fuzzy Syst..

[63]  Aik Choon Tan,et al.  Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[64]  Vladimir Pavlovic,et al.  A Bayesian framework for combining gene predictions , 2002, Bioinform..

[65]  Hannu Toivonen,et al.  Data Mining In Bioinformatics , 2005 .

[66]  Serafim Batzoglou,et al.  CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[67]  C. Ouzounis,et al.  Genome-wide identification of genes likely to be involved in human genetic disease. , 2004, Nucleic acids research.

[68]  P. Grassberger,et al.  Growth algorithms for lattice heteropolymers at low temperatures , 2002, cond-mat/0208042.

[69]  Gary B. Fogel Evolutionary Computation for the Inference of Natural Evolutionary Histories , .

[70]  Hao Chen,et al.  Beyond the rotamer library: Genetic algorithm combined with the disturbing mutation process for upbuilding protein side‐chains , 2003, Proteins.

[71]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.

[72]  James M. Bower,et al.  Computational modeling of genetic and biochemical networks , 2001 .

[73]  John R. Koza,et al.  Reverse Engineering of Metabolic Pathways from Observed Data Using Genetic Programming , 2000, Pacific Symposium on Biocomputing.

[74]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[75]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[76]  Yair Weiss,et al.  Approximate Inference and Protein-Folding , 2002, NIPS.

[77]  Robert M. MacCallum,et al.  Striped sheets and protein contact prediction , 2004, ISMB/ECCB.

[78]  Sun Yong Kim,et al.  Bootstrap Analysis of Gene Networks Based on Bayesian Networks and Nonparametric Regression , 2002 .

[79]  Heinz-Theodor Mevissen,et al.  Decision tree-based formation of consensus protein secondary structure prediction , 1999, Bioinform..

[80]  M. Yasunaga,et al.  Aligning multiple protein sequences by parallel hybrid genetic algorithm. , 2002, Genome informatics. International Conference on Genome Informatics.

[81]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[82]  Pedro Larrañaga,et al.  A Guide to the Literature on Inferring Genetic Networks by Probabilistic Graphical Models , 2005, Data Analysis and Visualization in Genomics and Proteomics.

[83]  Fred W. Glover,et al.  Future paths for integer programming and links to artificial intelligence , 1986, Comput. Oper. Res..

[84]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[85]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[86]  Hwan-Gue Cho,et al.  An automatic block and spot indexing with k-nearest neighbors graph for microarray image analysis , 2002, ECCB.

[87]  Juan Julián Merelo Guervós,et al.  Parallel Problem Solving from Nature — PPSN VII , 2002, Lecture Notes in Computer Science.

[88]  David G. Kleinbaum,et al.  Logistic regression analysis of epidemiologic data: theory and practice , 1982 .

[89]  Jacek Blazewicz,et al.  Application of tabu search strategy for finding low energy structure of protein , 2005, Artif. Intell. Medicine.

[90]  3 Classification and Regression Trees ( CART ) 3 . 1 Introduction , .

[91]  Wanlei Zhou,et al.  Biological Sequence Assembly and Alignment , 2005 .

[92]  Vasant Honavar,et al.  A two-stage classifier for identification of protein-protein interface residues , 2004, ISMB/ECCB.

[93]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[94]  J. E. Poliscuk,et al.  The machine learning approach: analysis of experimental results , 2003 .

[95]  Bill C White,et al.  Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases , 2003, BMC Bioinformatics.

[96]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[97]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[98]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[99]  Hitoshi Iba,et al.  Inference of gene regulatory networks using s-system and differential evolution , 2005, GECCO '05.

[100]  Pedro Larrañaga,et al.  Bioinformatics Advance Access published August 24, 2007 A review of feature selection techniques in bioinformatics , 2022 .

[101]  Hitoshi Iba,et al.  Modeling genetic network by hybrid GP , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[102]  Sue Whitesides,et al.  A complete and effective move set for simplified protein folding , 2003, RECOMB '03.

[103]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[104]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[105]  Masato Ishikawa,et al.  Comprehensive study on iterative algorithms of multiple sequence alignment , 1995, Comput. Appl. Biosci..

[106]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[107]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[108]  Ludmila I. Kuncheva,et al.  Genetic Algorithm for Feature Selection for Parallel Classifiers , 1993, Inf. Process. Lett..

[109]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[110]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[111]  Dirk Husmeier Inferring Genetic Regulatory Networks from Microarray Experiments with Bayesian Networks , 2005 .

[112]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[113]  Ming-Yang Kao,et al.  A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry , 2000, SODA '00.

[114]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[115]  Jain-Shing Wu,et al.  Primer design using genetic algorithm , 2004, Bioinform..

[116]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[117]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[118]  David C. Torney,et al.  Greedy algorithms for optimized DNA sequencing , 1999, SODA '99.

[119]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[120]  Edmund K. Burke,et al.  Multimeme Algorithms for Protein Structure Prediction , 2002, PPSN.

[121]  Alan Wells,et al.  Modeling of signal-response cascades using decision tree analysis , 2005, Bioinform..

[122]  Bart De Moor,et al.  Biclustering microarray data by Gibbs sampling , 2003, ECCB.

[123]  Robert Castelo,et al.  Splice site identification by idlBNs , 2004, ISMB/ECCB.

[124]  Jonathan D Wren,et al.  Simulated annealing of microarray data reduces noise and enables cross-experimental comparisons. , 2004, DNA and cell biology.

[125]  David Ward,et al.  Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data , 2003, Bioinform..

[126]  Irmtraud M. Meyer,et al.  Gene structure conservation aids similarity based gene prediction. , 2004, Nucleic acids research.

[127]  BayesianNetworksIrene,et al.  Inferring Regulatory Pathways in E . Coli using Dynami , 2001 .

[128]  Emanuel Falkenauer,et al.  Chapter 10 – Clustering Microarray Data with Evolutionary Algorithms , 2003 .

[129]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[130]  Kathleen Marchal,et al.  Functional bioinformatics of microarray data: from expression to regulation , 2002, Proc. IEEE.

[131]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[132]  Jakob Skou Pedersen,et al.  Gene finding with a hidden Markov model of genome structure and evolution , 2003, Bioinform..

[133]  John J. Grefenstette,et al.  Application of machine learning in SNP discovery , 2006, BMC Bioinformatics.

[134]  Pedro Larrañaga,et al.  Learning Bayesian network structures by searching for the best ordering with genetic algorithms , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[135]  D Husmeier,et al.  Reverse engineering of genetic networks with Bayesian networks. , 2003, Biochemical Society transactions.

[136]  Petra Perner,et al.  Mining knowledge for HEp-2 cell image classification , 2002, Artif. Intell. Medicine.

[137]  Steven Salzberg,et al.  Locating Protein Coding Regions in Human DNA Using a Decision Tree Algorithm , 1995, J. Comput. Biol..

[138]  Dirk Husmeier,et al.  Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks , 2003, Bioinform..

[139]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[140]  Jian Su,et al.  Recognition of protein/gene names from text using an ensemble of classifiers , 2005, BMC Bioinformatics.

[141]  J. Kittler,et al.  Feature Set Search Alborithms , 1978 .

[142]  D. N. Geary Mixture Models: Inference and Applications to Clustering , 1989 .

[143]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[144]  Suchendra M. Bhandarkar,et al.  A Comparison of Physical Mapping Algorithms Based on the Maximum Likelihood Model , 2003, Bioinform..

[145]  Doheon Lee,et al.  Modularized learning of genetic interaction networks from biological annotations and mRNA expression data , 2005, Bioinform..

[146]  L. J. Park,et al.  Application of genetic algorithms to parameter estimation of bioprocesses , 2006, Medical and Biological Engineering and Computing.

[147]  R. Lavery,et al.  A new approach to the rapid determination of protein side chain conformations. , 1991, Journal of biomolecular structure & dynamics.

[148]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[149]  Steen Knudsen,et al.  Promoter2.0: for the recognition of PolII promoter sequences , 1999, Bioinform..

[150]  M Ishikawa,et al.  Multiple sequence alignment by parallel simulated annealing , 1993, Comput. Appl. Biosci..

[151]  H. Iba,et al.  Inference of gene regulatory networks by means of dynamic differential Bayesian networks and nonparametric regression. , 2004, Genome informatics. International Conference on Genome Informatics.

[152]  Patrice Koehl,et al.  Building protein lattice models using self-consistent mean field theory , 1998 .

[153]  Pedro Larrañaga,et al.  GUEST EDITORIAL: Data mining in genomics and proteomics , 2004 .

[154]  Pierre Baldi,et al.  Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners , 2002, ISMB.

[155]  Jacek Blazewicz,et al.  Tabu Search Method for Determining Sequences of Amino Acids in Long Polypeptides , 2005, EvoWorkshops.

[156]  Lakhmi C. Jain,et al.  Bioinformatics using computational intelligence paradigms , 2005 .

[157]  Nir Friedman,et al.  Inferring quantitative models of regulatory networks from expression data , 2004, ISMB/ECCB.

[158]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[159]  Raya Khanin,et al.  Near‐optimal designs for dual channel microarray studies , 2005 .

[160]  Jun S. Liu,et al.  Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model , 2004, BMC Bioinformatics.

[161]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[162]  Andrea Pagnani,et al.  Predicting protein functions with message passing algorithms , 2005, Bioinform..

[163]  Yvan Saeys,et al.  Fast feature selection using a simple estimation of distribution algorithm: a case study on splice site prediction , 2003, ECCB.

[164]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[165]  Edward R. Dougherty,et al.  Superior feature-set ranking for small samples using bolstered error estimation , 2005, Bioinform..

[166]  Ron Shamir,et al.  Artificial Intelligence and Heuristic Methods in Bioinformatics , 2003 .

[167]  Christian Böhm,et al.  Supervised machine learning techniques for the classification of metabolic disorders in newborns , 2004, Bioinform..

[168]  Saejoon Kim Protein ß-turn prediction using nearest-neighbor method , 2004, Bioinform..

[169]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[170]  Hitoshi Iba,et al.  Evolutionary modeling and inference of gene network , 2002, Inf. Sci..

[171]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[172]  Cheng-Yan Kao,et al.  GEM: A Gaussian evolutionary method for predicting protein side‐chain conformations , 2002, Protein science : a publication of the Protein Society.

[173]  Ying Huang,et al.  Prediction of protein subcellular locations using fuzzy k-NN method , 2004, Bioinform..

[174]  K. N. Ramachandran Nair,et al.  A fuzzy guided genetic algorithm for operon prediction , 2005, Bioinform..

[175]  Daniel Barker,et al.  LVB: parsimony and simulated annealing in the search for phylogenetic trees , 2000, Bioinform..

[176]  Byoung-Tak Zhang,et al.  Analysis of Gene Expression Profiles and Drug Activity Patterns by Clustering and Bayesian Network Learning , 2002 .

[177]  Jiangsheng Yu,et al.  Bayesian neural network approaches to ovarian cancer identification from high-resolution mass spectrometry data , 2005, ISMB.

[178]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[179]  William W. Cohen,et al.  High-recall protein entity recognition using a dictionary , 2005, ISMB.

[180]  Jason Weston,et al.  SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition , 2007, BMC Bioinformatics.

[181]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[182]  Pierre Baldi,et al.  A machine learning information retrieval approach to protein fold recognition. , 2006, Bioinformatics.

[183]  David Page,et al.  A Bayesian Network Approach to Operon Prediction , 2003, Bioinform..

[184]  Daisuke Kihara,et al.  EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences , 2006, BMC Bioinformatics.

[185]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[186]  Martin Steffen,et al.  Automated modelling of signal transduction networks , 2002, BMC Bioinformatics.

[187]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[188]  Geoffrey J. McLachlan,et al.  A mixture model-based approach to the clustering of microarray expression data , 2002, Bioinform..

[189]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[190]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[191]  Jim Smith,et al.  The Co-Evolution of Memetic Algorithms for Protein Structure Prediction , 2005 .

[192]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[193]  D. Higgins,et al.  Bioinformatics : sequence, structure, and databanks , 2000 .

[194]  Marc Sebban,et al.  A data-mining approach to spacer oligonucleotide typing of Mycobacterium tuberculosis , 2002, Bioinform..

[195]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[196]  I Lasters,et al.  The dead-end elimination theorem: mathematical aspects, implementation, optimizations, evaluation, and performance. , 2000, Methods in molecular biology.

[198]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[199]  Bernard De Baets,et al.  Feature subset selection for splice site prediction , 2002, ECCB.

[200]  Kathleen Marchal,et al.  Adaptive quality-based clustering of gene expression profiles , 2002, Bioinform..

[201]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[202]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[203]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[204]  G. Sherlock Analysis of large-scale gene expression data. , 2000, Current opinion in immunology.

[205]  Moshe Ben-Bassat,et al.  35 Use of distance measures, information measures and error bounds in feature evaluation , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[206]  Satoru Miyano,et al.  Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection , 2003, ECCB.

[207]  Jonathan E. Allen,et al.  Computational gene prediction using multiple sources of evidence. , 2003, Genome research.

[208]  Yoav Freund,et al.  Predicting genetic regulatory response using classification , 2004, ISMB/ECCB.

[209]  Dan Geiger,et al.  High density linkage disequilibrium mapping using models of haplotype block variation , 2004, ISMB/ECCB.

[210]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[211]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[212]  Michael Jünger,et al.  A Branch-and-Cut Approach to Physical Mapping of Chromosomes by Unique End-Probes , 1997, J. Comput. Biol..

[213]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[214]  Amiram Goldblum,et al.  A stochastic algorithm for global optimization and for best populations: A test case of side chains in proteins , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[215]  Byoung-Tak Zhang,et al.  Applying Machine Learning Techniques to Analysis of Gene Expression Data: Cancer Diagnosis , 2002 .

[216]  Daniel G. Brown,et al.  Selective mapping: a discrete optimization approach to selecting a population subset for use in a high-density genetic mapping project , 2000, SODA '00.

[217]  Martin A. Nowak,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004 .

[218]  Yi Wang,et al.  Multiple Sequence Alignment Using Tabu Search , 2004, APBC.

[219]  Xiang-Sun Zhang,et al.  Haplotype reconstruction from SNP fragments by minimum error correction , 2005, Bioinform..

[220]  Andrés Moreira,et al.  Genetic algorithms for the imitation of genomic styles in protein backtranslation , 2003, Theor. Comput. Sci..

[221]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[222]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[223]  Suchendra M. Bhandarkar,et al.  Parallel Monte Carlo methods for physical mapping of chromosomes , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[224]  C. Robert Kenley,et al.  Gaussian influence diagrams , 1989 .

[225]  Rainer Fuchs,et al.  Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters , 2001, Bioinform..

[226]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[227]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[228]  William Stafford Noble,et al.  Guest Editors' Introduction to the Special Issue: Machine Learning for Bioinformatics-Part 1 , 2005, TCBB.

[229]  E. Forgy Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[230]  Takaho A. Endo,et al.  Probabilistic nucleotide assembling method for sequencing by hybridization , 2004, Bioinform..

[231]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[232]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[233]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[234]  Peter Adams,et al.  A simulated annealing algorithm for finding consensus sequences , 2002, Bioinform..

[235]  Hsiao-Ping Hsu,et al.  Structure optimization in an off-lattice protein model. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[236]  David B. Fogel,et al.  Identification of Coding Regions in DNA Sequences Using Evolved Neural Networks , 2003 .

[237]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[238]  Tsz Chung Au DNA Microarray Data Analysis , 2003 .

[239]  James R. Cole,et al.  Alignment of possible secondary structures in multiple RNA sequences using simulated annealing , 1996, Comput. Appl. Biosci..

[240]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[241]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[242]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[243]  Somnath Datta,et al.  Standardization and denoising algorithms for mass spectra to classify whole-organism bacterial specimens , 2004, Bioinform..

[244]  H. Iba,et al.  Inferring a system of differential equations for a gene regulatory network by using genetic programming , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[245]  S. Subbiah,et al.  Prediction of protein side-chain conformation by packing optimization. , 1991, Journal of molecular biology.

[246]  F. Valafar Pattern Recognition Techniques in Microarray Data Analysis , 2002 .

[247]  Marc De Maeyer,et al.  The Dead-End Elimination Theorem: , 2000 .

[248]  Satoru Miyano,et al.  Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[249]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[250]  Yan Hong,et al.  A format for databasing and comparison of AFLP fingerprint profiles , 2003, BMC Bioinformatics.

[251]  Pedro Larrañaga,et al.  Feature Subset Selection by Bayesian network-based optimization , 2000, Artif. Intell..

[252]  Sayan Mukherjee,et al.  Molecular classification of multiple tumor types , 2001, ISMB.

[253]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[254]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[255]  Jae Won Lee,et al.  An extensive comparison of recent classification tools applied to microarray data , 2004, Comput. Stat. Data Anal..

[256]  Shuhei Kimura,et al.  Inference of S-system models of genetic networks using a cooperative coevolutionary algorithm , 2005, Bioinform..

[257]  Celso C. Ribeiro,et al.  A GRASP/VND heuristic for the phylogeny problem using a new neighborhood structure , 2005, Int. Trans. Oper. Res..

[258]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[259]  I-Min A. Dubchak,et al.  A computational approach to identify genes for functional RNAs in genomic sequences. , 2001, Nucleic acids research.

[260]  Javed M. Aman,et al.  Graphical exploratory data analysis of RNA secondary structure dynamics predicted by the massively parallel genetic algorithm. , 2006, Journal of molecular graphics & modelling.

[261]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[262]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[263]  Nebojsa Jojic,et al.  Efficient approximations for learning phylogenetic HMM models from data , 2004, ISMB/ECCB.

[264]  Wenjiang J. Fu,et al.  Estimating misclassification error with small samples via bootstrap cross-validation , 2005, Bioinform..

[265]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[266]  Pavel A. Pevzner,et al.  Computational molecular biology : an algorithmic approach , 2000 .

[267]  David M. Rocke,et al.  Variance-stabilizing transformations for two-color microarrays , 2004, Bioinform..

[268]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[269]  D G Brown,et al.  Selective mapping: a strategy for optimizing the construction of high-density linkage maps. , 2000, Genetics.

[270]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[271]  Richard Scheines,et al.  Constructing Bayesian Network Models of Gene Expression Networks from Microarray Data , 2000 .

[272]  Saejoon Kim Protein beta-turn prediction using nearest-neighbor method. , 2004, Bioinformatics.

[273]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[274]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[275]  Geoffrey J. McLachlan,et al.  Discriminant Analysis and Statistical Pattern Recognition: McLachlan/Discriminant Analysis & Pattern Recog , 2005 .

[276]  Kathleen Marchal,et al.  Advances in Cluster Analysis of Microarray Data , 2005, Data Analysis and Visualization in Genomics and Proteomics.

[277]  Rainer Spang,et al.  Reconstructing gene regulation networks from passive observations and active interventions , 2003 .

[278]  Adam B. Olshen,et al.  Deriving quantitative conclusions from microarray expression data , 2002, Bioinform..

[279]  Petra Mutzel,et al.  Computational Molecular Biology , 1996 .

[280]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[281]  Tommi S. Jaakkola,et al.  Using Graphical Models and Genomic Expression Data to Statistically Validate Models of Genetic Regulatory Networks , 2000, Pacific Symposium on Biocomputing.

[282]  Joe Whittaker,et al.  Edge Exclusion Tests for Graphical Gaussian Models , 1999, Learning in Graphical Models.

[283]  Michael Q. Zhang,et al.  Current Topics in Computational Molecular Biology , 2002 .

[284]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[285]  Jacek Blazewicz,et al.  Tabu search algorithm for DNA sequencing by hybridization with isothermic libraries , 2004, Comput. Biol. Chem..

[286]  Ilya Shmulevich,et al.  Binary analysis and optimization-based normalization of gene expression data , 2002, Bioinform..

[287]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[288]  Yvan Saeys,et al.  Feature selection for splice site prediction: A new method using EDA-based feature ranking , 2004, BMC Bioinformatics.

[289]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[290]  Alfonso Valencia,et al.  Text-mining approaches in molecular biology and biomedicine. , 2005, Drug discovery today.

[291]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[292]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[293]  Paul Terry,et al.  Application of the GA/KNN method to SELDI proteomics data , 2004, Bioinform..