Outside the Machine Learning Blackbox: Supporting Analysts Before and After the Learning Algorithm

Applying machine learning to real problems is non-trivial because many important steps are needed to prepare for learning and to interpret the results after learning. This dissertation investigates four problems that arise before and after applying learning algorithms. First, how can we verify a dataset contains “good” information? I propose cross-data validation for quantifying the quality of a dataset relative to a benchmark dataset and define a data efficiency ratio that measures how efficiently the dataset in question collects information (relative to the benchmark). Using these methods I demonstrate the quality of bird observations collected by the eBird citizen science project which has few quality controls. Second, can off-the-shelf algorithms learn a model with good task-specific performance, or must the user have expertise both in the domain and in machine learning? In many applications, standard performance metrics are inappropriate, and most analysts lack the expertise or time to customize algorithms to optimize task-specific metrics. Ensemble selection offers a potential solution: build an ensemble to optimize the desired metric. I evaluate ensemble selection’s ability to optimize for domain-specific metrics on natural language processing tasks and show that ensemble selection usually improves performance but sometimes overfits. Third, how can we understand complex models? Understanding a model often is as important its accuracy. I propose and evaluate statistics for measuring the importance of inputs used by a decision tree ensemble. The statistics agree with sensitivity analysis and, in an application to bird distribution models, are 500 times faster to compute. The statistics have been used to study hundreds of bird distribution models. Fourth, how should data be pre-processed when learning a high-performing ensemble? I examine the behavior of variable selection and bagging using a bias-variance analysis of error. The results show that the most accurate variable subset corresponds to the best bias-variance trade-off point. Often, this is not the point separating relevant from irrelevant inputs. Variable selection should be viewed as a variance reduction method and thus is often redundant for low variance methods like bagging. The best bagged model performance usually is obtained using all available inputs.

[1]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Iain E. Buchan,et al.  A unified modeling approach to data-intensive healthcare , 2009, The Fourth Paradigm.

[4]  K. Abazajian,et al.  THE SEVENTH DATA RELEASE OF THE SLOAN DIGITAL SKY SURVEY , 2008, 0812.0649.

[5]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[6]  Thomas Hofmann,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, NIPS.

[7]  N. Draper,et al.  Applied Regression Analysis. , 1967 .

[8]  Tim Oates,et al.  The Effects of Training Set Size on Decision Tree Complexity , 1997, ICML.

[9]  M. Knutson,et al.  Scaling Local Species-habitat Relations to the Larger Landscape with a Hierarchical Spatial Count Model , 2007, Landscape Ecology.

[10]  K. Pollock,et al.  EXPERIMENTAL ANALYSIS OF THE AUDITORY DETECTION PROCESS ON AVIAN POINT COUNTS , 2007 .

[11]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[12]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[13]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[14]  Michael J. Pazzani,et al.  Reducing Misclassification Costs , 1994, ICML.

[15]  Remco R. Bouckaert Practical Bias Variance Decomposition , 2008, Australasian Conference on Artificial Intelligence.

[16]  D. Fink,et al.  Spatiotemporal exploratory models for broad-scale survey data. , 2010, Ecological applications : a publication of the Ecological Society of America.

[17]  J. Michael Scott,et al.  Predicting Species Occurrences: Issues of Accuracy and Scale , 2002 .

[18]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[19]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[20]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[21]  Aaron M Ellison,et al.  Observer bias and the detection of low-density populations. , 2009, Ecological applications : a publication of the Ecological Society of America.

[22]  Wynne Hsu,et al.  Intuitive Representation of Decision Trees Using General Rules and Exceptions , 2000, AAAI/IAAI.

[23]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[24]  Wesley M. Hochachka,et al.  Sources of Variation in Singing Probability of Florida Grasshopper Sparrows, and Implications for Design and Analysis of Auditory Surveys , 2009 .

[25]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[26]  Maarten van Someren,et al.  A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000 , 2004, Machine Learning.

[27]  Courtney J. Conway,et al.  Progress toward developing field protocols for a North American marsh bird monitoring program , 2005 .

[28]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[29]  C. S. Robbins,et al.  The Breeding Bird Survey: Its First Fifteen Years, 1965-1979 , 1987 .

[30]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[31]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[32]  Luis M. Carrascal,et al.  BIAS IN AVIAN SAMPLING EFFORT DUE TO HUMAN PREFERENCES: AN ANALYSIS WITH CATALONIAN BIRDS (1900 - 2002) , 2006 .

[33]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[34]  Salvatore J. Stolfo,et al.  AdaCost: Misclassification Cost-Sensitive Boosting , 1999, ICML.

[35]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[36]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[37]  Catherine S. Jarnevich,et al.  Ensemble Habitat Mapping of Invasive Plant Species , 2010, Risk analysis : an official publication of the Society for Risk Analysis.

[38]  Steve Kelling,et al.  Data-Intensive Science: A New Paradigm for Biodiversity Studies , 2009 .

[39]  Carolina Tovar,et al.  Using Spatial Models to Predict Areas of Endemism and Gaps in the Protection of Andean Slope Birds , 2009 .

[40]  Wray L. Buntine,et al.  Learning classification trees , 1992 .

[41]  C. Thomas,et al.  Birds extend their ranges northwards , 1999, Nature.

[42]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[43]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[44]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[45]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[46]  Pinar Donmez,et al.  On the local optimality of LambdaRank , 2009, SIGIR.

[47]  John Langford,et al.  An iterative method for multi-class cost-sensitive learning , 2004, KDD.

[48]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[49]  John Loughrey,et al.  Using Early-Stopping to Avoid Overfitting in Wrapper-Based Feature Selection Employing Stochastic Search , 2005 .

[50]  Peter J. Blancher,et al.  Setting numerical population objectives for priority landbird species , 2005 .

[51]  Rich Caruana,et al.  Introduction to IND and recursive partitioning, version 1.0 , 1991 .

[52]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[53]  Bernd Markert,et al.  Biomonitoring with birds. , 2003 .

[54]  Walter Daelemans,et al.  Evaluation of Machine Learning Methods for Natural Language Processing Tasks , 2002, LREC.

[55]  Ivanoe De Falco,et al.  An evolutionary approach for automatically extracting intelligible classification rules , 2005, Knowledge and Information Systems.

[56]  Wray L. Buntine,et al.  A Further Comparison of Splitting Rules for Decision-Tree Induction , 1992, Machine Learning.

[57]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[58]  David B. Roy,et al.  A northward shift of range margins in British Odonata , 2005 .

[59]  J. Bart,et al.  Reliability of Singing Bird Surveys: Changes in Observer Efficiency with Avian Density , 1984 .

[60]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[61]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[62]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[63]  Susan Ratcliffe,et al.  The Oxford dictionary of quotations by subject , 2010 .

[64]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[65]  Ben Shneiderman,et al.  The healthcare singularity and the age of semantic medicine , 2009, The Fourth Paradigm.

[66]  J. Langford,et al.  FeatureBoost: A Meta-Learning Algorithm that Improves Model Robustness , 2000, ICML.

[67]  Bhavani Raskutti,et al.  Optimising area under the ROC curve using gradient descent , 2004, ICML.

[68]  W. Thuiller,et al.  Predicting species distribution: offering more than simple habitat models. , 2005, Ecology letters.

[69]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[70]  Gareth M. James,et al.  Generalizations of the Bias/Variance Decomposition for Prediction Error , 1997 .

[71]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[72]  WESLEY M. HOCHACHKA,et al.  Data-Mining Discovery of Pattern and Process in Ecological Systems , 2007 .

[73]  Denis Couvet,et al.  Thermal range predicts bird population resilience to extreme high temperatures. , 2006, Ecology letters.

[74]  Bernd Markert,et al.  Chapter 1 Definitions, strategies and principles for bioindication/biomonitoring of the environment , 2003 .

[75]  Veronique Hoste,et al.  Optimization issues in machine learning of coreference resolution , 2005 .

[76]  Kai Ming Ting,et al.  Inducing Cost-Sensitive Trees via Instance Weighting , 1998, PKDD.

[77]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[78]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[79]  Kai Ming Ting,et al.  Boosting Trees for Cost-Sensitive Classifications , 1998, ECML.

[80]  Miroslav Dudík,et al.  Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling , 2007, J. Mach. Learn. Res..

[81]  Michael J. Pazzani,et al.  Knowledge discovery from data? , 2000, IEEE Intell. Syst..

[82]  R. Real,et al.  AUC: a misleading measure of the performance of predictive distribution models , 2008 .

[83]  Claire Cardie,et al.  Recognizing and Organizing Opinions Expressed in the World Press , 2003, New Directions in Question Answering.

[84]  Claire Cardie,et al.  Improving Machine Learning Approaches to Noun Phrase Coreference Resolution , 2004 .

[85]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[86]  Robert Tibshirani,et al.  Bias, Variance and Prediction Error for Classification Rules , 1996 .

[87]  Eugene Tuv,et al.  Feature Selection Using Ensemble Based Ranking Against Artificial Contrasts , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[88]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[89]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[90]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[91]  J. Nichols,et al.  Monitoring for conservation. , 2006, Trends in ecology & evolution.

[92]  Catharine van Ingen,et al.  Redefining ecological science using data , 2009, The Fourth Paradigm.

[93]  Isabelle Guyon,et al.  Winning the KDD Cup Orange Challenge with Ensemble Selection , 2009 .

[94]  C. S. Wallace,et al.  Coding Decision Trees , 1993, Machine Learning.

[95]  Walter Daelemans,et al.  Parameter optimization for machine-learning of word sense disambiguation , 2002, Natural Language Engineering.

[96]  Falk Huettmann,et al.  Current State of the Art for Statistical Modelling of Species Distributions , 2010 .

[97]  R. Stolzenberg,et al.  Multiple Regression Analysis , 2004 .

[98]  Pedro M. Domingos Knowledge Discovery Via Multiple Models , 1998, Intell. Data Anal..

[99]  Rich Caruana,et al.  Benefitting from the Variables that Variable Selection Discards , 2003, J. Mach. Learn. Res..

[100]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[101]  Tom Bylander,et al.  Estimating Generalization Error on Two-Class Datasets Using Out-of-Bag Estimates , 2002, Machine Learning.

[102]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[103]  W. Koenig,et al.  SPATIAL AUTOCORRELATION AND LOCAL DISAPPEARANCES IN WINTERING NORTH AMERICAN BIRDS , 2001 .

[104]  R. Bonney,et al.  Citizen Science: A Developing Tool for Expanding Science Knowledge and Scientific Literacy , 2009 .

[105]  P. Daszak,et al.  Predicting the global spread of H5N1 avian influenza , 2006, Proceedings of the National Academy of Sciences.

[106]  Yvan Saeys,et al.  New challenges for feature selection in data mining and knowledge discovery , 2008 .

[107]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[108]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[109]  G. Hooker Generalized Functional ANOVA Diagnostics for High-Dimensional Functions of Dependent Variables , 2007 .

[110]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[111]  Rich Caruana,et al.  Data mining in metric space: an empirical analysis of supervised learning performance criteria , 2004, ROCAI.

[112]  Niklaus E. Zimmermann,et al.  Predicting tree species presence and basal area in Utah: A comparison of stochastic gradient boosting, generalized additive models, and tree-based methods , 2006 .

[113]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[114]  J. Heckman Sample selection bias as a specification error , 1979 .

[115]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[116]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[117]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[118]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[119]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[120]  David D. Lewis,et al.  Applying Support Vector Machines to the TREC-2001 Batch Filtering and Routing Tasks , 2001, TREC.

[121]  Igor Kononenko,et al.  Cost-Sensitive Learning with Neural Networks , 1998, ECAI.

[122]  Roger Sauter,et al.  Introduction to Probability and Statistics for Engineers and Scientists , 2005, Technometrics.

[123]  M. Fireman,et al.  MULTIPLE REGRESSION ANALYSIS OF SOIL DATA , 1954 .

[124]  S. Manel,et al.  Evaluating presence-absence models in ecology: the need to account for prevalence , 2001 .

[125]  Michael J. Pazzani,et al.  Error reduction through learning multiple descriptions , 2004, Machine Learning.

[126]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[127]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[128]  C. Marshall Encyclopedia of Life , 2008 .

[129]  B. V. Horne,et al.  DENSITY AS A MISLEADING INDICATOR OF HABITAT QUALITY , 1983 .

[130]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[131]  P. van der Putten,et al.  A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000 , 2004 .

[132]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[133]  Stephen R. Baillie,et al.  Migration Watch: an Internet survey to monitor spring migration in Britain and Ireland , 2006, Journal of Ornithology.

[134]  A. Townsend Peterson,et al.  Rethinking receiver operating characteristic analysis applications in ecological niche modeling , 2008 .

[135]  W. Hochachka,et al.  Density-dependent decline of host abundance resulting from a new infectious disease. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[136]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[137]  N. Gotelli Predicting Species Occurrences: Issues of Accuracy and Scale , 2003 .

[138]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[139]  M. Pazzani Influence of prior knowledge on concept acquisition: Experimental and computational results. , 1991 .

[140]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[141]  Cândida Ferreira,et al.  Gene Expression Programming: A New Adaptive Algorithm for Solving Problems , 2001, Complex Syst..

[142]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.

[143]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[144]  Tom M. Mitchell,et al.  Using the Future to Sort Out the Present: Rankprop and Multitask Learning for Medical Risk Evaluation , 1995, NIPS.

[145]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[146]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[147]  David W. Opitz,et al.  Feature Selection for Ensembles , 1999, AAAI/IAAI.

[148]  G. J. Niemi,et al.  A comparison of on- and off-road bird counts: Do you need to go off road to count birds accurately? , 1995 .

[149]  John Mingers,et al.  An empirical comparison of selection measures for decision-tree induction , 2004, Machine Learning.

[150]  Stephen D. Bay Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets , 1998, ICML.

[151]  Yi Lin,et al.  Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[152]  W. Kendall,et al.  First-Time Observer Effects in the North American Breeding Bird Survey , 1996 .

[153]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[154]  Steve Kelling,et al.  Mining citizen science data to predict orevalence of wild bird species , 2006, KDD '06.

[155]  Carla E. Brodley,et al.  Pruning Decision Trees with Misclassification Costs , 1998, ECML.

[156]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[157]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[158]  Claire Cardie,et al.  Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions , 2004, COLING.

[159]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[160]  Simon Ferrier,et al.  Evaluating the predictive performance of habitat models developed using logistic regression , 2000 .

[161]  Les G. Underhill,et al.  The seminal legacy of the Southern African Bird Atlas Project , 2008 .

[162]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[163]  D. Billman Structural Biases in Concept Learning: Influences from Multiple Functions , 1996 .

[164]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[165]  Alain Rakotomamonjy,et al.  Optimizing Area Under Roc Curve with SVMs , 2004, ROCAI.

[166]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[167]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[168]  Curtis Wong,et al.  Bringing the night sky closer: discoveries in the data deluge , 2009, The Fourth Paradigm.

[169]  D. MacKenzie Modeling the Probability of Resource Use: The Effect of, and Dealing with, Detecting a Species Imperfectly , 2006 .

[170]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[171]  Michael J. Pazzani,et al.  Beyond Concise and Colorful: Learning Intelligible Rules , 1997, KDD.

[172]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[173]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[174]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[175]  Jonathan Bart,et al.  Reliability of the Breeding Bird Survey: Effects of restricting surveys to roads , 1995 .

[176]  Brian L. Sullivan,et al.  eBird: A citizen-based bird observation network in the biological sciences , 2009 .

[177]  W. Link,et al.  Observer differences in the North American Breeding Bird Survey , 1994 .

[178]  George C. Runger,et al.  Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination , 2009, J. Mach. Learn. Res..

[179]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[180]  Peter A. Flach,et al.  Learning Decision Trees Using the Area Under the ROC Curve , 2002, ICML.

[181]  Lori E. Dodd,et al.  Partial AUC Estimation and Regression , 2003, Biometrics.

[182]  Rich Caruana,et al.  Getting the Most Out of Ensemble Selection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[183]  Nathan Intrator,et al.  Interpreting neural-network results: a simulation study , 2001 .

[184]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[185]  D. Bystrak,et al.  The role of observer bias in the North American Breeding Bird Survey , 1981 .

[186]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[187]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[188]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.