Model compression

Often the best performing supervised learning models are ensembles of hundreds or thousands of base-level classifiers. Unfortunately, the space required to store this many classifiers, and the time required to execute them at run-time, prohibits their use in applications where test sets are large (e.g. Google), where storage space is at a premium (e.g. PDAs), and where computational power is limited (e.g. hea-ring aids). We present a method for "compressing" large, complex ensembles into smaller, faster models, usually without significant loss in performance.

[1]  Rich Caruana,et al.  Introduction to IND and recursive partitioning, version 1.0 , 1991 .

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[4]  Pedro M. Domingos Bayesian Averaging of Classifiers and the Overfitting Problem , 2000, ICML.

[5]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Raymond J. Mooney,et al.  Constructing Diverse Classifier Ensembles using Artificial Training Examples , 2003, IJCAI.

[9]  Pedro M. Domingos Knowledge Acquisition from Examples Via Multiple Models , 1997 .

[10]  Tony R. Martinez,et al.  Using a Neural Network to Approximate an Ensemble of Classifiers , 2000, Neural Processing Letters.

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[12]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[13]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[14]  Pedro M. Domingos,et al.  Naive Bayes models for probability estimation , 2005, ICML.

[15]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[16]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[17]  Andreas Zell,et al.  SNNS (Stuttgart Neural Network Simulator) , 1994 .

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Robert F. Cromp,et al.  Support Vector Machine Classifiers as Applied to AVIRIS Data , 1999 .