Positive-versus-Negative Classification for Model Aggregation in Predictive Data Mining

The process of constructing several base models that are then combined into a single classification model for prediction is called model aggregation or ensemble classification. Positive-versus-negative pVn classification is a new method for the implementation of base models for aggregation. pVn classification involves the decomposition of a k-class prediction task into mm <k subproblems. One base model is constructed for each subproblem to predict a subset of the k classes. The base models are then combined into one aggregate model for prediction. This paper reports studies that were conducted to demonstrate the performance of pVn classification when large volumes of data are available for modeling as is commonly the case in data mining. It is demonstrated in this paper that pVn modeling provides the capability to use a large amount of available data in a large data set for base model training. It is also demonstrated that pVn models created from large data sets provide a higher level of predictive performance compared to single k-class models.

[1]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[2]  Rajesh Parekh,et al.  Lessons and Challenges from Mining Retail E-Commerce Data , 2004, Machine Learning.

[3]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[4]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[5]  Christin Schäfer,et al.  Learning Intrusion Detection: Supervised or Unsupervised? , 2005, ICIAP.

[6]  Patricia E. N. Lutu,et al.  Base Model Combination Algorithm for Resolving Tied Predictions for K-Nearest Neighbor OVA Ensemble Models , 2013, INFORMS J. Comput..

[7]  Patricia E. N. Lutu Empirical comparison of four classifier fusion strategies for positive-versus-negative ensembles , 2011, SAICSIT '11.

[8]  Chris Carter,et al.  Multiple decision trees , 2013, UAI.

[9]  Andries Petrus Engelbrecht,et al.  Using OVA modeling to improve classification performance for large datasets , 2012, Expert Syst. Appl..

[10]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[11]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[12]  David J. Hand,et al.  Data Mining: Statistics and More? , 1998 .

[13]  Mervin E. Muller,et al.  Development of Sampling Plans by Using Sequential (Item by Item) Selection Techniques and Digital Computers , 1962 .

[14]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[16]  Poduri S. R. S. Rao Sampling Methodologies with Applications , 2017 .

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Doron Rotem,et al.  Random sampling from databases: a survey , 1995 .

[19]  Josef Kittler,et al.  Combining classifiers: A theoretical framework , 1998, Pattern Analysis and Applications.

[20]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[21]  Chi Hoon Lee,et al.  Using Attack-Specific Feature Subsets for Network Intrusion Detection , 2006, Australian Conference on Artificial Intelligence.

[22]  Patricia E. N. Lutu Using Confusion Matrices and Confusion Graphs to Design Ensemble Classification Models from Large Datasets , 2011, DaWaK.

[23]  Madhu Chetty,et al.  Differential prioritization in feature selection and classifier aggregation for multiclass microarray datasets , 2006, Data Mining and Knowledge Discovery.

[24]  Patricia E. N. Lutu,et al.  Dataset Selection for Aggregate Model Implementation in Predictive Data Mining , 2010 .

[25]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[26]  Thomas G. Dietterich,et al.  Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms , 2008 .

[27]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[28]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Paolo Giudici,et al.  Applied Data Mining: Statistical Methods for Business and Industry , 2003 .

[30]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[31]  Stephen D. Bay,et al.  The UCI KDD archive of large data sets for data mining research and experimentation , 2000, SKDD.

[32]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Christopher M. Bishop,et al.  Neural Network for Pattern Recognition , 1995 .

[34]  Terence G. Jones,et al.  A note on sampling a tape-file , 1962, Commun. ACM.

[35]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[36]  M. Pazzani,et al.  Error Reduction through Learning Multiple Descriptions , 1996, Machine Learning.

[37]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[38]  Andries Petrus Engelbrecht,et al.  A decision rule-based method for feature selection in predictive data mining , 2010, Expert Syst. Appl..

[39]  Vipin Kumar,et al.  A Perspective on Cluster Analysis , 2008 .

[40]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[41]  Padhraic Smyth,et al.  Data Mining at the Interface of Computer Science and Statistics , 2001 .

[42]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[43]  Salvatore J. Stolfo,et al.  A framework for constructing features and models for intrusion detection systems , 2000, TSEC.

[44]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[45]  Simon Parsons,et al.  Principles of Data Mining by David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 546 pp., £34.50, ISBN 0-262-08290-X , 2004, The Knowledge Engineering Review.

[46]  Paolo Giudici,et al.  Applied Data Mining for Business and Industry , 2009 .

[47]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[48]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[49]  Tom Fawcett,et al.  Using rule sets to maximize ROC performance , 2001, Proceedings 2001 IEEE International Conference on Data Mining.