Instance based random forest with rotated feature space

Random Forest is a competitive ensemble method in the field of machine learning with several advantages such as efficiency, robustness, generalization, ease of implementation, etc. This study attempts to increase the diversity among the pairwise individuals in the forest. On the other hand, we propose an instance based method to select several superior trees to perform the voting. The proposed method is evaluated on 28 datasets from the UCI Repository.

[1]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[2]  Keinosuke Fukunaga,et al.  Application of the Karhunen-Loève Expansion to Feature Selection and Ordering , 1970, IEEE Trans. Computers.

[3]  Marko Robnik-Sikonja,et al.  Improving Random Forests , 2004, ECML.

[4]  Robert E. Schapire,et al.  Theoretical Views of Boosting , 1999, EuroCOLT.

[5]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  L. Breiman Arcing Classifiers , 1998 .

[8]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[9]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[10]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[11]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[12]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[14]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[15]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[16]  Tin Kam Ho,et al.  A Data Complexity Analysis of Comparative Advantages of Decision Forest Constructors , 2002, Pattern Analysis & Applications.

[17]  Stanislav KolenikovGustavo Angeles The Use of Discrete Data in PCA: Theory, Simulations, and Applications to Socioeconomic Indices , 2004 .

[18]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[19]  Xudong Jiang,et al.  Linear Subspace Learning-Based Dimensionality Reduction , 2011, IEEE Signal Processing Magazine.

[20]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[21]  Josef Kittler,et al.  A new approach to feature selection based on the Karhunen-Loeve expansion , 1973, Pattern Recognit..

[22]  Raymond J. Mooney,et al.  Experiments on Ensembles with Missing and Noisy Data , 2004, Multiple Classifier Systems.

[23]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[24]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[25]  John W. Sammon,et al.  An Optimal Set of Discriminant Vectors , 1975, IEEE Transactions on Computers.

[26]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[27]  Peter L. Bartlett,et al.  Improved Generalization Through Explicit Optimization of Margins , 2000, Machine Learning.

[28]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[29]  P. Suganthan,et al.  AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. , 2011, Journal of theoretical biology.

[30]  Chao Chen,et al.  Using Random Forest to Learn Imbalanced Data , 2004 .

[31]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[32]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.