Towards Automatic Complex Feature Engineering

Feature engineering is one of the most difficult and time-consuming tasks in data mining projects, and requires strong expert knowledge. Existing feature engineering techniques tend to use limited numbers of simple feature transformation methods and validate on simple datasets (small volume, simple structure), obviously limiting the benefits of feature engineering. In this paper, we propose a general Automatic Feature Engineering Machine framework (AFEM for short), which defines families of complex features and introduces them one family at a time (block bottom-up). We show that this framework covers most of the existing features used in the literature and allows us to efficiently generate complex feature families: in particular, local time, social network and representation-based families for relational and graph datasets, as well as composition of features. We validate our approach on two large realistic competitions datasets and a recommendation system task with social network. In the first two tasks, AFEM automatically reached ranks 15 and 12 compared to human teams; in the last task, it achieved 1.5% regression error reduction, compared to best results in the literature. Furthermore, in the context of big data and web applications, by balancing computation time and number of features/performance, in one case, we could reduce 2/3 computation time with only 0.2% AUC performance loss. Our code is publicly available on GitHub (https://github.com/TjuJianyu/AFEM).

[1]  Juntao Liu,et al.  Bayesian Probabilistic Matrix Factorization with Social Relations and Item Contents for recommendation , 2013, Decis. Support Syst..

[2]  Deepak S. Turaga,et al.  Learning Feature Engineering for Classification , 2017, IJCAI.

[3]  Deepak S. Turaga,et al.  Feature Engineering for Predictive Modeling using Reinforcement Learning , 2017, AAAI.

[4]  Dawn Xiaodong Song,et al.  ExploreKit: Automatic Feature Generation and Selection , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[5]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[6]  Kalyan Veeramachaneni,et al.  Deep feature synthesis: Towards automating data science endeavors , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[7]  Deepak S. Turaga,et al.  Cognito: Automated Feature Engineering for Supervised Learning , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).