Cross-data Automatic Feature Engineering via Meta-learning and Reinforcement Learning

Feature Engineering (FE) is one of the most beneficial, yet most difficult and time-consuming tasks of machine learning projects, and requires strong expert knowledge. It is thus significant to design generalized ways to perform FE. The primary difficulties arise from the multiform information to consider, the potentially infinite number of possible features and the high computational cost of feature generation and evaluation. We present a framework called Cross-data Automatic Feature Engineering Machine (CAFEM), which formalizes the FE problem as an optimization problem over a Feature Transformation Graph (FTG). CAFEM contains two components: a FE learner (FeL) that learns fine-grained FE strategies on one single dataset by Double Deep Q-learning (DDQN) and a Cross-data Component (CdC) that speeds up FE learning on an unseen dataset by the generalized FE policies learned by Meta-Learning on a collection of datasets. We compare the performance of FeL with several existing state-of-the-art automatic FE techniques on a large collection of datasets. It shows that FeL outperforms existing approaches and is robust on the selection of learning algorithms. Further experiments also show that CdC can not only speed up FE learning but also increase learning performance.

[1]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[2]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[3]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[4]  Oznur Alkan,et al.  One button machine for automating feature engineering in relational databases , 2017, ArXiv.

[5]  Deepak S. Turaga,et al.  Learning Feature Engineering for Classification , 2017, IJCAI.

[6]  Lu Wang,et al.  Quantiles over data streams: an experimental study , 2013, SIGMOD '13.

[7]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[8]  Deepak S. Turaga,et al.  Feature Engineering for Predictive Modeling using Reinforcement Learning , 2017, AAAI.

[9]  Kalyan Veeramachaneni,et al.  Deep feature synthesis: Towards automating data science endeavors , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[10]  Françoise Fogelman-Soulié,et al.  Towards Automatic Complex Feature Engineering , 2018, WISE.

[11]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[12]  Dawn Xiaodong Song,et al.  ExploreKit: Automatic Feature Generation and Selection , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[13]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[14]  Khurana Udayan,et al.  Cognito: Automated Feature Engineering for Supervised Learning , 2016 .

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.