Towards Feature Engineering at Scale for Data from Massive Open Online Courses

We examine the process of engineering features for developing models that improve our understanding of learners' online behavior in MOOCs. Because feature engineering relies so heavily on human insight, we argue that extra effort should be made to engage the crowd for feature proposals and even their operationalization. We show two approaches where we have started to engage the crowd. We also show how features can be evaluated for their relevance in predictive accuracy. When we examined crowd-sourced features in the context of predicting stopout, not only were they nuanced, but they also considered more than one interaction mode between the learner and platform and how the learner was relatively performing. We were able to identify different influential features for stop out prediction that depended on whether a learner was in 1 of 4 cohorts defined by their level of engagement with the course discussion forum or wiki. This report is part of a compendium which considers different aspects of MOOC data science and stop out prediction.