Data science foundry for MOOCs

In this paper, we present the concept of data science foundry for data from Massive Open Online Courses. In the foundry we present a series of software modules that transform the data into different representations. Ultimately, each online learner is represented using a set of variables that capture his/her online behavior. These variables are captured longitudinally over an interval. Using this representation we then build a predictive analytics stack that is able to predict online learners behavior as the course progresses in real time. To demonstrate the efficacy of the foundry, we attempt to solve an important prediction problem for Massive Open Online Courses (MOOCs): who is likely to stopout? Across a multitude of courses, with our complex per-student behavioral variables, we achieve a predictive accuracy of 0.7 AUCROC and higher for a one-week-ahead prediction problem. For a two-to-three-weeks-ahead prediction problem, we are able to achieve 0.6 AUCROC. We validate, via transfer learning, that these predictive models can be used in real time. We also demonstrate that we can protect the models using privacy-preserving mechanisms without losing any predictive accuracy.

[1]  Guillermo Mendez,et al.  Factors Associated With Persistence in Science and Engineering Majors: An Exploratory Study Using Classification Trees and Random Forests , 2008 .

[2]  Franck Dernoncourt,et al.  MOOCdb: Developing Standards and Systems to Support MOOC Data Science , 2014, ArXiv.

[3]  Thierry Karsenti,et al.  The Effect of Peer Collaboration and Collaborative Learning on Self-Efficacy and Persistence in a Learner-Paced Continuous Intake Model , 2008 .

[4]  Kalyan Veeramachaneni,et al.  Likely to stop? Predicting Stopout in Massive Open Online Courses , 2014, ArXiv.

[5]  David E. Pritchard,et al.  Studying Learning in the Worldwide Classroom Research into edX's First MOOC. , 2013 .

[6]  Amy J. Wojciechowski,et al.  Individual Student Characteristics: Can Any Be Predictors Of Success In Online Classes? , 2005 .

[7]  Girish Balakrishnan,et al.  Predicting Student Retention in Massive Open Online Courses using Hidden Markov Models , 2013 .

[8]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[9]  Vassilis Loumos,et al.  Dropout prediction in e-learning courses through the combination of machine learning techniques , 2009, Comput. Educ..

[10]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[11]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[12]  A. Parker A Study of Variables that Predict Dropout from Distance Education. , 1999 .

[13]  Sherif A. Halawa,et al.  Dropout Prediction in MOOCs using Learner Activity Features , 2014 .

[14]  Kalyan Veeramachaneni,et al.  Towards Feature Engineering at Scale for Data from Massive Open Online Courses , 2014, ArXiv.

[15]  Philip J. Guo,et al.  How video production affects student engagement: an empirical study of MOOC videos , 2014, L@S.

[16]  Justin Reich,et al.  Rebooting MOOC Research , 2015, Science.

[17]  Cynthia Dwork,et al.  Differential Privacy , 2006, Encyclopedia of Cryptography and Security.

[18]  L. Bloomberg,et al.  Culture and Community: Case Study of a Video-Conferenced Graduate Distance Education Program , 2007 .