Technology for Mining the Big Data of MOOCs.

Because MOOCs bring big data to the forefront, they confront learning science with technology challenges. We describe an agenda for developing technology that enables MOOC analytics. Such an agenda needs to efficiently address the detailed, low level, high volume nature of MOOC data. It also needs to help exploit the data's capacity to reveal, in detail, how students behave and how learning takes place. We chart an agenda that starts with data standardization. It identifies crowd sourcing as a means to speed up data analysis of forum data or predictive analytics of student behavior. It also points to open source platforms that allow software to be shared and visualization analytics to be discussed. Massi ve Open Online Courses (MOOCs) are college courses offered on the Internet. Lectures are conveyed by videos, textbooks are digitized, and problem sets, quizzes and practice questions are web–based. Students communicate with one another and faculty via discussion forums. Grading, albeit constrained by somewhat restrictive assessment design, is automated. The popularity of MOOCs has made a high volume of learner data available for analytic purposes. Some MOOC data is just like that which comes from the classroom. This can include teaching material, student demographics and background data, enrollment information, assessment scores and grades. But very important differences arise between MOOC and classroom in how behavioral data is collected and what is observable. The platform records, unobtrusively, through input, capture every mouse click, video player control use, and every submission to the platform such as problem solution choice selection, solution composition or text entry for a forum discussion. The level of recorded detail of behavior in a MOOC vastly surpasses that recorded in conventional settings. Very directly, this data can provide a count of problem attempts and video replays. It can reveal how long a student stayed on a textbook page or the presence of very short, quick patterns of resource consultation. It can inform an individualized or aggregated portrait of how a student solves problems or accesses resources. It presents opportunities to identify and compare different cohorts of students in significant quantities, thus enabling us to personalize how content is delivered. It allows us to study learner activities not exclusive to problem-solving, such as forum interactions and video-watching habits (Thille et al., 2014). It also facilitates predictive analytics based on modeling and machine learning. This data also contains large samples. Large sample sizes enable us to …