How Deep is Knowledge Tracing?

In theoretical cognitive science, there is a tension between highly structured models whose parameters have a direct psychological interpretation and highly complex, general-purpose models whose parameters and representations are difficult to interpret. The former typically provide more insight into cognition but the latter often perform better. This tension has recently surfaced in the realm of educational data mining, where a deep learning approach to predicting students' performance as they work through a series of exercises---termed deep knowledge tracing or DKT---has demonstrated a stunning performance advantage over the mainstay of the field, Bayesian knowledge tracing or BKT. In this article, we attempt to understand the basis for DKT's advantage by considering the sources of statistical regularity in the data that DKT can leverage but which BKT cannot. We hypothesize four forms of regularity that BKT fails to exploit: recency effects, the contextualized trial sequence, inter-skill similarity, and individual variation in ability. We demonstrate that when BKT is extended to allow it more flexibility in modeling statistical regularities---using extensions previously proposed in the literature---BKT achieves a level of performance indistinguishable from that of DKT. We argue that while DKT is a powerful, useful, general-purpose framework for modeling student learning, its gains do not come from the discovery of novel representations---the fundamental advantage of deep learning. To answer the question posed in our title, knowledge tracing may be a domain that does not require `depth'; shallow models like BKT can perform just as well and offer us greater interpretability and explanatory power.

[1]  R. Atkinson,et al.  An Approach to the Psychology of Instruction. , 1972 .

[2]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Lancelot F. James,et al.  Generalized weighted Chinese restaurant processes for species sampling mixture models , 2003 .

[5]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[6]  John R. Anderson,et al.  Knowledge tracing: Modeling the acquisition of procedural knowledge , 2005, User Modeling and User-Adapted Interaction.

[7]  Deniz Senturk-Doganaksoy,et al.  Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach , 2006, Technometrics.

[8]  Vincent Aleven,et al.  More Accurate Student Modeling through Contextual Estimation of Slip and Guess Probabilities in Bayesian Knowledge Tracing , 2008, Intelligent Tutoring Systems.

[9]  Shou-De Lin,et al.  Feature Engineering and Classifier Ensemble for KDD Cup 2010 , 2010, KDD 2010.

[10]  Kenneth R. Koedinger,et al.  A Data Repository for the EDM Community: The PSLC DataShop , 2010 .

[11]  Zachary A. Pardos,et al.  Does Time Matter? Modeling the Effect of Time with Bayesian Knowledge Tracing , 2011, EDM.

[12]  Zachary A. Pardos,et al.  KT-IDEM: introducing item difficulty to the knowledge tracing model , 2011, UMAP'11.

[13]  Robert V. Lindsey,et al.  Incorporating Latent Factors Into Knowledge Tracing To Predict Individual Differences In Learning , 2013 .

[14]  Matthew H. Wilder,et al.  Sequential effects in response time reveal learning mechanisms and event representations. , 2013, Psychological review.

[15]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[16]  Peter Brusilovsky,et al.  Integrating Knowledge Tracing and Item Response Theory: A Tale of Two Frameworks , 2014, UMAP Workshops.

[17]  Doug Rohrer,et al.  Interleaved Practice Improves Mathematics Learning. , 2014 .

[18]  Michael C. Mozer,et al.  Automatic Discovery of Cognitive Skills to Improve the Prediction of Student Learning , 2014, NIPS.

[19]  D. Rohrer,et al.  The benefit of interleaved mathematics practice is not limited to superficially similar kinds of problems , 2014, Psychonomic bulletin & review.

[20]  José P. González-Brenes Modeling Skill Acquisition Over Time with Sequence and Topic Modeling , 2015, AISTATS.

[21]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[22]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[23]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[25]  Ilya M. Goldin,et al.  Move your lamp post: Recent data reflects learner knowledge better than older data , 2015, EDM.

[26]  Leonidas J. Guibas,et al.  Deep Knowledge Tracing , 2015, NIPS.

[27]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[28]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.