Corrective feedback and persistent learning for information extraction

To successfully embed statistical machine learning models in real world applications, two post-deployment capabilities must be provided: (1) the ability to solicit user corrections and (2) the ability to update the model from these corrections. We refer to the former capability as corrective feedback and the latter as persistent learning. While these capabilities have a natural implementation for simple classification tasks such as spam filtering, we argue that a more careful design is required for structured classification tasks. One example of a structured classification task is information extraction, in which raw text is analyzed to automatically populate a database. In this work, we augment a probabilistic information extraction system with corrective feedback and persistent learning components to assist the user in building, correcting, and updating the extraction model. We describe methods of guiding the user to incorrect predictions, suggesting the most informative fields to correct, and incorporating corrections into the inference algorithm. We also present an active learning framework that minimizes not only how many examples a user must label, but also how difficult each example is to label. We empirically validate each of the technical components in simulation and quantify the user effort saved. We conclude that more efficient corrective feedback mechanisms lead to more effective persistent learning.

[1]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[2]  I. Good,et al.  The Maximum Entropy Formalism. , 1979 .

[3]  A. Waibel,et al.  Connectionist Viterbi training: a new hybrid method for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[4]  P. Hodor High Precision Information Extraction , 2000 .

[5]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[6]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[7]  W. Bruce Croft,et al.  Table extraction using conditional random fields , 2003, DG.O.

[8]  Hsiao-Wuen Hon,et al.  Word-based acoustic confidence measures for large-vocabulary speech recognition , 1998, ICSLP.

[9]  Andrew McCallum,et al.  Confidence Estimation for Information Extraction , 2004, NAACL.

[10]  Paul A. Viola,et al.  Interactive Information Extraction with Constrained Conditional Random Fields , 2004, AAAI.

[11]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[12]  Craig A. Knoblock,et al.  Active Learning with Strong and Weak Views: A Case Study on Wrapper Induction , 2003, IJCAI.

[13]  Paul N. Bennett Assessing the Calibration of Naive Bayes Posterior Estimates , 2000 .

[14]  R. Schwartz,et al.  The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[15]  Gregory D. Abowd,et al.  Error Correction Techniques for Handwriting, Speech, and Other Ambiguous or Error Prone Systems , 1999 .

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[18]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[19]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[20]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[21]  Alexander H. Waibel,et al.  Model-based and empirical evaluation of multimodal interactive error correction , 1999, CHI '99.

[22]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[23]  L. Ungar,et al.  Active learning for logistic regression , 2005 .

[24]  E. T. Jaynes,et al.  Where do we Stand on Maximum Entropy , 1979 .

[25]  Jason Baldridge,et al.  Active Learning and the Total Cost of Annotation , 2004, EMNLP.

[26]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[27]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[28]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[29]  Brigham Anderson,et al.  Active learning for Hidden Markov Models: objective functions and algorithms , 2005, ICML.

[30]  George F. Foster,et al.  Confidence estimation for translation prediction , 2003, CoNLL.

[31]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[32]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[33]  M. Zuker Suboptimal sequence alignment in molecular biology. Alignment with error analysis. , 1991, Journal of molecular biology.

[34]  Shlomo Argamon,et al.  Committee-Based Sample Selection for Probabilistic Classifiers , 1999, J. Artif. Intell. Res..

[35]  Andrew McCallum,et al.  Reducing Labeling Effort for Structured Prediction Tasks , 2005, AAAI.

[36]  Andreas Vlachos,et al.  Active Annotation , 2022 .

[37]  Claire Cardie,et al.  Proposal for an Interactive Environment for Information Extraction , 1998 .

[38]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.