论文信息 - Online decoding of Markov models under latency constraints - 字舞流文

Online decoding of Markov models under latency constraints

The Viterbi algorithm is an efficient and optimal method for decoding linear-chain Markov Models. However, the entire input sequence must be observed before the labels for any time step can be generated, and therefore Viterbi cannot be directly applied to online/interactive/streaming scenarios without incurring significant (possibly unbounded) latency. A widely used approach is to break the input stream into fixed-size windows, and apply Viterbi to each window. Larger windows lead to higher accuracy, but result in higher latency.We propose several alternative algorithms to the fixed-sized window decoding approach. These approaches compute a certainty measure on predicted labels that allows us to trade off latency for expected accuracy dynamically, without having to choose a fixed window size up front. Not surprisingly, this more principled approach gives us a substantial improvement over choosing a fixed window. We show the effectiveness of the approach for the task of spotting semi-structured information in large documents. When compared to full Viterbi, the approach suffers a 0.1 percent error degradation with a average latency of 2.6 time steps (versus the potentially infinite latency of Viterbi). When compared to fixed windows Viterbi, we achieve a 40x reduction in error and 6x reduction in latency.

Paul A. Viola | Mukund Narasimhan | Michael Shilman | M. Narasimhan | Michael Shilman

[1] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[2] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[3] Tim Leek,et al. Information Extraction Using Hidden Markov Models , 1997 .

[4] Carla E. Brodley,et al. Approaches to Online Learning and Concept Drift for User Identification in Computer Security , 1998, KDD.

[5] Gerhard Rigoll,et al. Hidden Markov model based continuous online gesture recognition , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[6] Salvatore J. Stolfo,et al. Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[7] Salvatore J. Stolfo,et al. Mining Audit Data to Build Intrusion Detection Models , 1998, KDD.

[8] William DuMouchel,et al. A Fast Computer Intrusion Detection Algorithm Based on Hypothesis Testing of Command Transition Probabilities , 1998, KDD.

[9] Salvatore J. Stolfo,et al. Distributed data mining in credit card fraud detection , 1999, IEEE Intell. Syst..

[10] Andrew McCallum,et al. Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[11] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[12] Ben Taskar,et al. Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[13] Alexander Seward. Low-latency incremental speech transcription in the synface project , 2003, INTERSPEECH.

[14] W. Bruce Croft,et al. Table extraction using conditional random fields , 2003, DG.O.

[15] Paul A. Viola,et al. Interactive Information Extraction with Constrained Conditional Random Fields , 2004, AAAI.

[16] Tom Fawcett,et al. Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[17] Ronald J. Brachman,et al. Brief Application Description; Visual Data Mining: Recognizing Telephone Calling Fraud , 2004, Data Mining and Knowledge Discovery.

[18] Kazem Taghva,et al. Address extraction using hidden Markov models , 2005, IS&T/SPIE Electronic Imaging.

[19] Daniel Marcu,et al. Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.