In this paper we describe a generalized classification method for HMM-based speech recognition systems, that uses free energy as a discriminant function rather than conventional probabilities. The discriminant function incorporates a single adjustable temperature parameter T. The computation of free energy can be motivated using an entropy regularization, where the entropy grows monotonically with the temperature. In the resulting generalized classification scheme, the values of and give the conventional Viterbi and forward algorithms, respectively, as special cases. We show experimentally that if the test data are mismatched with the classifier, classification at temperatures higher than one can lead to significant improvements in recognition performance. The temperature parameter is far more effective in improving performance on mismatched data than a variance scaling factor, which is another apparent single adjustable parameter that has a very similar analytical form.
[1]
Anders Krogh,et al.
Introduction to the theory of neural computation
,
1994,
The advanced book program.
[2]
Manfred K. Warmuth,et al.
Averaging Expert Predictions
,
1999,
EuroCOLT.
[3]
K. Rose.
Deterministic annealing for clustering, compression, classification, regression, and related optimization problems
,
1998,
Proc. IEEE.
[4]
Yishay Mansour,et al.
Why averaging classifiers can protect against overfitting
,
2001,
AISTATS.
[5]
Richard M. Stern,et al.
Model Compensation and Matched Condition Methods for Robust Speech Recognition
,
2002,
Noise Reduction in Speech Applications.