Excited commentator speech detection with unsupervised model adaptation for soccer highlight extraction

Soccer highlight detection is an active research topic in recent years. In this paper, we present our effort to detect an important audio keyword — excited commentator speech, which contributes to a state-of-the-art soccer highlight extraction system. We propose an approach of using statistical classifier based on Gaussian mixture models (GMMs) with unsupervised model adaptation. The excited speech and normal speech are modeled as two GMMs, and are updated to compensate for the acoustic mismatch between training and test data via Maximum a posteriori (MAP) adaptation, starting from the pre-trained GMMs. The adaptation is operated in an unsupervised mode, since the correct classification of the test data is not known, and a first pass of detection using old GMMs is performed to produce hypothesized classification results. Experimental results demonstrate the effectiveness of the proposed approach. Based on the excited speech detection alone, we can recall 87% of the goal events.

[1]  Patrick Bouthemy,et al.  Unsupervised soccer video abstraction based on pitch, dominant color and camera motion analysis , 2004, MULTIMEDIA '04.

[2]  Yi-Ping Phoebe Chen,et al.  Sports video summarization using highlights and play-breaks , 2003, MIR '03.

[3]  Chng Eng Siong,et al.  Automatic replay generation for soccer video broadcasting , 2004, MULTIMEDIA '04.

[4]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[5]  Regunathan Radhakrishnan,et al.  Generation of sports highlights using motion activity in combination with a common audio feature extraction framework , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[6]  Tao Wang,et al.  Semantic Event Detection using Conditional Random Fields , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[7]  Regunathan Radhakrishnan,et al.  Effective and efficient sports highlights extraction using the minimum description length criterion in selecting GMM structures , 2004, ICME.

[8]  Shih-Fu Chang,et al.  Structure analysis of soccer video with hidden Markov models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  A. Murat Tekalp,et al.  Automatic Soccer Video Analysis and Summarization , 2003, IS&T/SPIE Electronic Imaging.

[11]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[12]  Tao Wang,et al.  Soccer Highlight Detection using Two-Dependence Bayesian Network , 2006, 2006 IEEE International Conference on Multimedia and Expo.