Getting the last laugh: automatic laughter segmentation in meetings

Our goal in this work was to develop an accurate method to identify laughter segments, ultimately for the purpose of speaker recognition. Our previous work used MLPs to perform frame level detection of laughter using short-term features, including MFCCs and pitch, and achieved a 7.9% EER on our test set. We improved upon our previous results by including high-level and long-term features, median filtering, and performing segmentation via a hybrid MLP/HMM system with Viterbi decoding. Upon including the long-term features and median filtering, our results improved to 5.4% EER on our test set and 2.7% EER on an equal-prior test set used by others. After attaining segmentation results by incorporating the hybrid MLP/HMM system and Viterbi decoding, we had a 78.5% precision rate and 85.3% recall rate on our test set. To our knowledge these are the best known laughter detection results on the ICSI Meeting Recorder Corpus to date.

[1]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[2]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  R. Provine Laughter: A Scientific Investigation , 2000 .

[4]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[5]  David A. van Leeuwen,et al.  Automatic detection of laughter , 2005, INTERSPEECH.

[6]  J. Trouvain Segmenting Phonetic Units in Laughter , 2003 .

[7]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[8]  Nikki Mirghafori,et al.  Automatic laughter detection using neural networks , 2007, INTERSPEECH.

[9]  Sheri Hunnicutt,et al.  Acoustic analysis of laughter , 1992, ICSLP.

[10]  Gerald Friedland,et al.  Overlapped speech detection for improved speaker diarization in multiparty meetings , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Kornel Laskowski,et al.  Analysis of the occurrence of laughter in meetings , 2007, INTERSPEECH.

[12]  Andreas Stolcke,et al.  Recent innovations in speech-to-text transcription at SRI-ICSI-UW , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Lie Lu,et al.  Highlight sound effects detection in audio stream , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[14]  Daniel P. W. Ellis,et al.  Laughter Detection in Meetings , 2004 .

[15]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[16]  J. Bachorowski,et al.  The acoustic features of human laughter. , 2001, The Journal of the Acoustical Society of America.

[17]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .