Audio signal classification with temporal envelopes

The conventional approach to audio processing, based on the short-time power spectrum model, is not adequate when it comes to general audio signals. We propose an approach, justified by studies from psycho-acoustics and neuroimaging, which uses the magnitude and frequency envelope of the audio signal in the from of AM-FM modulations to build an ARMA model which is then fed to a GMM to classify into various audio classes. We show that it makes explicit certain aspects of the signal which are overlooked when processing is limited to the spectral domain.

[1]  M. Schönwiesner,et al.  Spectro-temporal modulation transfer function of single voxels in the human auditory cortex measured with high-resolution fMRI , 2009, Proceedings of the National Academy of Sciences.

[2]  Y. Ando Model of Temporal and Spatial Factors in the Central Auditory System , 2009 .

[3]  Todor Ganchev,et al.  Generalized Recognition of Sound Events: Approaches and Applications , 2008 .

[4]  Robert J Zatorre,et al.  Neural specializations for speech and pitch: moving beyond the dichotomies , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[5]  Petros Maragos,et al.  Continuous energy demodulation methods and application to speech analysis , 2006, Speech Commun..

[6]  Andrey Temko,et al.  Classification of acoustic events using SVM-based clustering schemes , 2006, Pattern Recognit..

[7]  Fan-Gang Zeng,et al.  Speech recognition with amplitude and frequency modulations. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Satoshi Nakamura,et al.  Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[9]  Zachary M. Smith,et al.  Chimaeric sounds reveal dichotomies in auditory perception , 2002, Nature.

[10]  S. Shamma On the role of space and time in auditory processing , 2001, Trends in Cognitive Sciences.

[11]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[12]  Richard Lippmann,et al.  A comparison of signal processing front ends for automatic word recognition , 1995, IEEE Trans. Speech Audio Process..

[13]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[14]  Joseph W. Hall,et al.  Detection in noise by spectro-temporal pattern analysis. , 1984, The Journal of the Acoustical Society of America.

[15]  E. Owens Introduction to the Psychology of Hearing , 1977 .