Perceptually motivated temporal modeling of footsteps in a cross-environmental detection task

Real world sounds are ubiquitous and form an important part of the edifice of our cognitive abilities. Their perception combines signatures from spectral and temporal domains, among others, yet traditionally their analysis is focused on the frame based spectral properties. We consider the problem of sound analysis from perceptual perspective and investigate the temporal properties of a “footsteps” sound, which is a particularly challenging from the time-frequency analysis viewpoint. We identify the irregular repetition of the self similarity and the sense of duration as significant to its perceptual quality and extract features using the Teager-Kaiser energy operator. We build an acoustic event detection system for “footsteps” which shows promising results for detection in cross-environmental conditions when compared with conventional approach.

[1]  M. T. Johnson,et al.  Capacity and complexity of HMM duration modeling techniques , 2005, IEEE Signal Processing Letters.

[2]  Ewa Lukasik,et al.  Comparison of some time-frequency analysis methods for classification of plosives , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[3]  Norbert Dillier,et al.  Sound Classification in Hearing Aids Inspired by Auditory Scene Analysis , 2005, EURASIP J. Adv. Signal Process..

[4]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[5]  Andrey Temko,et al.  Acoustic event detection in meeting-room environments , 2009, Pattern Recognit. Lett..

[6]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[7]  Birger Kollmeier,et al.  Robust speech detection in real acoustic backgrounds with perceptually motivated features , 2011, Speech Commun..

[8]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9]  Joerg F. Hipp,et al.  Time-Frequency Analysis , 2014, Encyclopedia of Computational Neuroscience.

[10]  David V. Anderson,et al.  A Physiologically Inspired Method for Audio Classification , 2005, EURASIP J. Adv. Signal Process..

[11]  A. M. Mimpen,et al.  The ear as a frequency analyzer. II. , 1964, The Journal of the Acoustical Society of America.

[12]  Takaji Matsushima,et al.  Duration Sensation When Listening to Pure Tone and Complex Tone , 2002 .

[13]  Jonathan G. Fiscus,et al.  Multimodal Technologies for Perception of Humans, International Evaluation Workshops CLEAR 2007 and RT 2007, Baltimore, MD, USA, May 8-11, 2007, Revised Selected Papers , 2008, CLEAR.

[14]  Tomasz R. Letowski,et al.  AUDITORY FUNCTION , 2009 .