Ubiquitous speech communication interface

The Holy Grail of telecommunication is to bring people thousands miles apart, anytime, anywhere, together to communicate as if they were having a face-to-face conversation in a ubiquitous telepresence scenario. One key component necessary to reach this Holy Grail is the technology that supports hands-free speech communication. Hands-free telecommunication (both telephony and teleconferencing) refers to a communication mode in which the participants interact with each other over a communication network, without having to wear or hold any special device. For speech communications, we normally need a loudspeaker, a microphone or a headset. The goal of hands-free speech communication is thus to provide the users with an intelligent voice interface, which provides high quality communication and is safe, convenient, and natural to use. This goal stipulates many challenging technical issues, such as multiple sound sources, echo and reverberation in the room, and natural human-machine interaction, the resolution of which needs to be integrated into a working system before the benefit of hands-free telecommunication can be realized. We analyze these issues and review progress made in the last two decades, particularly from the viewpoint of signal acquisition, restoration and enhancement. We lay out new technical dimensions that may lead to further advances towards realization of a truly ubiquitous speech communication interface to an intelligent information source, be it a human or a machine.

[1]  Neviano Dal Degan,et al.  Acoustic noise analysis and speech enhancement techniques for mobile radio applications , 1988 .

[2]  Jacob Benesty,et al.  Frequency-domain adaptive filtering revisited, generalization to the multi-channel case, and application to acoustic echo cancellation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[4]  Fumitada Itakura,et al.  An approach to dereverberation using multi-microphone sub-band envelope estimation , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Jacob Benesty,et al.  Double-talk detection schemes for acoustic echo cancellation , 2000 .

[6]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[7]  Biing-Hwang Juang,et al.  A block least squares approach to acoustic echo cancellation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Biing-Hwang Juang,et al.  On the application of hidden Markov models for enhancing noisy speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[9]  F. Itakura,et al.  Speech Enhancement Using Nonlinear Microphone Array Based on Complementary Beamforming , 1999 .

[10]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..