ITR : Listen and Learn – Artificial Intelligence in Auditory Environments

Most academic departments in computer science and electrical engineering provide some sort of intellectual umbrella to integrate research in machine learning, computer vision, robotics, and artificial intelligence (AI). For largely historical and outmoded reasons, however, research in auditory computation and machine listening has not been included in these efforts. We hope to rectify this situation.

[1]  Daniel P. W. Ellis,et al.  Decoding speech in the presence of other sources , 2005, Speech Commun..

[2]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[3]  Daniel P. W. Ellis,et al.  Multi-channel source separation by factorial HMMs , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Malcolm Slaney,et al.  BabyEars: A recognition system for affective vocalizations , 2003, Speech Commun..

[5]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[6]  Daniel D. Lee,et al.  Dimensionality Reduction for Sensorimotor Learning in Mobile Robotics , 2002, Optics + Photonics.

[7]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[8]  Michael Kearns,et al.  CobotDS: a spoken dialogue system for chat , 2002, AAAI/IAAI.

[9]  Harriet J. Nock,et al.  Modelling asynchrony in automatic speech recognition using loosely coupled hidden Markov models , 2002, Cogn. Sci..

[10]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[11]  Yann LeCun,et al.  Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch , 2002, NIPS.

[12]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[13]  Alex Pentland,et al.  Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[14]  Daniel P. W. Ellis,et al.  The auditory organization of speech and other sources in listeners and computational models , 2001, Speech Commun..

[15]  Steve J. Young,et al.  Statistical Modeling in Continuous Speech Recognition (CSR) , 2001, UAI.

[16]  S. Shamma On the role of space and time in auditory processing , 2001, Trends in Cognitive Sciences.

[17]  Peter Stone,et al.  A social reinforcement learning agent , 2001, AGENTS '01.

[18]  Masataka Goto A predominant-F/sub 0/ estimation method for CD recordings: MAP estimation using EM algorithm for adaptive tone models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[19]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[20]  Lawrence K. Saul,et al.  A statistical model for robust integration of narrowband cues in speech , 2001, Comput. Speech Lang..

[21]  Andreas Stolcke,et al.  The Meeting Project at ICSI , 2001, HLT.

[22]  Chin-Hui Lee,et al.  A structural Bayes approach to speaker adaptation , 2001, IEEE Trans. Speech Audio Process..

[23]  J J Hopfield,et al.  What is a moment? Transient synchrony as a collective mechanism for spatiotemporal integration. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  COMBINING BOTTOM-UP AND TOP-DOWN CONSTRAINTS FOR ROBUST ASR : THE MULTISOURCE DECODER , 2001 .

[25]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[26]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[27]  Peter Stone,et al.  Cobot in LambdaMOO: A Social Statistics Agent , 2000, AAAI/IAAI.

[28]  Lawrence K. Saul,et al.  Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech , 2000, NIPS.

[29]  Christian R. Shelton,et al.  Balancing Multiple Sources of Reward in Reinforcement Learning , 2000, NIPS.

[30]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[31]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[32]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[33]  Alex Pentland,et al.  Wearable Audio Computing: A Survey of Interaction Techniques , 2000 .

[34]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[35]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[36]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[37]  F L Wightman,et al.  Resolution of front-back ambiguity in spatial hearing by listener and source movement. , 1999, The Journal of the Acoustical Society of America.

[38]  Tomohiro Nakatani,et al.  Harmonic sound stream segregation using localization and its application to speech stream segregation , 1999, Speech Commun..

[39]  Daniel P. W. Ellis,et al.  Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures , 1999, Speech Commun..

[40]  H. Sebastian Seung,et al.  Learning in Intelligent Embedded Systems , 1999, USENIX Workshop on Embedded Systems.

[41]  Keith Dana Martin,et al.  Sound-source recognition: a theory and computational model , 1999 .

[42]  D.A. Castanon,et al.  Rollout Algorithms for Stochastic Scheduling Problems , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[43]  Chris Schmandt,et al.  Speaking and listening on the run: design for wearable audio computing , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[44]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[45]  Alex Pentland,et al.  Auditory Context Awareness via Wearable Computing , 1998 .

[46]  Bradley J. Rhodes,et al.  The wearable remembrance agent: A system for augmented memory , 1997, Digest of Papers. First International Symposium on Wearable Computers.

[47]  Roberto Pieraccini,et al.  A stochastic model of computer-human interaction for learning dialogue strategies , 1997, EUROSPEECH.

[48]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[49]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Steve Mann,et al.  Wearable Computing: A First Step Toward Personal Imaging , 1997, Computer.

[51]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[52]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[53]  S. Schaal,et al.  One-Handed Juggling: A Dynamical Approach to a Rhythmic Movement Task. , 1996, Journal of motor behavior.

[54]  Hynek Hermansky,et al.  Towards increasing speech recognition error rates , 1995, Speech Commun..

[55]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[56]  Thomas G. Dietterich,et al.  High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.

[57]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[58]  R. Duda,et al.  Combined monaural and binaural localization of sound sources , 1995, Conference Record of The Twenty-Ninth Asilomar Conference on Signals, Systems and Computers.

[59]  J. J. Hopfield,et al.  Pattern recognition computation using action potential timing for stimulus representation , 1995, Nature.

[60]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[61]  C. J. Darwin,et al.  Chapter 11 – Auditory Grouping , 1995 .

[62]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[63]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[64]  Giuseppe Riccardi,et al.  THE 1994 AT&T ATIS CHRONUS RECOGNIZER , 1994 .

[65]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[66]  Guy J. Brown Computational auditory scene analysis : a representational approach , 1993 .

[67]  Jont B. Allen How do humans process and recognize speech , 1993 .

[68]  M Konishi,et al.  Listening with two ears. , 1993, Scientific American.

[69]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[70]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[71]  Satinder P. Singh,et al.  Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.

[72]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[73]  S A Shamma,et al.  Stereausis: binaural processing without neural delays. , 1989, The Journal of the Acoustical Society of America.

[74]  George R. Doddington,et al.  An integrated pitch tracking algorithm for speech systems , 1983, ICASSP.

[75]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[76]  Lawrence R. Rabiner,et al.  On the use of autocorrelation analysis for pitch detection , 1977 .

[77]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[78]  T. W. Parsons Separation of speech from interfering speech by means of harmonic selection , 1976 .

[79]  R. M. Warren Perceptual Restoration of Missing Speech Sounds , 1970, Science.

[80]  M. Schroeder Period histogram and product spectrum: new methods for fundamental-frequency measurement. , 1968, The Journal of the Acoustical Society of America.

[81]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[82]  Chasselle PHONETICS. , 1890, Science.