On the applications of multimedia processing to communications

The challenge of multimedia processing is to provide services that seamlessly integrate text, sound, image, and video information and to do it in a way that preserves the ease of use and interactivity of conventional plain old telephone service (POTS) telephony. To achieve this goal, there are a number of technological problems that must be considered, including: compression and coding of multimedia signals, including algorithmic issues, standards issues, and transmission issues; synthesis and recognition of multimedia signals, including speech, images, handwriting, and text; organization, storage, and retrieval of multimedia signals, including the appropriate method and speed of delivery, resolution, and quality of service; access methods to the multimedia signal, including spoken natural language interfaces, agent interfaces, and media conversion tools; searching by text, speech, and image queries; browsing by accessing the text, by voice, or by indexed images. In each of these areas, a great deal of progress has been made in the past few years, driven in part by the relentless growth in multimedia personal computers and in part by the promise of broad-band access from the home and from wireless connections. Standards have also played a key role in driving new multimedia services, both on the POTS network and on the Internet. It is the purpose of this paper to review the status of the technology in each of the areas listed above and to illustrate current capabilities by describing several multimedia applications that have been implemented at AT&T Labs over the past several years.

[1]  H. Fletcher Loudness, Masking and Their Relation to the Hearing Process and the Problem of Noise Measurement , 1938 .

[2]  B. Scharf Complex sounds and critical bands. , 1961, Psychological bulletin.

[3]  E. T. Klemmer,et al.  Subjective evaluation of delay and echo suppressors in telephone communications , 1963 .

[4]  R. Krauss,et al.  Effects of Transmission Delay and Access Delay on the Efficiency of Verbal Communication , 1967 .

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  George Nagy,et al.  A Means for Achieving a High Degree of Compaction on Scan-Digitized Printed Text , 1974, IEEE Transactions on Computers.

[7]  R. W. Hatch,et al.  Models for the subjective effects of loss, noise, and talker echo on telephone connections , 1976, The Bell System Technical Journal.

[8]  K. Mohiuddin,et al.  Lossless Binary Image Compression Based on Pattern Matching , 1984 .

[9]  Ben Shneiderman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction , 1998 .

[10]  Jakob Nielsen Book review: Designing the User Interface: Strategies for Effective Human-Computer Interaction by Ben Shneiderman (Addison-Wesley, 1987) , 1987, SGCH.

[11]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[14]  Yann LeCun,et al.  Handwritten zip code recognition with multilayer networks , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[15]  D. W. Robinson,et al.  Psychoacoustics—facts and models , 1991 .

[16]  Nobuhiko Kitawaki,et al.  Pure Delay Effects on Speech Quality in Telecommunications , 1991, IEEE J. Sel. Areas Commun..

[17]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1991, CACM.

[18]  Lawrence O'Gorman,et al.  The RightPages image-based electronic library for alerting and browsing , 1992, Computer.

[19]  Joan L. Mitchell,et al.  JPEG: Still Image Data Compression Standard , 1992 .

[20]  Richard Schaphorst,et al.  Fax: Digital Facsimile Technology and Applications , 1992 .

[21]  Barry G. Haskell,et al.  Adaptive frame/field motion compensated video coding , 1993, Signal Process. Image Commun..

[22]  R. Vaillant,et al.  An original approach for the localization of objects in images , 1993 .

[23]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[24]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[25]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[26]  Yoshua Bengio,et al.  Word normalization for on-line handwritten word recognition , 1994 .

[27]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[28]  Lawrence R. Rabiner,et al.  Applications of voice processing to telecommunications , 1994, Proc. IEEE.

[29]  Yoshua Bengio,et al.  Word normalization for online handwritten word recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[30]  E. Levin,et al.  CHRONUS, The next generation , 1995 .

[31]  Yoshua Bengio,et al.  LeRec: A NN/HMM Hybrid for On-Line Handwriting Recognition , 1995, Neural Computation.

[32]  John W. Woods,et al.  Handbook of visual communications , 1995 .

[33]  Kuldip K. Paliwal,et al.  Speech Coding and Synthesis , 1995 .

[34]  Takeo Kanade,et al.  Informedia Digital Video Library , 1995, CACM.

[35]  Arun N. Netravali,et al.  Digital Pictures: Representation, Compression and Standards , 1995 .

[36]  Uyless Black ATM foundation for broadband networks , 1995 .

[37]  Fernando Pereira,et al.  The AT&t 60,000 word speech-to-text system , 1995, EUROSPEECH.

[38]  Behzad Shahraray,et al.  Scene change detection and content-based sampling of video sequences , 1995, Electronic Imaging.

[39]  E. Bryan Carne Telecommunications primer: signals, building blocks, and networks , 1995 .

[40]  Behzad Shahraray,et al.  Automatic generation of pictorial transcripts of video programs , 1995, Electronic Imaging.

[41]  Robert Wilensky,et al.  Toward active, extensible, networked documents: multivalent architecture and applications , 1996, DL '96.

[42]  Thomas P. Barnwell,et al.  A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[43]  Michael K. Brown,et al.  Development Principles for Dialog-Based Interfaces , 1996, ECAI Workshop on Dialogue Processing in Spoken Language Systems.

[44]  Arun N. Netravali,et al.  Digital Video: An introduction to MPEG-2 , 1996 .

[45]  William A. Pearlman,et al.  A new, fast, and efficient image codec based on set partitioning in hierarchical trees , 1996, IEEE Trans. Circuits Syst. Video Technol..

[46]  Rakesh Dugad,et al.  A Tutorial On Hidden Markov Models , 1996 .

[47]  Joan L. Mitchell,et al.  MPEG Video: Compression Standard , 1996 .

[48]  Gerard Salton,et al.  Automatic text decomposition using text segments and text themes , 1996, HYPERTEXT '96.

[49]  Christian Huitema,et al.  IPv6--the new Internet protocol , 1996 .

[50]  P.G. Howard Lossless and lossy compression of text images by soft pattern matching , 1996, Proceedings of Data Compression Conference - DCC '96.

[51]  Biing-Hwang Juang,et al.  An Overview of Automatic Speech Recognition , 1996 .

[52]  Daniel P. Lopresti Robust retrieval of noisy text , 1996, Proceedings of the Third Forum on Research and Technology Advances in Digital Libraries,.

[53]  K. Rijkse,et al.  H.263: video coding for low-bit-rate communication , 1996, IEEE Commun. Mag..

[54]  Lori Lamel,et al.  Dialog in the RAILTEL telephone-based system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[55]  Itu-T Video coding for low bitrate communication , 1996 .

[56]  Krzysztof J. Cios,et al.  Advances in neural information processing systems 7 , 1997 .

[57]  Marina Bosi,et al.  Overview of MPEG audio : Current and future standards for low-bit-rate audio coding , 1997 .

[58]  Julia Hirschberg,et al.  Progress in speech synthesis , 1997 .

[59]  Paul G. Howard,et al.  Text Image Compression Using Soft Pattern Matching , 1997, Comput. J..

[60]  N. O. Johannesson The ETSI computation model: a tool for transmission planning of telephone networks , 1997 .

[61]  Lixia Zhang,et al.  Resource ReSerVation Protocol (RSVP) - Version 1 Functional Specification , 1997, RFC.

[62]  R. W. Baldwin,et al.  Locking the e-safe , 1997 .

[63]  Carl Malamud,et al.  Speaker identification based text to audio alignment for an audio retrieval system , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[64]  M. Sirbu Credits and debits on the Internet , 1997 .

[65]  Karen Spärck Jones,et al.  Open-vocabulary speech indexing for voice and video mail retrieval , 1997, MULTIMEDIA '96.

[66]  M. Reha Civanlar,et al.  FusionNet: joining the Internet and phone networks for multimedia applications , 1997, MULTIMEDIA '96.

[67]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[68]  Mehryar Mohri,et al.  Weighted determinization and minimization for large vocabulary speech recognition , 1997, EUROSPEECH.

[69]  Henning Schulzrinne,et al.  Real Time Streaming Protocol (RTSP) , 1998, RFC.

[70]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[71]  Allen L. Gorin,et al.  User Interface Issues for Natural Spoken Dialog Systems , 1998 .

[72]  Michael J. Witbrock,et al.  Informedia News-On Demand: Using Speech Recognition to Create a Digital Video Library , 1998 .