Boosting-Based Multimodal Speaker Detection for Distributed Meetings
暂无分享,去创建一个
Paul A. Viola | Yong Rui | Ross Cutler | Pei Yin | Cha Zhang | Y. Rui | Cha Zhang | Pei Yin | Ross Cutler
[1] A. Adjoudani,et al. On the Integration of Auditory and Visual Parameters in an HMM-based ASR , 1996 .
[2] Alex Pentland,et al. Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.
[3] Hong Wang,et al. Voice source localization for automatic camera pointing system in videoconferencing , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[4] Alex Pentland,et al. Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[5] Alex Pentland,et al. Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..
[6] Steven George Goodridge. Multimedia sensor fusion for intelligent camera control and human-computer interaction , 1997 .
[7] Michael S. Brandstein,et al. A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[8] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.
[9] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[10] Kentaro Toyama,et al. Wallflower: principles and practice of background maintenance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.
[11] Takeo Kanade,et al. A System for Video Surveillance and Monitoring , 2000 .
[12] Vladimir Pavlovic,et al. Multimodal speaker detection using error feedback dynamic Bayesian networks , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).
[13] Trevor Darrell,et al. Learning Joint Statistical Models for Audio-Visual Fusion and Segregation , 2000, NIPS.
[14] Giridharan Iyengar,et al. Speaker change detection using joint audio-visual statistics , 2000, RIAO.
[15] J. Friedman. Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .
[16] Larry S. Davis,et al. Look who's talking: speaker detection using video and audio correlation , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[17] Juergen Luettin,et al. Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..
[18] Patrick Pérez,et al. Sequential Monte Carlo Fusion of Sound and Vision for Speaker Tracking , 2001, ICCV.
[19] Milind R. Naphade,et al. Duration dependent input output markov models for audio-visual event detection , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..
[20] B.D. Rao,et al. Source localization in reverberant environments: performance bounds and ML estimation , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).
[21] Gopal Sarma Pingali,et al. A multimodal speaker detection and tracking system for teleconferencing , 2002, MULTIMEDIA '02.
[22] Anoop Gupta,et al. Distributed meetings: a meeting capture and broadcasting system , 2002, MULTIMEDIA '02.
[23] Nebojsa Jojic,et al. Audio-Video Sensor Fusion with Probabilistic Graphical Models , 2002, ECCV.
[24] Larry S. Davis,et al. Joint Audio-Visual Tracking Using Particle Filters , 2002, EURASIP J. Adv. Signal Process..
[25] Harriet J. Nock,et al. Speaker Localisation Using Audio-Visual Synchrony: An Empirical Study , 2003, CIVR.
[26] Shih-Fu Chang,et al. Discovery and fusion of salient multimodal features toward news story segmentation , 2003, IS&T/SPIE Electronic Imaging.
[27] Michael R. M. Jenkin,et al. Audiovisual localization of multiple speakers in a video teleconferencing setting , 2003, Int. J. Imaging Syst. Technol..
[28] Yong Rui,et al. Real-time speaker tracking using particle filter sensor fusion , 2004, Proceedings of the IEEE.
[29] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.
[30] Yoram Singer,et al. Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.
[31] Paul A. Viola,et al. Detecting Pedestrians Using Patterns of Motion and Appearance , 2005, International Journal of Computer Vision.
[32] Yong Rui,et al. Sound source localization for circular arrays of directional microphones , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[33] Carlos Busso,et al. Smart room: participant and speaker localization and identification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[34] John W. McDonough,et al. A joint particle filter for audio-visual speaker tracking , 2005, ICMI '05.
[35] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[36] Murat Kunt,et al. School of Engineering -sti Signal Processing Institute Information Theoretic Optimization of Audio Features for Multimodal Speaker Detection Information Theoretic Optimization of Audio Features for Multimodal Speaker Detection , 2022 .
[37] Zhengyou Zhang,et al. Maximum Likelihood Sound Source Localization for Multiple Directional Microphones , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[38] Paul A. Viola,et al. Multiple-Instance Pruning For Learning Efficient Cascade Detectors , 2007, NIPS.
[39] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent in Function Space , 2007 .