暂无分享,去创建一个
Heiga Zen | Quan Wang | Nando de Freitas | Oriol Vinyals | David Budden | Aäron van den Oord | Yutian Chen | Ben Laurie | Çaglar Gülçehre | Brendan Shillingford | Andrew Trask | Scott E. Reed | Yannis M. Assael | Luis C. Cobo | Oriol Vinyals | D. Budden | N. D. Freitas | Çaglar Gülçehre | Yutian Chen | H. Zen | Yannis Assael | Brendan Shillingford | Quan Wang | Andrew Trask | Ben Laurie | O. Vinyals
[1] H. Harlow,et al. The formation of learning sets. , 1949, Psychological review.
[2] Joseph P. Olive,et al. Text-to-speech synthesis , 1995, AT&T Technical Journal.
[3] T. Dutoit. An introduction to text-to-speech synthesis , 1997 .
[4] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.
[5] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .
[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[7] Hui Jiang,et al. Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[8] Richard Sproat,et al. The Kestrel TTS text normalization system , 2014, Natural Language Engineering.
[9] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.
[11] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Daan Wierstra,et al. One-Shot Generalization in Deep Generative Models , 2016, ICML.
[13] Heiga Zen,et al. Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices , 2016, INTERSPEECH.
[14] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.
[15] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[16] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.
[17] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[18] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.
[19] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .
[20] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[21] Yoshihiko Nankaku,et al. Redefining the Linguistic Context Feature Set for HMM and DNN TTS Through Position and Parsing , 2016, INTERSPEECH.
[22] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[23] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.
[24] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.
[25] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.
[26] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[27] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[28] Sercan Ömer Arik,et al. Deep Voice 3: 2000-Speaker Neural Text-to-Speech , 2017, ICLR 2018.
[29] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[30] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[31] Dmitry P. Vetrov,et al. Fast Adaptation in Generative Models with Generative Matching Networks , 2016, ICLR.
[32] Ambedkar Dukkipati,et al. Attentive Recurrent Comparators , 2017, ICML.
[33] Tor Lattimore,et al. Online Learning with Gated Linear Networks , 2017, ArXiv.
[34] Misha Denil,et al. Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.
[35] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[36] Jörg Bornschein,et al. Variational Memory Addressing in Generative Models , 2017, NIPS.
[37] Dong Wang,et al. Deep Speaker Feature Learning for Text-Independent Speaker Verification , 2017, INTERSPEECH.
[38] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[39] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.
[40] Lior Wolf,et al. Fitting New Speakers Based on a Short Untranscribed Sample , 2018, ICML.
[41] Lior Wolf,et al. VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop , 2017, ICLR.
[42] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[43] Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.
[44] Thomas Paine,et al. Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions , 2017, ICLR.
[45] Sercan Ömer Arik,et al. Neural Voice Cloning with a Few Samples , 2018, NeurIPS.
[46] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[47] Sergey Levine,et al. One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.
[48] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.