Data generation approaches for topic classification in multilingual spoken dialog systems

The conception of spoken-dialog systems (SDS) usually faces the problem of extending or adapting the system to multiple languages. This implies the creation of modules specifically for the new languages, which is a time consuming process. In this paper, we propose two methods to reduce the time needed to extend the SDS to other languages. Our methods are particularly oriented to the topic classification and semantic tagging tasks and we evaluate their effectiveness on topic classification for three languages: English, Spanish, French.

[1]  Philip Resnik,et al.  Breaking the Resource Bottleneck for Multilingual Parsing , 2002 .

[2]  Luisa Bentivogli,et al.  Looking for lexical gaps , 2000 .

[3]  Carlo Strapparava,et al.  Crossing Parallel Corpora and Multilingual Lexical Databases for WSD , 2005, CICLing.

[4]  Hartwig Holzapfel Towards Development of Multilingual Spoken Dialogue Systems , 2004 .

[5]  Ke Chen,et al.  ISIS: A multilingual spoken dialog system developed with CORBA and KQML agents , 2000, INTERSPEECH.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Jason Weston,et al.  Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[9]  German Rigau,et al.  Book Reviews: EuroWordNet: A Multilingual Database with Lexical Semantic Networks , 1999, CL.

[10]  M. Inés Torres,et al.  EMPATHIC: Empathic, Expressive, Advanced Virtual Coach to Improve Independent Healthy-Life-Years of the Elderly , 2018, Proces. del Leng. Natural.

[11]  Benoît Sagot,et al.  Building a free French wordnet from multilingual resources , 2008 .

[12]  Shan Wang,et al.  Developing Parallel Sense-tagged Corpora with Wordnets , 2013, LAW@ACL.

[13]  Norbert Reithinger,et al.  Insights into the Dialogue Processing of VERBMOBIL , 1997, ANLP.

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[16]  David Yarowsky,et al.  Induction of Fine-Grained Part-of-Speech Taggers via Classifier Combination and Crosslingual Projection , 2005, ParallelText@ACL.

[17]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[18]  F. Mihelic,et al.  Multilingual spoken dialog system , 1999, ISIE '99. Proceedings of the IEEE International Symposium on Industrial Electronics (Cat. No.99TH8465).

[19]  Francis Bond,et al.  A Survey of WordNets and their Licenses , 2011 .

[20]  Francis Bond,et al.  A Survey of WordNet Annotated Corpora , 2014, GWC.

[21]  Jiliang Tang,et al.  A Survey on Dialogue Systems: Recent Advances and New Frontiers , 2017, SKDD.

[22]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[23]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[24]  Emanuele Pianta,et al.  Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus , 2005, Natural Language Engineering.