Multi-Column Convolutional Neural Networks with Causality-Attention for Why-Question Answering

Why-question answering (why-QA) is a task to retrieve answers (or answer passages) to why-questions (e.g., "why are tsunamis generated?") from a text archive. Several previously proposed methods for why-QA improved their performance by automatically recognizing causalities that are expressed with such explicit cues as "because" in answer passages and using the recognized causalities as a clue for finding proper answers. However, in answer passages, causalities might be implicitly expressed, (i.e., without any explicit cues): "An earthquake suddenly displaced sea water and a tsunami was generated." The previous works did not deal with such implicitly expressed causalities and failed to find proper answers that included the causalities. We improve why-QA based on the following two ideas. First, implicitly expressed causalities in one text might be expressed in other texts with explicit cues. If we can automatically recognize such explicitly expressed causalities from a text archive and use them to complement the implicitly expressed causalities in an answer passage, we can improve why-QA. Second, the causes of similar events tend to be described with a similar set of words (e.g., "seismic energy" and "tectonic plates" for "the Great East Japan Earthquake" and "the 1906 San Francisco Earthquake"). As such, even if we cannot find in a text archive any explicitly expressed cause of an event (e.g., "the Great East Japan Earthquake") expressed in a question (e.g., "Why did the Great East Japan earthquake happen?"), we might be able to identify its implicitly expressed causes with a set of words (e.g., "tectonic plates") that appear in the explicitly expressed cause of a similar event (e.g., "the 1906 San Francisco Earthquake"). We implemented these two ideas in our multi-column convolutional neural networks with a novel attention mechanism, which we call causality attention. Through experiments on Japanese why-QA, we confirmed that our proposed method outperformed the state-of-the-art systems.

[1]  Masaki Murata,et al.  A System for Answering Non-Factoid Japanese Questions by Using Passage Retrieval Weighted Based on Type of Answer , 2007, NTCIR.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Yutaka Kidawara,et al.  Toward Future Scenario Generation: Extracting Event Causality Exploiting Semantic Relation, Context, and Association Features , 2014, ACL.

[4]  Jong-Hoon Oh,et al.  Intra-Sentential Subject Zero Anaphora Resolution using Multi-Column Convolutional Neural Network , 2016, EMNLP.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  Jong-Hoon Oh,et al.  Generating Event Causality Hypotheses through Semantic Relations , 2015, AAAI.

[7]  Ryuichiro Higashinaka,et al.  Corpus-based Question Answering for why-Questions , 2008, IJCNLP.

[8]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[9]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[10]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[11]  Jong-Hoon Oh,et al.  A Semi-Supervised Learning Approach to Why-Question Answering , 2016, AAAI.

[12]  Bowen Zhou,et al.  LSTM-based Deep Learning Models for non-factoid answer selection , 2015, ArXiv.

[13]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[14]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[15]  Jong-Hoon Oh,et al.  Why Question Answering using Sentiment Analysis and Word Classes , 2012, EMNLP.

[16]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Kentaro Torisawa,et al.  Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations , 2008, ACL.

[18]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[19]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[20]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Dan Roth,et al.  Minimally Supervised Event Causality Identification , 2011, EMNLP.

[22]  Koray Kavukcuoglu,et al.  Multiple Object Recognition with Visual Attention , 2014, ICLR.

[23]  Lei Yu,et al.  Deep Learning for Answer Sentence Selection , 2014, ArXiv.

[24]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[25]  Bowen Zhou,et al.  ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[26]  Jong-Hoon Oh,et al.  Why-Question Answering using Intra- and Inter-Sentential Causal Relations , 2013, ACL.

[27]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[28]  Di Wang,et al.  A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering , 2015, ACL.

[29]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[30]  Hans van Halteren,et al.  Learning to rank for why-question answering , 2011, Information Retrieval.

[31]  Jong-Hoon Oh,et al.  Improving Event Causality Recognition with Multiple Background Knowledge Sources Using Multi-Column Convolutional Neural Networks , 2017, AAAI.

[32]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[33]  Roxana Gîrju,et al.  Automatic Detection of Causal Relations for Question Answering , 2003, ACL 2003.

[34]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[35]  Bowen Zhou,et al.  Applying deep learning to answer selection: A study and an open task , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[36]  Cosmin Adrian Bejan,et al.  Commonsense Causal Reasoning Using Millions of Personal Stories , 2011, AAAI.

[37]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[38]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.