Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces

We propose a computationally efficient algorithm that combines compressed sensing with imitation learning to solve text-based games with combinatorial action spaces. Specifically, we introduce a new compressed sensing algorithm, named IK-OMP, which can be seen as an extension to the Orthogonal Matching Pursuit (OMP). We incorporate IK-OMP into a supervised imitation learning setting and show that the combined approach (Sparse Imitation Learning, Sparse-IL) solves the entire text-based game of Zork1 with an action space of approximately 10 million actions given both perfect and noisy demonstrations.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Gitta Kutyniok,et al.  Compressed Sensing for Finite-Valued Signals , 2016, 1609.09450.

[3]  Mohammed Bennamoun,et al.  Generating Bags of Words from the Sums of Their Word Embeddings , 2016, CICLing.

[4]  Mike E. Davies,et al.  Gradient Pursuits , 2008, IEEE Transactions on Signal Processing.

[5]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[6]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[7]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[8]  Mikael Skoglund,et al.  Look ahead orthogonal matching pursuit , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[10]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  R. Calderbank Compressed Learning : Universal Sparse Dimensionality Reduction and Learning in the Measurement Domain , 2009 .

[13]  Regina Barzilay,et al.  Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[14]  Robert F. H. Fischer,et al.  Soft-feedback OMP for the recovery of discrete-valued sparse signals , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[15]  Mikhail Khodak,et al.  A Compressed Sensing View of Unsupervised Text Embeddings, Bag-of-n-Grams, and LSTMs , 2018, ICLR.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[18]  Romain Laroche,et al.  Counting to Explore and Generalize in Text-based Games , 2018, ArXiv.

[19]  Michael Elad,et al.  A Plurality of Sparse Representations Is Better Than the Sparsest One Alone , 2009, IEEE Transactions on Information Theory.

[20]  Shang-Ho Tsai,et al.  A K-best orthogonal matching pursuit for compressive sensing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Robert F. H. Fischer,et al.  MMSE-based version of OMP for recovery of discrete-valued sparse signals , 2016 .

[22]  Shie Mannor,et al.  Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning , 2018, NeurIPS.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[24]  John O. Greene Action Assembly Theory , 2015 .

[25]  Mikulás Zelinka Using reinforcement learning to learn how to play text-based games , 2018, ArXiv.

[26]  Yonina C. Eldar,et al.  Spatial Compressive Sensing for MIMO Radar , 2013, IEEE Transactions on Signal Processing.

[27]  Arkadi Nemirovski,et al.  Lectures on modern convex optimization - analysis, algorithms, and engineering applications , 2001, MPS-SIAM series on optimization.

[28]  E. Candès,et al.  Error correction via linear programming , 2005, FOCS 2005.

[29]  Zheng Wen,et al.  Optimal Demand Response Using Device-Based Reinforcement Learning , 2014, IEEE Transactions on Smart Grid.

[30]  John O. Greene A cognitive approach to human communication: An action assembly theory , 1984 .

[31]  Matthew J. Hausknecht,et al.  TextWorld: A Learning Environment for Text-based Games , 2018, CGW@IJCAI.

[32]  Erik G. Larsson,et al.  Spectrum Sensing for Cognitive Radio : State-of-the-Art and Recent Advances , 2012, IEEE Signal Processing Magazine.

[33]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[34]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[35]  Davy Preuveneers,et al.  The intelligent industry of the future: A survey on emerging trends, research challenges and opportunities in Industry 4.0 , 2017, J. Ambient Intell. Smart Environ..

[36]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[37]  Laura Rebollo-Neira,et al.  A swapping-based refinement of orthogonal matching pursuit strategies , 2006, Signal Process..

[38]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[39]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[40]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.

[41]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[42]  Marc-Alexandre Côté,et al.  Towards Solving Text-based Games by Producing Adaptive Action Spaces , 2018, ArXiv.

[43]  Gitta Kutyniok,et al.  1 . 2 Sparsity : A Reasonable Assumption ? , 2012 .