Unsupervised Minimax: Adversarial Curiosity, Generative Adversarial Networks, and Predictability Minimization

I review unsupervised or self-supervised neural networks playing minimax games in game-theoretic settings. (i) Adversarial Curiosity (AC, 1990) is based on two such networks. One network learns to probabilistically generate outputs, the other learns to predict effects of the outputs. Each network minimizes the objective function maximized by the other. (ii) Generative Adversarial Networks (GANs, 2010-2014) are an application of AC where the effect of an output is 1 if the output is in a given set, and 0 otherwise. (iii) Predictability Minimization (PM, 1990s) models data distributions through a neural encoder that maximizes the objective function minimized by a neural predictor of the code components. We correct a previously published claim that PM is not based on a minimax game.

[1]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[2]  Corso Elvezia Neural Predictors for Detecting and Removing Redundant Information , 1998 .

[3]  Corso Elvezia What's Interesting? , 1997 .

[4]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[5]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[6]  Ferenc Huszar,et al.  How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , 2015, ArXiv.

[7]  Geoffrey E. Hinton,et al.  Self Supervised Boosting , 2002, NIPS.

[8]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[9]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[10]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[11]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[12]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[13]  S. Hochreiter,et al.  REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[14]  J. Schmidhuber Science as By-Products of Search for Novel Patterns , or Data Compressible in Unknown Yet Learnable Ways , 2009 .

[15]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[16]  Jürgen Schmidhuber,et al.  Simple Algorithmic Principles of Discovery, Subjective Beauty, Selective Attention, Curiosity & Creativity , 2007, Discovery Science.

[17]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[18]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[19]  Jürgen Schmidhuber,et al.  Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts , 2006, Connect. Sci..

[20]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[21]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[22]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[23]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[24]  Douglas Eck,et al.  A Neural Representation of Sketch Drawings , 2017, ICLR.

[25]  F. H. Adler Cybernetics, or Control and Communication in the Animal and the Machine. , 1949 .

[26]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[27]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[28]  Ningyuan Zheng,et al.  StrokeNet: A Neural Painting Environment , 2018, ICLR.

[29]  N N Schraudolph,et al.  Processing images by semi-linear predictability minimization. , 1997, Network.

[30]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[31]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[32]  H. B. Barlow,et al.  Finding Minimum Entropy Codes , 1989, Neural Computation.

[33]  Oriol Vinyals,et al.  Synthesizing Programs for Images using Reinforced Adversarial Learning , 2018, ICML.

[34]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[35]  Jürgen Schmidhuber,et al.  Artificial curiosity based on discovering novel algorithmic predictability through coevolution , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[36]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[37]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[38]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[39]  Sepp Hochreiter,et al.  Coulomb GANs: Provably Optimal Nash Equilibria via Potential Fields , 2017, ICLR.

[40]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[41]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[42]  J. Urgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992 .

[43]  Jürgen Schmidhuber,et al.  Semilinear Predictability Minimization Produces Well-Known Feature Detectors , 1996, Neural Computation.

[44]  Jürgen Schmidhuber,et al.  Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[45]  L. Davisson Rate-distortion theory and application , 1972 .

[46]  Reiichiro Nakano,et al.  Neural Painters: A learned differentiable constraint for generating brushstroke paintings , 2019, ArXiv.

[47]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  S. Edelman The Evolution of Sex : An Examination of Current Ideas , 2017 .

[49]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[50]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[51]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[52]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[53]  Jürgen Schmidhuber,et al.  One Big Net For Everything , 2018, ArXiv.

[54]  Juergen Schmidhuber,et al.  On learning how to learn learning strategies , 1994 .

[55]  Jürgen Schmidhuber,et al.  First Experiments with PowerPlay , 2012, Neural networks : the official journal of the International Neural Network Society.

[56]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[57]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[58]  Shuchang Zhou,et al.  Learning to Paint With Model-Based Deep Reinforcement Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[60]  Jürgen Schmidhuber,et al.  Netzwerkarchitekturen, Zielfunktionen und Kettenregel , 1993 .

[61]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[62]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[63]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[64]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[65]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[66]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[67]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[68]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[69]  W. Daniel Hillis,et al.  Co-evolving parasites improve simulated evolution as an optimization procedure , 1990 .

[70]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[71]  Jürgen Schmidhuber,et al.  Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..

[72]  R. J. Williams,et al.  On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.

[73]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[74]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[75]  Jürgen Schmidhuber,et al.  Exploring the predictable , 2003 .

[76]  Jürgen Schmidhuber,et al.  An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[77]  Jürgen Schmidhuber,et al.  On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models , 2015, ArXiv.

[78]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[79]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[80]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.