Unsupervised Minimax: Adversarial Curiosity, Generative Adversarial Networks, and Predictability Minimization

I review unsupervised or self-supervised neural networks playing minimax games in game-theoretic settings: (i) Artificial Curiosity (AC, 1990) is based on two such networks. One network learns to generate a probability distribution over outputs, the other learns to predict effects of the outputs. Each network minimizes the objective function maximized by the other. (ii) Generative Adversarial Networks (GANs, 2010-2014) are an application of AC where the effect of an output is 1 if the output is in a given set, and 0 otherwise. (iii) Predictability Minimization (PM, 1990s) models data distributions through a neural encoder that maximizes the objective function minimized by a neural predictor of the code components. I correct a previously published claim that PM is not based on a minimax game.

[1]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[2]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[3]  Jürgen Schmidhuber,et al.  One Big Net For Everything , 2018, ArXiv.

[4]  S. Shankar Sastry,et al.  On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games , 2019, 1901.00838.

[5]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[6]  Jürgen Schmidhuber Neural Predictors for Detecting and Removing Redundant Information , 2000 .

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[9]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[10]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[11]  H. B. Barlow,et al.  Unsupervised Learning , 1989, Neural Computation.

[12]  Shuchang Zhou,et al.  Learning to Paint With Model-Based Deep Reinforcement Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Jürgen Schmidhuber,et al.  Semilinear Predictability Minimization Produces Well-Known Feature Detectors , 1996, Neural Computation.

[14]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[15]  V. Borkar Stochastic approximation with two time scales , 1997 .

[16]  H. B. Barlow,et al.  Finding Minimum Entropy Codes , 1989, Neural Computation.

[17]  Ferenc Huszar,et al.  How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , 2015, ArXiv.

[18]  N N Schraudolph,et al.  Processing images by semi-linear predictability minimization. , 1997, Network.

[19]  J. Schmidhuber What''s interesting? , 1997 .

[20]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[21]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[22]  R. J. Williams,et al.  On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.

[23]  Jürgen Schmidhuber,et al.  First Experiments with PowerPlay , 2012, Neural networks : the official journal of the International Neural Network Society.

[24]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[25]  B. Widrow,et al.  The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[26]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[28]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[29]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[30]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[31]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[32]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[33]  Jürgen Schmidhuber,et al.  Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[34]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[35]  Jürgen Schmidhuber,et al.  An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[36]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[37]  Jürgen Schmidhuber,et al.  Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[38]  Norbert Wiener,et al.  Cybernetics. , 1948, Scientific American.

[39]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[40]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[41]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[42]  Jürgen Schmidhuber,et al.  Simple Algorithmic Principles of Discovery, Subjective Beauty, Selective Attention, Curiosity & Creativity , 2007, Discovery Science.

[43]  Sepp Hochreiter,et al.  Coulomb GANs: Provably Optimal Nash Equilibria via Potential Fields , 2017, ICLR.

[44]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[45]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[46]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[48]  J. Tsitsiklis,et al.  Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.

[49]  Douglas Eck,et al.  A Neural Representation of Sketch Drawings , 2017, ICLR.

[50]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[51]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[52]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[53]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[54]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[55]  Geoffrey E. Hinton,et al.  Self Supervised Boosting , 2002, NIPS.

[56]  Shalabh Bhatnagar,et al.  Two Timescale Stochastic Approximation with Controlled Markov noise , 2015, Math. Oper. Res..

[57]  J. Neumann,et al.  Theory of Games and Economic Behavior. , 1945 .

[58]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[59]  David Pfau,et al.  Connecting Generative Adversarial Networks and Actor-Critic Methods , 2016, ArXiv.

[60]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[61]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[62]  L. Davisson Rate-distortion theory and application , 1972 .

[63]  Oriol Vinyals,et al.  Synthesizing Programs for Images using Reinforced Adversarial Learning , 2018, ICML.

[64]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[65]  S. Linnainmaa Taylor expansion of the accumulated rounding error , 1976 .

[66]  Kenneth M. Merz,et al.  The application of the genetic algorithm to the minimization of potential energy functions , 1993, J. Glob. Optim..

[67]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[68]  Jürgen Schmidhuber,et al.  Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts , 2006, Connect. Sci..

[69]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[70]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[71]  Jürgen Schmidhuber,et al.  Netzwerkarchitekturen, Zielfunktionen und Kettenregel , 1993 .

[72]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[73]  Jürgen Schmidhuber,et al.  Exploring the predictable , 2003 .

[74]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[75]  Jürgen Schmidhuber,et al.  Artificial curiosity based on discovering novel algorithmic predictability through coevolution , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[76]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[77]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[78]  Ningyuan Zheng,et al.  StrokeNet: A Neural Painting Environment , 2018, ICLR.

[79]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[80]  Jürgen Schmidhuber,et al.  Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..

[81]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[82]  Jürgen Schmidhuber,et al.  On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models , 2015, ArXiv.

[83]  Reiichiro Nakano,et al.  Neural Painters: A learned differentiable constraint for generating brushstroke paintings , 2019, ArXiv.

[84]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[86]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[87]  W. Daniel Hillis,et al.  Co-evolving parasites improve simulated evolution as an optimization procedure , 1990 .

[88]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.