论文信息 - Unsupervised Minimax: Adversarial Curiosity, Generative Adversarial Networks, and Predictability Minimization

Unsupervised Minimax: Adversarial Curiosity, Generative Adversarial Networks, and Predictability Minimization

I review unsupervised or self-supervised neural networks playing minimax games in game-theoretic settings: (i) Artificial Curiosity (AC, 1990) is based on two such networks. One network learns to generate a probability distribution over outputs, the other learns to predict effects of the outputs. Each network minimizes the objective function maximized by the other. (ii) Generative Adversarial Networks (GANs, 2010-2014) are an application of AC where the effect of an output is 1 if the output is in a given set, and 0 otherwise. (iii) Predictability Minimization (PM, 1990s) models data distributions through a neural encoder that maximizes the objective function minimized by a neural predictor of the code components. I correct a previously published claim that PM is not based on a minimax game.

Juergen Schmidhuber | J. Schmidhuber

[1] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[2] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[3] Jürgen Schmidhuber,et al. One Big Net For Everything , 2018, ArXiv.

[4] S. Shankar Sastry,et al. On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games , 2019, 1901.00838.

[5] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[6] Jürgen Schmidhuber. Neural Predictors for Detecting and Removing Redundant Information , 2000 .

[7] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[8] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[9] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[10] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[11] H. B. Barlow,et al. Unsupervised Learning , 1989, Neural Computation.

[12] Shuchang Zhou,et al. Learning to Paint With Model-Based Deep Reinforcement Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13] Jürgen Schmidhuber,et al. Semilinear Predictability Minimization Produces Well-Known Feature Detectors , 1996, Neural Computation.

[14] Paul J. Werbos,et al. Applications of advances in nonlinear sensitivity analysis , 1982 .

[15] V. Borkar. Stochastic approximation with two time scales , 1997 .

[16] H. B. Barlow,et al. Finding Minimum Entropy Codes , 1989, Neural Computation.

[17] Ferenc Huszar,et al. How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , 2015, ArXiv.

[18] N N Schraudolph,et al. Processing images by semi-linear predictability minimization. , 1997, Network.

[19] J. Schmidhuber. What''s interesting? , 1997 .

[20] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[21] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[22] R. J. Williams,et al. On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.

[23] Jürgen Schmidhuber,et al. First Experiments with PowerPlay , 2012, Neural networks : the official journal of the International Neural Network Society.

[24] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[25] B. Widrow,et al. The truck backer-upper: an example of self-learning in neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[26] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27] R. A. Leibler,et al. On Information and Sufficiency , 1951 .

[28] W. J. Studden,et al. Theory Of Optimal Experiments , 1972 .

[29] Jürgen Schmidhuber,et al. World Models , 2018, ArXiv.

[30] Aaron C. Courville,et al. Adversarially Learned Inference , 2016, ICLR.

[31] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[32] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[33] Jürgen Schmidhuber,et al. Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[34] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[35] Jürgen Schmidhuber,et al. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[36] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[37] Jürgen Schmidhuber,et al. Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[38] Norbert Wiener,et al. Cybernetics. , 1948, Scientific American.

[39] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[40] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .

[41] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[42] Jürgen Schmidhuber,et al. Simple Algorithmic Principles of Discovery, Subjective Beauty, Selective Attention, Curiosity & Creativity , 2007, Discovery Science.

[43] Sepp Hochreiter,et al. Coulomb GANs: Provably Optimal Nash Equilibria via Potential Fields , 2017, ICLR.

[44] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.

[45] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[46] Raymond Y. K. Lau,et al. Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[47] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[48] J. Tsitsiklis,et al. Convergence rate of linear two-time-scale stochastic approximation , 2004, math/0405287.

[49] Douglas Eck,et al. A Neural Representation of Sketch Drawings , 2017, ICLR.

[50] Jürgen Schmidhuber,et al. Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[51] Léon Bottou,et al. Wasserstein GAN , 2017, ArXiv.

[52] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[53] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[54] Pierre-Yves Oudeyer,et al. Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[55] Geoffrey E. Hinton,et al. Self Supervised Boosting , 2002, NIPS.

[56] Shalabh Bhatnagar,et al. Two Timescale Stochastic Approximation with Controlled Markov noise , 2015, Math. Oper. Res..

[57] J. Neumann,et al. Theory of Games and Economic Behavior. , 1945 .

[58] Jürgen Schmidhuber,et al. PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[59] David Pfau,et al. Connecting Generative Adversarial Networks and Actor-Critic Methods , 2016, ArXiv.

[60] Navdeep Jaitly,et al. Adversarial Autoencoders , 2015, ArXiv.

[61] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[62] L. Davisson. Rate-distortion theory and application , 1972 .

[63] Oriol Vinyals,et al. Synthesizing Programs for Images using Reinforced Adversarial Learning , 2018, ICML.

[64] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[65] S. Linnainmaa. Taylor expansion of the accumulated rounding error , 1976 .

[66] Kenneth M. Merz,et al. The application of the genetic algorithm to the minimization of potential energy functions , 1993, J. Glob. Optim..

[67] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[68] Jürgen Schmidhuber,et al. Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts , 2006, Connect. Sci..

[69] Paul J. Werbos,et al. Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[70] George Trigeorgis,et al. Domain Separation Networks , 2016, NIPS.

[71] Jürgen Schmidhuber,et al. Netzwerkarchitekturen, Zielfunktionen und Kettenregel , 1993 .

[72] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[73] Jürgen Schmidhuber,et al. Exploring the predictable , 2003 .

[74] Trevor Darrell,et al. Adversarial Feature Learning , 2016, ICLR.

[75] Jürgen Schmidhuber,et al. Artificial curiosity based on discovering novel algorithmic predictability through coevolution , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[76] Pierre Baldi,et al. Bayesian surprise attracts human attention , 2005, Vision Research.

[77] C. E. SHANNON,et al. A mathematical theory of communication , 1948, MOCO.

[78] Ningyuan Zheng,et al. StrokeNet: A Neural Painting Environment , 2018, ICLR.

[79] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[80] Jürgen Schmidhuber,et al. Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..

[81] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[82] Jürgen Schmidhuber,et al. On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models , 2015, ArXiv.

[83] Reiichiro Nakano,et al. Neural Painters: A learned differentiable constraint for generating brushstroke paintings , 2019, ArXiv.

[84] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[85] Rob Fergus,et al. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[86] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[87] W. Daniel Hillis,et al. Co-evolving parasites improve simulated evolution as an optimization procedure , 1990 .

[88] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.