Gradient-Based Manipulation of

[1]  Nicole Norbert Schraudolph Optimization of entropy with neural networks , 1996 .

[2]  N N Schraudolph,et al.  Processing images by semi-linear predictability minimization. , 1999, Network.

[3]  Lucas C. Parra Symplectic Nonlinear Component Analysis , 1995, NIPS.

[4]  Jagat Narain Kapur,et al.  Measures of information and their applications , 1994 .

[5]  Terrence J. Sejnowski,et al.  Unsupervised Discrimination of Clustered Data via Optimization of Binary Information Gain , 1992, NIPS.

[6]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[7]  Shin Ishii,et al.  On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[8]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[9]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[10]  Paul A. Viola,et al.  Empirical Entropy Manipulation for Real-World Problems , 1995, NIPS.

[11]  Jürgen Schmidhuber,et al.  Semilinear Predictability Minimization Produces Well-Known Feature Detectors , 1996, Neural Computation.

[12]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[13]  A. Rényi On Measures of Entropy and Information , 1961 .

[14]  Geoffrey E. Hinton,et al.  Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[15]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2009, Encyclopedia of Artificial Intelligence.

[16]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[17]  Deniz Erdogmus,et al.  Generalized information potential criterion for adaptive system training , 2002, IEEE Trans. Neural Networks.

[18]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[19]  Jürgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[20]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[21]  Thibault Langlois,et al.  Parameter adaptation in stochastic optimization , 1999 .

[22]  Marc M. Van Hulle The Formation of Topographic Maps That Maximize the Average Mutual Information of the Output Responses to Noiseless Input Signals , 1997, Neural Computation.

[23]  A. F. Smith,et al.  A Quasi‐Bayes Sequential Procedure for Mixtures , 1978 .

[24]  Paul A. Viola,et al.  Alignment by maximization of mutual information , 1995, Proceedings of IEEE International Conference on Computer Vision.

[25]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[26]  L. Györfi,et al.  Nonparametric entropy estimation. An overview , 1997 .

[27]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[28]  J. N. Kapur,et al.  Entropy optimization principles with applications , 1992 .

[29]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[30]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[31]  Deniz Erdoğmuş,et al.  Online entropy manipulation: stochastic information gradient , 2003, IEEE Signal Processing Letters.