E 1 Analysing Iterative Machine Learning Algorithms with In formation Geometric Methods

Research Associate (level A, step 6) The research associate is essential for adequate progress on the project. We expect the RA to be involved in both aspects of the project (theoretical and algorithmic), depending to an extent on the background of the person we manage to recruit. We have asked for level A step 6 which is the lowest level of appointment the ANU makes for someone with a PhD. The salary for that position would be topped up with a market loading by RSISE because of the huge starting salaries offered in the machine learning field ($100,000US/yr for PhD graduates is not uncommon). We have built in standard annual increments. Programmer (ANU06) Although the majority of the work in this project will be theoretical the eventual goals are practical ones — the development of improved learning algorithms. In order to make honest assessments of the algorithms, and more significantly to attempt their deployment on a range of practical problems, we will need the use of a part-time programmer. We are asking for a programmer for 5 days a month at ANU06 level. This is at a salary that is typically commanded by a good programmer of the sort we seek (we are looking to final year computer science students who would be interested in this sort of part-time work). PhD Scholarships Machine learning is an attractive area for PhD students and we expect to be able to make considerable use of at least two students on this project. Their presence is necessary in order to make satisfactory progress on the range of topics to be investigated. We have used the standard ARC figure for a cost. Note that RSISE will provide $8000 top-up per year in order to better attract high quality candidates. Equipment We have sought funds to provide a standard desktop computing environment for the RA and the programmer(s) (a total of two machines will suffice). We used a figure of $2000 each based on our recent experience in sourcing suitable machines for other projects. (There is little point in getting a quote now because of the rapidity of change; $2000 is a modest figure.) The university provides such facilties for the CIs, but it does not factor provision of such things to staff appointed on specific research grants. We have also sought $2000/yr for maintenance of existing computing equipment (our Beowulf cluster) (a consequence of using …

[1]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .

[2]  O. Barndorff-Nielsen Parametric statistical models and likelihood , 1988 .

[3]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[4]  David A. Cohn,et al.  Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[5]  Nick Littlestone,et al.  Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[6]  Philip M. Long,et al.  On-line learning of linear functions , 1991, STOC '91.

[7]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[8]  Manfred K. Warmuth,et al.  Bounds on approximate steepest descent for likelihood maximization in exponential families , 1994, IEEE Trans. Inf. Theory.

[9]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  Manfred K. Warmuth,et al.  The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant , 1995, COLT '95.

[12]  Shun-ichi Amari,et al.  Adaptive Online Learning Algorithms for Blind Separation: Maximum Entropy and Minimum Mutual Information , 1997, Neural Computation.

[13]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[14]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[15]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[16]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[17]  Manfred K. Warmuth,et al.  Boosting as entropy projection , 1999, COLT '99.

[18]  Manfred K. Warmuth,et al.  Relative loss bounds for single neurons , 1999, IEEE Trans. Neural Networks.

[19]  Kenji Fukumizu,et al.  Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.

[20]  Gunnar Rätsch,et al.  Barrier Boosting , 2000, COLT.

[21]  Thore Graepel,et al.  Large Scale Bayes Point Machines , 2000, NIPS.

[22]  Manfred K. Warmuth,et al.  Relative Expected Instantaneous Loss Bounds , 2000, J. Comput. Syst. Sci..

[23]  Kenji Fukumizu,et al.  Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.

[24]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[25]  David P. Helmbold,et al.  Leveraging for Regression , 2000, COLT.

[26]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[27]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[28]  Claudio Gentile,et al.  A New Approximate Maximal Margin Classification Algorithm , 2002, J. Mach. Learn. Res..

[29]  Yves Grandvalet Bagging Can Stabilize without Reducing Variance , 2001, ICANN.

[30]  Gunnar Rätsch,et al.  On the Convergence of Leveraging , 2001, NIPS.

[31]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[32]  Gunnar Rätsch,et al.  Active Learning in the Drug Discovery Process , 2001, NIPS.

[33]  Colin Campbell,et al.  Bayes Point Machines , 2001, J. Mach. Learn. Res..

[34]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[35]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2002, J. Mach. Learn. Res..

[36]  Robert E. Mahony,et al.  Prior Knowledge and Preferential Structures in Gradient Descent Learning Algorithms , 2001, J. Mach. Learn. Res..

[37]  G. Rätsch Robust Boosting via Convex Optimization , 2001 .

[38]  Robert C. Williamson,et al.  Convergence of exponentiated gradient algorithms , 2001, IEEE Trans. Signal Process..

[39]  Gunnar Rätsch,et al.  Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Gunnar Rätsch,et al.  Maximizing the Margin with Boosting , 2002, COLT.

[41]  C. Richard Johnson,et al.  Exploiting sparsity in adaptive filters , 2002, IEEE Trans. Signal Process..

[42]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[43]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[44]  Manfred K. Warmuth,et al.  Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.

[45]  Yi Li,et al.  The Relaxed Online Maximum Margin Algorithm , 1999, Machine Learning.

[46]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[47]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[48]  Gunnar Rätsch,et al.  Sparse Regression Ensembles in Infinite and Finite Hypothesis Spaces , 2002, Machine Learning.

[49]  Manfred K. Warmuth,et al.  Relative Loss Bounds for Temporal-Difference Learning , 2000, Machine Learning.

[50]  Statistics and Differential Geometry , 2005 .

[51]  Prateek Jain,et al.  On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions , 2013, ICML.

[52]  A. Hall,et al.  Adaptive Switching Circuits , 2016 .

[53]  —Communications Systems , 2020, Facility Manager's Handbook.