Model-based machine learning

Several decades of research in the field of machine learning have resulted in a multitude of different algorithms for solving a broad range of problems. To tackle a new application, a researcher typically tries to map their problem onto one of these existing methods, often influenced by their familiarity with specific algorithms and by the availability of corresponding software implementations. In this study, we describe an alternative methodology for applying machine learning, in which a bespoke solution is formulated for each new application. The solution is expressed through a compact modelling language, and the corresponding custom machine learning code is then generated automatically. This model-based approach offers several major advantages, including the opportunity to create highly tailored models for specific scenarios, as well as rapid prototyping and comparison of a range of alternative models. Furthermore, newcomers to the field of machine learning do not have to learn about the huge range of traditional methods, but instead can focus their attention on understanding a single modelling environment. In this study, we show how probabilistic graphical models, coupled with efficient inference algorithms, provide a very flexible foundation for model-based machine learning, and we outline a large-scale commercial application of this framework involving tens of millions of users. We also describe the concept of probabilistic programming as a powerful software environment for model-based machine learning, and we discuss a specific probabilistic programming language called Infer.NET, which has been widely used in practical applications.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[3]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[4]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[5]  Daniel M. Roy Computability, inference and modeling in probabilistic programming , 2011 .

[6]  Thomas Hofmann,et al.  TrueSkill™: A Bayesian Skill Rating System , 2007 .

[7]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[8]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 2001 .

[9]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[10]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[11]  Tom Heskes,et al.  Fractional Belief Propagation , 2002, NIPS.

[12]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[13]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[14]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[15]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[16]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[17]  Tom Minka,et al.  TrueSkill Through Time: Revisiting the History of Chess , 2007, NIPS.

[18]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[19]  Andrew Thomas,et al.  The BUGS project: Evolution, critique and future directions , 2009, Statistics in medicine.

[20]  Thomas P. Minka,et al.  From Hidden Markov Models to Linear Dynamical Systems , 1999 .

[21]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[22]  T. Minka Power EP , 2004 .

[23]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[24]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[25]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[26]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[27]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[28]  David J. Spiegelhalter,et al.  VIBES: A Variational Inference Engine for Bayesian Networks , 2002, NIPS.

[29]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[30]  Thomas P. Minka,et al.  Gates: A Graphical Notation for Mixture Models , 2008 .