Learning Invariant Representations of Molecules for Atomization Energy Prediction

The accurate prediction of molecular energetics in chemical compound space is a crucial ingredient for rational compound design. The inherently graph-like, non-vectorial nature of molecular data gives rise to a unique and difficult machine learning problem. In this paper, we adopt a learning-from-scratch approach where quantum-mechanical molecular energies are predicted directly from the raw molecular geometry. The study suggests a benefit from setting flexible priors and enforcing invariance stochastically rather than structurally. Our results improve the state-of-the-art by a factor of almost three, bringing statistical methods one step closer to chemical accuracy.

[1]  W. Goddard,et al.  UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations , 1992 .

[2]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  D. Bates,et al.  Mixed-Effects Models in S and S-PLUS , 2001 .

[5]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[6]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[7]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[8]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[9]  A. Gross,et al.  Representing high-dimensional potential-energy surfaces for reactions at surfaces by neural networks , 2004 .

[10]  Sergei Manzhos,et al.  A random-sampling high dimensional model representation neural network for building potential energy surfaces. , 2006, The Journal of chemical physics.

[11]  Egon L. Willighagen,et al.  The Blue Obelisk—Interoperability in Chemical Informatics , 2006, J. Chem. Inf. Model..

[12]  O. A. von Lilienfeld,et al.  Molecular grand-canonical ensemble density functional theory and exploration of chemical space. , 2006, The Journal of chemical physics.

[13]  Roman M. Balabin,et al.  Neural network approach to quantum-chemistry data: accurate prediction of density functional theory energies. , 2009, The Journal of chemical physics.

[14]  Lorenz C. Blum,et al.  970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. , 2009, Journal of the American Chemical Society.

[15]  R. Kondor,et al.  Gaussian approximation potentials: the accuracy of quantum mechanics, without the electrons. , 2009, Physical review letters.

[16]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[17]  Anubhav Jain,et al.  Finding Nature′s Missing Ternary Oxide Compounds Using Machine Learning and Density Functional Theory. , 2010 .

[18]  Geoffrey E. Hinton,et al.  Learning a better representation of speech soundwaves using restricted boltzmann machines , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Klaus-Robert Müller,et al.  Editorial: Charting Chemical Space: Challenges and Opportunities for Artificial Intelligence and Machine Learning , 2011, Molecular informatics.

[20]  Roman M. Balabin,et al.  Support vector machine regression (LS-SVM)--an alternative to artificial neural networks (ANNs) for the analysis of quantum chemistry data? , 2011, Physical chemistry chemical physics : PCCP.

[21]  J. Behler Neural network potential-energy surfaces in chemistry: a tool for large-scale simulations. , 2011, Physical chemistry chemical physics : PCCP.

[22]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[23]  Klaus-Robert Müller,et al.  ℓ1-penalized linear mixed-effects models for high dimensional data with application to BCI , 2011, NeuroImage.

[24]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.