High-Dimensional Bayesian Optimisation with Variational Autoencoders and Deep Metric Learning

We introduce a method combining variational autoencoders (VAEs) and deep metric learning to perform Bayesian optimisation (BO) over high-dimensional and structured input spaces. By adapting ideas from deep metric learning, we use label guidance from the blackbox function to structure the VAE latent space, facilitating the Gaussian process fit and yielding improved BO performance. Importantly for BO problem settings, our method operates in semi-supervised regimes where only few labelled data points are available. We run experiments on three real-world tasks, achieving state-of-the-art results on the penalised logP molecule generation benchmark using just 3% of the labelled data required by previous approaches. As a theoretical contribution, we present a proof of vanishing regret for VAE BO. * Eual contribution. Correspondence to <firstname.name@huawei.com>

[1]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[2]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[3]  Jieping Ye,et al.  Nonlinear adaptive distance metric learning for clustering , 2007, KDD '07.

[4]  Boonserm Kijsirikul,et al.  A new kernelization framework for Mahalanobis distance learning algorithms , 2010, Neurocomputing.

[5]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[6]  Hao Jianye,et al.  An Empirical Study of Assumptions in Bayesian Optimisation , 2021 .

[7]  Alán Aspuru-Guzik,et al.  Phoenics: A Bayesian Optimizer for Chemistry , 2018, ACS central science.

[8]  Rowan McAllister,et al.  Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ArXiv.

[9]  Jasper Snoek,et al.  Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[10]  Linwei Wang,et al.  Bayesian Optimization on Large Graphs via a Graph Convolutional Generative Model: Application in Cardiac Model Personalization , 2019, MICCAI.

[11]  Hongdong Li,et al.  Learning Trajectory Dependencies for Human Motion Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Lorenzo Torresani,et al.  Large Margin Component Analysis , 2006, NIPS.

[13]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[14]  Cheng Li,et al.  Regret for Expected Improvement over the Best-Observed Value and Stopping Condition , 2017, ACML.

[15]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[16]  Madian Khabsa,et al.  CLEAR: Contrastive Learning for Sentence Representation , 2020, ArXiv.

[17]  Jos'e Miguel Hern'andez-Lobato,et al.  Sample-Efficient Optimization in the Latent Space of Deep Generative Models via Weighted Retraining , 2020, NeurIPS.

[18]  Daniel R. Jiang,et al.  BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization , 2020, NeurIPS.

[19]  Jos'e Miguel Hern'andez-Lobato,et al.  Improving black-box optimization in VAE latent space using decoder uncertainty , 2021, NeurIPS.

[20]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[21]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[22]  Cheng Li,et al.  High Dimensional Bayesian Optimization with Elastic Gaussian Process , 2017, ICML.

[23]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[24]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[25]  Xiaowen Dong,et al.  Neural Architecture Search using Bayesian Optimisation with Weisfeiler-Lehman Kernel , 2020, ArXiv.

[26]  Stephen Tyree,et al.  Non-linear Metric Learning , 2012, NIPS.

[27]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Johannes T. Margraf,et al.  Mapping Materials and Molecules. , 2020, Accounts of chemical research.

[29]  Jan Peters,et al.  Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.

[30]  Alan W Black,et al.  Ordinal Triplet Loss: Investigating Sleepiness Detection from Speech , 2019, INTERSPEECH.

[31]  Gabriela Csurka,et al.  Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Matthias Poloczek,et al.  Efficient search of compositional space for hybrid organic–inorganic perovskites via Bayesian optimization , 2018, npj Computational Materials.

[33]  Francisco Herrera,et al.  A tutorial on distance metric learning: Mathematical foundations, algorithms, experimental analysis, prospects and challenges , 2018, Neurocomputing.

[34]  Ming Huang,et al.  Embedding of Molecular Structure Using Molecular Hypergraph Variational Autoencoder with Metric Learning , 2020, Molecular informatics.

[35]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[36]  Roman Garnett,et al.  Discovering and Exploiting Additive Structure for Bayesian Optimization , 2017, AISTATS.

[37]  Ivan Laptev,et al.  Deep Metric Learning Beyond Binary Supervision , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Haitham Bou-Ammar,et al.  Are we Forgetting about Compositional Optimisers in Bayesian Optimisation? , 2020, J. Mach. Learn. Res..

[39]  Max Welling,et al.  BOCK : Bayesian Optimization with Cylindrical Kernels , 2018, ICML.

[40]  Janardhan Rao Doppa,et al.  Mercer Features for Efficient Combinatorial Bayesian Optimization , 2020, AAAI.

[41]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[42]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[43]  Emanuel Aldea,et al.  Learning a Discriminant Latent Space with Neural Discriminant Analysis , 2021, ArXiv.

[44]  Timothy J. Hazen,et al.  Dimensionality reduction for speech recognition using neighborhood components analysis , 2007, INTERSPEECH.

[45]  Cheng Li,et al.  High Dimensional Bayesian Optimization using Dropout , 2018, IJCAI.

[46]  Aki Vehtari,et al.  Good practices for Bayesian Optimization of high dimensional structured spaces , 2020, Applied AI Letters.

[47]  Ryo Nishikimi,et al.  Pitch-Timbre Disentanglement Of Musical Instrument Sounds Based On Vae-Based Metric Learning , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[49]  Yoni Choukroun,et al.  Deep Discriminative Latent Space for Clustering , 2018, ArXiv.

[50]  Henry B. Moss,et al.  Gaussian Process Molecule Property Prediction with FlowMO , 2020, ArXiv.

[51]  Trevor Darrell,et al.  Discriminative Gaussian process latent variable model for classification , 2007, ICML '07.

[52]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[53]  Jundong Liu,et al.  Nonlinear Metric Learning through Geodesic Interpolation within Lie Groups , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[54]  Jiayan Jiang,et al.  Learning a mixture of sparse distance metrics for classification and dimensionality reduction , 2011, 2011 International Conference on Computer Vision.

[55]  Fei Wang,et al.  Feature Extraction by Maximizing the Average Neighborhood Margin , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Alexei Lapkin,et al.  Multi-task Bayesian Optimization of Chemical Reactions , 2020 .

[57]  Jian Wang,et al.  Deep Metric Learning with Angular Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[58]  Alán Aspuru-Guzik,et al.  Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models , 2018, Frontiers in Pharmacology.

[59]  Artem Cherkasov,et al.  All SMILES Variational Autoencoder , 2019, 1905.13343.

[60]  Gunnar Rätsch,et al.  When crowds hold privileges: Bayesian unsupervised representation learning with oracle constraints , 2015, ICLR.

[61]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[62]  A. Aspuru-Guzik,et al.  Self-driving laboratory for accelerated discovery of thin-film materials , 2019, Science Advances.

[63]  David J. C. MacKay Sustainable Energy - Without the Hot Air , 2008 .

[64]  Marcello Pelillo,et al.  The Group Loss for Deep Metric Learning , 2020, ECCV.

[65]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[66]  Yingli Tian,et al.  Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Luc Van Gool,et al.  Soft Contrastive Learning for Visual Localization , 2020, NeurIPS.

[68]  Ivan V. Oseledets,et al.  Neural networks for topology optimization , 2017, Russian Journal of Numerical Analysis and Mathematical Modelling.

[69]  David Ginsbourger,et al.  On the choice of the low-dimensional domain for global optimization via random embeddings , 2017, J. Glob. Optim..

[70]  Wei Pan,et al.  BayesNAS: A Bayesian Approach for Neural Architecture Search , 2019, ICML.

[71]  B. Schweizer,et al.  Statistical metric spaces. , 1960 .

[72]  Danica Kragic,et al.  Bayesian Optimization in Variational Latent Spaces with Dynamic Compression , 2019, CoRL.

[73]  Riccardo Moriconi,et al.  High-dimensional Bayesian optimization using low-dimensional feature spaces , 2019, Machine Learning.

[74]  Stefano Ermon,et al.  Bayesian optimization and attribute adjustment , 2018, UAI.

[75]  Michael A. Osborne,et al.  The future of employment: How susceptible are jobs to computerisation? , 2017 .

[76]  Jiwen Lu,et al.  Ordinal Deep Learning for Facial Age Estimation , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[77]  Matthias Poloczek,et al.  Scalable Global Optimization via Local Bayesian Optimization , 2019, NeurIPS.

[78]  Ryan-Rhys Griffiths,et al.  Generative model‐enhanced human motion prediction , 2020, Applied AI letters.

[79]  A. G. Zhilinskas,et al.  Single-step Bayesian search method for an extremum of functions of a single variable , 1975 .

[80]  Isabelle Guyon,et al.  Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020 , 2021, NeurIPS.

[81]  Neil D. Lawrence,et al.  Structured Variationally Auto-encoded Optimization , 2018, ICML.

[82]  Ryan P. Adams,et al.  Bayesian reaction optimization as a tool for chemical synthesis , 2021, Nature.

[83]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[84]  Haque Ishfaq,et al.  TVAE: Triplet-Based Variational Autoencoder using Metric Learning , 2018, ArXiv.

[85]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[86]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[87]  Kirthevasan Kandasamy,et al.  Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly , 2019, J. Mach. Learn. Res..

[88]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[89]  José Miguel Hernández-Lobato,et al.  Constrained Bayesian optimization for automatic chemical design using variational autoencoders. , 2019 .

[90]  Marwin H. S. Segler,et al.  GuacaMol: Benchmarking Models for De Novo Molecular Design , 2018, J. Chem. Inf. Model..

[91]  Ryan-Rhys Griffiths,et al.  Achieving robustness to aleatoric uncertainty with heteroscedastic Bayesian optimisation , 2019, Mach. Learn. Sci. Technol..

[92]  Alexis Boukouvalas,et al.  Adaptive Sensor Placement for Continuous Spaces , 2019, ICML.

[93]  Stefano Ermon,et al.  Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance , 2018, NeurIPS.

[94]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[95]  Rong Jin,et al.  Distance Metric Learning: A Comprehensive Survey , 2006 .

[96]  Xilin Chen,et al.  Revised Contrastive Loss for Robust Age Estimation from Face , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[97]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[98]  Nando de Freitas,et al.  Bayesian Optimization in a Billion Dimensions via Random Embeddings , 2013, J. Artif. Intell. Res..

[99]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[100]  Pengtao Xie,et al.  CERT: Contrastive Self-supervised Learning for Language Understanding , 2020, ArXiv.

[101]  Ryan-Rhys Griffiths,et al.  Dataset Bias in the Natural Sciences: A Case Study in Chemical Reaction Prediction and Synthesis Design , 2018, ArXiv.

[102]  Kirthevasan Kandasamy,et al.  High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[103]  Yisong Yue,et al.  Optimizing Photonic Nanostructures via Multi-fidelity Gaussian Processes , 2018, ArXiv.

[104]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[105]  Roman Garnett,et al.  Active Learning of Linear Embeddings for Gaussian Processes , 2013, UAI.

[106]  Suha Kwak,et al.  Proxy Anchor Loss for Deep Metric Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[107]  Yixin Chen,et al.  Nonlinear Metric Learning with Kernel Density Estimation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[108]  Jasper Snoek,et al.  Nonparametric guidance of autoencoder representations using label information , 2012, J. Mach. Learn. Res..

[109]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[110]  Aditya R. Thawani,et al.  The Photoswitch Dataset: A Molecular Machine Learning Benchmark for the Advancement of Synthetic Chemistry , 2020, ArXiv.

[111]  Eytan Bakshy,et al.  Re-Examining Linear Embeddings for High-Dimensional Bayesian Optimization , 2020, NeurIPS.

[112]  Kirthevasan Kandasamy,et al.  ChemBO: Bayesian Optimization of Small Organic Molecules with Synthesizable Recommendations , 2019, AISTATS.

[113]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[114]  Peng Li,et al.  Distance Metric Learning with Eigenvalue Optimization , 2012, J. Mach. Learn. Res..

[115]  Bernard De Baets,et al.  Supervised distance metric learning through maximization of the Jeffrey divergence , 2017, Pattern Recognit..

[116]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[117]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.