Prediction of Molecular Substructure Using Mass Spectral Data Based on Deep Learning

In this paper, some metric learning algorithms are used to predict the molecular substructure from mass spectral features. Among them are Discriminative Component Analysis (DCA), Large Margin NN Classifier (LMNN), Information-Theoretic Metric Learning (ITML), Principal Component Analysis (PCA), Multidimensional Scaling (MDS) and Isometric Mapping (ISOMAP). The experimental results show metric learning algorithms achieved better prediction performance than the algorithms based on Elucidation distance. Contrasting to other metric learning algorithms, LMNN is the best one in eleven substructure prediction.

[1]  Stephen Stein,et al.  Mass spectral reference libraries: an ever-expanding resource for chemical identification. , 2012, Analytical chemistry.

[2]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[3]  John C. Clements,et al.  A fast, accurate algorithm for the isometric mapping of a developable surface , 1987 .

[4]  Imhoi Koo,et al.  Wavelet- and Fourier-transform-based spectrum similarity approaches to compound identification in gas chromatography/mass spectrometry. , 2011, Analytical chemistry.

[5]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[6]  F W McLafferty,et al.  Comparison of algorithms and databases for matching unknown mass spectra , 1998, Journal of the American Society for Mass Spectrometry.

[7]  L. Duchene,et al.  An Optimal Transformation for Discriminant and Principal Component Analysis , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Z. Boger Selection of quasi-optimal inputs in chemometrics modeling by artificial neural network analysis , 2003 .

[9]  A. Eghbaldar,et al.  Development of neural networks for identification of structural features from mass spectral data. , 1998 .

[10]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[11]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[12]  K. Biemann,et al.  Identification of mass spectra by computer-searching a file of known spectra , 1971 .

[13]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[14]  Dimitrios Gunopulos,et al.  Large margin nearest neighbor classifiers , 2005, IEEE Transactions on Neural Networks.

[15]  Yi-Zeng Liang,et al.  Improving the classification accuracy in chemistry via boosting technique , 2004 .

[16]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[17]  K. Varmuza,et al.  Feature selection by genetic algorithms for mass spectral classifiers , 2001 .

[18]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[19]  A. MacEachren,et al.  Sampling and Isometric Mapping of Continuous Geographic Surfaces , 1987 .

[20]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[21]  Pascal Vincent,et al.  Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives , 2012, ArXiv.

[22]  O. Fiehn,et al.  Mass spectrometry-based metabolic profiling reveals different metabolite patterns in invasive ovarian carcinomas and ovarian borderline tumors. , 2006, Cancer research.

[23]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Menglong Li,et al.  Computer-assisted prediction of pesticide substructure using mass spectra. , 2007, Analytica chimica acta.

[25]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[26]  K. Varmuza,et al.  Evaluation of mass spectra from organic compounds assumed to be present in cometary grains. Exploratory data analysis , 2002 .

[27]  Daniel Cozzolino,et al.  Classification of Tempranillo wines according to geographic origin: combination of mass spectrometry based electronic nose and chemometrics. , 2010, Analytica chimica acta.

[28]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[29]  Kurt Varmuza,et al.  Mass Spectral Classifiers for Supporting Systematic Structure Elucidation , 1996, J. Chem. Inf. Comput. Sci..

[30]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[31]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[32]  Imhoi Koo,et al.  A method of finding optimal weight factors for compound identification in gas chromatography-mass spectrometry , 2012, Bioinform..

[33]  Neil A. B. Gray,et al.  Computer-assisted structure elucidation , 1986 .

[34]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[35]  D. Schomburg,et al.  GC–MS libraries for the rapid identification of metabolites in complex biological samples , 2005, FEBS letters.