Combine multiple mass spectral similarity measures for compound identification

Compound identification in gas chromatography-mass spectrometry GC-MS is usually achieved by comparing a query mass spectrum with reference spectral library. The rapid growing spectral library requires a more powerful spectral similarity measure to achieve the best identification performance. In this study, seven spectrum similarity measures were combined to improve the identification accuracy. To reduce the computation time, absolute value distance ABS_VD similarity measure was chosen to construct a sub-library to be searched by all similarity measures. Particle Swarm Optimisation PSO algorithm was used to first find the optimised weights for the similarity score of each similarity measure based on the training data, and then the optimised weights were applied to the test data. Simulation study using the NIST/EPA/NIH Mass Spectral Library 2005 indicates that the combination of multiple similarity measures achieves a better performance than any single similarity measure, with the identification accuracy improved by 2.2% and 1.7% for the training data and the test data, respectively.

[1]  Stephen Stein,et al.  Mass spectral reference libraries: an ever-expanding resource for chemical identification. , 2012, Analytical chemistry.

[2]  Aiqin Fang,et al.  DISCO: distance and spectrum correlation optimization alignment for two-dimensional gas chromatography time-of-flight mass spectrometry-based metabolomics. , 2010, Analytical chemistry.

[3]  Imhoi Koo,et al.  Wavelet- and Fourier-transform-based spectrum similarity approaches to compound identification in gas chromatography/mass spectrometry. , 2011, Analytical chemistry.

[4]  K. Biemann,et al.  Identification of mass spectra by computer-searching a file of known spectra , 1971 .

[5]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[6]  Imhoi Koo,et al.  Comparative analysis of mass spectral matching-based compound identification in gas chromatography-mass spectrometry. , 2013, Journal of chromatography. A.

[7]  Imhoi Koo,et al.  A method of finding optimal weight factors for compound identification in gas chromatography-mass spectrometry , 2012, Bioinform..

[8]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[9]  O. Fiehn,et al.  Mass spectrometry-based metabolic profiling reveals different metabolite patterns in invasive ovarian carcinomas and ovarian borderline tumors. , 2006, Cancer research.

[10]  Arvind Visvanathan Information-theoretic mass spectral library search for comprehensive two-dimensional gas chromatography with mass spectrometry , 2008 .

[11]  F W McLafferty,et al.  Comparison of algorithms and databases for matching unknown mass spectra , 1998, Journal of the American Society for Mass Spectrometry.

[12]  Gary W. Small,et al.  Automated selection of library subsets for infrared spectral searching , 1990 .

[13]  Fang-Xiang Wu,et al.  SVM-RFE based feature selection for tandem mass spectrum quality assessment , 2011, Int. J. Data Min. Bioinform..

[14]  Imhoi Koo,et al.  Compound identification using partial and semipartial correlations for gas chromatography-mass spectrometry data. , 2012, Analytical chemistry.

[15]  Thomas L. Isenhour,et al.  Infrared Library Search on Principal-Component-Analyzed Fourier-Transformed Absorption Spectra , 1987 .

[16]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[17]  M. Hirai,et al.  MassBank: a public repository for sharing mass spectral data for life sciences. , 2010, Journal of mass spectrometry : JMS.

[18]  D. Schomburg,et al.  GC–MS libraries for the rapid identification of metabolites in complex biological samples , 2005, FEBS letters.

[19]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.