The Representation of Chemical Spectral Data for Classification

The classification of unknown samples is among the most common problems found in chemometrics. For this purpose, a proper representation of the data is very important. Nowadays, chemical spectral data are analyzed as vectors of discretized data where the variables have not connection, and other aspects of their functional nature e.g. shape differences (structural), are also ignored. In this paper, we study some advanced representations for chemical spectral datasets, and for that we make a comparison of the classification results of 4 datasets by using their traditional representation and two other: Functional Data Analysis and Dissimilarity Representation. These approaches allow taking into account the information that is missing in the traditional representation, thus better classification results can be achieved. Some suggestions are made about the more suitable dissimilarity measures to use for chemical spectral data.

[1]  Gilbert Saporta,et al.  PLS regression on a stochastic process , 2001, Comput. Stat. Data Anal..

[2]  P. Sarda,et al.  Functional linear model , 1999 .

[3]  S. Wold,et al.  SIMCA: A Method for Analyzing Chemical Data in Terms of Similarity and Analogy , 1977 .

[4]  Bruce R. Kowalski,et al.  Chemometrics: Theory and Application , 1977 .

[5]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[6]  Robert P. W. Duin,et al.  Prototype selection for finding efficient representations of dissimilarity data , 2002, Object recognition supported by user interaction for service robots.

[7]  José Francisco Martínez-Trinidad,et al.  Progress in Pattern Recognition, Image Analysis and Applications, 12th Iberoamericann Congress on Pattern Recognition, CIARP 2007, Valparaiso, Chile, November 13-16, 2007, Proceedings , 2008, CIARP.

[8]  Fabrice Rossi,et al.  Support Vector Machine For Functional Data Classification , 2006, ESANN.

[9]  J. Boardman,et al.  Discrimination among semi-arid landscape endmembers using the Spectral Angle Mapper (SAM) algorithm , 1992 .

[10]  Arnaud Guyader,et al.  Nearest neighbor classification in infinite dimension , 2006 .

[11]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[12]  Isneri Talavera-Bustamante,et al.  Support Vector Regression Methods for Functional Data , 2007, CIARP.

[13]  K. Varmuza,et al.  Spectral similarity versus structural similarity: infrared spectroscopy , 2003 .