From molecular model to sparse representation of chromatographic signals with an unknown number of peaks

Analysis of a fluid mixture using a chromatographic system is a standard technique for many biomedical applications such as in-vitro diagnostic of body fluids or air and water quality assessment. The analysis is often dedicated towards a set of molecules or biomarkers. However, due to the fluid complexity, the number of mixture components is often larger than the list of targeted molecules. In order to get an analysis as exhaustive as possible and also to take into account possible interferences, it is important to identify and to quantify all the components that are included in the chromatographic signal. Thus the signal processing aims to reconstruct a list of an unknown number of components and their relative concentrations. We address this question as a problem of sparse representation of a chromatographic signal. The innovative representation is based on a stochastic forward model describing the transport of elementary molecules in the chromatography column as a molecular random walk. We investigate three methods: two probabilistic Bayesian approaches, one parametric and one non-parametric, and a determinist approach based on a parsimonious decomposition on a dictionary basis. We examine the performances of these 3 approaches on an experimental case dedicated to the analysis of mixtures of the micro-pollutants Polycyclic Aromatic Hydrocarbons (PAH) in a methanol solution in two cases of high and low signal to noise ratio (SNR).