Smoothing noisy data with spline functions

SummarySmoothing splines are well known to provide nice curves which smooth discrete, noisy data. We obtain a practical, effective method for estimating the optimum amount of smoothing from the data. Derivatives can be estimated from the data by differentiating the resulting (nearly) optimally smoothed spline.We consider the modelyi(ti)+εi,i=1, 2, ...,n,ti∈[0, 1], whereg∈W2(m)={f:f,f′, ...,f(m−1) abs. cont.,f(m)∈ℒ2[0,1]}, and the {εi} are random errors withEεi=0,Eεiεj=σ2δij. The error variance σ2 may be unknown. As an estimate ofg we take the solutiongn, λ to the problem: Findf∈W2(m) to minimize $$\frac{1}{n}\sum\limits_{j = 1}^n {(f(t_j ) - y_j )^2 + \lambda \int\limits_0^1 {(f^{(m)} (u))^2 du} }$$ . The functiongn, λ is a smoothing polynomial spline of degree 2m−1. The parameter λ controls the tradeoff between the “roughness” of the solution, as measured by $$\int\limits_0^1 {[f^{(m)} (u)]^2 du}$$ , and the infidelity to the data as measured by $$\frac{1}{n}\sum\limits_{j = 1}^n {(f(t_j ) - y_j )^2 }$$ , and so governs the average square errorR(λ; g)=R(λ) defined by $$R(\lambda ) = \frac{1}{n}\sum\limits_{j = 1}^n {(g_{n,\lambda } (t_j ) - g(t_j ))^2 }$$ . We provide an estimate $$\hat \lambda$$ , called the generalized cross-validation estimate, for the minimizer ofR(λ). The estimate $$\hat \lambda$$ is the minimizer ofV(λ) defined by $$V(\lambda ) = \frac{1}{n}\parallel (I - A(\lambda ))y\parallel ^2 /\left[ {\frac{1}{n}{\text{Trace(}}I - A(\lambda ))} \right]^2$$ , wherey=(y1, ...,yn)t andA(λ) is then×n matrix satisfying(gn, λ (t1), ...,gn, λ (tn))t=A (λ) y. We prove that there exist a sequence of minimizers $$\tilde \lambda = \tilde \lambda (n)$$ ofEV(λ), such that as the (regular) mesh{ti}i=1n becomes finer, $$\mathop {\lim }\limits_{n \to \infty } ER(\tilde \lambda )/\mathop {\min }\limits_\lambda ER(\lambda ) \downarrow 1$$ . A Monte Carlo experiment with several smoothg's was tried withm=2,n=50 and several values of σ2, and typical values of $$R(\hat \lambda )/\mathop {\min }\limits_\lambda R(\lambda )$$ were found to be in the range 1.01–1.4. The derivativeg′ ofg can be estimated by $$g'_{n,\hat \lambda } (t)$$ . In the Monte Carlo examples tried, the minimizer of $$R_D (\lambda ) = \frac{1}{n}\sum\limits_{j = 1}^n {(g'_{n,\lambda } (t_j ) - } g'(t_j ))$$ tended to be close to the minimizer ofR(λ), so that $$\hat \lambda$$ was also a good value of the smoothing parameter for estimating the derivative.

[1]  H. Davis Summation of Series. , 2014, Nature.

[2]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[3]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[4]  I J Schoenberg,et al.  SPLINE FUNCTIONS AND THE PROBLEM OF GRADUATION. , 1964, Proceedings of the National Academy of Sciences of the United States of America.

[5]  C. Reinsch Smoothing by spline functions , 1967 .

[6]  Michael Golomb,et al.  APPROXIMATION BY PERIODIC SPLINE INTERPOLANTS ON UNIFORM MESHES. , 1968 .

[7]  M. Schultz Error bounds for polynomial spline interpolation , 1970 .

[8]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[9]  D. Kershaw A Note on the Convergence of Interpolatory Cubic Splines , 1971 .

[10]  C. Reinsch Smoothing by spline functions. II , 1971 .

[11]  Svante Wold,et al.  Estimation of Activation Parameters from One Kinetic Experiment (Varytemp Method). Error Analysis and Revised Computer Program. , 1971 .

[12]  R. G. Krutchkoff Empirical Bayes Estimation , 1972 .

[13]  G. Wahba Convergence rates of certain approximate solutions to Fredholm integral equations of the first kind , 1973 .

[14]  C. L. Mallows Some comments on C_p , 1973 .

[15]  Bonnie Elizabeth Wiencke The roundoff : a mechanical analysis of a skillfully executed gymnastic stunt , 1973 .

[16]  Convergence Properties of the Method of Regularization for Noisy Linear Operation Equations. , 1973 .

[17]  H. T. Davis,et al.  Periodic Splines and Spectral Estimation , 1974 .

[18]  P. Bloomfield,et al.  Numerical differentiation procedures for non-exact data , 1974 .

[19]  P. Bloomfield,et al.  Spline Functions in Data Analysis. , 1974 .

[20]  P. Bloomfield,et al.  A Time Series Approach To Numerical Differentiation , 1974 .

[21]  G. Wahba,et al.  A completely automatic french curve: fitting spline functions by cross validation , 1975 .

[22]  G. Wahba,et al.  Periodic splines for spectral density estimation: the use of cross validation for determining the degree of smoothing , 1975 .

[23]  G. Wahba Practical Approximate Solutions to Linear Operator Equations When the Data are Noisy , 1977 .

[24]  G. Wahba Improper Priors, Spline Smoothing and the Problem of Guarding Against Model Errors in Regression , 1978 .