Fisher Kernels and Semidefinite Programming

In this lecture, we introduce Fisher kernels to continue the discussion of kernels based on probability models. Switching topics, an introduction to semidefinite programming is motivated by an interest in combining multiple kernels. In statistics, the score function is the first derivative of the log likelihood, M = {P θ (x) : θ ∈ Θ}. To calculate the score, fix the parameters θ = θ 0 , and take the gradient of the log likelihood with respect to the model parameters, g(θ 0 , x) = ∂ log p θ 0 (x) ∂θ , where x is a specific value of the observed data. Note that the score is a vector, one component for each parameter, and the log likelihood is a function of a single data point x at a fixed parameter θ 0. The score vector captures the change to the fixed parameter to accommodate the new data point x. For each x, the score (the change in parameters to accommodate that x) is generally different. The Fisher information matrix (I) is the expectation of the negative second derivative of the log likelihood, also known as the (negative) Hessian matrix. It can be computed using the outer product of the score function: I. = E θ 0 [g(θ 0 , x), g(θ 0 , x) T ], where in the discrete case the expectation is a sum: I = x∈X p θ 0 (x)g(θ 0 , x)g(θ 0 , x) T .

[1]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[2]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[3]  Kiyoshi Asai,et al.  Marginalized kernels for biological sequences , 2002, ISMB.

[4]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .