论文信息 - Constant Time EXPected Similarity Estimation using Stochastic Optimization

Constant Time EXPected Similarity Estimation using Stochastic Optimization

A new algorithm named EXPected Similarity Estimation (EXPoSE) was recently proposed to solve the problem of large-scale anomaly detection. It is a non-parametric and distribution free kernel method based on the Hilbert space embedding of probability measures. Given a dataset of $n$ samples, EXPoSE needs only $\mathcal{O}(n)$ (linear time) to build a model and $\mathcal{O}(1)$ (constant time) to make a prediction. In this work we improve the linear computational complexity and show that an $\epsilon$-accurate model can be estimated in constant time, which has significant implications for large-scale learning problems. To achieve this goal, we cast the original EXPoSE formulation into a stochastic optimization problem. It is crucial that this approach allows us to determine the number of iteration based on a desired accuracy $\epsilon$, independent of the dataset size $n$. We will show that the proposed stochastic gradient descent algorithm works in general (possible infinite-dimensional) Hilbert spaces, is easy to implement and requires no additional step-size parameters.

[1] Le Song,et al. A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[2] S. Shalev-Shwartz,et al. Stochastic methods for {\it l}$_{\mbox{1}}$ regularized loss minimization , 2009, ICML 2009.

[3] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[4] Y. Nesterov,et al. Primal-dual subgradient methods for minimizing uniformly convex functions , 2010, 1401.1792.

[5] VARUN CHANDOLA,et al. Anomaly detection: A survey , 2009, CSUR.

[6] Bernhard Schölkopf,et al. Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[7] Harish Karnick,et al. Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[8] Andrew Zisserman,et al. Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[9] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[10] J. Peypouquet. Convex Optimization in Normed Spaces: Theory, Methods and Examples , 2015 .

[11] AI Koan. Weighted Sums of Random Kitchen Sinks : Replacing minimization with randomization in learning , 2008 .

[12] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.

[13] Martin J. Wainwright,et al. Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.

[14] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[15] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[16] Andrew Zisserman,et al. Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17] Francis R. Bach,et al. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..

[18] Markus Schneider,et al. Expected similarity estimation for large scale anomaly detection , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[19] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[20] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[21] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[22] H. Robbins. A Stochastic Approximation Method , 1951 .

[23] Martin J. Wainwright,et al. Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.

[24] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[25] Cristian Sminchisescu,et al. Random Fourier Approximations for Skewed Multiplicative Histogram Kernels , 2010, DAGM-Symposium.

[26] Kenji Fukumizu,et al. Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[27] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.