Learning Algorithm for Enclosing Points in Bregmanian Spheres

We discuss the problem of finding a generalized sphere that encloses points originating from a single source. The points contained in such a sphere are within a maximal divergence from a center point. The divergences we study are known as the Bregman divergences which include as a special case both the Euclidean distance and the relative entropy. We cast the learning task as an optimization problem and show that it results in a simple dual form which has interesting algebraic properties. We then discuss a general algorithmic framework to solve the optimization problem. Our training algorithm employs an auxiliary function that bounds the dual’s objective function and can be used with a broad class of Bregman functions. As a specific application of the algorithm we give a detailed derivation for the relative entropy. We analyze the generalization ability of the algorithm by adopting margin-style proof techniques. We also describe and analyze two schemes of online algorithms for the case when the radius of the sphere is set in advance.

[1]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[2]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[3]  Manfred K. Warmuth,et al.  Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.

[4]  Daniel D. Lee,et al.  Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines , 2002, NIPS.

[5]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Scalable Comput. Pract. Exp..

[6]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[7]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[8]  R. Fletcher Practical Methods of Optimization , 1988 .

[9]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[10]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[11]  Claudio Gentile,et al.  Linear Hinge Loss and Average Margin , 1998, NIPS.

[12]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[13]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[16]  Robert P. W. Duin,et al.  Data domain description using support vectors , 1999, ESANN.

[17]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.