On the Learnability and Design of Output Codes for Multiclass Problems

Output coding is a general framework for solving multiclass categorization problems. Previous research on output codes has focused on building multiclass machines given predefined output codes. In this paper we discuss for the first time the problem of designing output codes for multiclass problems. For the design problem of discrete codes, which have been used extensively in previous works, we present mostly negative results. We then introduce the notion of continuous codes and cast the design problem of continuous codes as a constrained optimization problem. We describe three optimization problems corresponding to three different norms of the code matrix. Interestingly, for the l2 norm our formalism results in a quadratic program whose dual does not depend on the length of the code. A special case of our formalism provides a multiclass scheme for building support vector machines which can be solved efficiently. We give a time and space efficient algorithm for solving the quadratic program. We describe preliminary experiments with synthetic data show that our algorithm is often two orders of magnitude faster than standard quadratic programming packages. We conclude with the generalization properties of the algorithm.

[1]  G. R. Walsh,et al.  Methods Of Optimization , 1976 .

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[4]  R. Fletcher Practical Methods of Optimization , 1988 .

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[7]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[8]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[9]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[10]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[11]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[12]  David W. AhaNavy Cloud Classiication Using Error-correcting Output Codes , 1996 .

[13]  D W Aha,et al.  CLOUD CLASSIFICATION USING ERRORCORRECTING OUTPUT CODES , 1997 .

[14]  Thomas G. Dietterich,et al.  Achieving High-Accuracy Text-to-Speech with Machine Learning , 1997 .

[15]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[16]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[17]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[18]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[19]  Trevor Hastie,et al.  The Error Coding Method and PICTs , 1998 .

[20]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[21]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[22]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[23]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[24]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[25]  Adam L. Berger,et al.  ERROR-CORRECTING OUTPUT CODING FOR TEXT CLASSIFICATION , 1999 .

[26]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[27]  L. Chambers Practical methods of optimization (2nd edn) , by R. Fletcher. Pp. 436. £34.95. 2000. ISBN 0 471 49463 1 (Wiley). , 2001, The Mathematical Gazette.

[28]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[29]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .