Handwritten Character Classification Using Nearest Neighbor in Large Databases

Shows that systems built on a simple statistical technique and a large training database can be automatically optimized to produce classification accuracies of 99% in the domain of handwritten digits. It is also shown that the performance of these systems scale consistently with the size of the training database, where the error rate is cut by more than half for every tenfold increase in the size of the training set from 10 to 100,000 examples. Three distance metrics for the standard nearest neighbor classification system are investigated: a simple Hamming distance metric, a pixel distance metric, and a metric based on the extraction of penstroke features. Systems employing these metrics were trained and tested on a standard, publicly available, database of nearly 225,000 digits provided by the National Institute of Standards and Technology. Additionally, a confidence metric is both introduced by the authors and also discovered and optimized by the system. The new confidence measure proves to be superior to the commonly used nearest neighbor distance. >

[1]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2]  A. ROSENFELD,et al.  Distance functions on digital pictures , 1968, Pattern Recognit..

[3]  Bernard Widrow,et al.  The "Rubber-Mask" Technique I. Pattern Measurement and Analysis , 1973 .

[4]  Jack Sklansky,et al.  On the Hough Technique for Curve Detection , 1978, IEEE Transactions on Computers.

[5]  P. Danielsson Euclidean distance mapping , 1980 .

[6]  David J. Burr,et al.  Elastic Matching of Line Drawings , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Theo Pavlidis,et al.  Algorithms for Graphics and Imag , 1983 .

[8]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[9]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Christopher G. Atkeson,et al.  Roles of Knowledge in Motor Learning , 1987 .

[11]  Guido Gerig,et al.  LINKING IMAGE-SPACE AND ACCUMULATOR-SPACE: A NEW APPROACH FOR OBJECT-RECOGNITION. , 1987 .

[12]  Ching Y. Suen,et al.  Computer Recognition of Totally unconstrained Handwritten ZIP Codes , 1987, Int. J. Pattern Recognit. Artif. Intell..

[13]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1989, ANLP.

[14]  Erkki Oja,et al.  A new curve detection method: Randomized Hough transform (RHT) , 1990, Pattern Recognit. Lett..

[15]  Gunilla Borgefors,et al.  Parallel distance transforms on pyramid machines: Theory and implementation , 1990, Signal Process..

[16]  Geoffrey E. Hinton,et al.  Adaptive Elastic Models for Hand-Printed Character Recognition , 1991, NIPS.

[17]  Lawrence Davis,et al.  A Hybrid Genetic Algorithm for Classification , 1991, IJCAI.

[18]  W. Eric L. Grimson,et al.  On the Verification of Hypothesized Matches in Model-Based Recognition , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Lawrence Davis,et al.  Hybridizing the Genetic Algorithm and the K Nearest Neighbors Classification Algorithm , 1991, ICGA.

[20]  David L. Waltz,et al.  Trading MIPS and memory for knowledge engineering , 1992, CACM.

[21]  Patrick J. Grother,et al.  The First Census Optical Character Recognition Systems Conference | NIST , 1992 .

[22]  V. F. Leavers,et al.  Which Hough transform , 1993 .

[23]  Antti Ylä-Jääski Contributions to a 3-D robot vision system: grouping from sparse and incomplete data , 1993 .