Similarity computing model of high dimension data for symptom classification of Chinese traditional medicine

In recent years, researchers have paid more and more attention on data mining of practical applications. Aimed to the problem of symptom classification of Chinese traditional medicine, this paper proposes a novel computing model based on the similarities among attributes of high dimension data to compute the similarity between any tuples. This model assumes data attributes as basic vectors of m dimensions and each tuple as a sum vector of all the attribute-vectors. Based on the transcendental concept similarity information among attributes, it suggests a novel distance algorithm to compute the similarity distance of any pair of attribute-vectors. In this method, the computing of similarity between any tuples are turned to the formulas of attribute-vectors and their projections of each other, and the similarity between any pair of tuples can be worked out by computing these vectors and formulas. This paper also presents a novel classification algorithm based on the similarity computing model and successfully applies the algorithm into the symptom classification of Chinese traditional medicine. The efficiency of the algorithm is proved by extensive experiments.

[1]  Shankar M. Krishnan,et al.  Neural network classification of homomorphic segmented heart sounds , 2007, Appl. Soft Comput..

[2]  Jonathan Goldstein,et al.  Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches , 2000, VLDB.

[3]  Hakan Altinçay,et al.  Ensembling evidential k-nearest neighbor classifiers through multi-modal perturbation , 2007, Appl. Soft Comput..

[4]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[5]  Ya-Ju Fan,et al.  Support feature machine for classification of abnormal brain activity , 2007, KDD '07.

[6]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[7]  Charu C. Aggarwal Hierarchical subspace sampling: a unified framework for high dimensional data reduction, selectivity estimation and nearest neighbor search , 2002, SIGMOD '02.

[8]  S. Baum,et al.  Intro , 2003, Science.

[9]  Robert D. Nowak,et al.  Minimax-optimal classification with dyadic decision trees , 2006, IEEE Transactions on Information Theory.

[10]  Enrique Vidal,et al.  Learning weighted metrics to minimize nearest-neighbor classification error , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Hae-Chang Rim,et al.  Some Effective Techniques for Naive Bayes Text Classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Satarupa Banerjee,et al.  Text classification: A least square support vector machine approach , 2007, Appl. Soft Comput..

[13]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[14]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[15]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[17]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[18]  Hans-Peter Kriegel,et al.  VGM: visual graph mining , 2006, SIGMOD Conference.

[19]  Yi Lu Murphey,et al.  Multi-class pattern classification using neural networks , 2007, Pattern Recognit..

[20]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Olvi L. Mangasarian,et al.  Multisurface proximal support vector machine classification via generalized eigenvalues , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Walid G. Aref,et al.  Casper*: Query processing for location services without compromising privacy , 2006, TODS.

[23]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .