Adaptive imputation of missing values for incomplete pattern classification

In classification of incomplete pattern, the missing values can either play a crucial role in the class determination, or have only little influence (or eventually none) on the classification results according to the context. We propose a credal classification method for incomplete pattern with adaptive imputation of missing values based on belief function theory. At first, we try to classify the object (incomplete pattern) based only on the available attribute values. As underlying principle, we assume that the missing information is not crucial for the classification if a specific class for the object can be found using only the available information. In this case, the object is committed to this particular class. However, if the object cannot be classified without ambiguity, it means that the missing values play a main role for achieving an accurate classification. In this case, the missing values will be imputed based on the K-nearest neighbor (K-NN) and Self-Organizing Map (SOM) techniques, and the edited pattern with the imputation is then classified. The (original or edited) pattern is classified according to each training class, and the classification results represented by basic belief assignments are fused with proper combination rules for making the credal classification. The object is allowed to belong with different masses of belief to the specific classes and meta-classes (which are particular disjunctions of several single classes). The credal classification captures well the uncertainty and imprecision of classification, and reduces effectively the rate of misclassifications thanks to the introduction of meta-classes. The effectiveness of the proposed method with respect to other classical methods is demonstrated based on several experiments using artificial and real data sets. HighlightsMissing values are adaptively imputed in classification according to context.SOM and K-NN are used for the imputation with admissible computation burden.Ensemble classifier is introduced for credal classification.The imprecision of classification can be well captured using belief functions.The proposed method has been tested by artificial and real data sets.

[1]  Seymour Geisser,et al.  8. Predictive Inference: An Introduction , 1995 .

[2]  Johan A. K. Suykens,et al.  Handling missing values in support vector machine classifiers , 2005, Neural Networks.

[3]  Chongzhao Han,et al.  Sequential weighted combination for unreliable evidence based on evidence variance , 2013, Decis. Support Syst..

[4]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[5]  Quan Pan,et al.  Combination of sources of evidence with different discounting factors based on a new dissimilarity measure , 2011, Decis. Support Syst..

[6]  Thierry Denoeux,et al.  An evidence-theoretic k-NN rule with parameter optimization , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[7]  Henri Prade,et al.  Representation and combination of uncertainty with belief functions and possibility measures , 1988, Comput. Intell..

[8]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[9]  Lukasz A. Kurgan,et al.  Impact of imputation of missing values on classification error for discrete data , 2008, Pattern Recognit..

[10]  Driss Aboutajdine,et al.  Support vector regression of membership functions and belief functions - Application for pattern recognition , 2010, Inf. Fusion.

[11]  Quan Pan,et al.  Median evidential c-means algorithm and its application to community detection , 2015, Knowl. Based Syst..

[12]  Quan Pan,et al.  Belief C-Means: An extension of Fuzzy C-Means algorithm in belief functions framework , 2012, Pattern Recognit. Lett..

[13]  Lotfi A. Zadeh,et al.  On the Validity of Dempster''s Rule of Combination of Evidence , 1979 .

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  Jean Dezert,et al.  On the Estimation of Mass Functions Using Self Organizing Maps , 2014, Belief Functions.

[16]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[17]  M. Lawera Predictive inference : an introduction , 1995 .

[18]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[19]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[20]  Thierry Denoeux,et al.  Maximum Likelihood Estimation from Uncertain Data in the Belief Function Framework , 2013, IEEE Transactions on Knowledge and Data Engineering.

[21]  Jean Dezert,et al.  On the Validity of Dempster's Fusion Rule and its Interpretation as a Generalization of Bayesian Fusion Rule , 2014, Int. J. Intell. Syst..

[22]  Thierry Denoeux,et al.  A neural network classifier based on Dempster-Shafer theory , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[23]  David G. Stork,et al.  Pattern Classification , 1973 .

[24]  Quan Pan,et al.  Credal classification rule for uncertain data based on belief functions , 2014, Pattern Recognit..

[25]  Florentin Smarandache,et al.  Advances and Applications of DSmT for Information Fusion , 2004 .

[26]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[27]  Xinde Li,et al.  Evidence supporting measure of similarity for reducing the complexity in information fusion , 2011, Inf. Sci..

[28]  Philippe Smets,et al.  Classification Using Belief Functions: Relationship Between Case-Based and Model-Based Approaches , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[29]  Florentin Smarandache,et al.  Advances and Applications of DSmT for Information Fusion (Collected Works) , 2004 .

[30]  Sophie Midenet,et al.  Self-Organising Map for Data Imputation and Correction in Surveys , 2002, Neural Computing & Applications.

[31]  Gustavo E. A. P. A. Batista,et al.  A Study of K-Nearest Neighbour as an Imputation Method , 2002, HIS.

[32]  Daniel J. Mundfrom,et al.  Imputing Missing Values: The Effect on the Accuracy of Classification , 1998 .

[33]  Sankaran Mahadevan,et al.  Parameter estimation based on interval-valued belief structures , 2014, Eur. J. Oper. Res..

[34]  Jitender S. Deogun,et al.  Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method , 2004, Rough Sets and Current Trends in Computing.

[35]  Quan Pan,et al.  A New Incomplete Pattern Classification Method Based on Evidential Reasoning , 2015, IEEE Transactions on Cybernetics.

[36]  J. Dezert,et al.  Information fusion based on new proportional conflict redistribution rules , 2005, 2005 7th International Conference on Information Fusion.

[37]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[38]  Thierry Denoeux,et al.  ECM: An evidential version of the fuzzy c , 2008, Pattern Recognit..

[39]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[40]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[41]  Éloi Bossé,et al.  Measuring ambiguity in the evidence theory , 2006, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[42]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[43]  Quan Pan,et al.  A new belief-based K-nearest neighbor classification method , 2013, Pattern Recognit..

[44]  Thierry Denoeux,et al.  A k-nearest neighbor classification rule based on Dempster-Shafer theory , 1995, IEEE Trans. Syst. Man Cybern..

[45]  Jean Dezert,et al.  Credal c-means clustering method based on belief functions , 2015, Knowl. Based Syst..

[46]  Peter K. Sharpe,et al.  Dealing with missing values in neural network-based diagnostic systems , 1995, Neural Computing & Applications.

[47]  Philippe Smets,et al.  The Combination of Evidence in the Transferable Belief Model , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[48]  Sankaran Mahadevan,et al.  A new decision-making method by incomplete preferences based on evidence distance , 2014, Knowl. Based Syst..

[49]  Francisco Herrera,et al.  Missing data imputation for fuzzy rule-based classification systems , 2012, Soft Comput..