Label-Driven Learning Framework: Towards More Accurate Bayesian Network Classifiers through Discrimination of High-Confidence Labels

Bayesian network classifiers (BNCs) have demonstrated competitive classification accuracy in a variety of real-world applications. However, it is error-prone for BNCs to discriminate among high-confidence labels. To address this issue, we propose the label-driven learning framework, which incorporates instance-based learning and ensemble learning. For each testing instance, high-confidence labels are first selected by a generalist classifier, e.g., the tree-augmented naive Bayes (TAN) classifier. Then, by focusing on these labels, conditional mutual information is redefined to more precisely measure mutual dependence between attributes, thus leading to a refined generalist with a more reasonable network structure. To enable finer discrimination, an expert classifier is tailored for each high-confidence label. Finally, the predictions of the refined generalist and the experts are aggregated. We extend TAN to LTAN (Label-driven TAN) by applying the proposed framework. Extensive experimental results demonstrate that LTAN delivers superior classification accuracy to not only several state-of-the-art single-structure BNCs but also some established ensemble BNCs at the expense of reasonable computation overhead.

[1]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[2]  Geoffrey I. Webb,et al.  Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning , 2012, Machine Learning.

[3]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[4]  Hui Liu,et al.  A new hybrid method for learning bayesian networks: Separation and reunion , 2017, Knowl. Based Syst..

[5]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[6]  Zygmunt Hasiewicz,et al.  Risk upper bound for a NM-type multiresolution classification scheme of random signals by Daubechies wavelets , 2017, Eng. Appl. Artif. Intell..

[7]  Hui Yang,et al.  Interpreting out-of-control signals using instance-based bayesian classifier in multivariate statistical process control , 2017, Commun. Stat. Simul. Comput..

[8]  Giuliano Grossi,et al.  Robust Face Recognition Providing the Identity and Its Reliability Degree Combining Sparse Representation and Multiple Features , 2016, Int. J. Pattern Recognit. Artif. Intell..

[9]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[10]  Geoffrey I. Webb,et al.  Selective AnDE for large data learning: a low-bias memory constrained approach , 2017, Knowledge and Information Systems.

[11]  Yong Fan,et al.  Feature selection by optimizing a lower bound of conditional mutual information , 2017, Inf. Sci..

[12]  Geoffrey I. Webb,et al.  Scalable Learning of Bayesian Network Classifiers , 2016, J. Mach. Learn. Res..

[13]  Terry Windeatt,et al.  Pruning of Error Correcting Output Codes by optimization of accuracy–diversity trade off , 2014, Machine Learning.

[14]  R. Prim Shortest connection networks and some generalizations , 1957 .

[15]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[16]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[17]  Shuicheng Yan,et al.  Age Estimation via Grouping and Decision Fusion , 2015, IEEE Transactions on Information Forensics and Security.

[18]  Subhadip Basu,et al.  Handwritten Bangla character recognition using a soft computing paradigm embedded in two pass approach , 2015, Pattern Recognit..

[19]  Sunita Sarawagi,et al.  Scaling multi-class support vector machines using inter-class confusion , 2002, KDD.

[20]  James Cussens,et al.  Integer Linear Programming for the Bayesian network structure learning problem , 2017, Artif. Intell..

[21]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[22]  孙铭会,et al.  General and Local: Averaged k-Dependence Bayesian Classifiers. , 2015 .

[23]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[24]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[25]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[26]  Jukka Corander,et al.  The role of local partial independence in learning of Bayesian networks , 2016, Int. J. Approx. Reason..

[27]  Concha Bielza,et al.  Discrete Bayesian Network Classifiers , 2014, ACM Comput. Surv..

[28]  Kaizhu Huang,et al.  Discriminative training of Bayesian Chow-Liu multinet classifiers , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[29]  Geoffrey I. Webb,et al.  Alleviating naive Bayes attribute independence assumption by attribute weighting , 2013, J. Mach. Learn. Res..

[30]  Juan José Rodríguez Diez,et al.  Tree ensemble construction using a GRASP-based heuristic and annealed randomness , 2014, Inf. Fusion.

[31]  David Heckerman,et al.  Knowledge Representation and Inference in Similarity Networks and Bayesian Multinets , 1996, Artif. Intell..

[32]  R. E. Lee,et al.  Distribution-free multiple comparisons between successive treatments , 1995 .

[33]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[34]  Liangxiao Jiang,et al.  Improving Tree augmented Naive Bayes for class probability estimation , 2012, Knowl. Based Syst..