Safety-aware Graph-based Semi-Supervised Learning

Abstract In machine learning field, Graph-based Semi-Supervised Learning (GSSL) has recently attracted much attention and many researchers have proposed a number of different methods. GSSL generally constructs a k nearest neighbors graph to explore manifold structure which may improve learning performance of GSSL. If one uses an inappropriate graph to learn a semi-supervised classifier, the performance of the classifier may be worse than that of supervised learning (SL) only trained by labeled samples. Hence, it is worthy to design a safe version to broaden the application area of GSSL. In this paper, we introduce a Safety-aware GSSL (SaGSSL) method which can adaptively select the good graphs and learn a safe semi-supervised classifier simultaneously. The basic assumption is that a graph has a high quality if the sample margin obtained by GSSL with the graph is larger than that obtained by SL. By identifying the high-quality graphs and setting the corresponding weights large, the predictions of our algorithm will approach to those of GSSL with the graphs. Meanwhile, the weights of the low-quality graphs should be small and the predictions of our algorithm will be close to those of SL. Hence the degeneration probability will be reduced and our algorithm is expected to realize the goal of safe exploitation of different graphs. Experimental results on several datasets show that our algorithm can simultaneously implement the graph selection and safely exploit the unlabeled samples.

[1]  Zhi-Hua Zhou,et al.  Learning Safe Prediction for Semi-Supervised Regression , 2017, AAAI.

[2]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[3]  Friedhelm Schwenker,et al.  Semi-supervised Learning , 2013, Handbook on Neural Information Processing.

[4]  A. Salman Avestimehr,et al.  A Sampling Theory Perspective of Graph-Based Semi-Supervised Learning , 2017, IEEE Transactions on Information Theory.

[5]  Yu-Lin He,et al.  Fuzziness based semi-supervised learning approach for intrusion detection system , 2017, Inf. Sci..

[6]  Nong Sang,et al.  Manifold regularized semi-supervised Gaussian mixture model. , 2015, Journal of the Optical Society of America. A, Optics, image science, and vision.

[7]  Guoyin Wang,et al.  Self-training semi-supervised classification based on density peaks of data , 2018, Neurocomputing.

[8]  Yan Meng,et al.  Towards Safe Semi-supervised Classification: Adjusted Cluster Assumption via Clustering , 2017, Neural Processing Letters.

[9]  Masayuki Numao,et al.  Evolutionary Distance Metric Learning Approach to Semi-supervised Clustering with Neighbor Relations , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[10]  Robert D. Nowak,et al.  Unlabeled data: Now it helps, now it doesn't , 2008, NIPS.

[11]  Eric Bair,et al.  Semi‐supervised clustering methods , 2013, Wiley interdisciplinary reviews. Computational statistics.

[12]  Carey E. Priebe,et al.  The Effect of Model Misspecification on Semi-Supervised Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  Shifei Ding,et al.  An overview on semi-supervised support vector machine , 2017, Neural Computing and Applications.

[15]  Naonori Ueda,et al.  A Hybrid Generative/Discriminative Classifier Design for Semi-supervised Learing , 2006 .

[16]  Shuicheng Yan,et al.  Semi-supervised Learning by Sparse Representation , 2009, SDM.

[17]  Alan L. Yuille,et al.  Semi-Supervised Sparse Representation Based Classification for Face Recognition With Insufficient Labeled Samples , 2016, IEEE Transactions on Image Processing.

[18]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[19]  Songcan Chen,et al.  Safety-Aware Semi-Supervised Classification , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Wang Zhan,et al.  Inductive Semi-supervised Multi-Label Learning with Co-Training , 2017, KDD.

[21]  Zhi-Hua Zhou,et al.  Graph Quality Judgement: A Large Margin Expedition , 2016, IJCAI.

[22]  Zihan Zhou,et al.  Label Information Guided Graph Construction for Semi-Supervised Learning , 2017, IEEE Transactions on Image Processing.

[23]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[24]  Qingshan She,et al.  A risk degree-based safe semi-supervised learning algorithm , 2016, Int. J. Mach. Learn. Cybern..

[25]  Zhi-Hua Zhou,et al.  Improving Semi-Supervised Support Vector Machines Through Unlabeled Instances Selection , 2010, AAAI.

[26]  Hai Wang,et al.  Instance Selection Method for Improving Graph-Based Semi-supervised Learning , 2016, PRICAI.

[27]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[28]  Zhi-Hua Zhou,et al.  On the doubt about margin explanation of boosting , 2010, Artif. Intell..

[29]  Mikhail Belkin,et al.  Semi-Supervised Learning , 2021, Machine Learning.

[30]  Zhi-Hua Zhou,et al.  Semi-supervised learning using label mean , 2009, ICML '09.

[31]  Yao Sun,et al.  Towards designing risk-based safe Laplacian Regularized Least Squares , 2016, Expert Syst. Appl..

[32]  Fan Yang,et al.  Good Semi-supervised Learning That Requires a Bad GAN , 2017, NIPS.

[33]  Kathrin Klamroth,et al.  Biconvex sets and optimization with biconvex functions: a survey and extensions , 2007, Math. Methods Oper. Res..

[34]  Zhi-Hua Zhou,et al.  Large Margin Distribution Learning with Cost Interval and Unlabeled Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[35]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[36]  Rómer Rosales,et al.  Comparing Clustering with Pairwise and Relative Constraints , 2016, ACM Trans. Knowl. Discov. Data.

[37]  Pengjiang Qian,et al.  Affinity and Penalty Jointly Constrained Spectral Clustering With All-Compatibility, Flexibility, and Robustness , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Nong Sang,et al.  Using clustering analysis to improve semi-supervised classification , 2013, Neurocomputing.

[39]  Jun'ichi Takeuchi,et al.  Safe semi-supervised learning based on weighted likelihood , 2014, Neural Networks.

[40]  Qi Huang,et al.  Semi-supervised fuzzy clustering with metric learning and entropy regularization , 2012, Knowl. Based Syst..

[41]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[42]  Masayuki Numao,et al.  Kernelized Evolutionary Distance Metric Learning for Semi-Supervised Clustering , 2017, AAAI.

[43]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.