Learning from Proportions of Positive and Unlabeled Examples

Weakly supervised classification tries to learn from data sets which are not certainly labeled. Many problems, with different natures of partial labeling, fit this description. In this paper, the novel problem of learning from positive‐unlabeled proportions is presented. The provided examples are unlabeled, and the only class information available consists of the proportions of positive and unlabeled examples in different subsets of the training data set. We present a methodology that adapts to the different levels of class uncertainty to learn Bayesian network classifiers using an expectation‐maximization strategy. It has been tested in a variety of artificial scenarios with different class uncertainty, as well as compared with two naive strategies that do not consider all the available class information. Finally, it has also been successfully tested in real data, collected from the embryo selection problem in assisted reproduction.

[1]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[2]  Carsten Riggelsen,et al.  Learning Bayesian Networks from Incomplete Data: An Efficient Method for Generating Approximate Predictive Distributions , 2006, SDM.

[3]  Pedro Larrañaga,et al.  Bioinformatics Advance Access published August 24, 2007 A review of feature selection techniques in bioinformatics , 2022 .

[4]  Iñaki Inza,et al.  Dealing with the evaluation of supervised classification algorithms , 2015, Artificial Intelligence Review.

[5]  B Giraudeau,et al.  Limited value of morphological assessment at days 1 and 2 to predict blastocyst development potential: a prospective study based on 4042 embryos. , 2007, Human reproduction.

[6]  David R. Musicant,et al.  Supervised Learning by Training on Aggregate Outputs , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[7]  J D Fisch,et al.  The Graduated Embryo Score (GES) predicts blastocyst formation and pregnancy rate from cleavage-stage embryos. , 2001, Human reproduction.

[8]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  D. Barad,et al.  The relative myth of elective single embryo transfer. , 2006, Human reproduction.

[11]  T. Ebner,et al.  Selection based on morphological assessment of oocytes and embryos at different stages of preimplantation development: a review. , 2003, Human reproduction update.

[12]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[13]  Doug Fisher,et al.  Learning from Data: Artificial Intelligence and Statistics V , 1996 .

[14]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[15]  Bernhard Pfahringer,et al.  A Two-Level Learning Method for Generalized Multi-instance Problems , 2003, ECML.

[16]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[17]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[18]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[19]  Concha Bielza,et al.  Discrete Bayesian Network Classifiers , 2014, ACM Comput. Surv..

[20]  Ben Taskar,et al.  Learning from Partial Labels , 2011, J. Mach. Learn. Res..

[21]  Stephen P. Brooks,et al.  Markov chain Monte Carlo method and its application , 1998 .

[22]  A. Revel,et al.  Endometrial receptivity markers, the journey to successful embryo implantation. , 2006, Human reproduction update.

[23]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[24]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[25]  A. J. Feelders,et al.  Learning Bayesian Network Models from Incomplete Data using Importance Sampling , 2005, AISTATS.

[26]  Iñaki Inza,et al.  Weak supervision and other non-standard classification problems: A taxonomy , 2016, Pattern Recognit. Lett..

[27]  Iñaki Inza,et al.  Learning Bayesian network classifiers from label proportions , 2013, Pattern Recognit..

[28]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[29]  Paola Sebastiani,et al.  Parameter Estimation in Bayesian Networks from Incomplete Databases , 1998, Intell. Data Anal..

[30]  J. Parinaud,et al.  Clinical and biological parameters influencing implantation: score to determine number of embryos to transfer. , 2006, Reproductive biomedicine online.

[31]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[32]  Iñaki Inza,et al.  A Novel Weakly Supervised Problem: Learning from Positive-Unlabeled Proportions , 2015, CAEPIA.

[33]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[34]  Man Leung Wong,et al.  Learning Bayesian networks from incomplete databases using a novel evolutionary algorithm , 2008, Decis. Support Syst..

[35]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[36]  L. Nieddu,et al.  Pattern recognition methods in human‐assisted reproduction , 2004 .

[37]  Pedro Larrañaga,et al.  An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering , 2000, Pattern Recognit. Lett..

[38]  Alexander J. Smola,et al.  Estimating labels from label proportions , 2008, ICML '08.

[39]  Pedro Larrañaga,et al.  Wrapper positive Bayesian network classifiers , 2012, Knowledge and Information Systems.