A Novel Weakly Supervised Problem: Learning from Positive-Unlabeled Proportions

Standard supervised classification learns a classifier from a set of labeled examples. Alternatively, in the field of weakly supervised classification different frameworks have been presented where the training data cannot be certainly labeled. In this paper, the novel problem of learning from positive-unlabeled proportions is presented. The provided examples are unlabeled and the only class information available consists of the proportions of positive and unlabeled examples in different subsets of the training dataset. An expectation-maximization method that learns Bayesian network classifiers from this kind of data is proposed. A set of experiments has been designed with the objective of shedding light on the capability of learning from this kind of data throughout different scenarios of increasing complexity.

[1]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[2]  L. Nieddu,et al.  Pattern recognition methods in human‐assisted reproduction , 2004 .

[3]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[4]  Alexander J. Smola,et al.  Estimating labels from label proportions , 2008, ICML '08.

[5]  T. Ebner,et al.  Selection based on morphological assessment of oocytes and embryos at different stages of preimplantation development: a review. , 2003, Human reproduction update.

[6]  Bernhard Pfahringer,et al.  A Two-Level Learning Method for Generalized Multi-instance Problems , 2003, ECML.

[7]  Stephen P. Brooks,et al.  Markov chain Monte Carlo method and its application , 1998 .

[8]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[9]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[10]  Pedro Larrañaga,et al.  Learning Bayesian classifiers from positive and unlabeled examples , 2007, Pattern Recognit. Lett..

[11]  David Heckerman,et al.  Learning With Bayesian Networks (Abstract) , 1995, ICML.

[12]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[13]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[14]  David R. Musicant,et al.  Supervised Learning by Training on Aggregate Outputs , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[15]  A. Revel,et al.  Endometrial receptivity markers, the journey to successful embryo implantation. , 2006, Human reproduction update.

[16]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[17]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[18]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[19]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .