A Note on the Behavior of Majority Voting in Multi-Class Domains with Biased Annotators

Majority voting is a popular and robust strategy to aggregate different opinions in learning from crowds, where each worker labels examples according to their own criteria. Although it has been extensively studied in the binary case, its behavior with multiple classes is not completely clear, specifically when annotations are biased. This paper attempts to fill that gap. The behavior of the majority voting strategy is studied in-depth in multi-class domains, emphasizing the effect of annotation bias. By means of a complete experimental setting, we show the limitations of the standard majority voting strategy. The use of three simple techniques that infer global information from the annotations and annotators allows us to put the performance of the majority voting strategy in context.

[1]  Iñaki Inza,et al.  Measuring the class-imbalance extent of multi-class problems , 2017, Pattern Recognit. Lett..

[2]  Xindong Wu,et al.  Multi-Class Ground Truth Inference in Crowdsourcing with Clustering , 2016, IEEE Transactions on Knowledge and Data Engineering.

[3]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[4]  Iñaki Inza,et al.  Weak supervision and other non-standard classification problems: A taxonomy , 2016, Pattern Recognit. Lett..

[5]  Xindong Wu,et al.  Learning from crowdsourced labeled data: a survey , 2016, Artificial Intelligence Review.

[6]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[7]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[8]  Ohad Shamir,et al.  Vox Populi: Collecting High-Quality Labels from a Crowd , 2009, COLT.

[9]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Xindong Wu,et al.  Imbalanced Multiple Noisy Labeling , 2015, IEEE Transactions on Knowledge and Data Engineering.

[12]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[13]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  Iñaki Inza,et al.  Multidimensional Learning from Crowds: Usefulness and Application of Expertise Detection , 2015, Int. J. Intell. Syst..

[15]  Xin Yao,et al.  Multiclass Imbalance Problems: Analysis and Potential Solutions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .