Learning to Screen

Imagine a large firm with multiple departments that plans a large recruitment. Candidates arrive one-by-one, and for each candidate the firm decides, based on her data (CV, skills, experience, etc), whether to summon her for an interview. The firm wants to recruit the best candidates while minimizing the number of interviews. We model such scenarios as an assignment problem between items (candidates) and categories (departments): the items arrive one-by-one in an online manner, and upon processing each item the algorithm decides, based on its value and the categories it can be matched with, whether to retain or discard it (this decision is irrevocable). The goal is to retain as few items as possible while guaranteeing that the set of retained items contains an optimal matching. We consider two variants of this problem: (i) in the first variant it is assumed that the $n$ items are drawn independently from an unknown distribution $D$. (ii) In the second variant it is assumed that before the process starts, the algorithm has an access to a training set of $n$ items drawn independently from the same unknown distribution (e.g.\ data of candidates from previous recruitment seasons). We give tight bounds on the minimum possible number of retained items in each of these variants. These results demonstrate that one can retain exponentially less items in the second variant (with the training set).

[1]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[2]  Shlomo Moran,et al.  Applications of Ramsey's theorem to decision tree complexity , 1985, JACM.

[3]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[4]  Thomas S. Ferguson,et al.  Who Solved the Secretary Problem , 1989 .

[5]  M. Talagrand Concentration of measure and isoperimetric inequalities in product spaces , 1994, math/9406212.

[6]  Gunnar Rätsch,et al.  Advanced lectures on machine learning : ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, Tübingen, Germany, August 4-16, 2003 : revised lectures , 2004 .

[7]  Nicole Immorlica,et al.  Matroids, secretary problems, and online mechanisms , 2007, SODA '07.

[8]  Jan Vondrák,et al.  A note on concentration of submodular functions , 2010, ArXiv.

[9]  Micha Sharir,et al.  Relative (p,ε)-Approximations in Geometry , 2011, Discret. Comput. Geom..

[10]  Mehryar Mohri,et al.  Tight Lower Bound on the Probability of a Binomial Exceeding its Expectation , 2013, ArXiv.

[11]  Aranyak Mehta,et al.  Online Matching and Ad Allocation , 2013, Found. Trends Theor. Comput. Sci..

[12]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[13]  R. Handel Probability in High Dimension , 2014 .

[14]  Tim Roughgarden,et al.  On the Pseudo-Dimension of Nearly Optimal Auctions , 2015, NIPS.

[15]  Shai Vardi,et al.  The Returning Secretary , 2015, STACS.

[16]  Tim Roughgarden,et al.  Learning Simple Auctions , 2016, COLT.

[17]  Marco Molinaro,et al.  How the Experts Algorithm Can Help Solve LPs Online , 2014, Math. Oper. Res..

[18]  Justin Hsu,et al.  Do prices coordinate markets? , 2015, SECO.

[19]  Ariel D. Procaccia,et al.  Opting Into Optimal Matchings , 2016, SODA.

[20]  Nisheeth K. Vishnoi,et al.  Multiwinner Voting with Fairness Constraints , 2017, IJCAI.

[21]  José R. Correa,et al.  Prophet Inequalities for Independent Random Variables from an Unknown Distribution , 2018, ArXiv.

[22]  Maria-Florina Balcan,et al.  A General Theory of Sample Complexity for Multi-Item Profit Maximization , 2017, EC.

[23]  Michal Feldman,et al.  Prophets and Secretaries with Overbooking , 2018, EC.

[24]  Nisheeth K. Vishnoi,et al.  Ranking with Fairness Constraints , 2017, ICALP.