Bundle Selling by Online Estimation of Valuation Functions

We consider the problem of online selection of a bundle of items when the cost of each item changes arbitrarily from round to round and the valuation function is initially unknown and revealed only through the noisy values of selected bundles (the bandit feedback setting). We are interested in learning schemes that have a small regret compared to an agent who knows the true valuation function. Since there are exponentially many bundles, further assumptions on the valuation functions are needed. We make the assumption that the valuation function is supermodular and has non-linear interactions that are of low degree in a certain sense. We develop efficient learning algorithms that balance exploration and exploitation to achieve low regret in this setting.

[1]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[2]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[3]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[4]  Maria-Florina Balcan,et al.  Learning submodular functions , 2010, STOC '11.

[5]  Baruch Awerbuch,et al.  Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.

[6]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[7]  Elad Hazan,et al.  Online submodular minimization , 2009, J. Mach. Learn. Res..

[8]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[9]  D. M. Topkis Supermodularity and Complementarity , 1998 .

[10]  Daphne Koller,et al.  Learning the Structure of Utility Functions , 1999 .

[11]  Daphne Koller,et al.  Learning an Agent's Utility Function by Observing Behavior , 2001, ICML.

[12]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[13]  Matthew J. Streeter,et al.  An Online Algorithm for Maximizing Submodular Functions , 2008, NIPS.

[14]  H. Vincent Poor,et al.  Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.

[15]  Elad Hazan,et al.  Beyond Convexity: Online Submodular Minimization , 2009, NIPS.

[16]  Thomas D. Nielsen,et al.  Learning a decision maker's utility function from (possibly) inconsistent behavior , 2004, Artif. Intell..

[17]  S. Thomas McCormick,et al.  Submodular Function Minimization , 2005 .