Dueling Convex Optimization

We address the problem of convex optimization with preference (dueling) feedback. Like the traditional optimization objective, the goal is to find the optimal point with the least possible query complexity, however, without the luxury of even a zeroth order feedback. Instead, the learner can only observe a single noisy bit which is win-loss feedback for a pair of queried points based on their function values. The problem is certainly of great practical relevance as in many real-world scenarios, such as recommender systems or learning from customer preferences, where the system feedback is often restricted to just one binary-bit preference information. We consider the problem of online convex optimization (OCO) solely by actively querying {0, 1} noisy-comparison feedback of decision point pairs, with the objective of finding a near-optimal point (function minimizer) with the least possible number of queries. For the non-stationary OCO setup, where the underlying convex function may change over time, we prove an impossibility result towards achieving the above objective. We next focus only on the stationary OCO problem, and our main contribution lies in designing a normalized gradient descent based algorithm towards finding a -best optimal point. Towards this, our algorithm is shown to yield a convergence rate of Õ(dβ/ ν) (ν being the noise parameter) when the underlying function is β-smooth. Further we show an improved convergence rate of just Õ(dβ/αν log 1 ) when the function is additionally also α-strongly convex.

[1]  Amin Karbasi,et al.  Projection-Free Bandit Convex Optimization , 2018, AISTATS.

[2]  Ohad Shamir,et al.  An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback , 2015, J. Mach. Learn. Res..

[3]  Yin Tat Lee,et al.  Kernel-based methods for bandit convex optimization , 2016, STOC.

[4]  Thorsten Joachims,et al.  Reducing Dueling Bandits to Cardinal Bandits , 2014, ICML.

[5]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[6]  Anit Kumar Sahu,et al.  Towards Gradient Free and Projection Free Stochastic Optimization , 2018, AISTATS.

[7]  Eyke Hüllermeier,et al.  A Survey of Preference-Based Online Learning with Bandit Algorithms , 2014, ALT.

[8]  Robert D. Nowak,et al.  Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[9]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[10]  Hiroshi Nakagawa,et al.  Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem , 2015, COLT.

[11]  Wataru Kumagai Regret Analysis for Continuous Dueling Bandit , 2017, NIPS.

[12]  Yuval Peres,et al.  Bandit Convex Optimization: \(\sqrt{T}\) Regret in One Dimension , 2015, COLT.

[13]  Ambuj Tewari,et al.  Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback , 2011, AISTATS.

[14]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[15]  Huasen Wu,et al.  Double Thompson Sampling for Dueling Bandits , 2016, NIPS.

[16]  Elad Hazan,et al.  An optimal algorithm for stochastic strongly-convex optimization , 2010, 1006.2425.

[17]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[18]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[19]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[20]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[21]  Mehryar Mohri,et al.  Optimistic Bandit Convex Optimization , 2016, NIPS.

[22]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[23]  Yuanzhi Li,et al.  An optimal algorithm for bandit convex optimization , 2016, ArXiv.

[24]  Sham M. Kakade,et al.  Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[25]  Ronen Eldan,et al.  Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff , 2015, NIPS.

[26]  Sivaraman Balakrishnan,et al.  Stochastic Zeroth-order Optimization in High Dimensions , 2017, AISTATS.

[27]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[28]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[29]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[30]  Vianney Perchet,et al.  Highly-Smooth Zero-th Order Online Optimization , 2016, COLT.