Battle of Bandits: Online Learning from Relative Preferences


Aditya Gopalan


Indian Institute of Science, Bangalore


Tuesday, 12 May 2020, 14:00


  • A-201 (STCS Seminar Room)


We consider the problem of sequentially learning good alternatives from among a pool, but with only relative utility feedback from adaptively chosen subsets. At each round, the learner chooses a subset of alternatives and can observe which ones are preferred over the others in the subset. This type of feedback is natural in several domains, especially where human preferences are elicited in a repeated fashion ("Which of A, B, C, D do you prefer?"), e.g., the design of surveys and expert reviews, web search and recommender systems, and other settings like ranking in multiplayer games. Tranditional approaches such as the multi-armed bandit model only absolute utility feedback, and are thus inadequate to express relative choices. The dueling bandit (Yue-Joachims'09) is a more recent attempt to model online learning with pairwise preferences, but the more general, realistic, and combinatorially harder case of working with preferences expressed over subsets has largely been unexplored. We take a step in this direction and formulate what we call the battling bandit problem, where one seeks to learn an optimal item or ranking of n items by sequentially choosing up to size-k subsets at each round and exploiting relative preferences arising from a choice model such as the well-known Plackett-Luce probability model. We study variants of learning objectives from subsetwise feedback: Identifying the best item, the set of top-k items, full ranking etc., in both the probably approximately correct (PAC) or regret optimization setting, and design algorithms with optimality properties. This is joint work with Aadirupa Saha (Indian Institute of Science).

Aditya Gopalan is an Assistant Professor and INSPIRE Faculty Fellow at the Indian Institute of Science, Dept. of Electrical Communication Engineering. He received the Ph.D. degree in electrical engineering from The University of Texas at Austin, and the B.Tech. and M.Tech. degrees in electrical engineering from the Indian Institute of Technology Madras. He was an Andrew and Erna Viterbi Post-Doctoral Fellow at the Technion-Israel Institute of Technology. His research interests include machine learning and statistical inference, control, and resource allocation algorithms.