Regret-minimization algorithms typically rely on constructing tight confidence intervals for means of the underlying distributions. We will also look at new anytime-valid confidence intervals for means, which are based on the empirical likelihood principle. Algorithms using the MGF-based concentration of the empirical mean for bounded support distributions (Auer et al., 2002), or of the robust estimators for mean, like the truncated empirical mean in the case of heavy-tailed distributions (Bubeck et al., 2013), exist in the literature. We will see exactly where the framework of the optimal algorithm gains over these existing algorithms that use MGF-based confidence intervals for the mean.
If time permits, we will also look at a mixture-martingale-based proof for the validity of the proposed confidence intervals.
This talk is based on joint work with Sandeep Juneja and Wouter M. Koolen (the paper can be found here). I will not assume any prerequisites beyond a basic understanding of probability theory.