Skip to Main content Skip to Navigation
Conference papers

Thompson Sampling : an asymptotically optimal finite time analysis

Abstract : The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.
Complete list of metadata

https://hal.telecom-paris.fr/hal-02286442
Contributor : Telecomparis Hal Connect in order to contact the contributor
Submitted on : Friday, September 13, 2019 - 3:45:17 PM
Last modification on : Tuesday, October 19, 2021 - 11:16:12 AM

Identifiers

  • HAL Id : hal-02286442, version 1

Collections

Citation

Emilie Kaufmann, Nathaniel Korda, Rémi Munos. Thompson Sampling : an asymptotically optimal finite time analysis. International Conference on Algorithmic Learning Theory, Nov 2012, Lyon, France. pp.199-213. ⟨hal-02286442⟩

Share

Metrics

Record views

73