Thompson Sampling : an asymptotically optimal finite time analysis

Emilie Kaufmann; Nathaniel Korda; Rémi Munos

Communication Dans Un Congrès Année : 2012

Thompson Sampling : an asymptotically optimal finite time analysis

(1, 2) , ,

1
2

Emilie Kaufmann

Fonction : Auteur
PersonId : 10422
IdHAL : emilie-kaufmann
ORCID : 0000-0002-5496-824X
IdRef : 197040810

Signal, Statistique et Apprentissage

Département Traitement du Signal et des Images

Nathaniel Korda

Fonction : Auteur

Rémi Munos

Fonction : Auteur

Résumé

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. The proof is accompanied by a numerical comparison with other optimal policies, experiments that have been lacking in the literature until now for the Bernoulli case.

Mots clés

Thompson Sampling Bandits

Domaines

Machine Learning [stat.ML] Théorie [stat.TH]

TelecomParis HAL : Connectez-vous pour contacter le contributeur

https://telecom-paris.hal.science/hal-02286442

Soumis le : vendredi 13 septembre 2019-15:45:17

Dernière modification le : lundi 9 octobre 2023-12:49:39

Dates et versions

hal-02286442 , version 1 (13-09-2019)

Identifiants

HAL Id : hal-02286442 , version 1

Citer

Emilie Kaufmann, Nathaniel Korda, Rémi Munos. Thompson Sampling : an asymptotically optimal finite time analysis. International Conference on Algorithmic Learning Theory, Nov 2012, Lyon, France. pp.199-213. ⟨hal-02286442⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM CNRS PARISTECH LTCI IDS S2A

39 Consultations

0 Téléchargements

Thompson Sampling : an asymptotically optimal finite time analysis

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager