Thompson Sampling for one-dimensial exponential family bandits

Nathaniel Korda; Emilie Kaufmann; Rémi Munos

Communication Dans Un Congrès Année : 2013

Thompson Sampling for one-dimensial exponential family bandits

(1) , (2, 3) , (1)

1
2
3

Nathaniel Korda

Fonction : Auteur

Sequential Learning

Emilie Kaufmann

Fonction : Auteur
PersonId : 10422
IdHAL : emilie-kaufmann
ORCID : 0000-0002-5496-824X
IdRef : 197040810

Signal, Statistique et Apprentissage

Département Traitement du Signal et des Images

Rémi Munos

Fonction : Auteur

Sequential Learning

Résumé

Thompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case. Here we extend them by proving asymptotic optimality of the algorithm using the Jeffreys prior for one-dimensional exponential family bandits. Our proof builds on previous work, but also makes extensive use of closed forms for Kullback-Leibler divergence and Fisher information (through the Jeffreys prior) available in an exponential family. This allow us to give a finite time exponential concentration inequality for posterior distributions on exponential families that may be of interest in its own right. Moreover our analysis covers some distributions for which no optimistic algorithm has yet been proposed, including heavy-tailed exponential families.

Domaines

Machine Learning [stat.ML] Théorie [stat.TH]

TelecomParis HAL : Connectez-vous pour contacter le contributeur

https://telecom-paris.hal.science/hal-02288407

Soumis le : samedi 14 septembre 2019-18:46:55

Dernière modification le : lundi 9 octobre 2023-12:49:39

Dates et versions

hal-02288407 , version 1 (14-09-2019)

Identifiants

HAL Id : hal-02288407 , version 1

Citer

Nathaniel Korda, Emilie Kaufmann, Rémi Munos. Thompson Sampling for one-dimensial exponential family bandits. NIPS 2013 - Neural Information Processing Systems Conference, Dec 2013, Lake Tahoe, United States. ⟨hal-02288407⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-LILLE3 CNRS INRIA PARISTECH LAGIS INRIA2 LTCI IDS S2A

40 Consultations

0 Téléchargements

Thompson Sampling for one-dimensial exponential family bandits

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager