M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, An introduction to variational methods for graphical models, Machine Learning, vol.37, pp.183-233, 1999.

M. J. Beal, Variational algorithms for approximate bayesian inference, 2003.

M. Opper and O. Winther, Gaussian processes for classification: Mean-field algorithms, Neural Computation, vol.12, issue.11, pp.2655-2684, 2000.

T. P. Minka, Expectation propagation for approximate bayesian inference, Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, UAI'01, pp.362-369, 2001.

S. Kullback and R. A. Leibler, On information and sufficiency, Ann. Math. Statist, vol.22, issue.1, p.1951

D. M. Blei, A. Kucukelbir, and J. D. Mcauliffe, Variational inference: A review for statisticians, Journal of the American Statistical Association, vol.112, issue.518, pp.859-877, 2017.

C. Zhang, J. Butepage, H. Kjellstrom, and S. Mandt, Advances in variational inference, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.41, issue.8, pp.2008-2026, 2019.

J. Paisley, D. Blei, and M. Jordan, Variational bayesian inference with stochastic search, Proceedings of the 29th International Conference on Machine Learning, pp.1363-1370, 2012.

R. Ranganath, S. Gerrish, and D. Blei, Black Box Variational Inference, Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, vol.33, pp.22-25, 2014.

R. Ranganath, D. Tran, and D. Blei, Hierarchical variational models, Proceedings of The 33rd International Conference on Machine Learning, vol.48, pp.20-22, 2016.

M. Yin and M. Zhou, Semi-implicit variational inference, Proceedings of the 35th International Conference on Machine Learning, vol.80, pp.10-15, 2018.

H. Zhu and R. Rohwer, Bayesian invariant measurements of generalization, Neural Processing Letters, vol.2, pp.28-31, 1995.

H. Zhu and R. Rohwer, Information geometric measurements of generalisation, 1995.

A. Rényi, On measures of entropy and information, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol.1, pp.547-561, 1961.

T. Van-erven and P. Harremoes, Rényi divergence and kullback-leibler divergence, IEEE Transactions on Information Theory, vol.60, issue.7, pp.3797-3820, 2014.

T. Minka, Divergence measures and message passing, 2005.

T. Minka, Power ep, 2004.

J. Hernandez-lobato, Y. Li, M. Rowland, T. Bui, D. Hernandez-lobato et al., Black-box alpha divergence minimization, Proceedings of The 33rd International Conference on Machine Learning, vol.48, pp.20-22, 2016.

Y. Li and R. E. Turner, Rényi divergence variational inference, Advances in Neural Information Processing Systems, vol.29, pp.1073-1081, 2016.

A. Bousso-dieng, D. Tran, R. Ranganath, J. Paisley, and D. Blei, Variational inference via \chi upper bound minimization, Advances in Neural Information Processing Systems, vol.30, pp.2732-2741, 2017.

R. Bamler, C. Zhang, M. Opper, S. Mandt, ;. I. Guyon et al., Perturbative black box variational inference, Advances in Neural Information Processing Systems, vol.30, pp.5079-5088, 2017.

D. Wang, H. Liu, and Q. Liu, Variational inference with tail-adaptive fdivergence, Advances in Neural Information Processing Systems, vol.31, pp.5737-5747, 2018.

M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley, Stochastic variational inference, Journal of Machine Learning Research, vol.14, issue.4, pp.1303-1347, 2013.

L. Bottou, Large-scale machine learning with stochastic gradient descent, Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT'2010), pp.177-187, 2010.

H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist, vol.22, issue.3, p.1951

Y. Li, J. M. Hernández-lobato, and R. E. Turner, Stochastic expectation propagation, Advances in Neural Information Processing Systems, vol.28, pp.2323-2331, 2015.

G. Dehaene and S. Barthelme, Expectation Propagation in the largedata limit, Journal of the Royal Statistical Society: Series B, vol.80, pp.197-217, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01235066

D. M. Blei, A. Y. Ng, and M. Jordan, Latent dirichlet allocation, vol.3, pp.993-1022, 2003.

T. Morimoto, Eine informationstheoretische ungleichung und ihre anwendung auf den beweis der ergodizitat von markoffschen ketten, Magyar Tud. Akad. Mat. Kutat Int, pp.85-108, 1963.

T. Morimoto, Markov processes and the h-theorem, Journal of the Physical Society of Japan, vol.18, issue.3, pp.328-331, 1963.

Y. Hsieh, C. Liu, and V. Cevher, Finding mixed Nash equilibria of generative adversarial networks, Proceedings of the 36th International Conference on Machine Learning, pp.9-15, 2019.

T. S. Jaakkola and M. I. Jordan, Improving the mean field approximation via the use of mixture distributions, NATO ASI Series (Series D: Behavioural and Social Sciences), p.89, 1998.

S. Gershman, M. D. Hoffman, and D. M. Blei, Nonparametric variational inference, 2012.

A. Cichocki and . Shun-ichi-amari, Families of alpha-beta-and gamma-divergences: Flexible and robust measures of similarities, Entropy, vol.12, issue.6, pp.1532-1568, 2010.

A. Cichocki, S. Cruces, and S. Amari, Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization, Entropy, vol.13, issue.1, pp.134-170, 2011.

I. Sason, On f-divergences: Integral representations, local behavior, and inequalities, Entropy, vol.20, issue.5, p.383, 2018.

E. Hellinger, Neue begründung der theorie quadratischer formen von unendlichvielen veränderlichen, Journal für die reine und angewandte Mathematik, vol.136, pp.210-271, 1909.

G. Bruce and . Lindsay, Efficiency versus robustness: The case for minimum hellinger distance and related methods, Ann. Statist, vol.22, issue.2, pp.1081-1114, 1994.

M. Samuel-gershman, D. Hoffman, and . Blei, Nonparametric variational inference, Proceedings of the 29 th International Conference on Machine Learning, 2012.

S. Bubeck, Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, vol.8, pp.231-357, 2015.

A. Beck and M. Teboulle, Mirror descent and nonlinear projected subgradient methods for convex optimization, Operations Research Letters, vol.31, issue.3, pp.167-175, 2003.

A. Nemirovski, Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems, SIAM Journal on Optimization, vol.15, pp.229-251, 2004.

A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization, vol.19, issue.4, pp.1574-1609, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00976649

P. Badr-eddine-chérief-abdellatif, M. E. Alquier, and . Khan, A generalization bound for online variational inference, Proceedings of the 29th International Conference on Machine Learning, vol.101, pp.662-677, 2019.

R. Douc, A. Guillin, J. Marin, and C. P. Robert, Convergence of adaptive mixtures of importance sampling schemes, Ann. Statist, vol.35, issue.1, pp.420-448, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00432955

C. J. Stone, Optimal global rates of convergence for nonparametric regression, Ann. Statist, vol.10, issue.4, pp.1040-1053, 1982.

M. Oh and J. Berger, Adaptive importance sampling in monte carlo integration, Journal of Statistical Computation and Simulation, vol.41, issue.3-4, pp.143-168, 1992.

T. Kloek and H. Dijk, Bayesian estimates of equation system parameters: an application of integration by monte carlo, Econometrica: Journal of the Econometric Society, pp.1-19, 1978.

N. Chopin, Central limit theorem for sequential monte carlo methods and its application to bayesian inference, Ann. Statist, vol.32, issue.6, pp.2385-2411, 2004.
URL : https://hal.archives-ouvertes.fr/hal-02403337

B. Delyon and F. Portier, Safe and adaptive importance sampling: a mixture approach, p.20, 2019.

H. L. Royden and P. Fitzpatrick, Prentice Hall, 2010. C.2. General Dominated Convergence Theorem. We state and prove a generalized version of the Dominated Convergence Theorem

, General Dominated Convergence Theorem). Let ? ? M 1 (T), vol.15

, Assume there exist (a M ), (b M ), (c M ) three sequences of (T , B(R))-measurable functions such that the limits lim M ?? a M (?), lim M ?? b M (?), lim M ?? c M (?) exist for ?-almost all ? ? T and LTCI

P. Place-marguerite, Palaiseau E-mail: kamelia.daudel@telecom-paris.fr francois.portier@telecom-paris.fr SAMOVAR, Télécom SudParis Institut Polytechnique de Paris 9 rue Charles Fourier, 91120.