F. Alvarez, On the minimizing property of a second order dissipative system in hilbert spaces, SIAM Journal on Control and Optimization, vol.38, pp.1102-1119, 2000.

V. Apidopoulos, J. Aujol, C. Dossal, and A. Rondepierre, Convergence rates of an inertial gradient descent algorithm under growth and flatness conditions, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01965095

H. Attouch, Z. Chbani, J. Peypouquet, and P. Redont, Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Mathematical Programming, vol.168, pp.123-175, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01821929

H. Attouch, X. Goudou, and P. Redont, The heavy ball with friction method, i. the continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system, Communications in Contemporary Mathematics, vol.2, pp.1-34, 2000.

J. Aujol, C. Dossal, and A. Rondepierre, Optimal convergence rates for nesterov acceleration, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01786117

L. Balles and P. Hennig, Dissecting adam: The sign, magnitude and variance of stochastic gradients, Proceedings of the 35th International Conference on Machine Learning (ICML), 2018.

A. Barakat and P. Bianchi, Convergence of the adam algorithm from a dynamical system viewpoint, 2018.

A. Basu, S. De, A. Mukherjee, and E. Ullah, Convergence guarantees for rmsprop and adam in non-convex optimization and their comparison to nesterov acceleration on autoencoders, 2018.

M. Benaïm, Dynamics of stochastic approximation algorithms, Séminaire de Probabilités, XXXIII, vol.1709, pp.1-68, 1999.

M. Benaïm and M. W. Hirsch, Asymptotic pseudotrajectories and chain recurrent flows, with applications, J. Dynam. Differential Equations, vol.8, pp.141-176, 1996.

M. Benaïm and S. J. Schreiber, Ergodic properties of weak asymptotic pseudotrajectories for semiflows, J. Dynam. Differential Equations, vol.12, pp.579-598, 2000.

J. Bernstein, Y. Wang, K. Azizzadenesheli, and A. Anandkumar, signSGD: Compressed optimisation for non-convex problems, Proceedings of the 35th International Conference on Machine Learning, vol.80, pp.560-569, 2018.

P. Bianchi, W. Hachem, and A. Salim, Constant step stochastic approximations involving differential inclusions: Stability, long-run convergence and applications, Stochastics, pp.288-320, 2019.

A. Cabot, H. Engler, and S. Gadat, On the long time behavior of second order differential equations with asymptotically small dissipation, Transactions of the American Mathematical Society, vol.361, pp.5983-6017, 2009.

A. Cabot, H. Engler, and S. Gadat, Second-order differential equations with asymptotically small dissipation and piecewise flat potentials, Electronic Journal of Differential Equation, vol.17, pp.33-38, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00628516

X. Chen, S. Liu, R. Sun, and M. Hong, On the convergence of a class of adam-type algorithms for non-convex optimization, International Conference on Learning Representations, 2019.

A. B. Silva and M. Gazeau, A general system of differential equations to model first order adaptive algorithms, p.31, 2018.

J. Duchi, E. Hazan, and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011.

R. Fletcher, A new approach to variable metric algorithms, The computer journal, vol.13, pp.317-322, 1970.

J. Fort and G. Pagès, Asymptotic behavior of a Markovian stochastic algorithm with constant step, SIAM J. Control Optim, vol.37, pp.1456-1482, 1999.

S. Gadat and F. Panloup, Long time behaviour and stationary regime of memory gradient diffusions, Annales de l'IHP Probabilités et statistiques, vol.50, pp.564-601, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00757068

S. Gadat, F. Panloup, and S. Saadane, Stochastic heavy ball, Electronic Journal of Statistics, vol.12, pp.461-529, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01402683

A. Haraux, Systemes dynamiques dissipatifs et applications, vol.17, 1991.

P. Hartman, Ordinary Differential Equations: Second Edition, Classics in Applied Mathematics, 1982.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, International Conference on Learning Representations, 2015.

H. J. Kushner and G. G. Yin, Stochastic approximation and recursive algorithms and applications, vol.35, 2003.

L. Ljung, Analysis of recursive stochastic algorithms, IEEE transactions on automatic control, vol.22, pp.551-575, 1977.

M. C. Mukkamala and M. Hein, Variants of RMSProp and Adagrad with logarithmic regret bounds, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.2545-2553, 2017.

Y. E. Nesterov, A method for solving the convex programming problem with convergence rate O(1/k 2 ), Dokl. Akad. Nauk SSSR, vol.269, pp.543-547, 1983.

B. Polyak and P. Shcherbakov, Lyapunov functions: An optimization theory perspective, IFAC-PapersOnLine, vol.50, pp.7456-7461, 2017.

B. T. Polyak, Some methods of speeding up the convergence of iteration methods, USSR Computational Mathematics and Mathematical Physics, vol.4, pp.1-17, 1964.

S. J. Reddi, S. Kale, and S. Kumar, On the convergence of adam and beyond, International Conference on Learning Representations, 2018.

H. Robbins and S. Monro, A stochastic approximation method, Herbert Robbins Selected Papers, pp.102-109, 1985.

G. Roth and W. H. Sandholm, Stochastic approximations with constant step size and differential inclusions, SIAM J. Control Optim, vol.51, pp.525-555, 2013.

T. Schaul, S. Zhang, and Y. Lecun, No more pesky learning rates, International Con-A. BARAKAT AND P. BIANCHI ference on Machine Learning, pp.343-351, 2013.

B. Shi, S. Du, M. I. Jordan, and W. J. Su, Understanding the acceleration phenomenon via high-resolution differential equations, 2018.

W. Su, S. Boyd, and E. J. Candès, A differential equation for modeling nesterov's accelerated gradient method: Theory and insights, Journal of Machine Learning Research, vol.17, pp.1-43, 2016.

T. Tieleman and G. Hinton, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, Coursera: Neural networks for machine learning, vol.4, pp.26-31, 2012.

R. Ward, X. Wu, and L. Bottou, Adagrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization, 2018.

A. Wibisono, A. C. Wilson, and M. I. Jordan, A variational perspective on accelerated methods in optimization, proceedings of the National Academy of Sciences, vol.113, pp.7351-7358, 2016.

A. C. Wilson, B. Recht, and M. I. Jordan, A lyapunov analysis of momentum methods in optimization, 2016.

M. Zaheer, S. J. Reddi, D. Sachan, S. Kale, and S. Kumar, Adaptive methods for nonconvex optimization, Advances in Neural Information Processing Systems, pp.9793-9803, 2018.

D. Zhou, Y. Tang, Z. Yang, Y. Cao, and Q. Gu, On the convergence of adaptive gradient methods for nonconvex optimization, 2018.