On the minimizing property of a second order dissipative system in hilbert spaces, SIAM Journal on Control and Optimization, vol.38, pp.1102-1119, 2000. ,
Convergence rates of an inertial gradient descent algorithm under growth and flatness conditions, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01965095
Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Mathematical Programming, vol.168, pp.123-175, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01821929
The heavy ball with friction method, i. the continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system, Communications in Contemporary Mathematics, vol.2, pp.1-34, 2000. ,
, Optimal convergence rates for nesterov acceleration, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01786117
Dissecting adam: The sign, magnitude and variance of stochastic gradients, Proceedings of the 35th International Conference on Machine Learning (ICML), 2018. ,
, Convergence of the adam algorithm from a dynamical system viewpoint, 2018.
, Convergence guarantees for rmsprop and adam in non-convex optimization and their comparison to nesterov acceleration on autoencoders, 2018.
Dynamics of stochastic approximation algorithms, Séminaire de Probabilités, XXXIII, vol.1709, pp.1-68, 1999. ,
Asymptotic pseudotrajectories and chain recurrent flows, with applications, J. Dynam. Differential Equations, vol.8, pp.141-176, 1996. ,
Ergodic properties of weak asymptotic pseudotrajectories for semiflows, J. Dynam. Differential Equations, vol.12, pp.579-598, 2000. ,
signSGD: Compressed optimisation for non-convex problems, Proceedings of the 35th International Conference on Machine Learning, vol.80, pp.560-569, 2018. ,
Constant step stochastic approximations involving differential inclusions: Stability, long-run convergence and applications, Stochastics, pp.288-320, 2019. ,
On the long time behavior of second order differential equations with asymptotically small dissipation, Transactions of the American Mathematical Society, vol.361, pp.5983-6017, 2009. ,
Second-order differential equations with asymptotically small dissipation and piecewise flat potentials, Electronic Journal of Differential Equation, vol.17, pp.33-38, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00628516
On the convergence of a class of adam-type algorithms for non-convex optimization, International Conference on Learning Representations, 2019. ,
, A general system of differential equations to model first order adaptive algorithms, p.31, 2018.
Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, vol.12, pp.2121-2159, 2011. ,
A new approach to variable metric algorithms, The computer journal, vol.13, pp.317-322, 1970. ,
Asymptotic behavior of a Markovian stochastic algorithm with constant step, SIAM J. Control Optim, vol.37, pp.1456-1482, 1999. ,
Long time behaviour and stationary regime of memory gradient diffusions, Annales de l'IHP Probabilités et statistiques, vol.50, pp.564-601, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00757068
Stochastic heavy ball, Electronic Journal of Statistics, vol.12, pp.461-529, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01402683
, Systemes dynamiques dissipatifs et applications, vol.17, 1991.
, Ordinary Differential Equations: Second Edition, Classics in Applied Mathematics, 1982.
Adam: A method for stochastic optimization, International Conference on Learning Representations, 2015. ,
, Stochastic approximation and recursive algorithms and applications, vol.35, 2003.
Analysis of recursive stochastic algorithms, IEEE transactions on automatic control, vol.22, pp.551-575, 1977. ,
Variants of RMSProp and Adagrad with logarithmic regret bounds, Proceedings of the 34th International Conference on Machine Learning, vol.70, pp.2545-2553, 2017. ,
A method for solving the convex programming problem with convergence rate O(1/k 2 ), Dokl. Akad. Nauk SSSR, vol.269, pp.543-547, 1983. ,
Lyapunov functions: An optimization theory perspective, IFAC-PapersOnLine, vol.50, pp.7456-7461, 2017. ,
Some methods of speeding up the convergence of iteration methods, USSR Computational Mathematics and Mathematical Physics, vol.4, pp.1-17, 1964. ,
On the convergence of adam and beyond, International Conference on Learning Representations, 2018. ,
A stochastic approximation method, Herbert Robbins Selected Papers, pp.102-109, 1985. ,
Stochastic approximations with constant step size and differential inclusions, SIAM J. Control Optim, vol.51, pp.525-555, 2013. ,
No more pesky learning rates, International Con-A. BARAKAT AND P. BIANCHI ference on Machine Learning, pp.343-351, 2013. ,
Understanding the acceleration phenomenon via high-resolution differential equations, 2018. ,
A differential equation for modeling nesterov's accelerated gradient method: Theory and insights, Journal of Machine Learning Research, vol.17, pp.1-43, 2016. ,
Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, Coursera: Neural networks for machine learning, vol.4, pp.26-31, 2012. ,
, Adagrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization, 2018.
A variational perspective on accelerated methods in optimization, proceedings of the National Academy of Sciences, vol.113, pp.7351-7358, 2016. ,
, A lyapunov analysis of momentum methods in optimization, 2016.
Adaptive methods for nonconvex optimization, Advances in Neural Information Processing Systems, pp.9793-9803, 2018. ,
, On the convergence of adaptive gradient methods for nonconvex optimization, 2018.