A. C. Aitken, On bernoulli's numerical solution of algebraic equations, Proceedings of the Royal Society of Edinburgh, vol.46, pp.289-305, 1927.

Z. Allen-zhu and . Katyusha, The first direct acceleration of stochastic gradient methods, 2016.

Z. , A. Zhu, and L. Orecchia, Linear coupling: An ultimate unification of gradient and mirror descent, Proceedings of the 8th Innovations in Theoretical Computer Science, vol.17, 2017.

D. G. Anderson, Iterative procedures for nonlinear integral equations, Journal of the ACM (JACM), vol.12, issue.4, pp.547-560, 1965.
DOI : 10.1145/321296.321305

U. M. Ascher, S. J. Ruuth, and B. T. Wetton, Implicit-explicit methods for time-dependent partial differential equations, SIAM Journal on Numerical Analysis, vol.32, issue.3, pp.797-823, 1995.
DOI : 10.1137/0732037

A. Beck and M. Teboulle, Mirror descent and nonlinear projected subgradient methods for convex optimization, Operations Research Letters, vol.31, issue.3, pp.167-175, 2003.
DOI : 10.1016/s0167-6377(02)00231-6

A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM journal on imaging sciences, vol.2, issue.1, pp.183-202, 2009.
DOI : 10.1137/080716542

URL : http://ie.technion.ac.il/%7Ebecka/papers/finalicassp2009.pdf

A. Ben-tal and A. Nemirovski, Lectures on modern convex optimization: analysis, algorithms, and engineering applications, 2001.

R. Bollapragada, D. Mudigere, J. Nocedal, H. M. Shi, and P. T. Tang, A progressive batching L-BFGS method for machine learning, 2018.

J. Bolte, A. Daniilidis, and A. S. Lewis, The ?ojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems, SIAM Journal on Optimization, vol.17, issue.4, pp.1205-1223, 2007.

S. Boyd and L. Vandenberghe, Convex optimization, 2004.

C. Brezinski, Accélération de la convergence en analyse numérique, vol.584, 2006.

S. Bubeck, Y. T. Lee, and M. Singh, A geometric alternative to Nesterov's accelerated gradient descent, 2015.

J. Butcher, Thirty years of g-stability, BIT Numerical Mathematics, vol.46, issue.3, pp.479-489, 2006.

S. Cabay and L. Jackson, A polynomial extrapolation method for finding limits and antilimits of vector sequences, SIAM Journal on Numerical Analysis, vol.13, issue.5, pp.734-752, 1976.

G. Dahlquist, G-stability is equivalent toa-stability, BIT Numerical Mathematics, vol.18, issue.4, pp.384-401, 1978.

G. Dahlquist, On one-leg multistep methods, SIAM journal on numerical analysis, vol.20, issue.6, pp.1130-1138, 1983.

A. Defazio, F. Bach, and S. Lacoste-julien, Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in neural information processing systems, pp.1646-1654, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01016843

L. Deng, J. Li, J. Huang, K. Yao, D. Yu et al.,

, 2013 IEEE International Conference on, pp.8604-8608, 2013.

Y. Drori and M. Teboulle, Performance of first-order methods for smooth convex minimization: a novel approach, Mathematical Programming, pp.451-482, 2014.

J. C. Duchi, S. Shalev-shwartz, Y. Singer, and A. Tewari, Composite objective mirror descent, COLT, pp.14-26, 2010.

J. Durbin, The fitting of time-series models. Revue de l'Institut International de Statistique, pp.233-244, 1960.

R. Eddy, Information linkage between applied mathematics and industry, pp.387-396, 1979.

H. Fang and Y. Saad, Two classes of multisecant methods for nonlinear acceleration, Numerical Linear Algebra with Applications, vol.16, issue.3, pp.197-221, 2009.

O. Fercoq and Z. Qu, Restarting accelerated gradient methods with a rough strong convexity estimate, 2016.
URL : https://hal.archives-ouvertes.fr/hal-02287730

N. Flammarion and F. Bach, From averaging to acceleration, there is only a step-size, Conference on Learning Theory, pp.658-695, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01136945

J. Frank, W. Hundsdorfer, and J. Verwer, On the stability of implicit-explicit linear multistep methods, Applied Numerical Mathematics, vol.25, issue.2-3, pp.193-205, 1997.

W. Gautschi, Numerical analysis, 2011.

G. H. Golub and C. F. Van-loan, Matrix computations, vol.3, 2012.

G. H. Golub and R. S. Varga, Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods, Numerische Mathematik, vol.3, issue.1, pp.157-168, 1961.

R. P. Gorman and T. J. Sejnowski, Analysis of hidden units in a layered network trained to classify sonar targets, Neural networks, vol.1, issue.1, pp.75-89, 1988.

P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski et al., Accurate, large minibatch sgd: training imagenet in 1 hour, 2017.

I. Guyon, Sido: A phamacology dataset, 2008.

I. Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh, Feature extraction: foundations and applications, vol.207, 2008.

E. Hazan, , 2014.

G. Heinig and K. Rost, Fast algorithms for Toeplitz and Hankel matrices, Linear Algebra and its Applications, vol.435, issue.1, pp.1-59, 2011.

P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson, Averaging weights leads to wider optima and better generalization, 2018.

R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in neural information processing systems, pp.315-323, 2013.

J. Kiefer and J. Wolfowitz, Stochastic estimation of the maximum of a regression function, The Annals of Mathematical Statistics, pp.462-466, 1952.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

W. Krichene, A. Bayen, and P. L. Bartlett, Accelerated mirror descent in continuous and discrete time, Advances in neural information processing systems, pp.2845-2853, 2015.

J. B. Lasserre, Global optimization with polynomials and the problem of moments, SIAM Journal on Optimization, vol.11, issue.3, pp.796-817, 2001.

L. Lessard, B. Recht, and A. Packard, Analysis and design of optimization algorithms via integral quadratic constraints, SIAM Journal on Optimization, vol.26, issue.1, p.101, 2016.

N. Levinson, The wiener rms error criterion in filter design and prediction, appendix b of wiener, n.(1949). Extrapolation, Interpolation, and Smoothing of Stationary Time Series, 1949.

H. Lin, J. Mairal, and Z. Harchaoui, A universal catalyst for first-order optimization, Advances in Neural Information Processing Systems, pp.3384-3392, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01160728

M. Massias, J. Salmon, and A. Gramfort, Celer: a fast solver for the lasso with dual extrapolation, International Conference on Machine Learning, pp.3321-3330, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01833398

M. Me?ina, Convergence acceleration for the iterative solution of the equations x= ax+ f, Computer Methods in Applied Mechanics and Engineering, vol.10, issue.2, pp.165-173, 1977.

E. Moulines and F. Bach, Non-asymptotic analysis of stochastic approximation algorithms for machine learning, Advances in Neural Information Processing Systems, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00608041

A. Nedi? and D. Bertsekas, Convergence rate of incremental subgradient algorithms, Stochastic optimization: algorithms and applications, pp.223-264, 2001.

A. Nemirovskii and Y. E. Nesterov, Optimal methods of smooth convex minimization, USSR Computational Mathematics and Mathematical Physics, vol.25, issue.2, pp.21-30, 1985.
DOI : 10.1016/0041-5553(85)90100-4

A. S. Nemirovskii and B. T. Polyak, Iterative methods for solving linear ill-posed problems under precise information, ENG. CYBER, issue.4, pp.50-56, 1984.

Y. Nesterov, A method of solving a convex programming problem with convergence rate o (1/k2), Soviet Mathematics Doklady, vol.27, pp.372-376, 1983.

Y. Nesterov, Squared functional systems and optimization problems, High performance optimization, pp.405-440, 2000.
DOI : 10.1007/978-1-4757-3216-0_17

Y. Nesterov, Gradient methods for minimizing composite objective function, 2007.
DOI : 10.1007/s10107-012-0629-5

Y. Nesterov, Introductory lectures on convex optimization: A basic course, vol.87, 2013.
DOI : 10.1007/978-1-4419-8853-9

Y. Nesterov, Universal gradient methods for convex optimization problems, Mathematical Programming, vol.152, issue.1-2, pp.381-404, 2015.
DOI : 10.1007/s10107-014-0790-0

Y. Nesterov and B. T. Polyak, Cubic regularization of Newton method and its global performance, Mathematical Programming, vol.108, issue.1, pp.177-205, 2006.
DOI : 10.1007/s10107-006-0706-8

P. A. Parrilo, Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization, 2000.

B. T. Polyak, Some methods of speeding up the convergence of iteration methods, USSR Computational Mathematics and Mathematical Physics, vol.4, issue.5, pp.1-17, 1964.

B. T. Polyak and A. B. Juditsky, Acceleration of stochastic approximation by averaging, SIAM Journal on Control and Optimization, vol.30, issue.4, pp.838-855, 1992.
DOI : 10.1137/0330046

S. J. Reddi, S. Kale, and S. Kumar, On the convergence of Adam and beyond, International Conference on Learning Representations, 2018.

R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM journal on control and optimization, vol.14, issue.5, pp.877-898, 1976.
DOI : 10.1137/0314056

URL : http://www.math.washington.edu/~rtr/papers/rtr-MonoOpProxPoint.pdf

M. Schmidt, N. Le-roux, and F. Bach, Minimizing finite sums with the stochastic average gradient, Mathematical Programming, pp.1-30, 2013.
DOI : 10.1007/s10107-016-1030-6

URL : https://hal.archives-ouvertes.fr/hal-00860051

D. Scieur, A. .-d'aspremont, and F. Bach, Regularized nonlinear acceleration, Advances In Neural Information Processing Systems, pp.712-720, 2016.
DOI : 10.1007/s10107-018-1319-8

URL : https://hal.archives-ouvertes.fr/hal-01384682

D. Scieur, F. Bach, and A. , Nonlinear acceleration of stochastic algorithms, Advances in Neural Information Processing Systems, pp.3985-3994, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01618379

D. Scieur, V. Roulet, F. Bach, and A. , Integration methods and optimization algorithms, Advances in Neural Information Processing Systems, pp.1109-1118, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01474045

D. Scieur, E. Oyallon, A. .-d'aspremont, and F. Bach, Nonlinear acceleration of CNNs, Workshop track of International Conference on Learning Representations (ICLR), 2018.
URL : https://hal.archives-ouvertes.fr/hal-01805251

D. Scieur, E. Oyallon, A. .-d'aspremont, and F. Bach, Nonlinear acceleration of deep neural networks, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01799269

S. Shalev-shwartz and T. Zhang, Stochastic dual coordinate ascent methods for regularized loss minimization, Journal of Machine Learning Research, vol.14, pp.567-599, 2013.

S. Shalev-shwartz and T. Zhang, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, ICML, pp.64-72, 2014.

D. Shanks, Non-linear transformations of divergent and slowly convergent sequences, Studies in Applied Mathematics, vol.34, issue.1-4, pp.1-42, 1955.

A. Sidi, W. F. Ford, and D. A. Smith, Acceleration of convergence of vector sequences, SIAM Journal on Numerical Analysis, vol.23, issue.1, pp.178-196, 1986.

D. A. Smith, W. F. Ford, and A. Sidi, Extrapolation methods for vector sequences, SIAM review, vol.29, issue.2, pp.199-233, 1987.

W. Su, S. Boyd, and E. Candes, A differential equation for modeling nesterov's accelerated gradient method: Theory and insights, Advances in Neural Information Processing Systems, pp.2510-2518, 2014.

E. Süli and D. F. Mayers, An introduction to numerical analysis, 2003.

T. Tieleman and G. Hinton, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, vol.4, pp.26-31, 2012.

E. E. Tyrtyshnikov, How bad are Hankel matrices? Numerische Mathematik, vol.67, pp.261-269, 1994.

A. Wibisono, A. C. Wilson, and M. I. Jordan, A variational perspective on accelerated methods in optimization, Proceedings of the National Academy of Sciences, p.201614734, 2016.

A. C. Wilson, B. Recht, and M. I. Jordan, A Lyapunov analysis of momentum methods in optimization, 2016.

P. Wynn, On a device for computing the e m (s n) transformation. Mathematical Tables and Other Aids to Computation, pp.91-96, 1956.

G. Zhang and A. Xiao, Stability and convergence analysis of implicit-explicit one-leg methods for stiff delay differential equations, International Journal of Computer Mathematics, vol.93, issue.11, pp.1964-1983, 2016.

Z. Zhou, J. Wu, and W. Tang, Ensembling neural networks: many could be better than all, Artificial intelligence, vol.137, issue.1-2, pp.239-263, 2002.