On bernoulli's numerical solution of algebraic equations, Proceedings of the Royal Society of Edinburgh, vol.46, pp.289-305, 1927. ,
, The first direct acceleration of stochastic gradient methods, 2016.
Linear coupling: An ultimate unification of gradient and mirror descent, Proceedings of the 8th Innovations in Theoretical Computer Science, vol.17, 2017. ,
Iterative procedures for nonlinear integral equations, Journal of the ACM (JACM), vol.12, issue.4, pp.547-560, 1965. ,
DOI : 10.1145/321296.321305
Implicit-explicit methods for time-dependent partial differential equations, SIAM Journal on Numerical Analysis, vol.32, issue.3, pp.797-823, 1995. ,
DOI : 10.1137/0732037
Mirror descent and nonlinear projected subgradient methods for convex optimization, Operations Research Letters, vol.31, issue.3, pp.167-175, 2003. ,
DOI : 10.1016/s0167-6377(02)00231-6
A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM journal on imaging sciences, vol.2, issue.1, pp.183-202, 2009. ,
DOI : 10.1137/080716542
URL : http://ie.technion.ac.il/%7Ebecka/papers/finalicassp2009.pdf
Lectures on modern convex optimization: analysis, algorithms, and engineering applications, 2001. ,
, A progressive batching L-BFGS method for machine learning, 2018.
The ?ojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems, SIAM Journal on Optimization, vol.17, issue.4, pp.1205-1223, 2007. ,
, Convex optimization, 2004.
Accélération de la convergence en analyse numérique, vol.584, 2006. ,
A geometric alternative to Nesterov's accelerated gradient descent, 2015. ,
Thirty years of g-stability, BIT Numerical Mathematics, vol.46, issue.3, pp.479-489, 2006. ,
A polynomial extrapolation method for finding limits and antilimits of vector sequences, SIAM Journal on Numerical Analysis, vol.13, issue.5, pp.734-752, 1976. ,
G-stability is equivalent toa-stability, BIT Numerical Mathematics, vol.18, issue.4, pp.384-401, 1978. ,
On one-leg multistep methods, SIAM journal on numerical analysis, vol.20, issue.6, pp.1130-1138, 1983. ,
Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, Advances in neural information processing systems, pp.1646-1654, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01016843
,
, 2013 IEEE International Conference on, pp.8604-8608, 2013.
Performance of first-order methods for smooth convex minimization: a novel approach, Mathematical Programming, pp.451-482, 2014. ,
Composite objective mirror descent, COLT, pp.14-26, 2010. ,
The fitting of time-series models. Revue de l'Institut International de Statistique, pp.233-244, 1960. ,
Information linkage between applied mathematics and industry, pp.387-396, 1979. ,
Two classes of multisecant methods for nonlinear acceleration, Numerical Linear Algebra with Applications, vol.16, issue.3, pp.197-221, 2009. ,
Restarting accelerated gradient methods with a rough strong convexity estimate, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-02287730
From averaging to acceleration, there is only a step-size, Conference on Learning Theory, pp.658-695, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01136945
On the stability of implicit-explicit linear multistep methods, Applied Numerical Mathematics, vol.25, issue.2-3, pp.193-205, 1997. ,
Numerical analysis, 2011. ,
, Matrix computations, vol.3, 2012.
Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods, Numerische Mathematik, vol.3, issue.1, pp.157-168, 1961. ,
Analysis of hidden units in a layered network trained to classify sonar targets, Neural networks, vol.1, issue.1, pp.75-89, 1988. ,
, Accurate, large minibatch sgd: training imagenet in 1 hour, 2017.
Sido: A phamacology dataset, 2008. ,
Feature extraction: foundations and applications, vol.207, 2008. ,
, , 2014.
Fast algorithms for Toeplitz and Hankel matrices, Linear Algebra and its Applications, vol.435, issue.1, pp.1-59, 2011. ,
Averaging weights leads to wider optima and better generalization, 2018. ,
Accelerating stochastic gradient descent using predictive variance reduction, Advances in neural information processing systems, pp.315-323, 2013. ,
Stochastic estimation of the maximum of a regression function, The Annals of Mathematical Statistics, pp.462-466, 1952. ,
Adam: A method for stochastic optimization, 2014. ,
Accelerated mirror descent in continuous and discrete time, Advances in neural information processing systems, pp.2845-2853, 2015. ,
Global optimization with polynomials and the problem of moments, SIAM Journal on Optimization, vol.11, issue.3, pp.796-817, 2001. ,
Analysis and design of optimization algorithms via integral quadratic constraints, SIAM Journal on Optimization, vol.26, issue.1, p.101, 2016. ,
The wiener rms error criterion in filter design and prediction, appendix b of wiener, n.(1949). Extrapolation, Interpolation, and Smoothing of Stationary Time Series, 1949. ,
A universal catalyst for first-order optimization, Advances in Neural Information Processing Systems, pp.3384-3392, 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01160728
Celer: a fast solver for the lasso with dual extrapolation, International Conference on Machine Learning, pp.3321-3330, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01833398
Convergence acceleration for the iterative solution of the equations x= ax+ f, Computer Methods in Applied Mechanics and Engineering, vol.10, issue.2, pp.165-173, 1977. ,
Non-asymptotic analysis of stochastic approximation algorithms for machine learning, Advances in Neural Information Processing Systems, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00608041
Convergence rate of incremental subgradient algorithms, Stochastic optimization: algorithms and applications, pp.223-264, 2001. ,
Optimal methods of smooth convex minimization, USSR Computational Mathematics and Mathematical Physics, vol.25, issue.2, pp.21-30, 1985. ,
DOI : 10.1016/0041-5553(85)90100-4
Iterative methods for solving linear ill-posed problems under precise information, ENG. CYBER, issue.4, pp.50-56, 1984. ,
A method of solving a convex programming problem with convergence rate o (1/k2), Soviet Mathematics Doklady, vol.27, pp.372-376, 1983. ,
Squared functional systems and optimization problems, High performance optimization, pp.405-440, 2000. ,
DOI : 10.1007/978-1-4757-3216-0_17
Gradient methods for minimizing composite objective function, 2007. ,
DOI : 10.1007/s10107-012-0629-5
Introductory lectures on convex optimization: A basic course, vol.87, 2013. ,
DOI : 10.1007/978-1-4419-8853-9
Universal gradient methods for convex optimization problems, Mathematical Programming, vol.152, issue.1-2, pp.381-404, 2015. ,
DOI : 10.1007/s10107-014-0790-0
Cubic regularization of Newton method and its global performance, Mathematical Programming, vol.108, issue.1, pp.177-205, 2006. ,
DOI : 10.1007/s10107-006-0706-8
Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization, 2000. ,
Some methods of speeding up the convergence of iteration methods, USSR Computational Mathematics and Mathematical Physics, vol.4, issue.5, pp.1-17, 1964. ,
Acceleration of stochastic approximation by averaging, SIAM Journal on Control and Optimization, vol.30, issue.4, pp.838-855, 1992. ,
DOI : 10.1137/0330046
On the convergence of Adam and beyond, International Conference on Learning Representations, 2018. ,
Monotone operators and the proximal point algorithm, SIAM journal on control and optimization, vol.14, issue.5, pp.877-898, 1976. ,
DOI : 10.1137/0314056
URL : http://www.math.washington.edu/~rtr/papers/rtr-MonoOpProxPoint.pdf
Minimizing finite sums with the stochastic average gradient, Mathematical Programming, pp.1-30, 2013. ,
DOI : 10.1007/s10107-016-1030-6
URL : https://hal.archives-ouvertes.fr/hal-00860051
Regularized nonlinear acceleration, Advances In Neural Information Processing Systems, pp.712-720, 2016. ,
DOI : 10.1007/s10107-018-1319-8
URL : https://hal.archives-ouvertes.fr/hal-01384682
Nonlinear acceleration of stochastic algorithms, Advances in Neural Information Processing Systems, pp.3985-3994, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01618379
Integration methods and optimization algorithms, Advances in Neural Information Processing Systems, pp.1109-1118, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01474045
Nonlinear acceleration of CNNs, Workshop track of International Conference on Learning Representations (ICLR), 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01805251
, Nonlinear acceleration of deep neural networks, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01799269
Stochastic dual coordinate ascent methods for regularized loss minimization, Journal of Machine Learning Research, vol.14, pp.567-599, 2013. ,
Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, ICML, pp.64-72, 2014. ,
Non-linear transformations of divergent and slowly convergent sequences, Studies in Applied Mathematics, vol.34, issue.1-4, pp.1-42, 1955. ,
Acceleration of convergence of vector sequences, SIAM Journal on Numerical Analysis, vol.23, issue.1, pp.178-196, 1986. ,
Extrapolation methods for vector sequences, SIAM review, vol.29, issue.2, pp.199-233, 1987. ,
A differential equation for modeling nesterov's accelerated gradient method: Theory and insights, Advances in Neural Information Processing Systems, pp.2510-2518, 2014. ,
An introduction to numerical analysis, 2003. ,
Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, vol.4, pp.26-31, 2012. ,
How bad are Hankel matrices? Numerische Mathematik, vol.67, pp.261-269, 1994. ,
A variational perspective on accelerated methods in optimization, Proceedings of the National Academy of Sciences, p.201614734, 2016. ,
A Lyapunov analysis of momentum methods in optimization, 2016. ,
On a device for computing the e m (s n) transformation. Mathematical Tables and Other Aids to Computation, pp.91-96, 1956. ,
Stability and convergence analysis of implicit-explicit one-leg methods for stiff delay differential equations, International Journal of Computer Mathematics, vol.93, issue.11, pp.1964-1983, 2016. ,
Ensembling neural networks: many could be better than all, Artificial intelligence, vol.137, issue.1-2, pp.239-263, 2002. ,