Hostname: page-component-76c49bb84f-l6pkv Total loading time: 0 Render date: 2025-07-04T21:23:46.189Z Has data issue: false hasContentIssue false

Acceleration methods for fixed-point iterations

Published online by Cambridge University Press:  01 July 2025

Yousef Saad*
Affiliation:
Department of Compter Science and Engineering, University of Minnesota, Twin cities, MN 55455, USA E-mail: saad@umn.edu
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

A pervasive approach in scientific computing is to express the solution to a given problem as the limit of a sequence of vectors or other mathematical objects. In many situations these sequences are generated by slowly converging iterative procedures, and this led practitioners to seek faster alternatives to reach the limit. ‘Acceleration techniques’ comprise a broad array of methods specifically designed with this goal in mind. They started as a means of improving the convergence of general scalar sequences by various forms of ‘extrapolation to the limit’, i.e. by extrapolating the most recent iterates to the limit via linear combinations. Extrapolation methods of this type, the best-known of which is Aitken’s delta-squared process, require only the sequence of vectors as input.

However, limiting methods to use only the iterates is too restrictive. Accelerating sequences generated by fixed-point iterations by utilizing both the iterates and the fixed-point mapping itself has proved highly successful across various areas of physics. A notable example of these fixed-point accelerators (FP-accelerators) is a method developed by Donald Anderson in 1965 and now widely known as Anderson acceleration (AA). Furthermore, quasi-Newton and inexact Newton methods can also be placed in this category since they can be invoked to find limits of fixed-point iteration sequences by employing exactly the same ingredients as those of the FP-accelerators.

This paper presents an overview of these methods – with an emphasis on those, such as AA, that are geared toward accelerating fixed-point iterations. We will navigate through existing variants of accelerators, their implementations and their applications, to unravel the close connections between them. These connections were often not recognized by the originators of certain methods, who sometimes stumbled on slight variations of already established ideas. Furthermore, even though new accelerators were invented in different corners of science, the underlying principles behind them are strikingly similar or identical.

The plan of this article will approximately follow the historical trajectory of extrapolation and acceleration methods, beginning with a brief description of extrapolation ideas, followed by the special case of linear systems, the application to self-consistent field (SCF) iterations, and a detailed view of Anderson acceleration. The last part of the paper is concerned with more recent developments, including theoretical aspects, and a few thoughts on accelerating machine learning algorithms.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

References

Aitken, A. (1926), On Bernoulli’s numerical solution of algebraic equations, Proc. Roy. Soc. Edinburgh 46, 289305.10.1017/S0370164600022070CrossRefGoogle Scholar
Anderson, D. G. (1965), Iterative procedures for non-linear integral equations, Assoc. Comput. Mach. 12(547), 547560.CrossRefGoogle Scholar
Axelsson, O. (1980), Conjugate gradient type-methods for unsymmetric and inconsistent systems of linear equations, Linear Algebra Appl. 29, 116.10.1016/0024-3795(80)90226-8CrossRefGoogle Scholar
Blair, A., Metropolis, N., von Neumann, J., Taub, A. H. and Tsingou, M. (1959), A study of a numerical solution to a two-dimensional hydrodynamical problem, Math. Comp. 13, 145184.10.1090/S0025-5718-1959-0108885-8CrossRefGoogle Scholar
Bottou, L. and Le Cun, Y. (2005), On-line learning for very large data sets, Appl. Stoch. Models Bus. Ind. 21, 137151.10.1002/asmb.538CrossRefGoogle Scholar
Bottou, L., Curtis, F. and Nocedal, J. (2018), Optimization methods for large-scale machine learning, SIAM Rev. 60, 223311.10.1137/16M1080173CrossRefGoogle Scholar
Brezinski, C. (1975), Généralisation de la transformation de Shanks, de la table de Padé et de l’ɛ-algorithme, Calcolo 12, 317360.10.1007/BF02575753CrossRefGoogle Scholar
Brezinski, C. (1977), Accélération de la Convergence en Analyse Numérique, Vol. 584 of Lecture Notes in Mathematics, Springer.10.1007/BFb0089363CrossRefGoogle Scholar
Brezinski, C. (1980), Padé-Type Approximation and General Orthogonal Polynomials, Birkhäuser.10.1007/978-3-0348-6558-6CrossRefGoogle Scholar
Brezinski, C. (2000), Convergence acceleration during the 20th century, J. Comput. Appl. Math. 122, 121.10.1016/S0377-0427(00)00360-5CrossRefGoogle Scholar
Brezinski, C. and Redivo-Zaglia, M. (1991), Extrapolation Methods: Theory and Practice, North-Holland.Google Scholar
Brezinski, C. and Redivo-Zaglia, M. (2019), The genesis and early developments of Aitken’s process, Shanks’ transformation, the -algorithm, and related fixed point methods, Numer. Algorithms 80, 11133.CrossRefGoogle Scholar
Brezinski, C. and Redivo-Zaglia, M. (2020), Extrapolation and Rational Approximation: The Works of the Main Contributors, Springer.10.1007/978-3-030-58418-4CrossRefGoogle Scholar
Brezinski, C., Cipolla, S., Redivo-Zaglia, M. and Saad, Y. (2022), Shanks and Anderson-type acceleration techniques for systems of nonlinear equations, IMA J. Numer. Anal. 42, 30583093.10.1093/imanum/drab061CrossRefGoogle Scholar
Brezinski, C., Redivo-Zaglia, M. and Salam, A. (2023), On the kernel of vector -algorithm and related topics, Numer. Algorithms 92, 207221.10.1007/s11075-022-01358-zCrossRefGoogle Scholar
Brown, P. N. and Saad, Y. (1990), Hybrid Krylov methods for nonlinear systems of equations, SIAM J. Sci. Statist. Comput. 11, 450481.CrossRefGoogle Scholar
Brown, P. N. and Saad, Y. (1994), Convergence theory of nonlinear Newton–Krylov algorithms, SIAM J. Optim. 4, 297330.CrossRefGoogle Scholar
Bubeck, S. (2015), Convex optimization: Algorithms and complexity, Found . Trends Mach. Learn. 8, 231357.10.1561/2200000050CrossRefGoogle Scholar
Cabay, S. and Jackson, L. W. (1976), A polynomial extrapolation method for finding limits and antilimits of vector sequences, SIAM J. Numer. Anal. 13, 734752.10.1137/0713060CrossRefGoogle Scholar
Cauchy, A. L. (1847), Methode générale pour la résolution des systèmes d’équations simultanées, Comp . Rend. Acad. Sci. Paris 25, 536538.Google Scholar
Chupin, M., Dupuy, M.-S., Legendre, G. and Séré, E. (2021), Convergence analysis of adaptive DIIS algorithms with application to electronic ground state calculations, ESAIM Math. Model. Numer. Anal. 55, 27852825.10.1051/m2an/2021069CrossRefGoogle Scholar
Degroote, J., Bathe, K.-J. and Vierendeels, J. (2009), Performance of a new partitioned procedure versus a monolithic procedure in fluid–structure interaction, Computers & Structures 87(11), 793801.CrossRefGoogle Scholar
Dembo, R. S., Eisenstat, S. C. and Steihaug, T. (1982), Inexact Newton methods, SIAM J. Numer. Anal. 18, 400408.10.1137/0719025CrossRefGoogle Scholar
Dirac, P. A. M. (1929), Quantum mechanics of many-electron systems, Proc. R. Soc. Lond. A123, 714733.Google Scholar
Eddy, R. P. (1979), Extrapolation to the limit of a vector sequence, in Information Linkage Between Applied Mathematics and Industry (Wang, P. C. C., ed.), Academic Press, pp. 387396.10.1016/B978-0-12-734250-4.50028-XCrossRefGoogle Scholar
Eddy, R. P. and Wang, P. C. C. (1979), Extrapolating to the limit of a vector sequence, in Information Linkage between Applied Mathematics and Industry, Academic Press, pp. 387396.CrossRefGoogle Scholar
Eisenstat, S. C. and Walker, H. F. (1994), Globally convergent inexact Newton methods, SIAM J. Optim. 4, 393422.10.1137/0804022CrossRefGoogle Scholar
Eisenstat, S. C., Elman, H. C. and Schultz, M. H. (1983), Variational iterative methods for nonsymmetric systems of linear equations, SIAM J. Numer. Anal. 20, 345357.10.1137/0720023CrossRefGoogle Scholar
Evans, C., Pollock, S., Rebholz, L. G. and Xiao, M. (2020), A proof that Anderson acceleration improves the convergence rate in linearly converging fixed-point methods (but not in those converging quadratically), SIAM J. Numer. Anal. 58, 788810.CrossRefGoogle Scholar
Eyert, V. (1996), A comparative study on methods for convergence acceleration of iterative vector sequences, J. Comput. Phys. 124, 271285.CrossRefGoogle Scholar
Fang, H. and Saad, Y. (2009), Two classes of multisecant methods for nonlinear acceleration, Numer. Linear Algebra Appl. 16, 197221.10.1002/nla.617CrossRefGoogle Scholar
Forsythe, G. E. (1951), Gauss to Gerling on relaxation, Mathematical Tables and Other Aids to Computation 5, 255258.Google Scholar
Forsythe, G. E. (1953), Solving linear algebraic equations can be interesting, Bull. Amer. Math. Soc. 59, 299329.10.1090/S0002-9904-1953-09718-XCrossRefGoogle Scholar
Gamov, G. (1966), Thirty Years That Shook Physics: The Story of Quantum Theory, Dover.Google Scholar
Germain-Bonne, B. (1978), Estimation de la limite de suites et formalisation de procédés d’accélération de convergence. PhD thesis, Université des Sciences et Techniques de Lille.Google Scholar
Golub, G. H. and Van Loan, C. F. (2013), Matrix Computations, fourth edition, Johns Hopkins University Press.10.56021/9781421407944CrossRefGoogle Scholar
Golub, G. H. and Varga, R. S. (1961), Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods, Numer. Math. 3, 157168.10.1007/BF01386014CrossRefGoogle Scholar
Gower, R. M., Loizou, N., Qian, X., Sailanbayev, A., Shulgin, E. and Richtárik, P. (2019), SGD: General analysis and improved rates, in International Conference on Machine Learning, PMLR, pp. 52005209.Google Scholar
Haelterman, R., Degroote, J., Heule, D. V. and Vierendeels, J. (2010), On the similarities between the quasi-Newton inverse least squares method and GMRES, SIAM J. Numer. Anal. 47, 46604679.10.1137/090750354CrossRefGoogle Scholar
Hardt, M., Recht, B. and Singer, Y. (2016), Train faster, generalize better: Stability of stochastic gradient descent, in Proceedings of The 33rd International Conference on Machine Learning (Balcan, M. F. and Weinberger, K. Q., eds), Vol. 48 of Proceedings of Machine Learning Research, PMLR, pp. 12251234.Google Scholar
He, H., Tang, Z., Zhao, S., Saad, Y. and Xi, Y. (2024), nlTGCR: A class of nonlinear acceleration procedures based on conjugate residuals, SIAM J. Matrix Anal. Appl. 45, 712743.10.1137/23M1576360CrossRefGoogle Scholar
Hestenes, M. R. and Stiefel, E. L. (1952), Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Standards 49, 409436.10.6028/jres.049.044CrossRefGoogle Scholar
Hestenes, M. R. and Todd, J. (1991), Mathematicians learning to use computers. NIST Special Publication 730, NBS-INA – The Institute for Numerical Analysis – UCLA 1947–1954.Google Scholar
Higham, N. J. and Strabić, N. (2016), Anderson acceleration of the alternating projections method for computing the nearest correlation matrix, Numer. Algorithms 72, 10211042.10.1007/s11075-015-0078-3CrossRefGoogle Scholar
Jacobsen, J. and Schmitt, K. (2002), The Liouville–Bratu–Gelfand problem for radial operators, J. Differential Equations 184, 283298.10.1006/jdeq.2001.4151CrossRefGoogle Scholar
Jbilou, K. (1988), Méthodes d’extrapolation et de projection: Applications aux suites de vecteurs. PhD thesis, Université des Sciences et Techniques de Lille.Google Scholar
Jbilou, K. and Sadok, H. (1991), Some results about vector extrapolation methods and related fixed point iteration, J. Comput. Appl. Math. 36, 385398.10.1016/0377-0427(91)90018-FCrossRefGoogle Scholar
Jbilou, K. and Sadok, H. (2000), Vector extrapolation methods: Application and numerical comparison, J. Comput. Appl. Math 122, 149165.10.1016/S0377-0427(00)00357-5CrossRefGoogle Scholar
Jea, K. C. and Young, D. M. (1980), Generalized conjugate gradient acceleration of nonsymmetrizable iterative methods, Linear Algebra Appl. 34, 159194.Google Scholar
Kaniel, S. and Stein, J. (1974), Least-square acceleration of iterative methods for linear equations, J. Optim. Theory Appl. 14, 431437.10.1007/BF00933309CrossRefGoogle Scholar
Kelley, C. T. (1995), Iterative Methods for Linear and Nonlinear Equations, Vol. 16 of Frontiers and Applied Mathematics, SIAM.10.1137/1.9781611970944CrossRefGoogle Scholar
Kerkhoven, T. and Saad, Y. (1992), Acceleration techniques for decoupling algorithms in semiconductor simulation, Numer. Math. 60, 525548.10.1007/BF01385735CrossRefGoogle Scholar
Kingma, D. P. and Ba, J. (2015), Adam: A method for stochastic optimization, in 3rd International Conference on Learning Representations (ICLR 2015). Conference Track Proceedings.Google Scholar
Kittel, C. (1986), Introduction to Solid State Physics, Wiley.Google Scholar
Kohn, W. and Sham, L. J. (1965), Self-consistent equations including exchange and correlation effects, Phys. Rev. 140, A1133A1138.10.1103/PhysRev.140.A1133CrossRefGoogle Scholar
Lanczos, C. (1950), An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, J. Res. Nat. Bur. Standards 45, 255282.10.6028/jres.045.026CrossRefGoogle Scholar
Lanczos, C. (1952), Solution of systems of linear equations by minimized iterations, J. Res. Nat. Bur. Standards 49, 3353.10.6028/jres.049.006CrossRefGoogle Scholar
Li, H., Xu, Z., Taylor, G., Studer, C. and Goldstein, T. (2018), Visualizing the loss landscape of neural nets, in Advances in Neural Information Processing Systems 31 (Bengio, S. et al., eds), Curran Associates, pp. 63916401.Google Scholar
Mešina, M. (1977), Convergence acceleration for the iterative solution of the equations , Comput . Methods Appl. Mech. Engrg 10, 165173.10.1016/0045-7825(77)90004-4CrossRefGoogle Scholar
Meurant, G. and Tebbens, J. D. (2020), Krylov Methods for Nonsymmetric Linear Systems: From Theory to Computations, Vol. 57 of Springer Series in Computational Mathematics, Springer.10.1007/978-3-030-55251-0CrossRefGoogle Scholar
Meyer, L., Barrett, C. and Haasen, P. (1964), New crystalline phase in solid argon and its solid solutions, J. Chem. Phys. 40, 27442745.10.1063/1.1725600CrossRefGoogle Scholar
Mohsen, A. (2014), A simple solution of the Bratu problem, Comput. Math. Appl. 67, 2633.10.1016/j.camwa.2013.10.003CrossRefGoogle Scholar
Murphy, K. P. (2022), Probabilistic Machine Learning: An Introduction, MIT Press.Google Scholar
Nesterov, Y. (2014), Introductory Lectures on Convex Optimization: A Basic Course, first edition, Springer.Google Scholar
Neyshabur, B., Tomioka, R. and Srebro, N. (2015), In search of the real inductive bias: On the role of implicit regularization in deep learning, in 3rd International Conference on Learning Representations (ICLR) , Workshop Track Proceedings (Bengio, Y. and LeCun, Y., eds).Google Scholar
Ortega, J. M. and Rheinbolt, W. C. (1970), Iterative Solution of Nonlinear Equations in Several Variables, Academic Press.Google Scholar
Paige, C. C. (1971), The computation of eigenvalues and eigenvectors of very large sparse matrices. PhD thesis, Institute of Computer Science, University of London.Google Scholar
Paige, C. C. (1980), Accuracy and effectiveness of the Lanczos algorithm for the symmetric eigenproblem, Linear Algebra Appl. 34, 235258.10.1016/0024-3795(80)90167-6CrossRefGoogle Scholar
Pasini, M. L., Yin, J., Reshniak, V. and Stoyanov, M. (2021), Stable Anderson acceleration for deep learning. Available at https://dblp.org/rec/journals/corr/abs-2110-14813.bib.Google Scholar
Pasini, M. L., Yin, J., Reshniak, V. and Stoyanov, M. K. (2022), Anderson acceleration for distributed training of deep learning models, in SoutheastCon 2022, IEEE, pp. 289295.10.1109/SoutheastCon48659.2022.9763953CrossRefGoogle Scholar
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L. et al. (2019), PyTorch: An imperative style, high-performance deep learning library, in Advances in Neural Information Processing Systems 32 (Wallach, H. et al., eds), Curran Associates.Google Scholar
Petrova, S. S. and Solov’ev, A. D. (1997), The origin of the method of steepest descent, Hist. Math. 24, 361375.10.1006/hmat.1996.2146CrossRefGoogle Scholar
Pollock, S. and Rebholz, L. G. (2021), Anderson acceleration for contractive and noncontractive operators, IMA J. Numer. Anal. 41, 28412872.10.1093/imanum/draa095CrossRefGoogle Scholar
Pugachëv, B. (1977), Acceleration of the convergence of iterative processes and a method of solving systems of non-linear equations, USSR Comput. Math. Math. Phys. 17, 199207.10.1016/0041-5553(77)90023-4CrossRefGoogle Scholar
Pulay, P. (1980), Convergence acceleration of iterative sequences. the case of SCF iteration, Chem. Phys. Lett. 73, 393398.10.1016/0009-2614(80)80396-4CrossRefGoogle Scholar
Pulay, P. (1982), Improved SCF convergence acceleration, J. Comput. Chem. 3, 556560.10.1002/jcc.540030413CrossRefGoogle Scholar
Ramière, I. and Helfer, T. (2015), Iterative residual-based vector methods to accelerate fixed point iterations, Comput. Math. Appl. 70, 22102226.10.1016/j.camwa.2015.08.025CrossRefGoogle Scholar
Reid, J. K. (1971), On the method of conjugate gradients for the solution of large sparse systems of linear equations, in Large Sparse Sets of Linear Equations (J. K. Reid, ed.), Academic Press, pp. 231254.Google Scholar
Richardson, L. F. (1910), The approximate arithmetical solution by finite differences of physical problems involving differential equations with an application to the stresses to a masonry dam, Philos . Trans. Roy. Soc. A 210, 307357.Google Scholar
Robbins, H. and Monro, S. (1951), A stochastic approximation method, Ann. Math. Statist. pp. 400407.10.1214/aoms/1177729586CrossRefGoogle Scholar
Rohwedder, T. (2010), An analysis for some methods and algorithms of quantum chemistry. PhD thesis, Technische Universität Berlin, Fakultät II - Mathematik und Naturwissenschaften.Google Scholar
Romberg, W. (1955), Vereinfachte numerische Integration, Norske Vid. Selsk. Forh. 28, 3036.Google Scholar
Saad, Y. (2003), Iterative Methods for Sparse Linear Systems, second edition, SIAM.10.1137/1.9780898718003CrossRefGoogle Scholar
Saad, Y. (2011), Numerical Methods for Large Eigenvalue Problems , Vol. 66 of Classics in Applied Mathematics, SIAM.Google Scholar
Saad, Y. (2022), The origin and development of Krylov subspace methods, Comput. Sci. Engrg 24, 2839.CrossRefGoogle Scholar
Saad, Y. and Schultz, M. H. (1986), GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Statist. Comput. 7, 856869.10.1137/0907058CrossRefGoogle Scholar
Saad, Y., Chelikowsky, J. and Shontz, S. (2009), Numerical methods for electronic structure calculations of materials, SIAM Rev. 52, 354.10.1137/060651653CrossRefGoogle Scholar
Shanks, D. (1955), Non-linear transformations of divergent and slowly convergent sequences, J. Math. Phys. 34, 142.10.1002/sapm19553411CrossRefGoogle Scholar
Sheldon, J. W. (1955), On the numerical solution of elliptic difference equations, Mathematical Tables and Other Aids to Computation 9, 101112.10.2307/2002066CrossRefGoogle Scholar
Shi, W., Song, S., Wu, H., Hsu, Y.-C., Wu, C. and Huang, G. (2019), Regularized Anderson acceleration for off-policy deep reinforcement learning, in Advances in Neural Information Processing Systems 32 (Wallach, H. et al., eds), Curran Associates.Google Scholar
Shortley, G. H. (1953), Use of Tschebyscheff-polynomial operators in the solution of boundary value problems, J. Appl. Phys. 24, 392396.10.1063/1.1721292CrossRefGoogle Scholar
Sidi, A. (2012), Review of two vector extrapolation methods of polynomial type with applications to large-scale problems, J. Comput. Sci. 3, 92101.10.1016/j.jocs.2011.01.005CrossRefGoogle Scholar
Sidi, A., Ford, W. F. and Smith, D. A. (1986), Acceleration of convergence of vector sequences, SIAM J. Numer. Anal. 23, 178196.10.1137/0723013CrossRefGoogle Scholar
Smith, D. A., Ford, W. F. and Sidi, A. (1987), Extrapolation methods for vector sequences, SIAM Rev. 29, 199233.10.1137/1029042CrossRefGoogle Scholar
Sun, K., Wang, Y., Liu, Y., Pan, B., Jui, S., Jiang, B., Kong, L. et al. (2021), Damped Anderson mixing for deep reinforcement learning: Acceleration, convergence, and stabilization, in Advances in Neural Information Processing Systems 34 (Ranzato, M. et al., eds), Curran Associates, pp. 37323743.Google Scholar
Tang, Z., Xu, T., He, H., Saad, Y. and Xi, Y. (2024), Anderson acceleration with truncated Gram–Schmidt, SIAM J. Matrix Anal. Appl. 45, 18501872.10.1137/24M1648600CrossRefGoogle Scholar
Toth, A. and Kelley, C. T. (2015), Convergence analysis for Anderson acceleration, SIAM J. Numer. Anal. 53, 805819.CrossRefGoogle Scholar
Vanderbilt, D. and Louie, S. G. (1984), Total energies of diamond (111) surface reconstructions by a linear combination of atomic orbitals method, Phys. Rev. B 30, 61186130.10.1103/PhysRevB.30.6118CrossRefGoogle Scholar
Vinsome, P. K. W. (1976), ORTHOMIN: An iterative method for solving sparse sets of simultaneous linear equations, in Proceedings of the Fourth Symposium on Reservoir Simulation, Society of Petroleum Engineers of AIME, pp. 149159.Google Scholar
Walker, H. F. and Ni, P. (2011), Anderson acceleration for fixed-point iterations, SIAM J. Numer. Anal. 49, 17151735.10.1137/10078356XCrossRefGoogle Scholar
Wu, L., Zhu, Z. and W. E (2017), Towards understanding generalization of deep learning: Perspective of loss landscapes. Available at https://dblp.org/rec/journals/corr/WuZE17.bib.Google Scholar
Wynn, P. (1956), On a device for computing the transformation, Mathematical Tables and Other Aids to Computation 10, 9196.10.2307/2002183CrossRefGoogle Scholar
Wynn, P. (1962), Acceleration techniques for iterated vector and matrix problems, Math. Comp. 16, 301322.10.1090/S0025-5718-1962-0145647-XCrossRefGoogle Scholar
Young, D. (1954), On Richardson’s method for solving linear systems with positive definite matrices, J. Math. Phys. 32, 243255.10.1002/sapm1953321243CrossRefGoogle Scholar
Zhang, C., Bengio, S., Hardt, M., Recht, B. and Vinyals, O. (2021), Understanding deep learning (still) requires rethinking generalization, Commun . Assoc. Comput. Mach. 64, 107115.Google Scholar
Zhang, J., O’Donoghue, B. and Boyd, S. (2020), Globally convergent type-I Anderson acceleration for nonsmooth fixed-point iterations, SIAM J. Optim. 30, 31703197.10.1137/18M1232772CrossRefGoogle Scholar
Zhou, P., Feng, J., Ma, C., Xiong, C., Hoi, S. C. H. and W. E (2020), Towards theoretically understanding why SGD generalizes better than Adam in deep learning, in Advances in Neural Information Processing Systems 33 (Larochelle, H. et al., eds), Curran Associates, pp. 2128521296.Google Scholar