Hostname: page-component-54dcc4c588-r5qjk Total loading time: 0 Render date: 2025-10-02T17:04:28.474Z Has data issue: false hasContentIssue false

Large and moderate deviations for Gaussian neural networks

Published online by Cambridge University Press:  01 October 2025

Claudio Macci*
Affiliation:
Università di Roma Tor Vergata
Barbara Pacchiarotti*
Affiliation:
Università di Roma Tor Vergata
Giovanni Luca Torrisi*
Affiliation:
Consiglio Nazionale delle Ricerche
*
*Postal address: Dipartimento di Matematica, Università di Roma Tor Vergata, Via della Ricerca Scientifica, I-00133 Rome, Italy.
*Postal address: Dipartimento di Matematica, Università di Roma Tor Vergata, Via della Ricerca Scientifica, I-00133 Rome, Italy.
****Postal address: Istituto per le Applicazioni del Calcolo (IAC), Consiglio Nazionale delle Ricerche, Via dei Taurini 19, I-00185 Rome, Italy. Email: giovanniluca.torrisi@cnr.it

Abstract

We prove large and moderate deviations for the output of Gaussian fully connected neural networks. The main achievements concern deep neural networks (i.e. when the model has more than one hidden layer) and hold for bounded and continuous pre-activation functions. However, for deep neural networks fed by a single input, we have results even if the pre-activation is ReLU. When the network is shallow (i.e. there is exactly one hidden layer), the large and moderate principles hold for quite general pre-activation functions.

Information

Type
Original Article
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Apollonio, N., De Canditiis, D., Franzina, G., Stolfi, P. and Torrisi, G. L. (2024). Normal approximation of random Gaussian neural networks. Stoch. Syst. 15, 88110.10.1287/stsy.2023.0033CrossRefGoogle Scholar
Balasubramanian, K., Goldstein, L., Ross, N. and Salim, A. (2024). Gaussian random field approximation via Stein’s method with applications to wide random neural networks. Appl. Comput. Harmonic Anal. 72, 101668.10.1016/j.acha.2024.101668CrossRefGoogle Scholar
Basteri, A. and Trevisan, D. (2024). Quantitative Gaussian approximation of randomly initialized deep neural networks. Mach. Learn. 113, 131.10.1007/s10994-024-06578-zCrossRefGoogle Scholar
Bathia, R. (1997). Matrix Analysis (Grad. Texts Math. 169). Springer, Berlin.Google Scholar
Bordino, A., Favaro, S. and Fortini, S. (2024). Non-asymptotic approximations of Gaussian neural networks via second-order Poincaré inequalities. Prog. Mach. Learn. Res. 253, 4578.Google Scholar
Braun, A., Kohler, M., Langer, S. and Walk, H. (2024). Convergence rates for shallow neural networks learned by gradient descent. Bernoulli 30, 475502.10.3150/23-BEJ1605CrossRefGoogle Scholar
Cammarota, V., Marinucci, D., Salvi, M. and Vigogna, S. (2024). A quantitative functional central limit theorem for shallow neural networks. Mod. Stoch. Theory Appl. 11, 85108.10.15559/23-VMSTA238CrossRefGoogle Scholar
Chaganty, N. R. (1997). Large deviations for joint distributions and statistical applications. Sankhyā A 59, 147166.Google Scholar
Dembo, A. and Zeitouni, O. (2010). Large Deviations Techniques and Applications (Stoch. Model. Appl. Prob. 38). Springer, Berlin.Google Scholar
Eldan, R., Mikulincer, D. and Schramm, T. (2021). Non-asymptotic approximations of neural networks by Gaussian processes. Proc. Mach. Learn. Res. 134, 17541775.Google Scholar
Favaro, S., Fortini, S. and Peluchetti, S. (2023). Deep stable neural networks: Large-width asymptotics and convergence rates. Bernoulli 29, 25742597.10.3150/22-BEJ1553CrossRefGoogle Scholar
Favaro, S., Hanin, B., Marinucci, D., Nourdin, I. and Peccati, G. (2025). Quantitative CLTs in deep neural networks. Prob. Theory Relat. Fields 191, 933977. https://doi.org/10.1007/s00440-025-01360-1.CrossRefGoogle Scholar
Giuliano, R., Macci, C. and Pacchiarotti, B. (2024). Asymptotic results for sums and extremes. J. Appl. Prob. 61, 11531171.10.1017/jpr.2023.118CrossRefGoogle Scholar
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y. (2014). Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27, 26722680.Google Scholar
Hanin, B. (2023). Random neural networks in the infinite width limit as Gaussian processes. Ann. Appl. Prob. 33, 47984819.10.1214/23-AAP1933CrossRefGoogle Scholar
Hanin, B. (2024). Random fully connected neural networks as perturbatively solvable hierarchies. J. Mach. Learn. Res. 25, 158.Google Scholar
Hirsch, C. and Willhalm, D. (2024). Large deviations of one-hidden-layer neural networks. Stoch. Dyn. 8, 2550002.10.1142/S0219493725500029CrossRefGoogle Scholar
Jung, P., Lee, H., Lee, J. and Yang, H. $\alpha$ -stable convergence of heavy-/light-tailed infinitely wide neural networks. Adv. Appl. Prob. 55, 14151441.Google Scholar
LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep learning. Nature 521, 436444.10.1038/nature14539CrossRefGoogle ScholarPubMed
Li, B. and Saad, D. (2020). Large deviation analysis of function sensitivity in random deep neural networks. J. Phys. A 53, 104002.10.1088/1751-8121/ab6a6fCrossRefGoogle Scholar
Neal, R. M. (1996). Priors for infinite networks. In Bayesian Learning for Neural Networks (Lect. Notes Statist. 118), ed. Neal, R. M.. Springer, New York, pp. 2953.10.1007/978-1-4612-0745-0_2CrossRefGoogle Scholar
Roberts, D. A., Yaida, S. and Hanin, B. (2022). The Principles of Deep Learning Theory. Cambridge University Press.10.1017/9781009023405CrossRefGoogle Scholar
Saulis, L. and Statulevičius, V. A. (1991). Limit Theorems for Large Deviations (Math. Appl. (Sov. Ser.) 73). Kluwer, Dordrecht.Google Scholar
Sirignano, J. and Spiliopoulos, K. (2020). Mean field analysis of neural networks: A law of large numbers. SIAM J. Appl. Math. 80, 725752.10.1137/18M1192184CrossRefGoogle Scholar
Vogel, Q. (2024). Large deviations of Gaussian neural networks with ReLU activation. Preprint, arXiv:2405.16958.Google Scholar
Zavatone-Veth, J. A. and Pehlevan, C. (2021). Exact marginal prior distributions of finite Bayesian neural networks. In Advances in Neural Information Processing Systems 34, eds. M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang and J. Wortman Vaughan. Curran Associates, Red Hook, NY, pp. 33643375.Google Scholar