Hostname: page-component-65b85459fc-cljkw Total loading time: 0 Render date: 2025-10-18T17:24:39.577Z Has data issue: false hasContentIssue false

Machine-learning closure for Vlasov–Poisson dynamics in Fourier–Hermite space

Published online by Cambridge University Press:  16 October 2025

Nathaniel Barbour*
Affiliation:
Department of Physics, University of Maryland, College Park, MD 20742, USA Institute for Research in Electronics and Applied Physics, University of Maryland, College Park, MD 20742, USA
William Dorland
Affiliation:
Department of Physics, University of Maryland, College Park, MD 20742, USA Institute for Research in Electronics and Applied Physics, University of Maryland, College Park, MD 20742, USA
Ian G. Abel
Affiliation:
Institute for Research in Electronics and Applied Physics, University of Maryland, College Park, MD 20742, USA
Matt Landreman
Affiliation:
Institute for Research in Electronics and Applied Physics, University of Maryland, College Park, MD 20742, USA
*
Corresponding author: Nathaniel Barbour, barbour@umd.edu

Abstract

Accurate reduced models of turbulence are desirable to facilitate the optimisation of magnetic-confinement fusion reactor designs. As a first step towards higher-dimensional turbulence applications, we use reservoir computing, a machine-learning (ML) architecture, to develop a closure model for a limiting case of electrostatic gyrokinetics. We implement a pseudo-spectral Eulerian code to solve the one-dimensional Vlasov–Poisson system on a basis of Fourier modes in configuration space and Hermite polynomials in velocity space. When cast onto the Hermite basis, the Vlasov equation becomes an infinitely coupled hierarchy of fluid moments, presenting a closure problem. We exploit the locality of interactions in the Hermite representation to introduce an ML closure model of the small-scale dynamics in velocity space. In the linear limit, when the kinetic Fourier–Hermite solver is augmented with the reservoir closure, the closure permits a reduction of the velocity resolution, with a relative error within 2 % for the Hermite moment where the reservoir closes the hierarchy. In the strongly nonlinear regime, the ML closure model more accurately resolves the low-order Fourier and Hermite spectra when compared with a naive closure by truncation and reduces the required velocity resolution by a factor of 16.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

1. Introduction

One of the most important ongoing efforts in the pursuit of magnetic fusion energy is the study of microturbulence (Yoshida et al. Reference Yoshida2025). While magnetic-confinement fusion plasmas are designed to be stable to macroscopic magnetohydrodynamic perturbations, the steep gradients in temperature and density that are present in all current designs also drive small-scale instabilities. These instabilities drive sustained turbulence, which then dominates the transport of particles, energy and momentum in the plasma.

As the fusion industry turns its focus toward designing first-of-a-kind pilot plants, optimising these designs to mitigate turbulent heat losses is an urgent objective. Significant progress in turbulence optimisation for tokamaks (Highcock et al. Reference Highcock, Mandell, Barnes and Dorland2018) and stellarators (Roberg-Clark et al. Reference Roberg-Clark, Plunk, Xanthopoulos, Nührenberg, Henneberg and Smith2023; Kim et al. Reference Kim2024) has been achieved. However, the space of candidate magnetic equilibria remains vast, and high-resolution turbulence simulations are computationally expensive. A unifying feature of fluid (Kraichnan Reference Kraichnan1959; Kolmogorov Reference Kolmogorov1991), magnetohydrodynamic (Goldreich & Sridhar Reference Goldreich and Sridhar1995) and gyrokinetic (Bañón Navarro et al. Reference Bañón Navarro, Morel, Albrecht-Marc, Carati, Merz, Görler and Jenko2011) turbulence theory is that free energy tends to cascade from large scales to small scales through local interactions in phase space, and these cascades are well described by power laws.Footnote 1 Therefore, models that can accurately calculate large-scale quantities, including spectra and heat fluxes, without needing to resolve small scales, are both desirable and plausible.

In the field of fluid turbulence modelling, two successful approaches to constructing reduced, lower-resolution models that capture large-scale behaviour are Reynolds-averaged Navier–Stokes (Reynolds Reference Reynolds1895) and large-eddy simulation (Deardorff Reference Deardorff1970). Large-eddy simulation also has been used to build reduced models of gyrokinetic turbulence (Bañón Navarro et al. Reference Bañón Navarro, Teaca, Jenko, Hammett and Happel2014). These formalisms accelerate turbulence simulations by averaging over time and filtering out the small-scale dynamics. Closure models provide another pathway to filter out the small scales while retaining accurate large-scale statistics. In kinetic plasma physics, Hammett & Perkins (Reference Hammett and Perkins1990) developed a landmark analytic linear closure for the low-frequency response to small Langmuir oscillations.

An alternative method for building reduced models is to use data-driven, statistical techniques like machine learning (ML) and Bayesian optimisation. In recent years, the proliferation of ML models has led to advancements in pattern identification and model construction for magnetic fusion. Some example applications include improved prediction and avoidance of tokamak disruptions using the DECAF framework (Piccione et al. Reference Piccione, Berkery, Sabbagh and Andreopoulos2020) and on the DIII-D tokamak (Rea et al. Reference Rea, Montes, Erickson, Granetz and Tinguely2019; Fu et al. Reference Fu, Eldon, Erickson, Kleiwegt, Lupin-Jimenez, Boyer, Eidietis, Barbour, Izacard and Kolemen2020), improved active-feedback plasma control systems with reinforcement learning (Degrave et al. Reference Degrave2022; Seo et al. Reference Seo, Kim, Jalalvand, Conlin, Rothstein, Abbate, Erickson, Wai, Shousha and Kolemen2024; Kerboua-Benlarbi et al. Reference Kerboua-Benlarbi, Nouailletas, Faugeras, Nardon and Moreau2024), optimised execution of gyrokinetic simulations with the PORTALS framework (Rodriguez-Fernandez, Howard & Candy Reference Rodriguez-Fernandez, Howard and Candy2022) and accelerated, low-resolution turbulence simulations (Greif, Jenko & Thuerey Reference Greif, Jenko and Thuerey2023; Castagna et al. Reference Castagna, Schiavello, Zanisi and Williams2024; Clavier et al. Reference Clavier, Zarzoso, del-Castillo-Negrete and Frénod2025). Data-driven, configuration space (Ma et al. Reference Ma, Zhu, Xu and Wang2020; Qin et al. Reference Qin, Ma, Jiang, Dong, Fu, Cheng and jin2023) and heat flux (Ingelsten et al. Reference Ingelsten, McGrae-Menge, Alves and Pusztai2024; Huang, Dong & Wang Reference Huang, Dong and Wang2025) closure models of Landau damping have also been developed.

In this paper, we choose to use the one-dimensional, collisionless Vlasov–Poisson system as a test problem for implementing an ML closure in velocity space. This system is the one-dimensional, electrostatic limit of the full five-dimensional gyrokinetic system (Parker & Dellar Reference Parker and Dellar2015). It models dynamics parallel to the magnetic field, and it contains a quadratic nonlinearity present in the full gyrokinetic system. These features make the Vlasov–Poisson system a suitable foundation to assess the future potential for a closure model to be developed for gyrokinetic turbulence.

In parallel velocity space, we use properties of the Hermite basis to cast the objective of reducing resolution requirements as a closure problem. The Hermite basis has been used extensively to solve problems in kinetic plasma dynamics (Armstrong Reference Armstrong1967; Grant & Feix Reference Grant and Feix1967; Joyce, Knorr & Meier Reference Joyce, Knorr and Meier1971; Gibelli & Shizgal Reference Gibelli and Shizgal2006; Zocco & Schekochihin Reference Zocco and Schekochihin2011; Loureiro, Schekochihin & Zocco Reference Loureiro, Schekochihin and Zocco2013; Kanekar et al. Reference Kanekar, Schekochihin, Dorland and Loureiro2015; Parker & Dellar Reference Parker and Dellar2015; Jorge, Ricci & Loureiro Reference Jorge, Ricci and Loureiro2017; Adkins & Schekochihin Reference Adkins and Schekochihin2018; Mandell, Dorland & Landreman Reference Mandell, Dorland and Landreman2018; Frei et al. Reference Frei, Hoffmann, Ricci, Brunner and Tecchioll2023). In this work, we opt to focus on velocity space because the Hermite basis naturally expresses the interaction between scales in velocity space as local interactions in a coupled hierarchy of moments. This property of the moment formalism directly presents a single-term target for a closure. Specifically, the interaction between resolved scales and unresolved scales in velocity space appears as a single unknown term in an equation in this representation. In contrast, the spectral representation of configuration space introduces some non-local interactions across scales, complicating the development of a closure model. Additionally, the Hermite basis explicitly expresses conservation laws for particle number, momentum and energy as evolution equations for the lowest-order moments. These properties allow us to develop a closure that preserves the low-order conservation laws. More details on the closure problem are presented in § 3.5.

We use reservoir computing, an ML architecture, to implement a spectral, proof-of-concept, velocity-space closure for the moment hierarchy that maintains accurate spectra and reduces resolution requirements. The reservoir computing model maintains an internal representation of the system state, which it evolves through a directed graph with fixed weights. It then predicts the system state at time $t+1$ from the state at time $t$ using a set of output weights trained by linear regression. Since its concurrent development by Jaeger (Reference Jaeger2001) and Maass, Natschläger & Makram (Reference Maass, Natschläger and Makram2002), reservoir computing has successfully been applied to the task of forecasting the dynamics of low-dimensional chaotic systems (Lu et al. Reference Lu, Pathak, Hunt, Girvan, Brockett and Ott2017; Pathak et al. Reference Pathak, Hunt, Girvan, Lu and Ott2018). This ML paradigm is well suited to the task of modelling time-series data of physical processes. In climate physics, Arcomano et al. (Reference Arcomano, Szunyogh, Pathak, Wikner, Hunt and Ott2020) built a surrogate global atmospheric forecast model with reservoir computing, and in plasma physics, Jalalvand et al. (Reference Jalalvand, Kaptanoglu, Garcia, Nelson, Abbate, Austin, Verdoolaege, Brunton, Heidbrink and Kolemen2022) used reservoir computing to analyse and classify Alfvén eigenmodes using DIII-D diagnostics data. In comparison with several other recurrent neural network architectures, reservoir computing has shown comparable ability to capture the statistics of chaotic systems, while requiring significantly less training time (Vlachas et al. Reference Vlachas, Pathak, Hunt, Sapsis, Girvan, Ott and Koumoutsakos2020). These results motivate our decision to use reservoir computing for the closure model.

Some ML methods, including reservoir computing, are more challenging to interpret than traditional physics models. Artificial neural network architectures are often ‘black boxes’ that generate layered, analytically intractable networks of nonlinear activation functions. Gaining insightful intuition from these networks can be daunting. We mitigate this problem by constraining the ML portion of our simulations to represent only the unresolved small scales, preserving analytic evolution equations for large scales. Our methods are analogous to a large-eddy simulation approach for phase space.

The paper has the following structure. In § 2, we present our test problem, the normalised one-dimensional Vlasov–Poisson system. We then derive the projection of the system onto the pseudo-spectral Fourier–Hermite basis in § 3. We present the closure problem in § 3.5 and introduce our ML closure model in § 4 as a solution to that problem. In § 5, we present our ML model’s predictions, and we conclude with a summary and discussion in § 6.

2. Vlasov–Poisson system

As our test problem, we study the dynamics of a one-dimensional, collisionless, electrostatic hydrogen plasma. We focus on the electron dynamics, treating the ions as a cold, neutralising background. The equations that describe this plasma are the Vlasov–Poisson system (Vlasov Reference Vlasov1968). The calculations in this paper use Gaussian units. The Vlasov equation for electrons in one dimension is

(2.1) \begin{equation} \frac {\partial f}{\partial t} + v \frac {\partial f}{\partial z} - \frac {eE}{m_e} \frac {\partial f}{\partial v} = 0, \end{equation}

where $f(z,v,t)$ is the electron distribution function, $e$ is the elementary charge and $m_e$ is the mass of an electron. Term $E(z,t)$ is the electric field, which we calculate using Poisson’s equation:

(2.2) \begin{equation} -\frac {\partial ^2\varPhi }{\partial z^2} = 4 \pi \rho , \quad E = -\frac {\partial \varPhi }{\partial z}, \end{equation}

where $\rho (z,t)$ is the total electric charge density, including both ions and electrons, and $\varPhi (z,t)$ is the electric potential. For this quasi-neutral plasma with cold ions, we can evaluate the charge density as

(2.3) \begin{equation} \rho (z,t) = e\left (n_0 - \int \text{d}v f\right ), \end{equation}

where $n_0$ is the mean number density of both species.

Plasmas with near-Maxwellian velocity distributions are of particular interest for magnetic fusion applications. Therefore, we separate the distribution function into a Maxwellian mean and time-dependent perturbations:

(2.4) \begin{equation} f(z,v,t) = F_0(v) + g(z,v,t), \end{equation}

where

(2.5) \begin{equation} F_0(v) = \left (\frac {n_0}{v_{te}}\right ) \frac {1}{\sqrt {2\pi }} \text{e}^{-v^2/2v_{te}^2} \end{equation}

is the one-dimensional Maxwellian velocity distribution for the electrons, and $g(z,v,t)$ is the fluctuating part of the distribution function. We have defined the electron thermal speed as $v_{te} = \sqrt {T_e / m_e}$ , where $T_e$ is the temperature of the electrons in units of energy. The Vlasov equation supports longitudinal Langmuir waves, whose dynamics occurs on time scales of the order of the plasma frequency:

(2.6) \begin{equation} \omega _{pe} \equiv \sqrt {\frac {4\pi n_0 e^2}{m_e}}. \end{equation}

The characteristic length scale for those oscillations is the Debye length:

(2.7) \begin{equation} \lambda _D \equiv \sqrt {\frac { T_e}{4 \pi n_0 e^2}}. \end{equation}

To support numerical integration of this system of equations, we normalise the variables into dimensionless forms, denoted with primes:

(2.8) \begin{equation} \begin{split} &t \equiv \frac {1}{\omega _{pe}}t', \quad v \equiv v_{te} v', \quad z \equiv \lambda _D z', \quad E \equiv \frac {T_e}{e\lambda _D} E', \quad f \equiv F_0 + g \equiv \frac {n_{0}}{v_t}(F'_M + g'), \\ &\rho \equiv n_0 e \rho ', \quad \varPhi \equiv \frac {T_e}{e}\varPhi ', \quad F_M' \equiv \frac {1}{\sqrt {2 \pi }} \text{e}^{-v'^2/2}, \quad \int \text{d} v' F_M'= 1. \end{split} \end{equation}

After dropping the primes for legibility, we can now write the model equations in dimensionless form:

(2.9) \begin{align} \frac {\partial f}{\partial t} + v \frac {\partial f}{\partial z} - E\frac {\partial f}{\partial v} = 0,\end{align}
(2.10) \begin{align} \frac {\partial E}{\partial z} = 1 - \int \text{d}v f.\end{align}

Finally, after splitting the distribution function into its mean and fluctuating components, we arrive at the normalised evolution equation for the fluctuating part of the electron distribution function:

(2.11) \begin{align} \frac {\partial g}{\partial t} + v\frac {\partial g}{\partial z} + vEF_M - E\frac {\partial g}{\partial v} = 0,\end{align}
(2.12) \begin{align} \frac {\partial E}{\partial z} = -\int \text{d} v \, g. \end{align}

3. Pseudo-spectral methods

We solve the Vlasov–Poisson system (2.11)–(2.12) using an orthonormal spectral basis by projecting the velocity-space dependence onto a Hermite polynomial basis and the configuration-space dependence onto Fourier harmonics. We choose this approach because pseudo-spectral methods are often more efficient than pure finite-difference algorithms (Boyd Reference Boyd2000), and as mentioned in the introduction, there is an extensive history of the application of the Fourier–Hermite basis to solve problems in kinetic plasma physics.

3.1. Hermite moment formalism

To discretise velocity space, we impose a Hermite basis representation, using notation from Mandell et al. (Reference Mandell, Dorland and Landreman2018). Including both the Maxwellian weight and the normalisation factors, the Hermite basis expansion ( $\varphi ^m(v)$ ) and projection ( $\varphi _m(v)$ ) functions are

(3.1) \begin{equation} \varphi ^{m}(v) = \frac {\text{He}_m(v)}{\sqrt {2\pi m!}}{\rm e}^{-v^2/2} , \quad \varphi _m(v) = \frac {\text{He}_m(v)}{\sqrt {m!}}, \end{equation}

where

(3.2) \begin{equation} \text{He}_m(v) = (-1)^m {\rm e}^{v^2/2}\frac {\text{d}^m}{\text{d}v^m} {\rm e}^{-v^2/2} \end{equation}

are the probabilists’ Hermite polynomials (Abramowitz & Stegun Reference Abramowitz and Stegun1972). Note that these functions are orthonormal,

(3.3) \begin{equation} \int _{-\infty }^{\infty } \text{d}v \varphi ^m(v)\varphi _{m'}(v) = \delta _{m,m'}, \end{equation}

and that the background Maxwellian is identical to the $m=0$ basis element:

(3.4) \begin{equation} \varphi ^0 = \frac {1}{\sqrt {2\pi }}{\rm e}^{-v^2/2} = F_M. \end{equation}

This property makes the Hermite moment representation a natural choice for near-Maxwellian velocity distributions. We project the fluctuating portion of the distribution function onto the Hermite basis using

(3.5) \begin{equation} g(z,v,t) = \sum _{m=0}^{\infty }\varphi ^m G_m(z,t), \quad G_m(z,t) \equiv \int _{-\infty }^{\infty } \text{d}v \varphi _m g(z,v,t). \end{equation}

3.2. Vlasov equation in the Hermite basis

We seek an evolution equation for the amplitudes, $G_m$ , of the Hermite expansion functions. To derive this moment hierarchy, we project (2.11) onto the Hermite basis term by term. The first term is the most straightforward:

(3.6) \begin{equation} \begin{split} \int _{-\infty }^{\infty } \text{d}v \varphi _m \frac {\partial g}{\partial t} = \frac {\partial G_m}{\partial t}. \end{split} \end{equation}

We then project the linear ballistic term, $v( \partial g / \partial z )$ , onto the basis:

(3.7) \begin{equation} v\frac {\partial g}{\partial z} = \frac {\partial }{\partial z}\sum _{m=0}^{\infty } v \varphi ^m G_m = \frac {\partial }{\partial z}\sum _{m=0}^{\infty } \left (\sqrt {m+1}\varphi ^{m+1} + \sqrt {m}\varphi ^{m-1} \right ) G_m, \end{equation}

where we used a recurrence relation for the Hermite basis functions to remove the explicit dependence on $v$ (Abramowitz & Stegun Reference Abramowitz and Stegun1972):

(3.8) \begin{equation} \sqrt {m+1}\varphi ^{m+1} = v\varphi ^m - \sqrt {m}\varphi ^{m-1}. \end{equation}

Applying the projection function to (3.7) and using (3.3) yields

(3.9) \begin{equation} \begin{split} \int _{-\infty }^{\infty } \text{d}v \varphi _{m} v\frac {\partial g}{\partial z} = \frac {\partial }{\partial z} \left (\sqrt {m}G_{m-1} + \sqrt {m+1}G_{m+1}\right ). \end{split} \end{equation}

The linear portion of the electric field drive term contains the $m=1$ expansion function, so applying the projection function to this term yields

(3.10) \begin{equation} \int _{-\infty }^{\infty } \text{d}v \varphi _{m} vEF_M = E\int _{-\infty }^{\infty } \text{d}v \varphi _{m} \varphi ^1 = E\delta _{m,1}. \end{equation}

The final term to consider is the nonlinear term, $-E(\partial g / \partial v)$ . Combining a second recurrence relation (Abramowitz & Stegun Reference Abramowitz and Stegun1972),

(3.11) \begin{equation} \frac {\text{d}}{\text{d}v}\text{He}_m(v) = v\text{He}_m(v) - \text{He}_{m+1}(v), \end{equation}

with the definition of the Hermite basis functions $\varphi ^m$ (3.1), presents a useful new identity:

(3.12) \begin{equation} \frac {\partial \varphi ^m}{\partial v} = -\sqrt {m+1}\varphi ^{m+1}. \end{equation}

The expansion of the nonlinear term in the Hermite basis is then

(3.13) \begin{equation} -E\frac {\partial g}{\partial v} = -E \sum _m \frac {\partial \varphi ^m}{\partial v} G_m = E \sum _m \sqrt {m+1}\varphi ^{m+1}G_m. \end{equation}

After projecting this term onto the basis,

(3.14) \begin{equation} \begin{split} \int _{-\infty }^{\infty } \text{d}v \varphi _{m} \left(E\sum _{m'} \sqrt {m'+1}\varphi ^{m'+1}G_{m'}\right) &= \sum _{m'} E\sqrt {m'+1}\delta _{m,m'+1}G_{m'} \\ &= E\sqrt {m}G_{m-1}, \end{split} \end{equation}

we can collect all of the terms to recover the general form of the equation for $G_m$ :

(3.15) \begin{equation} \frac {\partial G_m}{\partial t} + \frac {\partial }{\partial z}\left (\sqrt {m}G_{m-1} + \sqrt {m+1}G_{m+1}\right ) + E \sqrt {m}G_{m-1} + E \delta _{m,1} = 0. \end{equation}

3.3. Pseudo-spectral method in Fourier space

Next, we discretise space with a Fourier basis. The Fourier transform and inverse Fourier transform operators for a spatial grid of $N$ points are

(3.16) \begin{equation} \mathcal{F}[g] = \frac {1}{N} \sum _{z}g \text{e}^{-\text{i}kz}\quad \mathcal{F}^{-1}[G_k] = \sum _k G_k \text{e}^{\text{i}kz}, \end{equation}

where the Fourier wavenumbers are given by $k = 2\pi j / N$ , with $-N/2 \leqslant j \lt N/2$ and $j \in \mathbb{Z}$ . The Fourier convolution theorem,

(3.17) \begin{equation} \mathcal{F}[f g] = \sum _{k'}F_k G_{k-k'}, \end{equation}

transforms the nonlinear term into a coupled sum over all wavenumbers. This coupling introduces additional computational overhead. We mitigate that problem by using a pseudo-spectral method (Boyd Reference Boyd2000), evaluating the nonlinear term on the original configuration space and projecting the result back onto the space of Fourier wavenumbers. Applying the Fourier transform, (3.16), to (3.15) and then applying the Fourier convolution theorem yields

(3.18) \begin{align} \frac {\partial G_{m,k}}{\partial t} + \text{i}k\left (\sqrt {m}G_{m-1,k} + \sqrt {m+1}G_{m+1,k}\right ) + \sqrt {m}\mathcal{F}\left [\mathcal{F}^{-1}[E_k] \mathcal{F}^{-1}[G_{m-1,k}]\right ] \nonumber \\ = - E_k \delta _{m,1}. \end{align}

Simulating this collisionless system with finite resolution without using any regularisation is numerically infeasible, as the dynamics can generate structures in phase space at arbitrarily small scales (Parker & Dellar Reference Parker and Dellar2015; Loureiro et al. Reference Loureiro, Dorland, Fazendeiro, Kanekar, Mallet, Vilelas and Zocco2016; Mandell et al. Reference Mandell, Dorland, Abel, Gaur, Kim, Martin and Qian2024). To mitigate that effect, we introduce numerical regularisation through hyperdiffusion $(-\nu _k k^4)$ and hypercollision $(-\nu _m m^4)$ operators:

(3.19) \begin{equation} H_{mk} = \begin{cases} 0 & \text{if $m \leqslant 2$} \\ -\nu _k k^4 -\nu _m m^4 & \text{if $m\gt 2$}, \end{cases} \end{equation}

where $\nu _k$ and $\nu _m$ are parameters which control the strengths of the hyperdiffusion and hypercollisions, respectively. Note that we do not apply this regularisation to the $m=$ 0, 1 and 2 equations, as they encode conservation of particle number, momentum and energy, respectively.

We solve the system for a finite number of moments $M$ . Defining compact notation for the nonlinear operator,

(3.20) \begin{equation} \mathcal{N}[E_k,G_{m-1,k}] = \sqrt {m}\mathcal{F}[\mathcal{F}^{-1}[E_k]\boldsymbol{\cdot }\mathcal{F}^{-1}[G_{m-1,k}]], \end{equation}

and accounting for the special cases of the $m=0$ , $m=1$ and $m=M-1$ equations, we have derived the projection of the Vlasov equation onto the Fourier–Hermite basis:

(3.21) \begin{align} &\frac {\partial G_{0,k}}{\partial t} + \text{i}kG_{1,k} = 0, \end{align}
(3.22) \begin{align} &\frac {\partial G_{1,k}}{\partial t} + \text{i}k\left (G_{0,k} + \sqrt {2}G_{2,k}\right ) + \mathcal{N}[E_k,G_{0,k}] + E_k = 0, \end{align}
(3.23) \begin{align} &\frac {\partial G_{m,k}}{\partial t} + \text{i}k\left (\sqrt {m}G_{m-1,k} + \sqrt {m+1}G_{m+1,k}\right ) + \mathcal{N}[E_k,G_{m-1,k}] \nonumber \\ & \qquad\quad = H_{m,k} G_{m,k}, \quad 2 \leqslant m \lt M-1, \end{align}
(3.24) \begin{align} &\frac {\partial G_{M-1,k}}{\partial t} + \text{i}k\sqrt {M-1}G_{M-2,k} + \mathcal{N}[E_k,G_{M-2,k}] = H_{M-1,k} G_{M-1,k} . \end{align}

3.4. Poisson’s equation in the Fourier–Hermite basis

To complete the system, we project Poisson’s equation onto the Fourier–Hermite basis and solve for the electric field, $E_k$ . First, we solve for the charge density in the Hermite basis:

(3.25) \begin{equation} \rho = -\int _v \text{d}v g = -\int _v \text{d}v \varphi _0 \sum _m \varphi ^m G_m = -G_0. \end{equation}

Next, we insert this expression for the normalised charge density into Poisson’s equation (2.12) and apply the Fourier transform (3.16) to calculate the electric field as a function of the $m=0$ density moment in the Fourier–Hermite basis:

(3.26) \begin{equation} E_k = \frac {\text{i}G_{0,k}}{k}. \end{equation}

We require that the $k=0$ mode is zero and stationary, corresponding to zero mean electric field.

3.5. The closure problem

A continuous representation formally requires an infinite number of Hermite moments, but we must discretise the system with a finite number of moments, $M$ . Observe that in the Fourier–Hermite basis, the parallel streaming term in the convective derivative, $v (\partial g / \partial z)$ , becomes $\text{i}k(\sqrt {m}G_{m-1, k} + \sqrt {m+1}G_{m+1, k})$ for general $m$ . This term couples nearest neighbours in Hermite space and forms a hierarchy of equations that must be closed. This closure problem is analogous to the classic closure problem encountered when taking fluid moments of the kinetic equation (2.1). The resulting fluid equations for density, momentum, energy and higher-order moments exhibit coupling between moment $m$ and moments $m+1$ and $m-1$ . This same feature occurs in the Hermite representation of velocity space, and from (3.5) and (3.1), the Hermite moment basis is obtained through a similar process of multiplying the distribution function by increasing powers of $v$ via increasing orders of the Hermite polynomials and then integrating over velocity space.

The simplest model of closure is by truncation, which we use in the direct numerical solution (DNS) by omitting the $G_{m+1,k}$ term in the evolution equation for the final moment, $G_{M-1,k}$ . Our objective is to introduce a dynamical closure model to close the hierarchy at a moment number $m=m_c, \, m_c \lt M-1$ , while still achieving accurate predictions of the Fourier and Hermite spectra. We use a reservoir recurrent neural network as the closure model, which we describe in more detail in § 4.

4. Machine-learning closure: reservoir computing

Inspecting the evolution equation for $G_{m,k}$ (3.15) reveals that the coupling between moment $m$ and moment $m+1$ only explicitly appears as a linear term in the equation, acting on each wavenumber $k$ independently. As a result, we introduce a unique and independent Hermite closure model for each $k$ in the system, for a total of $N_k$ reservoirs. We use methods similar to those of Pathak et al. (Reference Pathak, Hunt, Girvan, Lu and Ott2018) to design the reservoirs.

Each reservoir is a network of $D_{r}$ nodes defined by a latent state vector $ \boldsymbol{r}_k(t) \, \in \mathbb{R}^{D_r}$ , where each element of $\boldsymbol{r}_k(t)$ represents the state of one node at time $t$ . The structure of the reservoir computing model is presented in figure 1. The connections between nodes are defined by a $D_{r} \times D_{r}$ sparse adjacency matrix, $\unicode{x1D63C}$ , where exactly $\kappa$ non-zero elements per row are initialised to random values drawn from the uniform distribution on [0, 1] and scaled by the appropriate factors to set the largest eigenvalue of $\unicode{x1D63C}$ equal to a specified value, $\rho _{sp}$ . The magnitude of the largest eigenvalue of a reservoir’s adjacency matrix has been demonstrated to significantly impact the accuracy of the reservoir’s predictions (Jiang & Lai Reference Jiang and Lai2019). The runtime is separated into two phases, training and prediction. In the training phase, the inputs are derived from a high-resolution DNS. In the subsequent prediction phase, the inputs are derived from a low-resolution simulation with the output of our reservoir used to compute the closure. The input to a given reservoir at each time step is a vector $\boldsymbol{u}_k(t)$ of dimension $2w$ , where the elements of $\boldsymbol{u}$ are the real and imaginary parts of the final $w$ Hermite moments before $m_c$ :

(4.1) \begin{equation} \boldsymbol{u}(t) = \left [G_{m_c-w,k}(t-w\Delta t), \, G_{m_c-(w-1),k}(t-(w-1)\Delta t), \ldots ,\,G_{m_c-1,k}(t-\Delta t)\right ]. \end{equation}

Figure 1. Diagram of the reservoir computing closure model, integrated with the kinetic moment solver.

In both phases, the inputs form a diagonal in $m$ and $t$ of Hermite moments, and this structure is inspired by the phenomenon of linear Landau damping in the linearised Vlasov–Poisson system, where energy cascades from low $m$ to high $m$ in time. The input layer is a $D_{r} \times 2w$ matrix $\unicode{x1D652}_{\rm in}$ , with its elements drawn from the uniform distribution on $[-\sigma ,\sigma ]$ , where $\sigma$ is a parameter. Critically, $\unicode{x1D63C}$ and $\unicode{x1D652}_{\rm in}$ remain fixed after initialisation. Only the output layer, $\unicode{x1D652}_{\rm out}$ , is trained. The reservoir state vector evolves with the equation

(4.2) \begin{equation} \boldsymbol{r}_k(t + \Delta t) = \tanh [\unicode{x1D63C} \boldsymbol{r}_k(t) + \unicode{x1D652}_{\rm in}\boldsymbol{u}(t)], \end{equation}

where the hyperbolic tangent activation function operates element-wise on the vector argument.

The elements of the reservoir-to-output coupling, $\unicode{x1D652}_{\text{out}}$ , are parameters that must be tuned to train the reservoir. We introduce a hyperparameter $b$ to exclude data from initial transients from the reservoir training process. This hyperparameter serves as a buffer of time steps between the initialisation time and the data assigned for training $\unicode{x1D652}_{\text{out}}$ . During the training phase, defined by the times $t$ where $b \Delta t \leqslant t \leqslant T$ , the reservoir output is trained to approximate the closure moment, $G_{m_c,k}$ .

The most straightforward choice for the closure evaluation would be the linear readout operation, $\unicode{x1D652}_{\rm out} \boldsymbol{r}_k$ . Empirically, that operation does not permit reservoirs to predict the dynamics of several other nonlinear systems, including the Lorenz system (Lu et al. Reference Lu, Pathak, Hunt, Girvan, Brockett and Ott2017; Chattopadhyay, Hassanzadeh & Subramanian Reference Chattopadhyay, Hassanzadeh and Subramanian2020; Pyle et al. Reference Pyle, Jovanovic, Subramanian, Palem and Patel2021), Kuramoto–Sivashinsky equation (Pathak et al. Reference Pathak, Hunt, Girvan, Lu and Ott2018) and the global atmospheric system (Arcomano et al. Reference Arcomano, Szunyogh, Pathak, Wikner, Hunt and Ott2020). As Lu et al. (Reference Lu, Pathak, Hunt, Girvan, Brockett and Ott2017) and Pathak et al. (Reference Pathak, Hunt, Girvan, Lu and Ott2018) describe, the hyperbolic tangent operation in the evolution equation for the reservoir state (4.2) introduces an odd-parity symmetry to the reservoir state vector $\boldsymbol{r}$ . As in the Kuramoto–Sivashinsky equation and the Lorenz systems studied by those authors, the quadratic nonlinear term in the Vlasov equation breaks odd-parity symmetry. In other words, the general evolution equation for the Hermite moments (3.18) is not invariant under the transformation $G_{m,k} \rightarrow -G_{m,k}$ , while the evolution equation for the reservoir state (4.2) is invariant under the transformations $\boldsymbol{u} \rightarrow -\boldsymbol{u}$ and $\boldsymbol{r} \rightarrow -\boldsymbol{r}$ . Following the methods of Pathak et al. (Reference Pathak, Hunt, Girvan, Lu and Ott2018) and explored in more detail by Chattopadhyay et al. (Reference Chattopadhyay, Hassanzadeh and Subramanian2020) and Pyle et al. (Reference Pyle, Jovanovic, Subramanian, Palem and Patel2021), we resolve this conflict by applying a symmetry-breaking transformation ( $\boldsymbol{r}_k \rightarrow \boldsymbol{r}_k^{\,*}$ ) to the reservoir latent state before readout:

(4.3) \begin{equation} r_{i,k}^*(t) = \begin{cases} r_{i,k}(t), \quad \text{even } i \\ r_{i,k}^2(t), \quad \text{odd } i. \end{cases} \end{equation}

Because the reservoir output, $\unicode{x1D652}_{\rm out}\boldsymbol{r}^*(t)$ , is a linear operation on $\boldsymbol{r}^*$ , we can train the weights of $\unicode{x1D652}_{\rm out}$ using linear regression and avoid the computational expense of backpropagation. To mitigate potential overfitting, we introduce a regularisation parameter, $\beta$ , in the method of Tikhonov-regularised linear regression. The training problem is then to find the values of $\unicode{x1D652}_{\rm out}$ which minimise

(4.4) \begin{equation} \sum _{j = b}^{T/\Delta t}||G_{m_c,k}(j\Delta t) - \unicode{x1D652}_{\text{out}} \boldsymbol{r}_k^{\,*}(j\Delta t)||^{2} + \beta ||\unicode{x1D652}_{\rm out}||^{2}. \end{equation}

Explicitly, the closed-form solution that solves this linear regression problem is

(4.5) \begin{equation} \unicode{x1D652}_{\rm out} = \unicode{x1D642}_{m_c,k} \unicode{x1D64D}_k^{\,*T}(\unicode{x1D64D}_k^{\,*}\unicode{x1D64D}_k^{\,*T} + \beta \unicode{x1D644}\,)^{-1}, \end{equation}

where $\unicode{x1D642}_{m_c,k}$ is a $2 \times (T/\Delta t - b)$ matrix consisting of the time series of the real and imaginary parts of the closure moment, $G_{m_c,k}$ , $\unicode{x1D64D}_k^{\,*}$ is a $D_r \times (T/\Delta t- b)$ matrix consisting of the time series of the modified state vector $\boldsymbol{r}_k^{\,*}(t)$ , $\unicode{x1D64D}_k^{\,*T}$ is the transpose of $\unicode{x1D64D}_k^{\,*}$ and $\unicode{x1D644}$ is the identity matrix.

4.1. Closure integration

Once the weights of $\unicode{x1D652}_{\rm out}$ are trained, the algorithm switches to the prediction phase, and the system does not evolve moments beyond $m = m_c$ . Instead, the output of the reservoir is fed back into the system of equations as a closure for the moment hierarchy. To make the discussion more concrete, we introduce a time integration operator $\mathcal{D}$ , such that $\mathcal{D}[\unicode{x1D642}(t)] = \unicode{x1D642}(t + \Delta t)$ , where $\unicode{x1D642}(t)$ is the $M \times N_k$ matrix of Fourier–Hermite amplitudes representing the state of the system at time $t$ . Operator $\mathcal{D}$ is a generic operator, and its implementation details are flexible. For the results presented in this paper, we implement $\mathcal{D}$ using the Python libraries NumPy (Harris et al. Reference Harris2020) and SciPy (Virtanen et al. Reference Virtanen2020) with the classic fourth-order explicit Runge–Kutta method (RK4) and the third-order strong stability-preserving Runge–Kutta method (SSPRK3) as detailed by Durran (Reference Durran2010).

When we introduce the closure models, the time integration becomes a two-step process. For each wavenumber $k$ , we use the time integration operator $\mathcal{D}$ to calculate the resolved moments $\unicode{x1D642}_{m\lt m_c,k}$ :

(4.6) \begin{equation} \unicode{x1D642}_{m\lt m_c,k}(t + \Delta t) = \mathcal{D}[\unicode{x1D642}_{m\leqslant m_c}(t)]_k. \end{equation}

We then use a reservoir to predict the closure moment $G_{m_c,k}$ :

(4.7) \begin{equation} G_{m_c, k} (t + \Delta t) = \unicode{x1D652}_{\rm out} \boldsymbol{r}_k^{\,*}(t + \Delta t). \end{equation}

4.2. Computational complexity

Though performance profiling metrics for numerical solutions of partial differential equations are highly dependent on computational hardware architectures and algorithm implementation choices, we can calculate the computational complexity of the DNS and the ML closure for the cases in this paper. Both methods have an identical cost to evolve the lowest-order moments. Thus, we only report the cost of evolving moments $m_c \leqslant m \lt M$ for the DNS and the closure moment $m_c$ for the ML method. While the time-integration operator $\mathcal{D}$ is generic, we will remove a layer of abstraction by calculating the result for the SSPRK3 algorithm. For simplicity, we will neglect the additional cost of operations on complex numbers relative to those same operations on real numbers, with the understanding that this will underestimate the cost of the DNS in comparison to the ML closure. The cost of transcendental functions including $\tanh$ depends on instruction sets. This estimate treats them as equivalent in cost to multiplication, with the understanding that this underestimates the cost of updating the reservoir state. We also neglect the cost of memory access and copy operations for both methods.

4.2.1. Complexity of DNS

First, let us introduce a new parameter $M' = M - m_c$ that represents the additional number of moments that the DNS must evolve (3.23) in comparison with the ML closure model. Our choice of time-integration operator, SSPRK3, is a three-stage algorithm. Each stage includes an evaluation of the electric field, nonlinear term, linear streaming term and hypercollisions. The electric field is calculated by (3.26). The vector $(\text{i} / k)$ can be stored in memory during initialisation, so the electric field calculation requires $N_k$ multiplication operations. The nonlinear term is evaluated with a pseudo-spectral method using (3.20). Using the $\mathcal{O}(N \log N)$ scaling of the fast Fourier transform algorithm (Cooley & Tukey Reference Cooley and Tukey1965), the total computational complexity of the nonlinear term is $3 M' \mathcal{O}(N_k \log N_k)$ . If the vector $(\text{i} k)$ is also stored during initialisation, then the linear streaming term requires $6M'N_k$ operations. The dissipation matrix in (3.19) can be stored, and the cost of the regularisation term is $2M'N_k$ . Finally, we include the operations needed to combine the results of each stage into the state of the system at the next time step. For SSPRK3, this requires $12M'N_k$ operations. The total cost of DNS for each time step is then

(4.8) \begin{equation} \text{Cost}_{\text{DNS}} = 3M'\mathcal{O}(N_k \log N_k) + 20M'N_k + N_k. \end{equation}

4.2.2. Complexity of ML closure

The ML closure evaluation has two stages: updating the latent state of each reservoir and evaluating the closure moment. There are a total of $N_k$ reservoirs. The state of each reservoir, $\boldsymbol{r}_k$ , updates at every time step using (4.2). This requires the sum of two matrix-vector products. The product $\unicode{x1D652}_{\rm in} \boldsymbol{u}$ has a cost of $4D_rw$ operations. The other product $\unicode{x1D63C} \boldsymbol{r}_k$ involves a sparse matrix. Though the cost of this product is implementation-specific, we will count operations on non-zero elements, resulting in a cost of $2 D_r \kappa$ operations, where $\kappa \lt D_r$ . Then, the $\tanh$ operation is applied to the sum of these products. The total cost of the reservoir update is then $2D_r[\kappa +$ $2w + 1]$ . The symmetry-breaking transformation (4.3) is applied to the state vector, which requires $D_r / 2$ multiplication operations. Finally, each reservoir predicts a Hermite closure using (4.7), which uses $4D_r$ operations. The ML closure is linear in $k$ . In total, the cost of calculating the ML closure for each time step is then

(4.9) \begin{equation} \text{Cost}_{\text{ML}} = N_k D_r [8w + 4\kappa + 13/2]. \end{equation}

After eliminating the common factor of $N_k$ from the cost of the DNS and the ML closure, we find that as $N_k$ increases, the dominant scaling for the DNS is $\mathcal{O}(M' \log N_k)$ . In contrast, the cost of the ML closure scales linearly with the size of each reservoir $D_r$ . For sufficiently large $(M' \log N_k) / (D_r w)$ , the reservoir closure will be faster than the DNS.

5. Closure results

5.1. Linearised Vlasov–Poisson

For our first test of the ML closure, we confirm that the model accurately solves the linear limit of the Vlasov–Poisson system. First, we linearise the Vlasov equation (2.11), dropping the nonlinear term:

(5.1) \begin{equation} \frac {\partial g}{\partial t} + v\frac {\partial g}{\partial z} +vEF_M = 0. \end{equation}

In Fourier space, this allows us to restrict our focus to a single wavenumber, as the coupling between wavenumbers occurs only in the nonlinear term. We simulate an approximately steady-state solution by driving the system, as explored by Kanekar et al. (Reference Kanekar, Schekochihin, Dorland and Loureiro2015). We inject energy into the $m=0$ moment, and it cascades down to finer scales through Landau damping. We incorporate these features into the projection of this equation onto the Fourier–Hermite basis using the results of § 3:

(5.2) \begin{equation} \frac {\partial G_{m,k}}{\partial t} + \text{i}k\left (\sqrt {m}G_{m-1,k} + \sqrt {m+1}G_{m+1,k}\right ) = \chi (t)\delta _{m,2} - E_k \delta _{m,1}, \end{equation}

where $\chi (t)$ is a forcing coefficient independently drawn at each time step from the uniform distribution on $[0,1)$ . The electric field $E_k$ is given by (3.26). The purpose of this mechanism for energy injection is to achieve a statistical steady-state solution. Our primary systems of interest exhibit nonlinear chaos and typically settle into time-dependent steady states. While constant forcing would also achieve a steady-state solution, a time-dependent forcing mechanism more closely resembles typical problems encountered in plasmas. Other injection mechanisms, including driving the $m=2$ moment or applying a stochastic external electric field, yield similar performance for the closure model. Figure 2 depicts a high-resolution ( $M=1025$ ) baseline simulation of (5.2) with $k=0.4$ .

Figure 2. High-velocity-resolution ( $M=1025$ ) baseline time series of Hermite amplitudes for the driven, linearised Vlasov–Poisson system. Energy is injected as a density perturbation in the $m=0$ Hermite moment and advects to higher moments through linear Landau damping. No hypercollisional regularisation is applied to this example, and hundreds of Hermite moments are required to prevent numerical reflection at the high- $m$ boundary from impacting the low- $m$ spectrum.

When no hypercollisional regularisation is applied, the cascade of free energy numerically reflects at the high- $m$ boundary. In the simulation in figure 2, this reflection occurs around $t=80$ . This phenomenon always occurs at a finite time for a given Hermite resolution ( $M$ ), meaning that extending the duration of a simulation where the low-order moments are valid requires increasing the Hermite resolution. Additionally, the rate at which energy advects from low $m$ to high $m$ increases with Fourier wavenumber $k$ (Parker & Dellar Reference Parker and Dellar2015). Therefore, though this linearised simulation with $k=0.4$ is feasible with high velocity resolution, more Hermite resolution would be required for a nonlinear simulation with moderate spatial resolution. Some form of artificial dissipation is necessary for baseline simulations of higher-dimensional systems to be computationally feasible. We therefore apply hypercollisional regularisation to the nonlinear form of the Vlasov–Poisson system in §§ 5.2 and 5.3.

We then train a reservoir to predict the Hermite closure moment, $G_{m_c,k}$ , where we evaluate the closure at $m_c = 3$ . The choice of $m_c=3$ is motivated by the tradition of heat flux closures in fluid theory and our desire for a closure that allows the conservation laws in the $m \in [0,1,2]$ equations to remain intact. One motivation for a closure is to reduce the resolution required to achieve accurate simulations, so we choose the lowest valid value for $m_c$ . We use the time series of the density, momentum and temperature moments from the time window $15 \leqslant t \lt 25$ as training data. We find that a requirement is that enough time has passed to allow energy to cascade into $m=m_c$ . We report results training on data from the time window $15 \lt t \leqslant 25$ , but the reservoir supports accurate spectra for other training windows as well. Figure 3 depicts the excellent agreement between the low-order spectra calculated by the high-resolution baseline and the ML closure model. The mean-squared error in the ML closure spectrum is $2.25 \times 10^{-5}$ . The ML closure shows favourable performance in comparison with a naive closure by truncation, where no hypercollisional regularisation is applied. It also outperforms a closure by truncation where the hypercollisional term, $-\nu _m m^4 G_{m,k}$ , is introduced to the right-hand side of (5.2) to mitigate numerical reflection at the high- $m$ boundary. As in (3.19), this term is only applied for $m\gt 2$ to preserve the low-order conservation laws. For an initial comparison, $\nu _m$ is set to $2 \times 10^{-2}$ to maintain finite dissipation at $m_c = 3$ , where $\nu _m m_c^4$ is of order one. For reference, we also compare with the theoretical scaling $G_m \sim m^{-1/2}$ (Zocco & Schekochihin Reference Zocco and Schekochihin2011; Kanekar et al. Reference Kanekar, Schekochihin, Dorland and Loureiro2015).

Figure 3. Time-averaged Hermite spectra for the ML closure model compared with the high-velocity-resolution ( $M=1025$ ) baseline, closure-by-truncation with three different coefficients ( $\nu _m$ ) for hypercollisional regularisation and the theoretical $m^{-1/2}$ scaling. The ML closure model shows strong agreement with the baseline simulation, while no value of $\nu _m$ produces an accurate spectrum.

The reservoir hyperparameters used to construct the model are spectral radius $\rho _{sp}=0.6$ , adjacency matrix degree of three, Tikhonov regularisation parameter $\beta = 10^{-9}$ , input scaling parameter $\sigma = 0.5$ and six nodes in the reservoir. We found these hyperparameters by comparing mean-squared errors in spectra. An unguided scan led to a reduction in error from $1.75 \times 10^{-7}$ to $2.26 \times 10^{-8}$ . No systematic optimisation was required, though some choices of parameters led to numerically unstable time integration of (5.2). We used the same procedure to select the hyperparameters in both the weakly and strongly nonlinear cases presented in §§ 5.2 and 5.3, respectively. The spectral radius $\rho _{sp}$ of the adjacency matrix $\unicode{x1D63C}$ must be set to $\rho _{sp} \lt 1$ for stability. The block-diagonal structure for $\unicode{x1D652}_{\rm in}$ that Vlachas et al. (Reference Vlachas, Pathak, Hunt, Sapsis, Girvan, Ott and Koumoutsakos2020) used in their reservoir computing implementation yielded unstable predictions. We speculate that this may be related to differences in requirements for complex-valued data in comparison with real-valued data, but more analysis would be required to confirm. For the strongly nonlinear case in § 5.3, using a reservoir smaller than 60 nodes or including the initial transient in the training set also led to unstable solutions. The computational complexity for evaluating the ML closure is the same as in (4.9), with $N_k=1$ . The DNS cost for this case is $18M' + 1$ , after removing the contributions of the nonlinear term and hypercollisions from (4.8).

5.2. Weakly nonlinear Vlasov–Poisson

Next, we reintroduce the nonlinear term, $-E (\partial g / \partial v)$ , which permits coupling between wavenumbers. This coupling in Fourier space allows energy to flow across spatial scales, in addition to the already-present energy cascade in velocity space. This effect is subdominant at low amplitudes, but it becomes more significant as amplitudes increase. We first examine the low-amplitude limit in this section before proceeding to a high-amplitude case in § 5.3. In the low-amplitude limit, Landau damping continues to be the dominant effect. Here, we solve an initial-value problem, where the initial condition is a cosine density perturbation with a Maxwellian velocity component:

(5.3) \begin{equation} g(z,v,t=0) = \epsilon \cos (k_0 z)F_M(v), \end{equation}

where $\epsilon$ is the initial amplitude of the perturbation normalised by the background density and Maxwellian weight and $k_0$ is the wavenumber of the initial perturbation. In the spectral domain, this initial condition has the convenient form

(5.4) \begin{equation} G_{m,k,t=0} = \frac {\epsilon }{2} \delta _{m,0} \left (\delta _{k,k_0} + \delta _{k,-k_0}\right ), \end{equation}

collapsing the initial system state into a single non-zero value after applying the reality condition. To explore the linear limit of the system, we choose $\epsilon = 0.001$ , a $0.1\, \%$ density perturbation. We examine the $k_0 = 0.4$ case that Brunetti, Califano & Pegoraro (Reference Brunetti, Califano and Pegoraro2000) explored in their analysis, accounting for a factor of 10 difference in normalisations. With this low initial amplitude, we choose $N_k=17$ Fourier wavenumbers, including the $k=0$ mode, anticipating that the dominant energy cascade will be linear Landau damping. For the high-resolution simulations, we continue to use $M=17$ Hermite moments, including the $m=0$ moment. The hypercollisional regularisation coefficient is $\nu _m = 5 \times 10^{-5}$ , and we set $\nu _k = 0$ . This value of $\nu _m$ is chosen to maintain finite dissipation at the finest velocity scale in the simulation, such that the magnitude of the coefficient of the hypercollisional term at that scale, $\nu _{m} (M-1)^4$ , is of order one. As the Fourier spectrum in figure 6 shows, the flux of energy to high $k$ in this regime is small, so $\nu _k$ is not necessary. We set the size of the spatial simulation domain to $L=5\pi$ so that $1/k_0$ is the largest wavelength resolved in the system. Figure 4 presents a comparison between the baseline, high-Hermite-resolution simulation, the ML closure model and the theoretical damping rate. The time trace for the ML closure method begins at the end of the training phase, $t=25$ . To calculate the damping rate, we use the complex root-finding algorithm of Carpentier & Dos Santos (Reference Carpentier and Dos Santos1982), implemented in the generalised dispersion relation solver developed by Ivanov & Adkins (Reference Ivanov and Adkins2023). These results demonstrate that with a low initial amplitude, our baseline simulation converges to the expected linear Landau damping rate. Additionally, figure 4 demonstrates that the ML closure model preserves the frequency of the wave. Time traces of the $m=1$ and $m=2$ moments show similar performance and are presented in Appendix B.

Figure 4. Comparison between baseline numerical solution, ML closure and theoretical damping rate of the Fourier–Hermite amplitude for an initial cosine density perturbation. The low-amplitude perturbation shows strong agreement with the theoretical damping rate. When augmented with the ML closure, the moment solver continues to capture the behaviour well at a lower Hermite resolution of $M=4$ , as opposed to the $M=17$ baseline.

Figure 5. Hermite spectra of the high-velocity-resolution ( $M=17$ ) and truncated ( $M=4$ ) simulations and ML closure for the initial-value problem in figure 4. The spectra are averaged over time and Fourier wavenumber. The ML closure model permits a low-resolution simulation to accurately resolve the Hermite spectrum.

Figure 6. Fourier spectra of the simulations in figure 5. In this weakly nonlinear regime, the majority of the energy remains in the boxscale mode. Each reduced Hermite resolution model properly captures the rapid energy decay across $k$ .

In figure 5, we compare the ML closure model, with $m_c = 3$ , to both the high-resolution ( $M=17$ ) baseline and two low-resolution simulations without the ML closure, i.e. closure by truncation. We also compare the Fourier spectra of these simulations in figure 6. For this problem, we place an independent reservoir at each wavenumber $k$ and train it to learn a Hermite moment closure for that wavenumber. The ML closure shows strong agreement with the high-resolution baseline, with mean-squared errors of $1.26 \times 10^{-42}$ for the Hermite spectrum and $8.01 \times 10^{-38}$ for the Fourier spectrum. It successfully captures the Hermite and Fourier spectra of the system and confirms that its predictive capability continues to hold when a small, but non-zero, flux of energy flows between wavenumbers. To create a low-resolution simulation without the ML closure, we truncate the moment hierarchy at $m_c=3$ and set $G_{m_c+1,k}=0$ in the evolution equation for $G_{m_c,k}$ . As in § 5.1, we also compare to a low-resolution simulation with increased hypercollisional regularisation, with $\nu _m$ set to $2 \times 10^{-2}$ to maintain finite dissipation at the finest resolved scales. An intuitive explanation for the poor performance of the truncated simulations in figure 5 can be described in relation to the general form of the evolution equation for this problem, (3.18). The truncation process removes the $G_{m+1,k}$ term from the final moment equation in the truncated system, eliminating an effective energy sink from that moment. Without access to the proper dissipation channel of smaller scales in velocity space, the truncated system experiences some energy reflection and pile-up in its smallest resolved scales in velocity. In contrast, the ML closure learns the pattern of dissipation through Landau damping and serves as a better boundary condition than truncation.

We find that the reservoir hyperparameters that result in the best closure performance for this initial condition are spectral radius $\rho _{sp} = 0.6$ , Tikhonov regularisation parameter $\beta = 10^{-7}$ and $T=25$ normalised time units for training. For each wavenumber, we set the reservoir input scaling parameter $\sigma$ to normalise the reservoir inputs by the time average of the Hermite amplitude at the moment before the closure:

(5.5) \begin{equation} \sigma _k = \frac {1}{\langle |G_{m_c-1,k,t}| \rangle _t}. \end{equation}

This preserves the dynamic range of the hyperbolic tangent nonlinear activation function by mitigating the possibility of saturation to $-1$ or 1. As in § 5.1, we set the degree of the adjacency matrix to three. Finally, we choose a reservoir size of two nodes per input value, totalling 12 nodes.

5.3. Strongly nonlinear Vlasov–Poisson

Finally, we create an ML closure model for the strongly nonlinear regime of the Vlasov–Poisson system. When the amplitude of the initial density perturbation becomes significant relative to the background, the nonlinear term dominates over the energy cascade in linear Landau damping. Figure 7 presents the numerical solution to the initial-value problem from § 5.2, with $\epsilon$ increased to 0.18. When compared with figure 4, the behaviour at high amplitude deviates from the linear theory. Additionally, a convergence study indicated that more Hermite moments ( $M=65$ ) are necessary to accurately resolve the low-order spectrum than in the weakly nonlinear case. (For more detail on this, see Appendix A.) We set $\nu _m$ to $5 \times 10^{-7}$ , following the procedure we used to determine the baseline hypercollisional regularisation parameter for the linearised and weakly nonlinear cases. From the start of the simulation until approximately $t=25$ , the mode damps at a faster rate than in the linear theory. After the initial transient, the mode saturates to a steady state. When the ML closure model is introduced, the lower-Hermite-resolution ( $M=4$ ) system maintains an accurate frequency and slightly overdamps the amplitude. The training phase for the ML closure model was set to $50 \leqslant t \lt 100$ to avoid the most severe initial transient behaviour, and the time trace for the ML closure begins at $t=100$ .

Figure 7. Simulated Fourier–Hermite amplitude for a high-initial-amplitude (18 % of background) cosine density perturbation. The dynamics exhibits strongly nonlinear behaviour, attenuating the damping of the mode. As in the low-amplitude case, the ML closure model captures well the frequency and amplitude of the wave.

As in the previous section, figure 8 displays a comparison between the Hermite spectra of the high-resolution baseline, the ML closure model and two truncated low-resolution simulations, where one has an increased amount of hypercollisional regularisation to account for the lower resolution. At this higher initial amplitude, the Hermite spectrum of the baseline does not decrease monotonically, creating a more challenging scenario for the reservoirs to model. At the same value of $\nu _m$ , both the truncated low-resolution simulation and the ML closure model accurately capture the Hermite spectrum of the system at low $m$ , but as figure 9 reveals, only the ML closure model also agrees with the Fourier spectrum. Mean-squared errors in the ML closure spectra are $3.66 \times 10^{-18}$ and $1.19 \times 10^{-14}$ for the Hermite and Fourier spectra, respectively. The ML closure successfully captures both spectra of the system, reducing the required velocity-space resolution by a factor of 16. We observe some oscillatory behaviour in the Fourier spectrum of the ML closure, but our tests do not identify a mechanism that causes this behaviour. We obtain the reservoir hyperparameters by the same method as in the previous section, adjusting them to $\beta = 10^{-6}$ , $T=50$ and 120 nodes in each reservoir. Other parameters remain the same.

Figure 8. Hermite spectra of the high-velocity-resolution ( $M=65$ ) and truncated ( $M=4$ ) simulations and ML closure for the initial-value problem in figure 7 averaged over $k$ and $t$ . While both the ML closure model and truncated simulation agree with the high-resolution DNS, the ML closure model shows closer agreement.

Figure 9. Fourier spectra of the simulations in figure 8. The ML closure resolves the wavenumber spectrum more accurately than the truncated simulations, improving simulation accuracy at low resolution in velocity space.

A potentially surprising result is that the Hermite spectrum of the naively truncated simulation is more accurate than its Fourier spectrum, despite the fact that the truncation occurs in Hermite space. We intuit that this result is due to a pile-up of excess free energy at the closure moment, which then nonlinearly advects in Fourier space. Adjusting the strength of the hypercollisions slightly improves the accuracy of the Fourier spectrum, at the cost of overdamping the density and momentum moments. An analysis of the flux of free energy in this system similar to the work of Meyrand et al. (Reference Meyrand, Kanekar, Dorland and Schekochihin2019) may be used to investigate this result further, but that is beyond the scope of this work.

6. Summary and conclusion

In this paper, we have used reservoir computing to present an ML, velocity-space closure model for the hierarchy of Hermite moments in the one-dimensional Vlasov–Poisson system. The closure model serves as a proof-of-principle that reservoir computing can be used to reduce the simulation domain requirements for studying plasma dynamics.

A logical objective to pursue is to extend the closure to reduce the required resolution in the Fourier representation of position space. In practice, gyrokinetic turbulence simulations commonly use significantly more resolution for configuration space than they do for velocity space, in part because of the near-Maxwellian nature of the velocity distributions. Therefore, a Fourier-space closure model would potentially be more desirable than a velocity-space closure. However, the structure of the nonlinear term poses a challenge to achieving that goal. In the Vlasov–Poisson system, the coupling between resolved and unresolved scales in velocity space occurs as a sum of nearest neighbours in Hermite space. This locality in the spectral domain defines a clear, single closure term. In contrast, the convolution in the nonlinear term establishes non-local interactions across scales in Fourier space. An ML closure for Fourier space would require a different structure from the one developed for this paper.

The eventual goal is to build closure models that can reduce the domain requirements for higher-dimensional turbulence simulations, including the full gyrokinetic equation. A next step towards that goal would be to test this velocity-space closure in that system. There has been recent interest in applying spectral formulations of velocity space to gyrokinetic simulations, particularly with a Hermite basis used for parallel velocity and a Laguerre polynomial basis for perpendicular velocity (Jorge et al. Reference Jorge, Ricci and Loureiro2017; Mandell et al. Reference Mandell, Dorland and Landreman2018; Hoffmann et al., Reference Hoffmann, Frei and Ricci2023a , Reference Hoffmann, Frei and Riccib ; Frei et al. Reference Frei, Hoffmann, Ricci, Brunner and Tecchioll2023, Reference Frei, Mencke and Ricci2024). A successful implementation of an ML closure for Fourier–Hermite–Laguerre codes, including Gyacomo (Hoffmann et al. Reference Hoffmann, Frei and Ricci2023b ) and GX (Mandell et al. Reference Mandell, Dorland, Abel, Gaur, Kim, Martin and Qian2024), may improve their capabilities to resolve turbulence statistics with lower resolution requirements. Empirically, both codes calculate turbulent heat fluxes with high accuracy using eight or fewer Laguerre modes, suggesting that future work on ML closures should prioritise reductions in spatial and parallel velocity resolution requirements.

An ideal closure model would have a compact, symbolic form from which physical intuition can be derived. Significant recent progress towards interpretable, symbolic closures has been achieved using sparse regression techniques (Cheng et al. Reference Cheng, Fu, Wang, Dong, Jin, Jiang, Ma, Qin and Liu2023; Donaghy & Germaschewski Reference Donaghy and Germaschewski2023; Ingelsten et al. Reference Ingelsten, McGrae-Menge, Alves and Pusztai2024). One challenge that these algorithms face is the difficulty of finding symbolic closures that are not local in space. The analytic closure derived by Hammett & Perkins (Reference Hammett and Perkins1990) for the linear limit of the system in this paper is local in Fourier space, but it includes a non-local Hilbert transform when expressed in configuration space. Closures for the nonlinear form of this system or other systems of interest may also require information that is non-local in configuration space, but local in a spectral representation. The ML closure developed in this paper and by Huang et al. (Reference Huang, Dong and Wang2025) apply a Fourier representation to the data, providing the neural networks direct access to locality in spatial scales, and the velocity-space closure in this paper is also local in Hermite space. Though interpreting moment closures that use neural networks is more challenging than closures derived by sparse regression techniques, future work may incorporate feature importance identification methods like those formalised by Lundberg & Lee (Reference Lundberg and Lee2017).

Acknowledgements

We express our gratitude to R. Gaur, N.R. Mandell and M. Nastac for helpful discussions about turbulence, M. Almanza, E.P. Alves, D. Carvalho, J. Chou, F. Fiuza, G. Guttormsen, B. Jang, N.F. Loureiro, M. McGrae-Menge, J. Pierce and A. Velberg for their perspectives on subgrid modelling and E. Ott, D. Patel and A. Wikner for sharing their insights on reservoir computing.

Editor Nuno Loureiro thanks the referees for their advice in evaluating this article.

Funding

This work was made possible through the support of the United States Department of Energy under contract numbers DE-FG02-93ER54197 and DE-SC0022039.

Declaration of interest

M.L. is a consultant for Type One Energy.

Appendix A. Hermite resolution convergence

In § 5.3, we demonstrated that reservoir computing can be used to construct an ML closure that reduces the number of moments required to capture the low-order spectrum of the strongly nonlinear regime of the Vlasov–Poisson system by a factor of 16. In figure 10, we present the convergence study that we used to determine the high-resolution ( $M=65$ ) baseline for that case. The low-order moments represent important physical quantities, including density, momentum and energy. Therefore, accurately resolving those moments is a higher priority than resolving the fine-scale structure in velocity space that the high-order moments capture. In figure 10, the lowest Hermite resolution that captures the $m=0$ and $m=1$ density and momentum moments within a factor of 2 is $M=65$ , so that resolution was selected as the baseline case. Figure 11 shows a root-mean-square error (RMSE) metric for the low-order Hermite spectra at different Hermite resolutions, where we take the $M=513$ spectrum as ground truth. Inaccuracy in the density and momentum moments leads to increased RMSE for the $M=17$ and $M=33$ cases, and subsequent errors converge exponentially, beginning with $M=65$ .

Figure 10. Convergence study of the Hermite spectra for the strongly nonlinear initial-value problem from § 5.3. We solve (3.21)–(3.24) without the ML closure at increasing levels of resolution in velocity space. We selected $M=65$ moments for the high-resolution baseline case in § 5.3, as that case is the first to show convergence in the density and momentum moments.

Figure 11. Root-mean-square error in the low-order ( $m \in [0,1,2,3]$ ) Hermite spectra at the resolutions plotted in figure 10, as compared with the $M=513$ case. The $M=17$ and $M=33$ cases have similar errors, and exponential convergence occurs afterward.

Figure 12. Time trace of the $m=1$ moment for the low-amplitude initial-value case in § 5.2.

Figure 13. Time trace of the $m=2$ moment for the low-amplitude initial-value case in § 5.2.

Figure 14. Time trace of the $m=1$ moment for the high-amplitude initial-value case in § 5.3.

Figure 15. Time trace of the $m=2$ moment for the high-amplitude initial-value case in § 5.3.

Appendix B. Time evolution of low-order moments

Sections 5.2 and 5.3 demonstrate that the ML closure can capture spectra of the Vlasov–Poisson system and that the time evolution of the density moment is accurate. Figures 12–15 depict time traces of the $m=1$ and $m=2$ moments for the cases in those sections. The ML closure resolves the frequencies of the oscillations well. It also successfully captures the amplitude peaks soon after training ends, but it fails to match many of the minima. At long times after training, the amplitudes resulting from the ML closure begin to deviate slightly from the DNS amplitudes. In the weakly nonlinear regime presented in figures 12 and 13, the ML closure leads to a slightly lower amplitude at the end of the simulation for both the $m=1$ and $m=2$ moments. In the strongly nonlinear regime in figure 14, the ML closure trends towards a lower amplitude for $m=1$ than in the DNS baseline. However, for $m=2$ in that regime, figure 15 shows that the ML closure trends towards a slightly higher amplitude than the DNS. To quantitatively analyse this behaviour, we report damping rates for the DNS and ML closure in table 1. Damping rates were calculated by first extracting the local maxima from each time trace and then solving a linear regression problem for the natural logarithm of those maxima. An interesting observation is that the $m=1$ and $m=2$ results are identical for the low-amplitude case. In contrast, in the high-amplitude case, the DNS has a faster damping rate for $m=2$ than for $m=1$ . In the strongly nonlinear regime, the ML closure yields similar damping rates for $m=1$ and $m=2$ , but they diverge from the DNS damping rates.

Table 1. Damping rates calculated from the initial-value simulations in figures 1215. Here, $\epsilon$ is the amplitude of the initial cosine density perturbation and $m$ is the Hermite moment number. Damping rates were calculated by solving a linear regression problem for the natural logarithm of the local maxima in each time series.

Footnotes

1 Power laws typically arise when a scale-local transfer of energy occurs, and a separation between energy injection and dissipation scales exists. The relevance of scale locality to energy cascades in gyrokinetics has been studied in detail by Barnes, Parra & Schekochihin (Reference Barnes, Parra and Schekochihin2011) and Teaca, Jenko & Told (Reference Teaca, Jenko and Told2017). In some systems of interest, inverse cascades occur, driving invariants from small scales to large scales. The theory of inverse cascades has been developed for several two-dimensional systems (Hasegawa & Mima Reference Hasegawa and Mima1978; Plunk et al. Reference Plunk, Cowley, Schekochihin and Tatsuno2010), yet an open question remains about their applicability to the full five-dimensional gyrokinetic system.

References

Abramowitz, M. & Stegun, I.A. 1972 Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. 10th edn., pp. 773782. U.S. Dept. of Commerce, National Bureau of Standards, Dover Publications.Google Scholar
Adkins, T. & Schekochihin, A.A. 2018 A solvable model of Vlasov-kinetic plasma turbulence in Fourier–Hermite phase space. J. Plasma Phys. 84, 905840107.10.1017/S0022377818000089CrossRefGoogle Scholar
Arcomano, T., Szunyogh, I., Pathak, J., Wikner, A., Hunt, B.R. & Ott, E. 2020 A machine learning-based global atmospheric forecast model. Geophys. Res. Lett. 47, e2020GL087776.CrossRefGoogle Scholar
Armstrong, T.P. 1967 Numerical studies of the nonlinear Vlasov equation. Phys. Fluids 10, 12691280.10.1063/1.1762272CrossRefGoogle Scholar
Bañón Navarro, A., Teaca, B., Jenko, F., Hammett, G.W., Happel, T. & ASDEX Upgrade Team 2014 Applications of large eddy simulation methods to gyrokinetic turbulence. Phys. Plasmas 21, 032304.CrossRefGoogle Scholar
Bañón Navarro, A., Morel, P., Albrecht-Marc, M., Carati, D., Merz, F., Görler, T. & Jenko, F. 2011 Free energy cascade in gyrokinetic turbulence. Phys. Rev. Lett. 106, 055001.10.1103/PhysRevLett.106.055001CrossRefGoogle ScholarPubMed
Barnes, M., Parra, F.I. & Schekochihin, A.A. 2011 Critically balanced ion temperature gradient turbulence in fusion plasmas. Phys. Rev. Lett. 107, 115003.10.1103/PhysRevLett.107.115003CrossRefGoogle ScholarPubMed
Boyd, J.P. 2000 Chebychev and Fourier Spectral Methods. 2nd edn., pp. 117. Dover Publications.Google Scholar
Brunetti, M., Califano, F. & Pegoraro, F. 2000 Asymptotic evolution of nonlinear Landau damping. Phys. Rev. E 62, 4109.10.1103/PhysRevE.62.4109CrossRefGoogle ScholarPubMed
Carpentier, M.P. & Dos Santos, A.F. 1982 Solution of equations involving analytic functions. J. Comput. Phys. 45, 210.10.1016/0021-9991(82)90117-6CrossRefGoogle Scholar
Castagna, J., Schiavello, F., Zanisi, L. & Williams, J. 2024 StyleGAN as an AI deconvolution operator for large eddy simulations of turbulent plasma equations in BOUT++. Phys. Plasmas 31, 033902.10.1063/5.0189945CrossRefGoogle Scholar
Chattopadhyay, A., Hassanzadeh, P. & Subramanian, D. 2020 Data-driven predictions of a multiscale Lorenz 96 chaotic system using machine-learning methods: reservoir computing, artificial neural network, and long short-term memory network. Nonlinear Process. Geophys. 27, 373389.CrossRefGoogle Scholar
Cheng, W., Fu, H., Wang, L., Dong, C., Jin, Y., Jiang, M., Ma, J., Qin, Y. & Liu, K. 2023 Data-driven, multi-moment fluid modeling of Landau damping. Comput. Phys. Commun. 282, 108538.10.1016/j.cpc.2022.108538CrossRefGoogle Scholar
Clavier, B., Zarzoso, D., del-Castillo-Negrete, D. & Frénod, E. 2025 Generative-machine-learning surrogate of plasma turbulence. Phys. Rev. E 111, L013202.10.1103/PhysRevE.111.L013202CrossRefGoogle ScholarPubMed
Cooley, J.W. & Tukey, J.W. 1965 An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297301.10.1090/S0025-5718-1965-0178586-1CrossRefGoogle Scholar
Deardorff, J.W. 1970 A numerical study of three-dimensional turbulent channel flow at large Reynolds numbers. J. Fluid Mech. 41, 453480.10.1017/S0022112070000691CrossRefGoogle Scholar
Degrave, J., et al. 2022 Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602, 414419.CrossRefGoogle ScholarPubMed
Donaghy, J. & Germaschewski, K. 2023 In search of a data-driven symbolic multi-fluid ten-moment model closure. J. Plasma Phys. 89, 895890105.10.1017/S0022377823000119CrossRefGoogle Scholar
Durran, D.R. 2010 Numerical Methods for Fluid Dynamics. Chapter 2, 2nd edn. Springer.10.1007/978-1-4419-6412-0CrossRefGoogle Scholar
Frei, B.J., Hoffmann, A.C.D., Ricci, P., Brunner, S. & Tecchioll, Z. 2023 Moment-based approach to the flux-tube linear gyrokinetic model. J. Plasma Phys. 89, 905890414.CrossRefGoogle Scholar
Frei, B.J., Mencke, J. & Ricci, P. 2024 Full-F turbulent simulation in a linear plasma device using a gyro-moment approach. Phys. Plasmas 31, 012301.10.1063/5.0167997CrossRefGoogle Scholar
Fu, Y., Eldon, D., Erickson, K., Kleiwegt, K., Lupin-Jimenez, L., Boyer, M.D., Eidietis, N., Barbour, N., Izacard, O. & Kolemen, E. 2020 Machine learning control for disruption and tearing mode avoidance. Phys. Plasmas 27, 022501.CrossRefGoogle Scholar
Gibelli, L. & Shizgal, B.D. 2006 Spectral convergence of the Hermite basis function solution of the Vlasov equation: the free-streaming term. J. Comput. Phys. 219, 477488.10.1016/j.jcp.2006.06.017CrossRefGoogle Scholar
Goldreich, P. & Sridhar, S. 1995 Toward a theory of interstellar turbulence. II. Strong Alfvénic turbulence. Astrophys. J. 438, 763775.10.1086/175121CrossRefGoogle Scholar
Grant, F.C. & Feix, M.R. 1967 Fourier–Hermite solutions of the vlasov equations in the linearized limit. Phys. Fluids 10, 696702.CrossRefGoogle Scholar
Greif, R., Jenko, F. & Thuerey, N. 2023 Physics-preserving AI-accelerated simulations of plasma turbulence. arXiv: 2309.16400.Google Scholar
Hammett, G.W. & Perkins, F.W. 1990 Fluid moment models for Landau damping with application to the ion-temperature-gradient instability. Phys. Rev. Lett. 64, 3019.CrossRefGoogle Scholar
Harris, C.R., et al. 2020 Array programming with NumPy. Nature 585, 357362.10.1038/s41586-020-2649-2CrossRefGoogle ScholarPubMed
Hasegawa, A. & Mima, K. 1978 Pseudo-three-dimensional turbulence in magnetized nonuniform plasma. Phys. Fluids 21, 8792.10.1063/1.862083CrossRefGoogle Scholar
Highcock, E.G., Mandell, N.R., Barnes, M. & Dorland, W. 2018 Optimisation of confinement in a fusion reactor using a nonlinear turbulence model. J. Plasma Phys. 84, 905840208.10.1017/S002237781800034XCrossRefGoogle Scholar
Hoffmann, A.C.D., Frei, B.J. & Ricci, P. 2023a Gyrokinetic moment-based simulations of the Dimits shift. J. Plasma Phys. 89, 905890611.10.1017/S0022377823001320CrossRefGoogle Scholar
Hoffmann, A.C.D., Frei, B.J. & Ricci, P. 2023b Gyrokinetic simulations of plasma turbulence in a Z-pinch using a moment-based approach and advanced collision operators. J. Plasma Phys. 89, 905890214.10.1017/S0022377823000284CrossRefGoogle Scholar
Huang, Z., Dong, C. & Wang, L. 2025 Machine-learning heat flux closure for multi-moment fluid modeling of nonlinear landau damping. Proc. Natl. Acad. Sci. USA 122, e2419073122.10.1073/pnas.2419073122CrossRefGoogle ScholarPubMed
Ingelsten, E.R., McGrae-Menge, M.C., Alves, E.P. & Pusztai, I. 2024 Data-driven discovery of a heat flux closure for electrostatic plasma phenomena. arXiv: 2411.18358.10.1017/S0022377825000285CrossRefGoogle Scholar
Ivanov, P.G. & Adkins, T. 2023 An analytical form of the dispersion function for local linear gyrokinetics in a curved magnetic field. J. Plasma Phys. 89, 905890213.10.1017/S0022377823000077CrossRefGoogle Scholar
Jaeger, H. 2001 The “echo state” approach to analysing and training recurrent neural networks. GMD Report 148. German National Research Center for Information Technology.Google Scholar
Jalalvand, A., Kaptanoglu, A.A., Garcia, A.V., Nelson, A.O., Abbate, J., Austin, M.E., Verdoolaege, G., Brunton, S.L., Heidbrink, W.W. & Kolemen, E. 2022 Alfvén eigenmode classification based on ECE diagnostics at DIII-D using deep recurrrent neural networks. Nucl. Fusion 62, 026007.10.1088/1741-4326/ac3be7CrossRefGoogle Scholar
Jiang, J. & Lai, Y.-C. 2019 Model-free prediction of spatiotemporal dynamical systems with recurrent neural networks: role of network spectral radius. Phys. Rev. Res. 1, 033056.10.1103/PhysRevResearch.1.033056CrossRefGoogle Scholar
Jorge, R., Ricci, P. & Loureiro, N.F. 2017 A drift-kinetic analytical model for scrape-off layer plasma dynamics at arbitrary collisionality. J. Plasma Phys. 86, 905830606.10.1017/S002237781700085XCrossRefGoogle Scholar
Joyce, G., Knorr, G. & Meier, H.K. 1971 Numerical integration methods of the Vlasov equation. J. Comput. Phys. 8, 5363.10.1016/0021-9991(71)90034-9CrossRefGoogle Scholar
Kanekar, A., Schekochihin, A.A., Dorland, W. & Loureiro, N.F. 2015 Fluctuation-dissipation relations for a plasma-kinetic Langevin equation. J. Plasma Phys. 81, 305810104.10.1017/S0022377814000622CrossRefGoogle Scholar
Kerboua-Benlarbi, S., Nouailletas, R., Faugeras, B., Nardon, E. & Moreau, P. 2024 Magnetic control of WEST plasmas through deep reinforcement learning. IEEE Trans. Plasma Sci. 52, 3698.10.1109/TPS.2024.3377811CrossRefGoogle Scholar
Kim, P., et al. 2024 Optimization of nonlinear turbulence in stellarators. J. Plasma Phys. 90, 905900402.10.1017/S0022377824000369CrossRefGoogle Scholar
Kolmogorov, A.N. 1991 The local structure of turbulence in incompressible viscous fluid for very large Reynolds numbers. Proc. R. Soc. Lond. Ser. A 434, 913.Google Scholar
Kraichnan, R.H. 1959 The structure of isotropic turbulence at very high Reynolds numbers. J. Fluid Mech. 5, 497543.10.1017/S0022112059000362CrossRefGoogle Scholar
Loureiro, N.F., Dorland, W., Fazendeiro, L., Kanekar, A., Mallet, A., Vilelas, M.S. & Zocco, A. 2016 Viriato: a Fourier–Hermite spectral code for strongly magnetized fluid-kinetic plasma dynamics. Comput. Phys. Commun. 206, 4563.CrossRefGoogle Scholar
Loureiro, N.F., Schekochihin, A.A. & Zocco, A. 2013 Fast collisionless reconnection and electron heating in strongly magnetized plasmas. Phys. Rev. Lett. 111, 025002.10.1103/PhysRevLett.111.025002CrossRefGoogle ScholarPubMed
Lu, Z., Pathak, J., Hunt, B., Girvan, M., Brockett, R. & Ott, E. 2017 Reservoir observers: model-free infrerence of unmeasured variables in chaotic systems. Chaos 27, 041102.10.1063/1.4979665CrossRefGoogle ScholarPubMed
Lundberg, S.M. & Lee, S.-I. 2017 A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (ed. Guyon, I., Von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S. & Garnett, R.), vol. 30, pp. 47654774. Curran Associates, Inc.Google Scholar
Ma, C., Zhu, B., Xu, X.-Q. & Wang, W. 2020 Machine learning surrogate models for Landau fluid closure. Phys. Plasmas 27, 042502.CrossRefGoogle Scholar
Maass, W., Natschläger, T. & Makram, H. 2002 Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 14, 25312560.10.1162/089976602760407955CrossRefGoogle Scholar
Mandell, N.R., Dorland, W., Abel, I., Gaur, R., Kim, P., Martin, M. & Qian, T. 2024 GX: a GPU-native gyrokinetic turbulence code for tokamak and stellarator design. J. Plasma Phys. 90, 905900402.10.1017/S0022377824000631CrossRefGoogle Scholar
Mandell, N.R., Dorland, W. & Landreman, M. 2018 Laguerre-Hermite pseudo-spectral velocity formulation of gyrokinetics. J. Plasma Phys. 84, 905840108.10.1017/S0022377818000041CrossRefGoogle Scholar
Meyrand, R., Kanekar, A., Dorland, W. & Schekochihin, A.A. 2019 Fluidization of collisionless plasma turbulence. Proc. Natl. Acad. Sci. USA 116, 11851194.10.1073/pnas.1813913116CrossRefGoogle ScholarPubMed
Parker, J.T. & Dellar, P.J. 2015 Fourier–Hermite spectral representation for the Vlasov–Poisson system in the weakly collisional limit. J. Plasma Phys. 81, 305810203.10.1017/S0022377814001287CrossRefGoogle Scholar
Pathak, J., Hunt, B., Girvan, M., Lu, Z. & Ott, E. 2018 Model-free prediction of large spatiotemporally chaotic systems from data: a reservoir computing approach. Phys. Rev. Lett. 120, 024102.10.1103/PhysRevLett.120.024102CrossRefGoogle Scholar
Piccione, A., Berkery, J.W., Sabbagh, S.A. & Andreopoulos, Y. 2020 Physics-guided machine learning approaches to predict the ideal stability properties of fusion plasmas. Nucl. Fusion 60, 046033.10.1088/1741-4326/ab7597CrossRefGoogle Scholar
Plunk, G.G., Cowley, S.C., Schekochihin, A.A. & Tatsuno, T. 2010 Two-dimensional gyrokinetic turbulence. J. Fluid Mech. 664, 407435.10.1017/S002211201000371XCrossRefGoogle Scholar
Pyle, R., Jovanovic, N., Subramanian, D., Palem, K.V. & Patel, A.B. 2021 Domain-driven models yield better predictions at lower cost than reservoir computers in Lorenz systems. Philos. Trans. R. Soc. A 379, 20200246.10.1098/rsta.2020.0246CrossRefGoogle ScholarPubMed
Qin, Y., Ma, J., Jiang, M., Dong, C., Fu, H., Cheng, W. & jin, Y. 2023 Data-driven modeling of Landau damping by physics-informed neural networks. Phys. Rev. Res. 5, 033079.10.1103/PhysRevResearch.5.033079CrossRefGoogle Scholar
Rea, C., Montes, K.J., Erickson, K.G., Granetz, R.S. & Tinguely, R.A. 2019 A real-time machine learning-based disruption predictor on DIII-D. Nucl. Fusion 59, 096016.10.1088/1741-4326/ab28bfCrossRefGoogle Scholar
Reynolds, O. 1895 On the dynamical theory of incompressible viscous fluids and the determination of the criterion. Philos. Trans. R. Soc. Lond. (A) 186, 123164.Google Scholar
Roberg-Clark, G.T., Plunk, G.G., Xanthopoulos, P., Nührenberg, C., Henneberg, S.A. & Smith, H.M. 2023 Critical gradient turbulence optimization toward a compact stellarator reactor concept. Phys. Rev. Res. 5, L032030.10.1103/PhysRevResearch.5.L032030CrossRefGoogle Scholar
Rodriguez-Fernandez, P., Howard, N.T. & Candy, J. 2022 Nonlinear gyrokinetic predictions of SPARC burning plasma profiles enabled by surrogate modeling. Nucl. Fusion 62, 076036.10.1088/1741-4326/ac64b2CrossRefGoogle Scholar
Seo, J., Kim, S., Jalalvand, A., Conlin, R., Rothstein, A., Abbate, J., Erickson, K., Wai, J., Shousha, R. & Kolemen, E. 2024 Avoiding fusion plasma tearing instability with deep reinforcement learning. Nature 626, 746751.CrossRefGoogle ScholarPubMed
Teaca, B., Jenko, F. & Told, D. 2017 Gyrokinetic turbulence: between idealized estimates and a detailed analysis of nonlinear energy transfers. New J. Phys. 19, 045001.10.1088/1367-2630/aa6998CrossRefGoogle Scholar
Virtanen, P., et al. 2020 SciPy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods 17, 261272.10.1038/s41592-019-0686-2CrossRefGoogle ScholarPubMed
Vlachas, P.R., Pathak, J., Hunt, B.R., Sapsis, T.P., Girvan, M., Ott, E. & Koumoutsakos, P. 2020 Backpropagation algorithms and reservoir computing in recurrent neural networks for the forecasting of complex spatiotemporal dynamics. Neural Netw. 126, 191217.10.1016/j.neunet.2020.02.016CrossRefGoogle ScholarPubMed
Vlasov, A.A. 1968 The vibrational properties of an electron gas. Sov. Phys. Usp. 10, 721.10.1070/PU1968v010n06ABEH003709CrossRefGoogle Scholar
Yoshida, M., et al. 2025 Transport and confinement physics: Chapter 2 of the special issue: on the path to tokamak burning plasma operation. Nucl. Fusion 65, 033001.Google Scholar
Zocco, A. & Schekochihin, A.A. 2011 Reduced fluid-kinetic equations for low-frequency dynamics, magnetic reconnection, and electron heating in low-beta plasmas. Phys. Plasmas 18, 102309.10.1063/1.3628639CrossRefGoogle Scholar
Figure 0

Figure 1. Diagram of the reservoir computing closure model, integrated with the kinetic moment solver.

Figure 1

Figure 2. High-velocity-resolution ($M=1025$) baseline time series of Hermite amplitudes for the driven, linearised Vlasov–Poisson system. Energy is injected as a density perturbation in the $m=0$ Hermite moment and advects to higher moments through linear Landau damping. No hypercollisional regularisation is applied to this example, and hundreds of Hermite moments are required to prevent numerical reflection at the high-$m$ boundary from impacting the low-$m$ spectrum.

Figure 2

Figure 3. Time-averaged Hermite spectra for the ML closure model compared with the high-velocity-resolution ($M=1025$) baseline, closure-by-truncation with three different coefficients ($\nu _m$) for hypercollisional regularisation and the theoretical $m^{-1/2}$ scaling. The ML closure model shows strong agreement with the baseline simulation, while no value of $\nu _m$ produces an accurate spectrum.

Figure 3

Figure 4. Comparison between baseline numerical solution, ML closure and theoretical damping rate of the Fourier–Hermite amplitude for an initial cosine density perturbation. The low-amplitude perturbation shows strong agreement with the theoretical damping rate. When augmented with the ML closure, the moment solver continues to capture the behaviour well at a lower Hermite resolution of $M=4$, as opposed to the $M=17$ baseline.

Figure 4

Figure 5. Hermite spectra of the high-velocity-resolution ($M=17$) and truncated ($M=4$) simulations and ML closure for the initial-value problem in figure 4. The spectra are averaged over time and Fourier wavenumber. The ML closure model permits a low-resolution simulation to accurately resolve the Hermite spectrum.

Figure 5

Figure 6. Fourier spectra of the simulations in figure 5. In this weakly nonlinear regime, the majority of the energy remains in the boxscale mode. Each reduced Hermite resolution model properly captures the rapid energy decay across $k$.

Figure 6

Figure 7. Simulated Fourier–Hermite amplitude for a high-initial-amplitude (18 % of background) cosine density perturbation. The dynamics exhibits strongly nonlinear behaviour, attenuating the damping of the mode. As in the low-amplitude case, the ML closure model captures well the frequency and amplitude of the wave.

Figure 7

Figure 8. Hermite spectra of the high-velocity-resolution ($M=65$) and truncated ($M=4$) simulations and ML closure for the initial-value problem in figure 7 averaged over $k$ and $t$. While both the ML closure model and truncated simulation agree with the high-resolution DNS, the ML closure model shows closer agreement.

Figure 8

Figure 9. Fourier spectra of the simulations in figure 8. The ML closure resolves the wavenumber spectrum more accurately than the truncated simulations, improving simulation accuracy at low resolution in velocity space.

Figure 9

Figure 10. Convergence study of the Hermite spectra for the strongly nonlinear initial-value problem from § 5.3. We solve (3.21)–(3.24) without the ML closure at increasing levels of resolution in velocity space. We selected $M=65$ moments for the high-resolution baseline case in § 5.3, as that case is the first to show convergence in the density and momentum moments.

Figure 10

Figure 11. Root-mean-square error in the low-order ($m \in [0,1,2,3]$) Hermite spectra at the resolutions plotted in figure 10, as compared with the $M=513$ case. The $M=17$ and $M=33$ cases have similar errors, and exponential convergence occurs afterward.

Figure 11

Figure 12. Time trace of the $m=1$ moment for the low-amplitude initial-value case in § 5.2.

Figure 12

Figure 13. Time trace of the $m=2$ moment for the low-amplitude initial-value case in § 5.2.

Figure 13

Figure 14. Time trace of the $m=1$ moment for the high-amplitude initial-value case in § 5.3.

Figure 14

Figure 15. Time trace of the $m=2$ moment for the high-amplitude initial-value case in § 5.3.

Figure 15

Table 1. Damping rates calculated from the initial-value simulations in figures 12–15. Here, $\epsilon$ is the amplitude of the initial cosine density perturbation and $m$ is the Hermite moment number. Damping rates were calculated by solving a linear regression problem for the natural logarithm of the local maxima in each time series.