Hostname: page-component-745bb68f8f-grxwn Total loading time: 0 Render date: 2025-01-09T08:59:02.520Z Has data issue: false hasContentIssue false

A deep operator network for Bayesian parameter identification of self-oscillators

Published online by Cambridge University Press:  25 November 2024

Tobias Sugandi
Affiliation:
CAPS Laboratory, Department of Mechanical and Process Engineering, ETH Zürich, Zürich, Switzerland
Bayu Dharmaputra*
Affiliation:
CAPS Laboratory, Department of Mechanical and Process Engineering, ETH Zürich, Zürich, Switzerland
Nicolas Noiray*
Affiliation:
CAPS Laboratory, Department of Mechanical and Process Engineering, ETH Zürich, Zürich, Switzerland
*
Corresponding authors: Bayu Dharmaputra and Nicolas Noiray; Emails: bayud@ethz.ch; noirayn@ethz.ch
Corresponding authors: Bayu Dharmaputra and Nicolas Noiray; Emails: bayud@ethz.ch; noirayn@ethz.ch

Abstract

Many physical systems exhibit limit-cycle oscillations that can typically be modeled as stochastically driven self-oscillators. In this work, we focus on a self-oscillator model where the nonlinearity is on the damping term. In various applications, it is crucial to determine the nonlinear damping term and the noise intensity of the driving force. This article presents a novel approach that employs a deep operator network (DeepONet) for parameter identification of self-oscillators. We build our work upon a system identification methodology based on the adjoint Fokker–Planck formulation, which is robust to the finite sampling interval effects. We employ DeepONet as a surrogate model for the operator that maps the first Kramers–Moyal (KM) coefficient to the first and second finite-time KM coefficients. The proposed approach can directly predict the finite-time KM coefficients, eliminating the intermediate computation of the solution field of the adjoint Fokker–Planck equation. Additionally, the differentiability of the neural network readily facilitates the use of gradient-based optimizers, further accelerating the identification process. The numerical experiments demonstrate that the proposed methodology can recover desired parameters with a significant reduction in time while maintaining an accuracy comparable to that of the classical finite-difference approach. The low computational time of the forward path enables Bayesian inference of the parameters. Metropolis-adjusted Langevin algorithm is employed to obtain the posterior distribution of the parameters. The proposed method is validated against numerical simulations and experimental data obtained from a linearly unstable turbulent combustor.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press

Impact Statement

We demonstrate the applicability of the deep operator network (DeepONet) to identify the parameters of a stochastically forced van der Pol oscillator. The DeepONet is used as a surrogate for the first and second finite-time Kramers–Moyal coefficients, offering computational efficiency two orders of magnitude higher than the traditional finite-difference method. The efficiency of DeepONet allows for a Bayesian inference routine at low computational cost.

1. Introduction

Dynamical systems that exhibit limit cycle occur frequently in nature and technological applications. Limit cycles are observed in biology (Nomura et al., Reference Nomura, Sato, Doi, Segundo and Stiber1994; Rompala et al., Reference Rompala, Rand and Howland2007), quantum mechanical system (Yulin et al., Reference Yulin, Sedov, Kavokin and Shelykh2024), flow physics (Boujo and Cadot, Reference Boujo and Cadot2019; Bourquard et al., Reference Bourquard, Faure-Beaulieu and Noiray2021), aerodynamic fluttering (Hoskoti et al., Reference Hoskoti, Misra and Sucheendran2020), thermoacoustic oscillation in a sequential combustors (Dharmaputra et al., Reference Dharmaputra, Shcherbanev, Schuermans and Noiray2023), and so forth. Often, one can model the dynamics with a nonlinear harmonic oscillator. A natural choice to model such behavior is the Van der Pol (VdP) oscillator, which incorporates nonlinearity in the damping term.

When the parameters of the oscillator are known, simulating the system can be done in a straightforward manner. However, identifying these parameters from the measurement data is a rather challenging task. Identifying the parameters is beneficial for understanding the system itself, as well as for control applications. In many situations, classical system identification methods that involve applying an external input forcing to the system are not feasible. However, when an inherent stochastic forcing is present in the system, for instance the effect of flow turbulence in flutter phenomena, it can be exploited for parameter identification. In this work, we focus on identifying the parameters of the VdP oscillator subject to intense additive stochastic forcing. Such forcing enables the observer to analyze the system dynamics as it is continuously and randomly perturbed from its deterministic attractor, providing valuable insights about its nonlinear characteristics.

Various strategies to identify the unknown parameters of a stochastically forced harmonic oscillator from the experimental data are presented in Noiray and Schuermans (Reference Noiray and Schuermans2013). In the case where the system is linearly stable on the fixed-point branch, the linear damping rate $ \nu $ can be directly obtained by fitting a Lorentzian curve on the resonance peak of the measured power spectrum. In this work, we focus on the more interesting case where the system is operating on a limit cycle and nonlinear effects come into play. The system is modeled as a stochastically forced VdP oscillator with a small saturation constant (see Figure 1 for a sample time series). When the noise intensity level is low, one can infer the parameters by fitting the power spectral density of the amplitude fluctuations. A more general method uses the transitional statistics of the stationary acoustic pressure envelope to identify the linear growth rate $ \nu $ , the noise intensity $ \Gamma /2{\omega}^2 $ , and the saturation constant $ \kappa $ . It employs a careful extrapolation to estimate the drift and diffusion coefficients of the Fokker–Planck (FP) equation from the transition moments of the signal envelope. The method along with the postulated VdP oscillator as the underlying model is validated against experimental measurements in Noiray and Denisov (Reference Noiray and Denisov2017) for a thermoacoustic instability in combustor.

Figure 1. A realization of the stochastic Van der Pol oscillator with parameters $ \left(\nu = 2.5,\kappa = 1.6,{D}^{(2)}= 2.5\right) $ that models the local acoustic pressure fluctuation $ \eta (t) $ in a combustion chamber (red), and the corresponding envelope $ A(t) $ (black). The probability distributions for both the oscillation ( $ \eta (t) $ taking value $ \eta $ ) and the envelope ( $ A(t) $ taking value $ a $ ) are displayed on the left and right sides of the time signal, respectively, illustrating the statistical properties of the entire time signal.

A subsequent improvement where no extrapolation procedure needs to be involved is proposed in Boujo and Noiray (Reference Boujo and Noiray2017), and the method has been applied in Boujo et al. (Reference Boujo, Bourquard, Xiong and Noiray2020) for the whistling of a beer bottle and in Lee, Kim et al. (Reference Lee, Kim, Lee, Kim and Yi2023) for Hall thruster discharge instability. The proposed method is more robust to the finite-time effects that occur due to finite sampling rate, non-delta-correlated noise, and band-pass filtering necessary to isolate the mode of interest. The method relies on the fact that finite-time Kramers–Moyal (KM) coefficients can be obtained directly and exactly (up to numerical accuracy) by solving the adjoint FP (AFP) equation with well-defined initial condition (Honisch and Friedrich, Reference Honisch and Friedrich2011; Lade, Reference Lade2009). The equation is discretized by means of the finite-difference (FD) method. An initial guess of the parameters is given and then the equation is solved iteratively until the combination of the parameters gives an acceptable error between the finite-time KM coefficients from the FD solution and from the post-processed time signal data. The authors (Boujo and Noiray, Reference Boujo and Noiray2017) use the Nelder–Mead simplex algorithm (direct search method) to solve the optimization problem. A recent work in Lee, Kim, and Park (Reference Lee, Kim and Park2023) proposed the use of a gradient-based optimizer using FD to approximate the required gradient.

We aim to extend this AFP-based parameter identification framework by using the deep operator network (DeepONet) proposed in Lu, Jin et al. (Reference Lu, Jin, Pang, Zhang and Karniadakis2021). It is a scientific machine learning framework capable of mapping a function to another function. Contrary to the widely known physics-informed neural networks (Raissi et al., Reference Raissi, Perdikaris and Karniadakis2019) that act as the approximator to the solution of a PDE, DeepONet does not require retraining every time a parameter of the PDE is changed. This is because in addition to taking the coordinates of the output function as input, DeepONet also takes an input function that may characterize the boundary condition or the coefficients that parameterize the PDE. As a result, DeepONet has demonstrated successful efficiency improvements in various scientific and engineering applications (Lin et al., Reference Lin, Maxey, Li and Karniadakis2021; Lu et al., Reference Lu, Pestourie, Johnson and Romano2022; Mao et al., Reference Mao, Lu, Marxen, Zaki and Karniadakis2021; Yin et al., Reference Yin, Zhang, Yu and Karniadakis2022). In particular, Lu et al. (Reference Lu, Pestourie, Johnson and Romano2022) combined DeepONet with either a genetic algorithm or gradient-based optimizer to solve an inverse problem in nanoscale heat transport very efficiently.

In this work, we employ DeepONet as a surrogate for the operator that maps the parameters of the AFP equation to the first and second finite-time KM coefficients. The use of DeepONet as the forward simulator replaces multiple calls to the FD solver and avoids intermediate calculation of the whole solution field of the AFP equation. Once trained, DeepONet can run a forward simulation in a fraction of a second. Furthermore, owing to the automatic differentiation framework, exact gradient information is available such that gradient-based optimizers can be used to further accelerate the parameter identification process. On top of that, the extremely fast forward pass using DeepONet allows for a Markov Chain Monte Carlo (MCMC) algorithm to be performed where thousands of samples are generated to perform Bayesian inference. In this way, the uncertainties of the inferred parameters could be obtained.

This article is organized as follows. Section 2 reviews the stochastically forced harmonic oscillator and the associated FP formulations. Section 3 describes the working principle of DeepONet followed by the corresponding parameter identification framework in Section 4. Section 5 describes the Bayesian inference procedure using the gradient-based adaptive MCMC algorithm. Several results from the numerical experiments are then presented in Section 6 where the proposed methodology is evaluated to identify the parameters of synthetic and experimental time series. Finally, we conclude the findings of this article in Section 7.

2. Background theory

2.1. Harmonic oscillator

Let us start with the general stochastically forced harmonic oscillator with $ \eta $ as the state variable,

(2.1) $$ \frac{d^2\eta }{{d t}^2}+f\left(\eta, \frac{d\eta}{d t}\right)+{\omega}^2\eta =\xi (t), $$

where $ \xi (t) $ is a white noise with intensity $ \Gamma $ . By applying Hilbert transform on $ \eta (t) $ and taking the real part, that is, $ \eta (t)=\mathcal{\Re}\left(A(t){e}^{i\omega t+\phi (t)}\right) $ , the following relations hold:

(2.2) $$ \eta =A\cos \left(\omega t+\phi \right) $$
(2.3) $$ \frac{d\eta}{d t}=- A\omega \sin \left(\omega t+\phi \right) $$

$ A(t) $ and $ \phi (t) $ are assumed to vary slower than the $ \cos \left(\omega t\right) $ term (see Figure 1). Consequently, $ \frac{d^2\eta }{dt^2} $ in the left-hand side of Eq. (2.1) can be expressed as

(2.4) $$ \frac{d^2\eta }{{d t}^2}=-\frac{d A}{d t}\omega \sin \left(\omega \hskip0.1em t+\phi \right)- A\omega \cos \left(\omega \hskip0.1em t+\phi \right)\left[\omega +\frac{d\phi}{d t}\right]. $$

Let $ \Phi =\omega t+\phi $ , by employing the trigonometric identity: $ {\sin}^2\left(\Phi \right)+{\cos}^2\left(\Phi \right)=1 $ , Eq. (2.1) can then be rewritten as

(2.5) $$ {\displaystyle \begin{array}{c}-\frac{d A}{d t}\omega \sin \left(\Phi \right)- A\omega \cos \left(\Phi \right)\frac{d\phi}{d t}=-f\left(A\cos \left(\Phi \right),- A\omega \sin \left(\Phi \right)\right)\left({\sin}^2\left(\Phi \right)+{\cos}^2\left(\Phi \right)\right)\\ {}\hskip1.8em +\xi (t)\left({\sin}^2\left(\Phi \right)+{\cos}^2\left(\Phi \right)\right).\end{array}} $$

Using the orthogonality between $ \sin \left(\Phi \right) $ and $ \cos \left(\Phi \right) $ , and grouping terms that depend on $ \sin \left(\Phi \right) $ and $ \cos \left(\Phi \right) $ separately, the first-order equations for $ A(t) $ and $ \phi (t) $ can be retrieved and expressed as

(2.6) $$ \frac{dA}{dt}=\underset{F}{\underbrace{\frac{\sin \left(\Phi \right)}{\omega }}}\left[\hskip0.27em f\left(A\cos \left(\Phi \right),- A\omega \sin \left(\Phi \right)\right)-\xi (t)\right], $$
(2.7) $$ \frac{d\phi}{d t}=\underset{G}{\underbrace{\frac{\cos \left(\Phi \right)}{A(t)\omega }}}\left[\hskip0.28em f\left(A\cos \left(\Phi \right),- A\omega \sin \left(\Phi \right)\right)-\xi (t)\right]. $$

We now employ the nonlinearities associated with the VdP oscillator,

(2.8) $$ \hskip-11em f\left(\eta, \frac{d\eta}{d t}\right)=\left({\kappa \eta}^2-2\nu \right)\frac{d\eta}{d t} $$
(2.9) $$ \hskip6em =\left[2\nu -\kappa {A}^2{\cos}^2\left(\Phi \right)\right] A\omega \sin \left(\Phi \right) $$

It is worth mentioning that the method, which is validated in this article with this type of nonlinearity, can be applied to more complex saturable nonlinear damping.

Focusing on the dynamics of $ A(t) $ , we apply deterministic averaging to the $ Ff $ term in Eq. (2.6), that is,

(2.10) $$ {\displaystyle \begin{array}{c}\left\langle Ff\right\rangle =\frac{1}{2\pi }{\int}_0^{2\pi } Ffd\Phi \\ {}=\frac{1}{2\pi }{\int}_0^{2\pi }2\nu A{\sin}^2\left(\Phi \right)-\kappa {A}^3{\cos}^2\left(\Phi \right){\sin}^2\left(\Phi \right)d\Phi \\ {}=\frac{1}{2\pi }{\int}_0^{2\pi}\nu A\left(1-\cos \left(2\Phi \right)\right)-\frac{\kappa {A}^3}{8}\left(1-\cos \left(4\Phi \right)\right)d\Phi \\ {}=A\left(\nu -\frac{\kappa }{8}{A}^2\right).\end{array}} $$

Because $ F $ and $ G $ both depend on the dynamical variables $ A $ and $ \Phi $ , an additional “drift,” μ, term appears in the averaged dynamics. This term can be obtained by performing stochastic averaging, as described in Roberts and Spanos (Reference Roberts and Spanos1986) and Stratonovich (Reference Stratonovich1963). The expression for $ \mu $ is as follows:

(2.11)

The noise term $ \xi (t) $ is also averaged, which then induces a new noise variance $ {\sigma}^2 $ ,

(2.12) $$ {\displaystyle \begin{array}{c}{\sigma}^2=\frac{1}{2\pi }{\int}_0^{2\pi }{\int}_{-\infty}^{\infty}\frac{\sin^2\left(\Phi \right)}{\omega^2}\unicode{x1D53C}\left[\xi (t)\xi \left(t+\tau \right)\right] d\tau d\Phi \\ {}=\frac{\Gamma}{2{\omega}^2}.\end{array}} $$

Finally, the Langevin equation describing the dynamics of the amplitude can be cast to

(2.13) $$ {\displaystyle \begin{array}{c}\frac{dA}{dt}=\left\langle Ff\right\rangle +\mu +\zeta (t)\\ {}=A\left(\nu -\frac{\kappa }{8}{A}^2\right)+\frac{\Gamma}{4{\omega}^2A}+\zeta (t).\end{array}} $$

where $ \zeta $ is a delta-correlated white noise satisfying $ \left\langle \zeta (t)\zeta \left(t+\tau \right)\right\rangle =\frac{\Gamma}{2{\omega}^2}\hskip0.1em \delta \left(\tau \right) $ such that the Markov property holds for the amplitude process $ A(t) $ .

2.2. The FP equation

The dynamics of the amplitude (Eq. 2.13) can be equivalently described by the evolution of the corresponding probability density function (PDF) governed by the FP equation. The connection is provided through the KM coefficients defined as

(2.14) $$ {D}^{(n)}(a)=\underset{\tau \to 0}{\lim}\frac{1}{n!\tau}\left\langle {\left[A\left(t+\tau \right)-A(t)\right]}^n\;|\hskip0.1em A(t)=a\right\rangle . $$

Inserting the diffusion process (Eq. 2.13) into the above equation results in

(2.15) $$ {D}^{(1)}(a)=\nu a-\frac{\kappa }{8}{a}^3+\frac{\Gamma}{4{\omega}^2a}, $$
(2.16) $$ {D}^{(2)}(a)=\frac{\Gamma}{4{\omega}^2}, $$

and $ {D}^{(4)}=0 $ . Consequently, according to the Pawula theorem (Pawula, Reference Pawula1967), all other coefficients $ {D}^{(n)} $ with $ n\ge 3 $ to vanish. The evolution of the PDF of the acoustic pressure envelope $ P\left(a,t\right) $ is then exactly described by the following FP equation (Risken, Reference Risken1996),

(2.17) $$ \frac{\partial }{\partial t}P\left(a,t\right)=\hat{L}(a)\hskip0.1em P\left(a,t\right), $$

where the FP operator is

(2.18) $$ \hat{L}(a)=-\frac{\partial }{\partial a}{D}^{(1)}(a)+\frac{\partial^2}{\partial {a}^2}{D}^{(2)}(a). $$

A stationary PDF can be obtained as the long time solution of the above equation, that is,

(2.19) $$ {P}_{\infty }(a)=\frac{1}{\mathcal{Z}}\exp \left(\frac{1}{D^{(2)}}{\int}_0^a{D}^{(1)}\left({a}^{\prime}\right)d{a}^{\prime}\right), $$

where $ \mathcal{Z} $ is a normalization coefficient.

2.3. The AFP equation

Parametric system identification can be performed by matching the statistics of the time signal with the transition PDF $ P\left({a}^{\prime },t+\tau\;|\hskip0.1em a,t\right) $ calculated by solving the FP equation (Eq. 2.17). However, this procedure may invoke numerical issues since it involves a Dirac delta initial condition (Honisch and Friedrich, Reference Honisch and Friedrich2011).

The work of Lade (Reference Lade2009) provides an elegant alternative procedure based on the AFP formalism. Starting with the finite-time KM coefficient,

(2.20) $$ {D}_{\tau}^{(n)}\left(a,\tau \right)=\frac{1}{n!\tau }{\int}_0^{\infty }{\left({a}^{\prime }-a\right)}^n\hskip0.1em P\left({a}^{\prime },t+\tau\;|\hskip0.1em a,t\right)\hskip0.1em d{a}^{\prime }, $$

the idea is to interpret the transition PDF as the solution of the Fokker Planck equation with initial condition $ \delta \left({a}^{\prime }-a\right) $ ,

(2.21) $$ {\displaystyle \begin{array}{c}{D}_{\tau}^{(n)}\left(a,\tau \right)=\frac{1}{n!\tau }{\int}_0^{\infty }{\left({a}^{\prime }-a\right)}^n\;{e}^{\hat{L}\left({a}^{\prime}\right)\tau}\hskip0.1em \delta \left({a}^{\prime }-a\right)\hskip0.1em d{a}^{\prime}\\ {}=\frac{1}{n!\tau }{\int}_0^{\infty }{e}^{{\hat{L}}^{\dagger}\left({a}^{\prime}\right)\tau}\hskip0.1em {\left({a}^{\prime }-a\right)}^n\;\delta \left({a}^{\prime }-a\right)\hskip0.1em d{a}^{\prime}\\ {}{\left.=\frac{1}{n!\tau}\;{e}^{{\hat{L}}^{\dagger}\left({a}^{\prime}\right)\tau}\hskip0.1em {\left({a}^{\prime }-a\right)}^n\right|}_{a^{\prime }=a},\end{array}} $$

where $ {\hat{L}}^{\dagger } $ is the AFP operator,

(2.22) $$ {\hat{L}}^{\dagger }(a)={D}^{(1)}(a)\frac{\partial }{\partial a}+{D}^{(2)}(a)\frac{\partial^2}{\partial {a}^2}. $$

Equation (2.21) implies that the finite-time KM coefficients $ {D}_{\tau}^{(n)} $ can be obtained by solving the AFP equation,

(2.23) $$ \frac{\partial {P}_{n,a}^{\dagger}\left({a}^{\prime },t\right)}{\partial t}={\hat{L}}^{\dagger}\left({a}^{\prime}\right){P}_{n,a}^{\dagger}\left({a}^{\prime },t\right), $$

with initial condition

(2.24) $$ {P}_{n,a}^{\dagger}\left({a}^{\prime },0\right)={\left({a}^{\prime }-a\right)}^n, $$

and evaluating at $ t=\tau $ and $ {a}^{\prime }=a $ ,

(2.25) $$ {D}_{\tau}^{(n)}\left(a,\tau \right)=\frac{1}{n!\tau }{P}_{n,a}^{\dagger}\left(a,\tau \right). $$

Therefore, given a pressure signal envelope, we can perform parameter identification by minimizing the difference between the finite-time coefficients, $ {D}_{\tau}^{(n)} $ , obtained from the solution of the AFP equation, and the finite-time coefficients obtained by estimating the conditional average of the time signal using Eq. (2.20). Compared to the extrapolation-based procedure for parameter identification (Noiray and Denisov, Reference Noiray and Denisov2017; Siegert et al., Reference Siegert, Friedrich and Peinke1998), the AFP-based approach is robust to error due to the finite-time effects since it does not involve taking the limit $ \tau \to 0 $ (see Boujo and Noiray (Reference Boujo and Noiray2017) and Honisch and Friedrich (Reference Honisch and Friedrich2011) for elaborations).

The solution of the AFP equation can be obtained by employing a numerical time-stepping scheme and using FD to approximate the spatial derivatives. We illustrate in Figure 2 the simulation results of the AFP equation and how the finite-time KM coefficients can be extracted from the solution. It is worth highlighting that for a particular value of the amplitude $ A=a $ , we have a distinct initial condition $ {\left({a}^{\prime }-a\right)}^n $ for the AFP simulation over the $ \left({a}^{\prime },t\right) $ domain to extract the nth finite-time KM coefficient. For fixed parameters $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ , the simulations must be performed $ 2{N}_a $ times, for both KM coefficients and for all amplitudes present in the signal. We then only extract the solution along the line $ {a}^{\prime }=a $ to calculate the finite-time KM coefficients. In the next section, we will present a DeepONet that can directly map the input parameters $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ to the desired outputs $ {D}_{\tau}^{(1)}\left(a,\tau \right) $ and $ {D}_{\tau}^{(2)}\left(a,\tau \right) $ without the need to calculate the entire solution field of the AFP equation multiple times.

Figure 2. Illustration of the calculation of the first (left) and second (right) finite-time KM coefficients from the solution of the AFP equation for $ a= 5.5,\nu = 6.5,\kappa = 1.6,{D}^{(2)}= 2.5 $ . The finite-time KM coefficients are obtained by extracting the solution $ {P}_{n,a}^{\dagger}\left({a}^{\prime },\tau \right) $ at $ {a}^{\prime }=a $ for several values of $ \tau $ .

3. DeepONet

While it is well known that neural networks with as few as a single hidden layer are universal function approximators, there is a similar theorem that proves that neural networks are universal operator approximators (Chen and Chen, Reference Chen and Chen1995). Inspired by the structure given in the universal approximation theorem for nonlinear operators, Lu, Jin et al., Reference Lu, Jin, Pang, Zhang and Karniadakis2021, propose a neural network architecture termed as the DeepONet that maps a function to another function. In this work, we propose a DeepONet architecture (Figure 3) suited for efficient parameter identification based on the AFP formulation presented in the previous section.

Figure 3. A DeepONet architecture designed for the parameter identification based on the AFP formalism. It approximates the operator that maps the first Kramers–Moyal (KM) coefficient $ {D}^{(1)}(a) $ to both the first $ {D}_{\tau}^{(1)}\left(a,\tau \right) $ and second $ {D}_{\tau}^{(2)}\left(a,\tau \right) $ finite-time KM coefficients. Although only the first KM coefficient $ {D}^{(1)}(a) $ appears as the input to the branch network, the second KM coefficient $ {D}^{(2)} $ implicitly influences the forward calculation process through $ {D}^{(1)}(a) $ according to Eq. (2.15).

The proposed operator network is designed as a surrogate model for the operator $ \mathcal{G} $ that maps the first KM coefficient, $ {D}^{(1)}(a) $ , to the finite-time KM coefficients, $ {D}_{\tau}^{(n)}\left(a,\tau \right) $ , that is,

(3.1) $$ {D}_{\tau}^{(n)}\left(a,\tau \right)=\mathcal{G}\left({D}^{(1)}\right)\left(a,\tau \right),\hskip1em n=1,2. $$

The motivation behind this architecture originates from the formulations given in Eqs. (2.22)–(2.25), where finite-time KM coefficients can be extracted from the AFP solution given the drift and diffusion coefficients. Since we only consider additive noise, that is, a constant diffusion coefficient $ {D}^{(2)} $ , we can simply take the drift coefficient, $ {D}^{(1)}(a) $ , to represent the input. As can be seen in Eq. (2.15), the function $ {D}^{(1)}(a) $ is completely characterized by the parameters $ \nu, \kappa $ , and $ {D}^{(2)} $ . Therefore, once the network is trained, given a set of parameters $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ , we can directly predict $ {D}_{\tau}^{(1)} $ and $ {D}_{\tau}^{(2)} $ at specified values of $ \left(a,\tau \right) $ within a fraction of a second avoiding multiple simulations of the AFP equation.

The DeepONet architecture presented in Figure 3 consists of a branch network and two trunk networks. The trunk networks serve to encode the domain $ \left(a,\tau \right) $ of the output functions and generate basis functions. The first trunk network provides basis functions for $ {D}_{\tau}^{(1)} $ and the second trunk network provides basis functions for $ {D}_{\tau}^{(2)} $ . On the other hand, the branch network takes discrete representations of the input function and processes them to generate coefficients for the basis functions provided by the trunk networks. The output functions are then obtained by taking the inner product between the output of the branch and trunk networks as follows,

(3.2) $$ {D}_{\tau, \theta}^{(n)}\left(a,\tau \right)=\sum \limits_{k=1}^p{b}_k^{(n)}{t}_k^{(n)}+{b}_0^{(n)},\hskip1em n=1,2, $$

where $ {D}_{\tau, \theta}^{(n)} $ is the DeepONet prediction of the nth finite-time KM coefficient; $ {b}_k^{(n)} $ and $ {t}_k^{(n)} $ are the outputs of the branch and trunk networks, respectively; $ {b}_0^{(n)} $ is a bias term; and $ p $ is the total number of neurons in the output layer of a trunk network.

In this work, we employ fully connected neural network architecture for the branch and trunk networks. We use the hyperbolic tangent activation function and Glorot normal initialization. All branch and trunk networks have 5 hidden layers with 128 number of neurons per layer. As we have to work with discrete objects, we evaluate the input of the branch network, $ {D}^{(1)}(a) $ , at $ 21 $ fixed locations $ a=\left(0.5,0.6,\dots, \mathrm{2.5}\right) $ . Each trunk network takes 2 values (the coordinate $ \left(a,\tau \right) $ ) as input and outputs a layer containing 128 neurons. The branch network outputs a layer with 128 neurons, which is then connected via a fully connected linear connection to a second output layer, also consisting of 128 neurons. This configuration is intended to provide enhanced expressivity for predicting the second finite-time KM coefficients (see Figure 5). We base our implementation on the scientific machine learning library DeepXDE (Lu, Meng et al., Reference Lu, Meng, Mao and Karniadakis2021) with PyTorch backend.

3.1. Training

We train the network by providing pairs of input output functions $ \left[{D}^{(1)};{D}_{\tau}^{(1)},{D}_{\tau}^{(2)}\right] $ as training data. For a given set of parameters $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ , we can perform FD simulations of the AFP equation to obtain the finite-time KM coefficients $ {D}_{\tau}^{(1)}\left(a,\tau \right) $ and $ {D}_{\tau}^{(2)}\left(a,\tau \right) $ . We record the solutions at several values of $ a $ that occur with probability larger than 0.1% according to the theoretical stationary PDF (Eq. 2.19) with the spacing $ da=0.05 $ and we select 50 values of $ \tau $ uniformly distributed in $ \left[\mathrm{0.01,0.5}\right] $ . The training data comprise input and output samples from 2900 different sets of $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ . The values of $ \nu, \kappa $ , and $ {D}^{(2)} $ range from $ -20 $ to $ 20 $ , $ 0.5 $ to $ 5.25 $ , and $ 1 $ to $ 20 $ , respectively.

Each AFP equation is solved over the following domain $ \left[{a}_{\mathrm{min}}^{\prime },{a}_{\mathrm{max}}^{\prime}\right]\times \left[{\tau}_{\mathrm{min}},{\tau}_{\mathrm{max}}\right] $ = $ \left[ da,\mathrm{1.5}{a}_{\mathrm{max}}\right]\times \left[\mathrm{0,0.5}\right] $ , where $ {a}_{\mathrm{max}} $ is the maximum amplitude selected for the corresponding $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ . The grid spacing used is $ da=0.05 $ , and the time step is chosen according to $ dt=\frac{1}{2}\frac{(da)^2}{D^{(2)}} $ . We use the second-order accurate central difference schemes for the first and second spatial derivatives, while the time-stepping is performed according to the Crank–Nicolson method. For the left and right boundaries, we apply second-order one-sided FDs.

Before training, input–output examples are standardized to have zero mean and unit variance. This is a standard practice in deep learning to avoid the vanishing or exploding gradient problem and to help neural networks converge faster during training. We train the proposed operator network for $ {10}^4 $ iterations using Adam with a learning rate of $ 0.001 $ that decays to 0.98 of its value every $ {10}^3 $ iterations. The training continues for $ 9\times {10}^4 $ iterations with LBFGS. We use the batch size of $ {2}^{16} $ for each iteration. Computations are carried out using double-precision floating point number. The DeepONet training process was completed in 16.3 hours on a system equipped with an NVIDIA Quadro RTX 4000 graphics card.

For the first $ 1.5\times {10}^4 $ iterations, we use the standard mean squared error as the loss function,

(3.3) $$ \mathrm{\mathcal{L}}=\frac{1}{2}\left({\mathrm{\mathcal{L}}}_{MSE}^{(1)}+{\mathrm{\mathcal{L}}}_{MSE}^{(2)}\right), $$

where

(3.4) $$ {\mathrm{\mathcal{L}}}_{MSE}^{(n)}=\frac{1}{N_b}\sum \limits_{i=1}^{N_b}{\left({D}_{\tau}^{(n)}\left({a}_i,{\tau}_i\right)-{D}_{\tau, \theta}^{(n)}\Big({a}_i,{\tau}_i\Big)\right)}^2. $$

$ {D}_{\tau}^{(n)} $ and $ {D}_{\tau, \theta}^{(n)} $ correspond to the target and DeepONet-predicted values, respectively, and $ {N}_b $ denotes the batch size. For the rest of the training, we use the following loss function to balance the losses of the first and second finite-time KM coefficients,

(3.5) $$ \mathrm{\mathcal{L}}=\frac{1}{2}\left({\mathrm{\mathcal{L}}}_{MSE}^{(1)}+{\mathrm{\mathcal{L}}}_{MSE}^{(2)}\frac{{\mathrm{\mathcal{L}}}_{MSE}^{(2)}}{{\mathrm{\mathcal{L}}}_{MSE}^{(1)}}\right). $$

The solution field of the second finite-time KM coefficient $ {D}_{\tau}^{(2)}\left(a,t\right) $ generally has more curvature compared to that of the first finite-time KM coefficient (see Figure 5), and thus is harder to train. This motivates the use of separate trunk networks for the first and second finite-time KM coefficients and the modified loss function (Eq. 3.5) in the final stage of the training.

4. Parameter identification framework

In this section, we will outline the system identification procedure to recover from an acoustic time series three parameters of interest, $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ , that describe the amplitude dynamics, Eq. (2.13). We first present how to calculate the finite-time KM coefficients from the statistics of the time series. We then proceed with the parameter identification algorithm.

4.1. Signal processing

As illustrated in Figure 4, finite-time KM coefficients can be estimated from the time series of the acoustic signals in the following manner:

Step 1. Apply band-pass filter to the acoustic signal $ \eta (t) $ around the frequency $ {f}_0 $ of interest, that is, $ \left[{f}_0-\Delta f/2,{f}_0+\Delta f/2\right] $ , where $ \Delta f $ is the bandwidth.

Step 2. Apply Hilbert transform to obtain the acoustic signal envelope $ A(t) $ and estimate the stationary PDF $ P(a) $ by histogram binning.

Step 3. Estimate the joint PDF $ P\left({a}^{\prime },t+\tau; a,t\right) $ of the signal envelope $ A(t) $ and the time-shifted envelope $ A\left(t+\tau \right) $ by histogram binning.

Step 4. Calculate the transitional PDF by normalizing the joint PDF with $ P(a) $

(4.1) $$ P\left({a}^{\prime },t+\tau\;|\;a,t\right)=\frac{P\left({a}^{\prime },t+\tau; a,t\right)}{P(a)}. $$

Step 5. The finite-time KM coefficients can then be readily calculated by numerical quadrature according to Eq. (2.20).

Figure 4. This set of diagrams visualizes the method for extracting finite-time KM coefficients by processing an acoustic pressure time series. (a) The envelope $ A(t) $ and the time-shifted envelope $ A\left(t+\tau \right) $ . (b) The stationary PDF of the envelope $ P\left(A=a\right) $ obtained by histogram binning. (c) The joint PDF of the envelope and the time-shifted envelope also obtained by histogram binning. (d) The conditional (transition) PDF $ P\left({a}^{\prime },t+\tau |a,t\right) $ obtained by normalizing the joint PDF with $ P(A) $ . (e) Transition moments calculated by numerically integrating $ {\left({a}^{\prime }-a\right)}^nP\left({a}^{\prime },t+\tau |a,t\right) $ . Finite-time KM coefficients can then be obtained by normalizing the moments with $ n!\tau $ .

The finite-time KM coefficients, $ {D}_{\tau}^{(n)}\left(a,\tau \right) $ , need to be calculated for a selection of amplitudes and time shifts. We select several values of $ a $ that occurs in the signal with probability larger than 0.1% with the spacing $ da=0.05 $ . We distribute 10 values of finite-time shifts uniformly between $ \max \left\{{F}_s^{-1},\Delta {f}^{-1},{\left({\omega}_0/2\pi \right)}^{-1}\right\} $ and $ 2{\tau}_A $ , where $ {\tau}_A $ is the time shift where the autocorrelation of $ A(t) $ reduces to 25% of its initial value. We then take the last nine values of $ \tau $ since typically the finite-time effects still significantly affect the estimated finite-time KM coefficients at the first $ \tau $ .

4.2. Parameter identification procedure

Having the estimated finite-time KM coefficients from the processed time signal, the latent parameters $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ can be obtained by an iterative optimization procedure that minimizes the following loss function,

(4.2) $$ \mathrm{\mathcal{E}}=\frac{1}{2N}\sum \limits_{n=1}^2\sum \limits_{i=1}^N{\left({\hat{D}}_{\tau}^{(n)}\left({a}_i,{\tau}_i\right)-{D}_{\tau, \theta}^{(n)}\Big({a}_i,{\tau}_i;\nu, \kappa, {D}^{(2)}\Big)\right)}^2\times \tilde{P}\left({a}_i\right). $$

Here, we denote by $ {\hat{D}}_{\tau}^{(n)} $ the measured finite-time coefficients from time signal data and $ {D}_{\tau, \theta}^{(n)} $ the DeepONet prediction that corresponds to the guess value of $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ . Both $ {\hat{D}}_{\tau}^{(n)} $ and $ {D}_{\tau, \theta}^{(n)} $ have been standardized according to the normalizers used for the DeepONet training data. This helps to ensure that the first and second finite-time KM coefficients have similar magnitude. In the above expression, $ N={N}_a\times {N}_{\tau } $ , where $ {N}_a $ and $ {N}_{\tau } $ are the number of discrete envelopes and time-shifts extracted from the time signal, respectively. To take into account the statistical significance of different amplitudes in the time signal data, we include the weight $ \tilde{P}\left({a}_i\right)=p\left({a}_i\right){N}_a $ into the loss function, where $ p(a) $ is the probability mass function of the discretized amplitude.

Given a processed time signal data, the AFP-based system identification procedure is outlined as follows:

Step 1. Provide an initial guess to the parameters $ \nu, \kappa $ , and $ {D}^{(2)} $ .

Step 2. Predict $ {D}_{\tau, \theta}^{(1)}\left(a,\tau \right) $ and $ {D}_{\tau, \theta}^{(2)}\left(a,\tau \right) $ using DeepONet according to the parameters provided.

Step 3. Evaluate the loss function given in Eq. (4.2).

Step 4. Update $ \nu, \kappa, {D}^{(2)} $ iteratively using a selected optimizer.

Step 5. If after optimization, the loss function is still larger than $ {10}^{-2} $ , return to step 1 to start with another initial guess.

A large selection of optimizers and their implementations exist in scientific computing libraries. In this work, we select LBFGS (Liu and Nocedal, Reference Liu and Nocedal1989), a gradient-based optimizer whose implementation is readily available in PyTorch. The exact gradient between the loss function and the parameters is available since the computations in modern deep learning frameworks can be tracked through a computational graph. The automatic differentiation framework (also known as backpropagation) calculates the chain rule from the loss function to the output of the DeepONet, from the output of the DeepONet to the input of the branch network, and from the input of the branch network to the parameters $ \nu, \kappa $ and $ {D}^{(2)} $ .

A widely known gradient-free Nelder–Mead simplex algorithm (Nelder and Mead, Reference Nelder and Mead1965) can also be used as an alternative. It is used in Boujo and Noiray (Reference Boujo and Noiray2017) as the optimizer for the FD based parameter identification. In Section 6, we also provide the results by combining DeepONet with the Nelder–Mead optimizer (with implementation provided in SciPy) to facilitate comparison between FD and DeepONet approaches for solving the inverse problem.

5. Bayesian inference/MCMC

In this section, we will describe a Bayesian inference procedure to estimate the posterior distribution of the parameters $ \vartheta =\left\{\nu, \kappa, {D}^{(2)}\right\} $ given a specified prior and likelihood distribution. According to the Bayes rule, the posterior distribution is given by

(5.1) $$ P\left(\vartheta |\mathcal{D}\right)=\frac{P\left(\mathcal{D}|\vartheta \right)P\left(\vartheta \right)}{\int P\left(\mathcal{D}|\vartheta \right)P\left(\vartheta \right) d\vartheta}, $$

where $ P\left(\vartheta \right) $ refers to the prior distribution and $ P\left(\mathcal{D}|\vartheta \right) $ refers to the likelihood distribution. $ \mathcal{D} $ denotes a set of $ N $ observations, $ \mathcal{D}={\left\{a,\tau, {\hat{D}}_{\tau}^{(1)},{\hat{D}}_{\tau}^{(2)}\right\}}_{1:N} $ , that is available about the system of interest.

As the denominator in Eq. (5.1) is often intractable, we estimate the posterior distribution over the parameters by employing an MCMC algorithm that only needs pointwise evaluations of the unnormalized target distribution. The algorithm aims to construct a Markov chain whose stationary distribution converges to the target distribution (Eq. 5.1). In each step, a set of new parameters $ {\vartheta}^{\prime } $ is proposed according to a certain proposal distribution $ {q}_L\left({\vartheta}^{\prime }|\vartheta \right) $ . The proposed parameters are then evaluated according to the Metropolis–Hastings (M–H) acceptance criteria,

(5.2) $$ \alpha \left(\vartheta, {\vartheta}^{\prime}\right)=\min \left\{1,\frac{P\left({\vartheta}^{\prime }|\mathcal{D}\right)\hskip0.1em {q}_L\left(\vartheta |{\vartheta}^{\prime}\right)}{P\left(\vartheta |\mathcal{D}\right)\hskip0.1em {q}_L\left({\vartheta}^{\prime }|\vartheta \right)}\right\}, $$

where the proposal will be accepted with probability $ \alpha $ . We then set $ {\vartheta}_{t+1}={\vartheta}^{\prime } $ if accepted, and $ {\vartheta}_{t+1}={\vartheta}_t $ if rejected.

In this work, we employ the Metropolis-adjusted Langevin algorithm (MALA) with a gradient-based adjustment to its covariance matrix. The proposal distribution of the MALA algorithm takes the following form,

(5.3) $$ {q}_L\left({\vartheta}^{\prime }|\vartheta \right)=\mathcal{N}\left({\vartheta}^{\prime }|\;\vartheta +\frac{1}{2}{LL}^T\hskip0.1em {\nabla}_{\vartheta}\log P\left(\vartheta |\mathcal{D}\right),{LL}^T\right). $$

It proposes a new state according to the Gaussian distribution where the mean encourages a step toward an increase in posterior value and with a covariance matrix $ {LL}^T $ .

In a standard MALA, the covariance matrix is isotropic, that is, $ {LL}^T={\sigma}^2I $ with constant variance $ {\sigma}^2 $ . In the proposed approach, the covariance matrix $ {LL}^T $ is adaptively updated according to the fast variant of the gradient-based adaptive MALA algorithm, termed gadMALAf originally devised in Titsias and Dellaportas (Reference Titsias and Dellaportas2019). The key idea is to adjust the proposal distribution to maximize the generalized speed measure given by,

(5.4) $$ {s}_L\left(\vartheta \right)=\exp \left\{\beta {\mathrm{\mathscr{H}}}_{q_L\left({\vartheta}^{\prime }|\vartheta \right)}\right\}\times \int \alpha \left(\vartheta, {\vartheta}^{\prime };L\right){q}_L\left({\vartheta}^{\prime }|\vartheta \right)d{\vartheta}^{\prime }. $$

Here, the notion of speed refers to how fast the Markov chain converges to its stationary distribution, that is, the target posterior. In the above equation, $ \mathrm{\mathscr{H}}{q}_L $ is the entropy of the proposal distribution and $ \beta $ is a hyperparameter. Intuitively, this objective function encourages high variance (large step) and high acceptance rate simultaneously.

During the burn in period, we update the Cholesky factor of the proposal covariance matrix $ L $ according to

(5.5) $$ L\leftarrow L+{\rho}_t{\nabla}_L{\mathrm{\mathcal{F}}}_L\left({\vartheta}_t,{\unicode{x025B}}_t\right), $$

where $ {\mathrm{\mathcal{F}}}_L $ is the lower bound of $ {s}_L $ and $ {\rho}_t $ is the learning rate. The algorithm is particularly efficient because the adaptation still occurs even when the candidate states are rejected (Titsias and Dellaportas, Reference Titsias and Dellaportas2019). Further details on the derivation of the gadMALAf algorithm are provided in Appendix A. It will be demonstrated through the results presented in the next section that careful construction of the covariance matrix of the proposal distribution is necessary to mitigate the slow mixing of the MCMC samples due to highly correlated parameters.

6. Numerical investigations

6.1. Synthetic data

We generate synthetic time signals using Simulink by simulating the stochastic VdP oscillator (Eqs. 2.1 and 2.8) for 500 seconds. We use the sampling rate $ {F}_s=10 $ kHz, frequency $ {\omega}_0/2\pi =150 $ Hz, and bandwidth $ \Delta f=100 $ Hz. Hundred synthetic signals are generated, for which the oscillator parameters $ \nu, \kappa $ and $ {D}^{(2)} $ are uniformly distributed across the ranges $ \left[\mathrm{2.5,18.5}\right] $ , $ \left[\mathrm{1.6,4.6}\right] $ , and $ \left[\mathrm{2.5,18.5}\right] $ , respectively.

It is worth noting that the test data are different from the training data generated by simulating the AFP equation. First, although the values of $ \nu, \kappa $ and $ {D}^{(2)} $ in the test data are within the support of the training data, the values have not been seen before by the DeepONet. Furthermore, the finite-time KM coefficients calculated from the processed time signals are corrupted by errors due to band-pass filtering, the finite sampling rate, and the finite sample size of the signals.

Figure 5 visually demonstrates how the proposed DeepONet tries to predict the parameters that yield finite-time KM coefficients that match those obtained from the processed time-series data. The initial guess is represented by the transparent wireframe, while the final prediction is depicted by the darker wireframe. Notably, there is excellent agreement between the predicted wireframe and the measured values extracted from the signal.

Figure 5. Visualization on the parameter identification process by matching the first (left) and second (right) finite-time KM coefficients at several values of $ a $ and $ \tau $ , simultaneously. Estimated finite-time KM coefficients from the time signal are depicted as colored dots, with darker shades indicating higher statistical significance. The DeepONet initial guess is presented as the light wireframe, while the final prediction is presented as the dark wireframe. The thick black curves at $ \tau =0 $ depict the true KM coefficients. The true parameters are $ \nu = 6.5,\kappa = 1.6,{D}^{(2)}= 2.5 $ .

To quantitatively evaluate the accuracy of the proposed approach in tackling the inverse problem, we present in Figure 6a the relative error between the true and predicted parameters. The results are provided in the form of violin error plots. In the same figure, we also present the results using the FD-based system identification procedure proposed in Boujo and Noiray (Reference Boujo and Noiray2017), where the Nelder–Mead simplex method is chosen as the optimization algorithm. For the sake of comparison, we also include parameter identification results using DeepONet with the Nelder–Mead optimizer. It can be seen that all methods successfully identify the latent parameters with excellent accuracy, especially for $ \nu $ and $ \kappa $ with median error below 5%.

Figure 6. Violin plots representing the relative errors of the identified parameters $ \nu, \kappa $ , and $ {D}^{(2)} $ (a) across three different approaches (100-Hz bandwidth) and (b) across three different bandwidths (DeepONet—LBFGS). The inverse problem is solved for synthetic signals with 100 different combinations of $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ . The white dots indicate the median of the relative errors, while the encompassing thick black lines denote the interquartile range, extending from the first to the third quartile. The shaded regions of the violins represent the overall distribution of the error.

We perform tests to evaluate the effect of the filtering bandwidth on the parameter identification accuracy using DeepONet. To this end, the bandwidth $ \Delta f $ is varied among three distinct values: 30, 60, and 100 Hz. The results are presented in the right panel of Figure 6. The reduction in bandwidth mainly increases the error in the diffusion coefficient $ {D}^{(2)} $ , while the prediction accuracy of $ \nu $ and $ \kappa $ is only slightly affected. This outcome can be attributed to the fact that $ {D}^{(2)} $ characterizes the noise intensity, whereas $ \nu $ and $ \kappa $ primarily influence the limit-cycle amplitude at the peak frequency (in the purely deterministic case, the limit-cycle amplitude is $ {A}_{\mathrm{det}}=\sqrt{8\nu /\kappa } $ ). Since white noise has a flat spectral profile, narrowing the bandwidth around the peak frequency then tends to predominantly increase the prediction error in $ {D}^{(2)} $ .

To assess the efficiency of the proposed parameter identification approach, we measure the time required to identify $ \nu, \kappa $ , and $ {D}^{(2)} $ given the measured values of $ {\hat{D}}_{\tau}^{(1)} $ and $ {\hat{D}}_{\tau}^{(2)} $ from the time signals. The results are presented in the left panel of Figure 7 in the form of violin plots and quantitatively detailed in Table 1. It can be seen that DeepONet with LBFGS manages to solve the inverse problem two orders of magnitude faster than the FD approach with the Nelder–Mead simplex algorithm. For this study, we have selected the FD simulation parameters that minimize the time required to solve the inverse problem (see Appendix B) to facilitate meaningful comparison. The initial guesses for $ \nu, \kappa $ , and $ {D}^{(2)} $ are taken from $ \left\{\mathrm{1,10.5,20}\right\} $ , $ \left\{\mathrm{0.5,2.875,5.25}\right\} $ , and $ \left\{\mathrm{1,10.5,20}\right\} $ , respectively. For most cases, however, the first initial guess is enough for the optimization problem to converge. The reported time includes all the trials needed to solve the inverse problem. The DeepONet model is trained and evaluated on a GPU (NVIDIA Quadro RTX 4000), while the FD computations are performed on a CPU (AMD Ryzen Threadripper 3970X).

Figure 7. The time required to solve the inverse problem (left) and to perform the forward pass (right) for 100 synthetic test data with different values of $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ .

Table 1. Comparison of the computational efficiency of three different approaches in solving forward and inverse problems

Note: The presented values represent the median and standard deviation of the time measurements. For the FD approach, the temporal and spatial resolutions as well as the number of processes have been varied such that the minimum time is reported.

The major speed-up gain can be attributed to the fact that DeepONet directly predicts the finite-time KM coefficients without the need to calculate solution over the whole $ \left({a}^{\prime },\tau \right) $ domain (see Figure 2), and without the need for time-stepping. This is demonstrated on the right panel of Figure 7 in the form of violin plots and in Table 1 in the quantitative form where we present the time comparison for the forward pass, that is, the time to predict the finite-time KM coefficients given the parameters $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ . Furthermore, owing to the differentiability of the neural network, we can leverage optimizers, such as LBFGS, which utilizes gradient information to further accelerate the system identification process. Figure 8 illustrates how the gradient-based optimization algorithm can identify the desired parameters in a significantly fewer number of iterations.

Figure 8. Evolution of the parameters $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ from the initial guess to the final prediction for three different parameter identification approaches. DeepONet can leverage automatic differentiation to use LBFGS to significantly reduce the number of iterations required to achieve convergence.

Finally, we perform parameter identification on synthetic signals with parameters $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ where the values reside outside of the support of the training data. Specifically, we generate 64 signals with $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ taking values $ \left\{\mathrm{22.5,26.5,30.5,34.5}\right\} $ , $ \left\{\mathrm{5.6,6.6,7.6,8.6}\right\} $ , $ \left\{\mathrm{22.5,26.5,30.5,34.5}\right\} $ , respectively. We recall that the training data have maximum values of 20, 5.25, and 20, for $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ , respectively. This means that the parameters for testing are significantly distanced from those used during training, thus putting the DeepONet extrapolation abilities to the test. The violin plots of the relative error between the prediction and the true values are provided in Figure 9. We report median errors of 6.75%, 7.33%, and 31.33% for $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ , respectively, for DeepONet-LBFGS. The FD approach with the Nelder–Mead optimizer results in a median error of 3.13%, 3.04%, and 19.97% for $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ , respectively. Indeed, this result demonstrates that the errors for DeepONet are larger in the out-of-distribution case compared to the in-distribution case. However, it is noteworthy that the error increase primarily affects the $ {D}^{(2)} $ prediction, while the errors in $ \nu $ and $ \kappa $ do not increase significantly. Hence, in the out of distribution setting, DeepONet still maintains the capability to recover the linear growth rate $ \nu $ and the cubic saturation coefficient $ \kappa $ , albeit with reduced accuracy.

Figure 9. The relative error for out-of-distribution test data, where the parameters $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ extend beyond the range covered by the training data. Specifically, $ \nu $ and $ {D}^{(2)} $ vary between 22.5 and 34.5, and $ \kappa $ ranges from 5.6 to 8.6. This contrasts with the training dataset, where the maximum values for $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ are capped at 20, 5.25, and 20, respectively.

6.2. Experimental data

In this section, we attempt to identify the parameters $ \nu, \kappa $ and $ {D}^{(2)} $ characterizing acoustic pressure signals obtained from experiments. Specifically, we refer to the experiments performed in Noiray and Denisov (Reference Noiray and Denisov2017) of premixed methane flames on a lab-scale cylindrical combustion chamber. The authors observed three different operating conditions: linearly stable (c1), marginally stable (c2), and linearly unstable (c3) by increasing the equivalence ratio from 0.47 to 0.5.

We process the time signals under the operating conditions c2 and c3 with 40-Hz bandwidth around the thermoacoustic mode at 150 Hz. The signals are 60 s long with 10-kHz sampling rate. We then employ DeepONet to identify the parameters of interest. It should be noted that the same DeepONet, trained only once with synthetic data (as described in Section 3.1), is used here. The resulting identified parameters are presented in Table 2, together with the estimated parameters from Noiray and Denisov (Reference Noiray and Denisov2017) where the authors used the extrapolation method. Using the identified parameters, we reconstruct the theoretical stationary PDFs (Eq. 2.19) in Figure 10 presented together with the normalized histogram of the corresponding time signals. Good agreement can be observed between the reconstructed PDFs and the histogram data.

Table 2. The identified parameters $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ corresponding to the experimental measurements in Noiray and Denisov (Reference Noiray and Denisov2017)

Figure 10. Probability density function of the acoustic pressure envelope of the experimental time signal data corresponding to operating conditions c2 (left) and c3 (right). Parameters $ \nu, \kappa, {D}^{(2)} $ are extracted using both DeepONet and finite-difference approaches and then used to plot the analytical stationary PDFs.

Furthermore, we can go one step further to provide uncertainty quantification for our prediction by performing Bayesian inference using the gradient-based adaptive MALA. We use a Gaussian likelihood and a Gamma Prior to construct the unnormalized posterior density. We assume i.i.d. Gaussian observations such that the likelihood takes the form,

(6.1) $$ P\left(\mathcal{D}|\vartheta \right)=\prod \limits_{n=1}^2\prod \limits_{i=1}^N\frac{1}{Z}\exp \left(-\frac{1}{2{\sigma}_n^2}{\left({\hat{D}}_{\tau}^{(n)}\left({a}_i,{\tau}_i\right)-{D}_{\tau, \theta}^{(n)}\Big({a}_i,{\tau}_i\Big)\right)}^2\tilde{P}\left({a}_i\right)\right). $$

We include the weight $ \tilde{P}\left({a}_i\right) $ to scale the variance (a constant $ {\sigma}_n^2 $ ) of each data point to express higher confidence for amplitudes that have higher occurrence in the time signal. We can see the resemblance of the above likelihood with the loss function given in Eq. (4.2). We estimate $ {\sigma}_n^2 $ for $ n=1,2 $ by calculating the weighted mean squared error between $ {\hat{D}}_{\tau}^{(n)} $ and $ {D}_{\tau, \theta}^{(n)} $ .

As for the prior distribution, we use gamma prior $ \Gamma \left(\alpha, \beta \right) $ with parameters $ \alpha $ and $ \beta $ , that is,

(6.2) $$ P\left(\vartheta \right)=\prod \limits_{i=1}^3\frac{1}{\Gamma \left({\alpha}_i\right)}{\beta}_i^{\alpha_i}{\vartheta}_i^{\alpha_i-1}{e}^{-{\beta}_i{\theta}_i}, $$

where $ \alpha $ and $ \beta $ are selected such that the parameters have the estimated values from Noiray and Denisov (Reference Noiray and Denisov2017) as the prior mean and 0.1 of the mean values as the prior standard deviation.

We conduct MCMC simulations to generate 30,000 samples, setting aside the first 10,000 samples as a burn-in period to adapt the proposal covariance matrix and to converge to the stationary distribution. This was performed successfully in 2.6 minutes, demonstrating the high efficiency of DeepONet as a forward simulator. The posterior distribution for each of the parameters $ \nu, \kappa $ and $ {D}^{(2)} $ is provided in Figures 11 and 13 for the operating conditions c2, and c3, respectively. The joint distributions between each two of the three parameters are given in Figures 12 and 14 for the operating conditions c2, and c3, respectively.

Figure 11. Posterior distribution of each parameter $ \nu, \kappa, {D}^{(2)} $ corresponding to the operating condition c2 (marginally stable).

Figure 12. Joint distributions over each two of the three parameters $ \nu, \kappa, {D}^{(2)} $ for the operating condition c2 (marginally stable).

Figure 13. Posterior distribution of each parameter $ \nu, \kappa, {D}^{(2)} $ corresponding to the operating condition c3 (linearly unstable).

Figure 14. Joint distributions over each two of the three parameters $ \nu, \kappa, {D}^{(2)} $ for the operating condition c3 (linearly unstable).

The joint distribution shows how $ \nu $ and $ \kappa $ can be very strongly correlated. A transition proposal with isotropic covariance matrix (e.g., random walk Metropolis or standard MALA) will either result in too many rejections or force the use of a very small learning rate. A chain that moves too slowly indicates high autocorrelation, so the effective sample size becomes much smaller than the actual generated samples. This motivates the use of the adaptive algorithm gadMALAf (Titsias and Dellaportas, Reference Titsias and Dellaportas2019), where a suitable proposal distribution is automatically constructed during the burn-in period by maximizing the generalized speed measure. This is reflected in Figures 15 and 16 where we can observe a good mixing of the MCMC samples with rapidly decaying autocorrelation.

Figure 15. Trace and autocorrelation plots of the MCMC algorithm for the operating condition c2 demonstrating good mixing.

Figure 16. Trace and autocorrelation plots of the MCMC algorithm for the operating condition c3 demonstrating good mixing.

7. Concluding remarks

In this work, a DeepONet has been utilized to recover the model parameters from time series data. The numerical results presented in this article demonstrated a major advantage in terms of speed, while maintaining a level of accuracy on par with the classical FD method. The speed advantage can be attributed to two key factors. First, using DeepONet as a surrogate, we managed to bypass the calculations required to solve the AFP equation across the entire simulation domain $ \left({a}^{\prime },\tau \right) $ for each value of the amplitude $ a $ present in the time signal. Second, the differentiability of the neural network allows the computation of exact gradient information between the predicted finite-time KM coefficients and the latent parameters $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ . This facilitates the utilization of gradient-based optimizers, further accelerating the parameter identification process. Furthermore, these advantages allow the application of Bayesian inference through the gradient-based adaptive MALA algorithm to provide uncertainty quantification of the parameters prediction.

In addition to these advantages, there are several opportunities for further studies. First, our proposed operator network, trained solely on input–output examples, can only make reliable predictions within the range of the training data. To address this limitation, one potential solution is to incorporate PDE information into the neural network, as demonstrated in the Physics Informed DeepONet (Wang, Wang, and Perdikaris, Reference Wang, Wang and Perdikaris2021), or to explore other reliable extrapolation approaches, as suggested in Zhu et al. (Reference Zhu, Zhang, Jiao, Karniadakis and Lu2023). Additionally, it might be worthwhile to explore alternative neural network architectures beyond fully connected networks. Attention-based architectures, as seen in Vaswani et al. (Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin2017) and Wang, Teng, and Perdikaris (Reference Wang, Teng and Perdikaris2021), have shown success in various applications and could be a promising avenue for improvement. The methodology presented in this article can be adapted to handle different nonlinear forcing functions by adjusting the network’s input function. Extending the methodology to stochastic processes with multiplicative noise (nonconstant diffusion coefficient) might necessitate architecture modification (e.g., using the multiple input–output networks proposed in Jin et al. (Reference Jin, Meng and Lu2022)) to accommodate the drift and diffusion coefficients as two separate input functions. Finally, it is interesting to investigate the possibility to extend our methodology to systems involving multimode oscillations or coupled oscillators. Provided that the corresponding AFP equations can be formulated and solved using the FD method to generate the necessary training data, the approach outlined in this article could be effectively applied.

Acknowledgments

The authors are grateful for the technical assistance of Dr. Boujo for the initial implementation of the FD solver.

Data availability statement

Data could be made available upon request.

Author contribution

B.D. conceived the study with the support of N.N. T.S. developed the algorithms with the support of B.D. T.S. carried out the numerical simulations. All authors analyzed the results. T.S. and B.D. wrote the original manuscript. All authors reviewed the manuscript.

Funding statement

This study was supported by the European Research Council under the ERC Consolidator Grant (No. 820091) TORCH (2019–2024).

Competing interest

The authors declare none.

Ethical standard

The research meets all ethical guidelines, including adherence to the legal requirements of the study country.

Appendix A. Derivation of the gradient-based adaptive MALA algorithm

In this section, we will describe the derivation of the gradient-based adaptive MALA (gadMALAf) algorithm proposed in Titsias and Dellaportas (Reference Titsias and Dellaportas2019). The aim is to adapt the Cholesky factor $ L $ of the proposal distribution covariance matrix to maximize the generalized speed measure $ {s}_L $ . This is equivalent to maximizing $ \log {s}_L $ ,

(A.1) $$ \log {s}_L\left({\vartheta}_t\right)=\log {\unicode{x1D53C}}_{\vartheta^{\prime }:\hskip0.1em {q}_L\left({\vartheta}^{\prime }|{\vartheta}_t\right)}\left[\alpha \left({\vartheta}_t,{\vartheta}^{\prime };L\right)\right]+\beta {H}_{q_L\left({\vartheta}^{\prime }|{\vartheta}_t\right)}. $$

Using Jensen’s inequality, we can obtain the lower bound $ {\mathrm{\mathcal{F}}}_L\left({\vartheta}_t\right)\le \log {s}_L\left({\vartheta}_t\right) $ as follows,

(A.2) $$ {\mathrm{\mathcal{F}}}_L\left({\vartheta}_t\right)={\unicode{x1D53C}}_{\vartheta^{\prime }:\hskip0.1em {q}_L\left({\vartheta}^{\prime }|{\vartheta}_t\right)}\left[\log \left(\min \left\{1,\frac{\pi \left({\vartheta}^{\prime}\right){q}_L\left({\vartheta}_t|{\vartheta}^{\prime}\right)}{\pi \left({\vartheta}_t\right){q}_L\left({\vartheta}^{\prime }|{\vartheta}_t\right)}\right\}\right)\right]+\beta {H}_{q_L\left({\vartheta}^{\prime }|{\vartheta}_t\right)}, $$

where we have introduced $ \pi $ to denote a general target distribution (in the context of Bayesian inference $ \pi $ denotes the posterior distribution $ \pi \left(\vartheta \right)=P\left(\vartheta |\mathcal{D}\right) $ ).

We aim to take a gradient ascent in the direction of maximum $ {\mathrm{\mathcal{F}}}_L\left({\vartheta}_t\right) $ . However, it is not clear how to take the gradient with respect to $ L $ of the expectation over a distribution that itself depends on $ L $ . We employ the reparameterization trick, a widely used technique in variational inference, to be able to take a gradient toward maximizing $ {\mathrm{\mathcal{F}}}_L\left({\vartheta}_t\right) $ . We parameterize $ {\vartheta}^{\prime } $ with $ \unicode{x025B} :\hskip0.1em p\left(\unicode{x025B} \right) $ according to a deterministic transformation $ {\vartheta}^{\prime }={g}_L\left(\unicode{x025B}; {\vartheta}_t\right) $ such that

(A.3) $$ {\mathrm{\mathcal{F}}}_L\left({\vartheta}_t\right)={\unicode{x1D53C}}_{\unicode{x025B} :\hskip0.1em p\left(\unicode{x025B} \right)}\left[\min \left\{0,\log \frac{\pi \left({g}_L\left(\unicode{x025B}; {\vartheta}_t\right)\right)}{\pi \left({\vartheta}_t\right)}+\log \frac{q_L\left({\vartheta}_t|{g}_L\left(\unicode{x025B}; {\vartheta}_t\right)\right)}{q_L\left({g}_L\left(\unicode{x025B}; {\vartheta}_t\right)|{\vartheta}_t\right)}\right\}\right]+\beta {H}_{q_L\left({\vartheta}^{\prime }|{\vartheta}_t\right)}. $$

The gradient of $ \mathrm{\mathcal{F}} $ can then be obtained according to

(A.4) $$ {\nabla}_L{\mathrm{\mathcal{F}}}_L\left({\vartheta}_t\right)={\unicode{x1D53C}}_{\unicode{x025B} :\hskip0.1em p\left(\unicode{x025B} \right)}\left[{\nabla}_L\min \left\{0,\log \frac{\pi \left({g}_L\left(\unicode{x025B}; {\vartheta}_t\right)\right)}{\pi \left({\vartheta}_t\right)}+\log \frac{q_L\left({\vartheta}_t|{g}_L\left(\unicode{x025B}; {\vartheta}_t\right)\right)}{q_L\left({g}_L\left(\unicode{x025B}; {\vartheta}_t\right)|{\vartheta}_t\right)}\right\}\right]+\beta {\nabla}_L{H}_{q_L\left({\vartheta}^{\prime }|{\vartheta}_t\right)}. $$

We now focus on the MALA algorithm, where the reparameterization of the proposal $ {\vartheta}^{\prime }={g}_L\left(\unicode{x025B}; {\vartheta}_t\right) $ takes the form of

(A.5) $$ {\vartheta}^{\prime }=\vartheta +\frac{1}{2}{LL}^{\mathrm{T}}\nabla \log \pi \left(\vartheta \right)+L\unicode{x025B}, \unicode{x025B} :\mathcal{N}\left(\unicode{x025B} |\;0,I\right). $$

The corresponding Monte Carlo estimator of the gradient $ {\nabla}_L{\mathrm{\mathcal{F}}}_L\left({\vartheta}_t\right) $ can be obtained as

(A.6) $$ {\displaystyle \begin{array}{c}{\nabla}_L{\mathrm{\mathcal{F}}}_L\left({\vartheta}_t,{\unicode{x025B}}_t\right)={\nabla}_L\min \left\{0,\log \pi \left({\vartheta}_t+\left(1/2\right){LL}^{\mathrm{T}}\nabla \log \pi \left({\vartheta}_t\right)+L{\unicode{x025B}}_t\right)-\log \pi \left({\vartheta}_t\right)\right.\\ {}\left.-\frac{1}{2}\left({\left\Vert \frac{1}{2}{L}^T\left[\nabla \log \pi \left({\vartheta}_t\right)+\nabla \log \pi \left({\vartheta}_t^{\prime}\right)\right]+{\unicode{x025B}}_t\right\Vert}^2-{\left\Vert {\unicode{x025B}}_t\right\Vert}^2\right)\right\}+\beta {\nabla}_L\sum \limits_{i=1}^n\log {L}_{ii}.\end{array}} $$

The fast variant of the gadMALAf algorithm assumes that $ \nabla \log \pi \left({\vartheta}_t^{\prime}\right) $ is constant with respect to $ L $ . Furthermore, in the perspective of implementation, the gradient of the M–H log ratio in the above equation can be calculated from the lower triangular part of the following outer product,

(A.7) $$ -\frac{1}{2}\left(\nabla \log \pi \left({\vartheta}_t\right)-\nabla \log \pi \left({\vartheta}_t^{\prime}\right)\right){\left(\frac{1}{2}{L}^T\left[\nabla \log \pi \left({\vartheta}_t\right)-\nabla \log \pi \left({\vartheta}_t^{\prime}\right)\right]+{\unicode{x025B}}_t\right)}^T. $$

The gradient terms appearing in this algorithm can be obtained conveniently using the automatic differentiation readily available in modern deep learning frameworks such as PyTorch.

We now have complete information to update $ L $ according to the gadMALAf algorithm. During the burn-in period, we iteratively adapt $ L $ according to the following procedure,

Step 1. Propose $ {\vartheta}_t^{\prime }={g}_L\left({\unicode{x025B}}_t;{\vartheta}_t\right) $ with $ \unicode{x025B} :\hskip0.1em \mathcal{N}\left(\unicode{x025B} |\;0,I\right) $ .

Step 2. Update $ L\leftarrow L+{\rho}_t{\nabla}_L{\mathrm{\mathcal{F}}}_L\left({\vartheta}_t,{\unicode{x025B}}_t\right) $ .

Step 3. Accept or reject $ {\vartheta}_t^{\prime } $ according to the M–H acceptance ratio to obtain $ {\vartheta}_{t+1} $ .

Step 4. Set $ {\hat{\alpha}}_t=1 $ if $ {\vartheta}_t^{\prime } $ is accepted and $ {\hat{\alpha}}_t=0 $ otherwise. Update the hyperparameter $ \beta \leftarrow \beta \left(1+{\rho}_{\beta}\left({\hat{\alpha}}_t-{\alpha}_{\ast}\right)\right) $

We set the hyperparameters according to the recommendations in the original paper (Titsias and Dellaportas, Reference Titsias and Dellaportas2019). The learning rate for $ {\rho}_t $ is adjusted using the root mean square propagation algorithm, where $ {\rho}_t=\eta /\left(1+\sqrt{G_t}\right) $ and $ {G}_t=0.9{G}_t+0.1{\left[{\nabla}_L{\mathrm{\mathcal{F}}}_L\left({\vartheta}_t,{\unicode{x025B}}_t\right)\right]}^2 $ . Furthermore, we set $ {\rho}_{\beta }=0.02 $ , and $ {\alpha}_{\ast }=0.55 $ as the target acceptance ratio for the MALA algorithm.

Figure A1. Median time to solve the inverse problem using the finite-difference method across 100 test datasets. (Left) The spatial resolution is set to remain constant at $ da= 0.05 $ , with variations in the time step achieved by modifying the multiplier $ k $ , defined as $ dt=k\times \frac{1}{2}\frac{D^{(2)}}{da^2} $ . Note that the y-axis is broken into parts to enhance the visibility of the optimal time-step parameter. (Right) The time step multiplier is set at $ k= 300 $ , while the spatial resolution $ da $ is varied.

Appendix B . Parameter study of the finite-difference solver

We conducted a study to select simulation parameters for the finite-difference simulation that minimizes the time required to solve the inverse problem. We vary the spatial resolution $ da $ taking values $ \left\{\mathrm{0.05,0.1,0.2,0.5,0.75}\right\} $ . The temporal resolutions are varied by parameterizing the time step with a constant $ k $ , that is, by setting $ dt=k\times \frac{1}{2}\frac{D^{(2)}}{da^2} $ . We also run the forward simulation in parallel since the AFP equations can be solved independently for different amplitudes $ a $ . We vary the number of processes taking values $ \left\{\mathrm{1,2,4,8,12}\right\} $ .

Across 100 synthetic test data, the median time to identify the parameters of the processed time signal is reported in Figure A1. Although the Crank–Nicolson method allows stable simulations in all parameters tested, we can see that the required time does not decrease further when $ da $ increases above $ 0.5 $ and $ k $ increases above $ 300 $ . In fact, the minimum median time is obtained for $ da=0.5 $ with $ k=300 $ and using two processes. This is the parameter setting that is used for the comparison with DeepONet in Section 6.

References

Boujo, E, Bourquard, C, Xiong, Y and Noiray, N (2020) Processing time-series of randomly forced self-oscillators: The example of beer bottle whistling. Journal of Sound and Vibration 464, 114981.CrossRefGoogle Scholar
Boujo, E and Cadot, O (2019) Stochastic modeling of a freely rotating disk facing a uniform flow. Journal of Fluids and Structures 86, 3443.CrossRefGoogle Scholar
Boujo, E and Noiray, N (2017) Robust identification of harmonic oscillator parameters using the adjoint Fokker-Planck equation. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 473(2200), 20160894.CrossRefGoogle ScholarPubMed
Bourquard, C, Faure-Beaulieu, A and Noiray, N (2021) Whistling of deep cavities subject to turbulent grazing flow: Intermittently unstable aeroacoustic feedback. Journal of Fluid Mechanics 909, A19.CrossRefGoogle Scholar
Chen, T and Chen, H (1995) Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Transactions on Neural Networks 6(4), 911917.CrossRefGoogle ScholarPubMed
Dharmaputra, B, Shcherbanev, S, Schuermans, B and Noiray, N (2023) Thermoacoustic stabilization of a sequential combustor with ultra-low-power nanosecond repetitively pulsed discharges. Combustion and Flame 258, 113101.CrossRefGoogle Scholar
Honisch, C and Friedrich, R (2011) Estimation of Kramers-Moyal coefficients at low sampling rates. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics 83(6), 066701.CrossRefGoogle ScholarPubMed
Hoskoti, LADA, Misra, A and Sucheendran, MM (2020) Frequency lock-in during nonlinear vibration of an airfoil coupled with van der pol oscillator. Journal of Fluids and Structures 92, 102776.CrossRefGoogle Scholar
Jin, P, Meng, S and Lu, L (2022) Mionet: Learning multiple-input operators via tensor product. SIAM Journal on Scientific Computing 44(6), A3490A3514.CrossRefGoogle Scholar
Lade, SJ (2009) Finite sampling interval effects in Kramers–Moyal analysis. Physics Letters A 373(41), 37053709.CrossRefGoogle Scholar
Lee, M, Kim, D, Lee, J, Kim, Y and Yi, M (2023) A data-driven approach for analyzing hall thruster discharge instability leading to plasma blowoff. Acta Astronautica 206, 18.CrossRefGoogle Scholar
Lee, M, Kim, KT and Park, J (2023) A numerically efficient output-only system-identification framework for stochastically forced self-sustained oscillators. Probabilistic Engineering Mechanics 74, 103516.CrossRefGoogle Scholar
Lin, C, Maxey, M, Li, Z and Karniadakis, GE (2021) A seamless multiscale operator neural network for inferring bubble dynamics. Journal of Fluid Mechanics 929, A18.CrossRefGoogle Scholar
Liu, DC and Nocedal, J (1989) On the limited memory BFGS method for large scale optimization. Mathematical Programming 45(1–3), 503528.CrossRefGoogle Scholar
Lu, L, Jin, P, Pang, G, Zhang, Z and Karniadakis, GE (2021) Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence 3(3), 218229.CrossRefGoogle Scholar
Lu, L, Meng, X, Mao, Z and Karniadakis, GE (2021) DeepXDE: A deep learning library for solving differential equations. SIAM Review 63(1), 208228.CrossRefGoogle Scholar
Lu, L, Pestourie, R, Johnson, SG and Romano, G (2022) Multifidelity deep neural operators for efficient learning of partial differential equations with application to fast inverse design of nanoscale heat transport. Physical Review Research 4(2), 23210.CrossRefGoogle Scholar
Mao, Z, Lu, L, Marxen, O, Zaki, TA and Karniadakis, GE (2021) Deepm&mnet for hypersonics: Predicting the coupled flow and finite-rate chemistry behind a normal shock using neural-network approximation of operators. Journal of Computational Physics 447, 110698.CrossRefGoogle Scholar
Nelder, JA and Mead, R (1965) A simplex method for function minimization. The Computer Journal 7(4), 308313.CrossRefGoogle Scholar
Noiray, N and Denisov, A (2017) A method to identify thermoacoustic growth rates in combustion chambers from dynamic pressure time series. Proceedings of the Combustion Institute 36(3), 38433850.CrossRefGoogle Scholar
Noiray, N and Schuermans, B (2013) Deterministic quantities characterizing noise driven Hopf bifurcations in gas turbine combustors. International Journal of Non-Linear Mechanics 50, 152163.CrossRefGoogle Scholar
Nomura, T, Sato, S, Doi, S, Segundo, JP and Stiber, MD (1994) Global bifurcation structure of a bonhoeffer-van der pol oscillator driven by periodic pulse trains. Biological Cybernetics 72, 5567.CrossRefGoogle Scholar
Pawula, RF (1967) Approximation of the linear boltzmann equation by the fokker-planck equation. Physics Review 162(1), 186188.CrossRefGoogle Scholar
Raissi, M, Perdikaris, P and Karniadakis, GE (2019) Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, 686707.CrossRefGoogle Scholar
Risken, H (1996) Fokker-Planck Equation. Heidelberg: Springer.CrossRefGoogle Scholar
Roberts, J and Spanos, P (1986) Stochastic averaging: An approximate method of solving random vibration problems. International Journal of Non-Linear Mechanics 21(2), 111134.CrossRefGoogle Scholar
Rompala, K, Rand, R and Howland, H (2007) Dynamics of three coupled van der pol oscillators with application to circadian rhythms. Communications in Nonlinear Science and Numerical Simulation 12(5), 794803.CrossRefGoogle Scholar
Siegert, S, Friedrich, R and Peinke, J (1998) Analysis of data sets of stochastic systems. Physics Letters A 243(5–6), 275280.CrossRefGoogle Scholar
Stratonovich, RL (1963) Topics in the Theory of Random Noise, Vol. 2. London: Gordon & Breach.Google Scholar
Titsias, MK and Dellaportas, P (2019) Gradient-based adaptive Markov chain Monte Carlo. In Proceedings of the 33rd International Conference on Neural Information Processing Systems.Google Scholar
Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, AN, Kaiser, Ł and Polosukhin, I (2017) Attention is all you need. Advances in Neural Information Processing Systems 30.Google Scholar
Wang, S, Teng, Y and Perdikaris, P (2021) Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM Journal on Scientific Computing 43(5), 30553081.CrossRefGoogle Scholar
Wang, S, Wang, H and Perdikaris, P (2021) Learning the solution operator of parametric partial differential equations with physics-informed DeepONets. Science Advances 7(40), eabi8605.CrossRefGoogle ScholarPubMed
Yin, M, Zhang, E, Yu, Y and Karniadakis, GE (2022) Interfacing finite elements with deep neural operators for fast multiscale modeling of mechanics problems. Computer Methods in Applied Mechanics and Engineering 402, 115027.CrossRefGoogle ScholarPubMed
Yulin, AV, Sedov, ES, Kavokin, AV and Shelykh, IA (2024) Persistent polarization oscillations in ring-shape polariton condensates. Physical Review Research 6, 013261.CrossRefGoogle Scholar
Zhu, M, Zhang, H, Jiao, A, Karniadakis, GE and Lu, L (2023) Reliable extrapolation of deep neural operators informed by physics or sparse observations. Computer Methods in Applied Mechanics and Engineering 412, 116064.CrossRefGoogle Scholar
Figure 0

Figure 1. A realization of the stochastic Van der Pol oscillator with parameters $ \left(\nu = 2.5,\kappa = 1.6,{D}^{(2)}= 2.5\right) $ that models the local acoustic pressure fluctuation $ \eta (t) $ in a combustion chamber (red), and the corresponding envelope $ A(t) $ (black). The probability distributions for both the oscillation ($ \eta (t) $ taking value $ \eta $) and the envelope ($ A(t) $ taking value $ a $) are displayed on the left and right sides of the time signal, respectively, illustrating the statistical properties of the entire time signal.

Figure 1

Figure 2. Illustration of the calculation of the first (left) and second (right) finite-time KM coefficients from the solution of the AFP equation for $ a= 5.5,\nu = 6.5,\kappa = 1.6,{D}^{(2)}= 2.5 $. The finite-time KM coefficients are obtained by extracting the solution $ {P}_{n,a}^{\dagger}\left({a}^{\prime },\tau \right) $ at $ {a}^{\prime }=a $ for several values of $ \tau $.

Figure 2

Figure 3. A DeepONet architecture designed for the parameter identification based on the AFP formalism. It approximates the operator that maps the first Kramers–Moyal (KM) coefficient $ {D}^{(1)}(a) $ to both the first $ {D}_{\tau}^{(1)}\left(a,\tau \right) $ and second $ {D}_{\tau}^{(2)}\left(a,\tau \right) $ finite-time KM coefficients. Although only the first KM coefficient $ {D}^{(1)}(a) $ appears as the input to the branch network, the second KM coefficient $ {D}^{(2)} $ implicitly influences the forward calculation process through $ {D}^{(1)}(a) $ according to Eq. (2.15).

Figure 3

Figure 4. This set of diagrams visualizes the method for extracting finite-time KM coefficients by processing an acoustic pressure time series. (a) The envelope $ A(t) $ and the time-shifted envelope $ A\left(t+\tau \right) $. (b) The stationary PDF of the envelope $ P\left(A=a\right) $ obtained by histogram binning. (c) The joint PDF of the envelope and the time-shifted envelope also obtained by histogram binning. (d) The conditional (transition) PDF $ P\left({a}^{\prime },t+\tau |a,t\right) $ obtained by normalizing the joint PDF with $ P(A) $. (e) Transition moments calculated by numerically integrating $ {\left({a}^{\prime }-a\right)}^nP\left({a}^{\prime },t+\tau |a,t\right) $. Finite-time KM coefficients can then be obtained by normalizing the moments with $ n!\tau $.

Figure 4

Figure 5. Visualization on the parameter identification process by matching the first (left) and second (right) finite-time KM coefficients at several values of $ a $ and $ \tau $, simultaneously. Estimated finite-time KM coefficients from the time signal are depicted as colored dots, with darker shades indicating higher statistical significance. The DeepONet initial guess is presented as the light wireframe, while the final prediction is presented as the dark wireframe. The thick black curves at $ \tau =0 $ depict the true KM coefficients. The true parameters are $ \nu = 6.5,\kappa = 1.6,{D}^{(2)}= 2.5 $.

Figure 5

Figure 6. Violin plots representing the relative errors of the identified parameters $ \nu, \kappa $, and $ {D}^{(2)} $(a) across three different approaches (100-Hz bandwidth) and (b) across three different bandwidths (DeepONet—LBFGS). The inverse problem is solved for synthetic signals with 100 different combinations of $ \left\{\nu, \kappa, {D}^{(2)}\right\} $. The white dots indicate the median of the relative errors, while the encompassing thick black lines denote the interquartile range, extending from the first to the third quartile. The shaded regions of the violins represent the overall distribution of the error.

Figure 6

Figure 7. The time required to solve the inverse problem (left) and to perform the forward pass (right) for 100 synthetic test data with different values of $ \left\{\nu, \kappa, {D}^{(2)}\right\} $.

Figure 7

Table 1. Comparison of the computational efficiency of three different approaches in solving forward and inverse problems

Figure 8

Figure 8. Evolution of the parameters $ \nu $, $ \kappa $, and $ {D}^{(2)} $ from the initial guess to the final prediction for three different parameter identification approaches. DeepONet can leverage automatic differentiation to use LBFGS to significantly reduce the number of iterations required to achieve convergence.

Figure 9

Figure 9. The relative error for out-of-distribution test data, where the parameters $ \nu $, $ \kappa $, and $ {D}^{(2)} $ extend beyond the range covered by the training data. Specifically, $ \nu $ and $ {D}^{(2)} $ vary between 22.5 and 34.5, and $ \kappa $ ranges from 5.6 to 8.6. This contrasts with the training dataset, where the maximum values for $ \nu $, $ \kappa $, and $ {D}^{(2)} $ are capped at 20, 5.25, and 20, respectively.

Figure 10

Table 2. The identified parameters $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ corresponding to the experimental measurements in Noiray and Denisov (2017)

Figure 11

Figure 10. Probability density function of the acoustic pressure envelope of the experimental time signal data corresponding to operating conditions c2 (left) and c3 (right). Parameters $ \nu, \kappa, {D}^{(2)} $ are extracted using both DeepONet and finite-difference approaches and then used to plot the analytical stationary PDFs.

Figure 12

Figure 11. Posterior distribution of each parameter $ \nu, \kappa, {D}^{(2)} $ corresponding to the operating condition c2 (marginally stable).

Figure 13

Figure 12. Joint distributions over each two of the three parameters $ \nu, \kappa, {D}^{(2)} $ for the operating condition c2 (marginally stable).

Figure 14

Figure 13. Posterior distribution of each parameter $ \nu, \kappa, {D}^{(2)} $ corresponding to the operating condition c3 (linearly unstable).

Figure 15

Figure 14. Joint distributions over each two of the three parameters $ \nu, \kappa, {D}^{(2)} $ for the operating condition c3 (linearly unstable).

Figure 16

Figure 15. Trace and autocorrelation plots of the MCMC algorithm for the operating condition c2 demonstrating good mixing.

Figure 17

Figure 16. Trace and autocorrelation plots of the MCMC algorithm for the operating condition c3 demonstrating good mixing.

Figure 18

Figure A1. Median time to solve the inverse problem using the finite-difference method across 100 test datasets. (Left) The spatial resolution is set to remain constant at $ da= 0.05 $, with variations in the time step achieved by modifying the multiplier $ k $, defined as $ dt=k\times \frac{1}{2}\frac{D^{(2)}}{da^2} $. Note that the y-axis is broken into parts to enhance the visibility of the optimal time-step parameter. (Right) The time step multiplier is set at $ k= 300 $, while the spatial resolution $ da $ is varied.

Submit a response

Comments

No Comments have been published for this article.