Enhancing dynamical system modeling through interpretable machine-learning augmentations: a case study in cathodic electrophoretic deposition

Christian Jacobsen; Jiayuan Dong; Mehdi Khalloufi; Xun Huan; Karthik Duraisamy; Maryam Akram; Wanjiao Liu

doi:10.1017/dce.2024.51

Enhancing dynamical system modeling through interpretable machine-learning augmentations: a case study in cathodic electrophoretic deposition

Published online by Cambridge University Press: 08 January 2025

Xun Huan ,

Maryam Akram and

Christian Jacobsen*: Affiliation:
Department of Aerospace Engineering, Ann Arbor, MI, USA
Jiayuan Dong: Affiliation:
Department of Mechanical Engineering, Ann Arbor, MI, USA
Mehdi Khalloufi: Affiliation:
Department of Mechanical Engineering, Ann Arbor, MI, USA
Xun Huan: Affiliation:
Department of Mechanical Engineering, Ann Arbor, MI, USA
Karthik Duraisamy: Affiliation:
Department of Aerospace Engineering, Ann Arbor, MI, USA
Maryam Akram: Affiliation:
Ford Research and Innovation Center, Dearborn, MI, USA
Wanjiao Liu: Affiliation:
Ford Research and Innovation Center, Dearborn, MI, USA
*: Corresponding author: Christian Jacobsen; Email: csjacobs@umich.edu

Article contents

Abstract
Impact Statement
Introduction
Mathematical formulation
Parameter inference using experimental data
Model updates
Conclusions
Data availability statement
Author contribution
Competing interest
Funding statement
Ethical standard
Footnotes
References

Abstract

We introduce a comprehensive data-driven framework aimed at enhancing the modeling of physical systems, employing inference techniques and machine-learning enhancements. As a demonstrative application, we pursue the modeling of cathodic electrophoretic deposition, commonly known as e-coating. Our approach illustrates a systematic procedure for enhancing physical models by identifying their limitations through inference on experimental data and introducing adaptable model enhancements to address these shortcomings. We begin by tackling the issue of model parameter identifiability, which reveals aspects of the model that require improvement. To address generalizability, we introduce modifications, which also enhance identifiability. However, these modifications do not fully capture essential experimental behaviors. To overcome this limitation, we incorporate interpretable yet flexible augmentations into the baseline model. These augmentations are parameterized by simple fully-connected neural networks, and we leverage machine-learning tools, particularly neural ordinary differential equations, to learn these augmentations. Our simulations demonstrate that the machine-learning-augmented model more accurately captures observed behaviors and improves predictive accuracy. Nevertheless, we contend that while the model updates offer superior performance and capture the relevant physics, we can reduce off-line computational costs by eliminating certain dynamics without compromising accuracy or interpretability in downstream predictions of quantities of interest, particularly film thickness predictions. The entire process outlined here provides a structured approach to leverage data-driven methods by helping us comprehend the root causes of model inaccuracies and by offering a principled method for enhancing model performance.

Keywords

e-coat machine-learning augmentations neural ODE uncertainty quantification variational inference

Information

Type: Research Article
Information: Data-Centric Engineering , Volume 6 , 2025 , e4

DOI: https://doi.org/10.1017/dce.2024.51 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices: Open materials
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Impact Statement

We present an approach to enhancing the modeling of physical systems through a data-driven framework that integrates inference techniques and machine-learning enhancements along with baseline physical models. By focusing on cathodic electrophoretic deposition (e-coating) as a case study, this work systematically identifies limitations in existing physical models using experimental data. The approach presented addresses these limitations by incorporating adaptable model enhancements, significantly improving model parameter identifiability and generalizability. A key contribution of this work is the introduction of interpretable and flexible augmentations to baseline models, parameterized by simple, fully-connected neural networks. Leveraging machine-learning tools, particularly neural ordinary differential equations, the framework effectively learns these augmentations, resulting in simulations that more accurately capture observed behaviors and enhance predictive accuracy. Despite these improvements, the study highlights the potential to reduce off-line computational costs by eliminating certain dynamics without compromising accuracy or interpretability in downstream predictions, particularly for film thickness. This structured approach not only elucidates the root causes of model inaccuracies but also offers a principled method for enhancing model performance.

1. Introduction

Cathodic electrophoretic deposition (EPD), commonly known as e-coating, is a pivotal technique used in industries, such as automotive and manufacturing, to apply protective coatings to various surfaces, preventing corrosion and ensuring durability. Achieving optimal coating properties and process efficiency requires accurate modeling of EPD dynamics, which presents significant challenges due to the complex electrochemical interactions involved.

The primary challenges in modeling the e-coat process are the uncertainty surrounding the physical properties of the coating film during the deposition and a limited understanding of the underlying physics. Various models proposed in the literature have attempted to address these issues (Boyd and Zwack, Reference Boyd and Zwack1996; Pierce et al., Reference Pierce1981; Rastegar, et al., Reference Rastegar, Ranjbar and Fakhrpour2008; Mišković-Stanković, Reference Mišković-Stanković2002; Ellwood et al., Reference Ellwood, Tardiff, Gray, Gaffney, Braslaw, Moldekleiv and Halvorsen2009), but they often require empirical calibration based on measurements for film initiation and growth or fail to capture experimental behaviors. In addition, the deposition onset of the coating film, characterized by threshold parameters, remains an open question, leading to discontinuities in model outputs and rendering parameter inference difficult. An extensive study on the modeling of the e-coat process and chemistry involved for constant current and constant voltage can be found in Beck (Reference Beck1976); Pierce, Reference Pierce1981.

E-coating in the automotive industry typically involves employing (Sand, Reference Sand1899) Eq. (25) to compute the induction time under constant current (CC) conditions. However, the application of a linearly increasing voltage in time is also common in automotive practices to ensure good throw power and prevent defects associated with high voltage (Marlar et al., Reference Marlar, Trainor, Liu, Ellwood, Okerberg, Padash and Harb2020). To address non-CC and voltage scenarios, an equivalent to Sand’s equation is derived using the Laplace transform, assuming a constant diffusion coefficient of the bath solution.

To enhance the modeling of EPD dynamics, specifically in the context of e-coating, this study aims to modify a baseline model to improve model performance with respect to experimentally obtained data. There exist two major types of uncertainty in the baseline model: parametric and model form uncertainties. Parametric uncertainties refer to the uncertainty or lack of precise knowledge about the values of certain parameters in a computational model while model form uncertainty refers to the uncertainty associated with the choice of computational model itself. Parametric uncertainty is often addressed by leveraging experimental data to perform parameter estimation (maximum likelihood estimation, Bayesian inference, etc.), and model form is typically addressed through physical insight and intuition along with data-driven modeling techniques, the latter of which is highlighted in this work. Experimental uncertainties can also significantly affect both the reliability and interpretability of machine-learning methods and models. With increasing levels of noise or error, the learned model parameters may become less accurate, leading to unreliable predictions. However, in this work, we assume that experimental uncertainties are not the primary cause of inaccuracy in our models but rather the model form itself.

Variational inference methods are employed to estimate the parameters of this model in an effort to address the parametric uncertainty associated with the model. However, inherent unidentifiability issues in the model hinder accurate parameter estimation. To overcome these challenges, modifications are introduced to the baseline model to ensure that all parameters are identifiable and more generalizable across experimental conditions, attempting to address model form uncertainty. Nonetheless, the model’s inability to capture observed behaviors in experimental data prompts the use of machine-learning (ML) tools, particularly neural ordinary differential equation (neural ODE) (Chen et al., Reference Chen, Rubanova, Bettencourt, Duvenaud, Bengio, Wallach, Larochelle, Grauman, Cesa-Bianchi and Garnett2018), to introduce and learn physically relevant and interpretable modifications.

By leveraging neural ODE and carefully crafting learnable augmentations using physical insights, the model’s capability to match experimental data is improved while preserving physical interpretability. We use data to enhance an existing full-order dynamical system, effectively creating a surrogate model in the form of model augmentations rather than entirely discovering the physical dynamics (Brunton et al., Reference Brunton, Proctor and Kutz2016; Rudy et al., Reference Rudy, Brunton, Proctor and Kutz2017). Our aim is that of improving predictive accuracy while maintaining an interpretable model. As the baseline model is computationally efficient, reduced order modeling techniques (Ahmed et al., Reference Ahmed, Pawar, San, Rasheed, Iliescu and Noack2021; Pawar et al., Reference Pawar, Rahman, Vaddireddy, San, Rasheed and Vedula2019; San and Maulik, Reference San and Maulik2018; Hasegawa et al., Reference Hasegawa, Fukami, Murata and Fukagata2020) are not addressed in this work. To this end, simulations and comparisons with experimental results demonstrate the effectiveness of the improved model in capturing observed behaviors and enhancing the predictive accuracy of the baseline e-coat dynamics model, notably for film thickness prediction, while augmentation forms remain physically interpretable and flexible.

Recent advancements in hybrid machine-learning and data assimilation (DA) techniques have demonstrated the potential to model complex dynamical systems, especially when integrating simulation and observational data. These methods, while showing promise in representing high-dimensional and noisy systems, often prioritize predictive power over interpretability. Our approach aims to maintain a balance between interpretability and accuracy. By leveraging variational inference and neural ODE to design model augmentations, we preserve physical insights while introducing flexibility to model modifications. Hybrid approaches such as those discussed by Cheng et al. (Reference Cheng, Quilodrán-Casas, Ouala, Farchi, Liu, Tandeo, Fablet, Lucor, Iooss, Brajard, Xiao, Janjic, Ding, Guo, Carrassi, Bocquet and Arcucci2023). offer complementary strengths, particularly in terms of uncertainty quantification and real-time adaptability. However, our focus on augmenting the physical model with learnable components allows for better adherence to known physical principles, making it suitable for applications where both interpretability and predictive accuracy are critical. This approach aligns with our goal of improving the e-coating model without compromising the transparency of its inner workings. Accurately predicting the polymer film thickness over time is of utmost interest for time and cost savings in industrial applications, and this study argues that not all behaviors present in experimental data are necessary for accurate film thickness predictions. Removing unnecessary complex physical behavior from the model leads to a more efficient formulation of the e-coat process, striking a balance between interpretability, predictive accuracy, and efficiency.

A contribution of this work is to illustrate a principled process of clearly understanding and evaluating the root cause of model shortcomings followed by improving model performance using these insights along with experimental data. Together, our methods can be useful for addressing both parametric and model form uncertainties in computational models where experimental data are available.

The prominent aspects of our approach include the use of variational inference techniques to estimate model parameters from data, an in depth identifiability analysis of parameters in the baseline e-coat model, identifiability and generalizability improvements to the baseline model based on experiment type, and the introduction of learnable ML augmentations to improve model accuracy while retaining interpretability. Interpretability in this work refers to the ability of a model to provide insights into its decision-making process in a way that is understandable to humans, specifically by aligning its operations and outputs with known physical laws and principles. This concept goes beyond mere prediction accuracy, aiming to make the internal workings of the model transparent, so that its predictions can be directly related to physical phenomena and understood in terms of physical concepts.

We first illustrate issues with the baseline e-coat dynamics model during inference and prediction which inform the introduction of physically relevant augmentations to improve predictive accuracy. In particular, through variational Bayesian inference, we attempt to infer the parameters of the baseline model and show that identifiability issues hinder the process. Slightly altering the model results in an identifiable model which is more generalizable, but important physically relevant behavior is found to be entirely absent from the model. Finally, we introduce interpretable ML augmentations to improve the accuracy of the baseline model on experimental data. The final model is assessed for both CC and voltage ramp (VR) cases concerning experimental data while trained only on VR cases.

The organization of this manuscript is as follows. Section 2 introduces the baseline model for electro-deposition. Section 3 presents the Bayesian inference methods for estimating the parameters of the baseline model with quantified uncertainty and illustrates the issue of parameter unidentifiability. Section 4 introduces a new model based on a variation of Sand’s equation, valid for any CC/VR experiment, but we show that this model fails to capture all relevant physical behavior observed in experimental data. We then introduce an interpretable ML-based augmentations to the model later in Section 4 to better capture experimental data dynamics and show that removing some of the most complex physical behavior can greatly improve model efficiency without negatively impacting film thickness growth predictions much. Finally, the study concludes in Section 5, highlighting the contributions to e-coating modeling and offering insights into potential applications in diverse fields.

2. Mathematical formulation

In this section, we present the mathematical formulation of the baseline deposition model (Ellwood et al., Reference Ellwood, Tardiff, Gray, Gaffney, Braslaw, Moldekleiv and Halvorsen2009), a computational framework for solving the baseline model, and the corresponding measurement data acquired from experiments.

2.1. Baseline model

The baseline model is one-dimensional in space, consisting of a cathode and anode placed length $ L $ apart and filled with a solution of suspended colloidal particles in between (see Figure 1). After a voltage is applied and prior to film deposition, an electrochemical reaction takes place at the cathode with 2H⁺ + 2e⁻ → H₂ or 2H₂O + 2e⁻ → 2OH⁻ + H₂ (Ellwood et al., Reference Ellwood, Tardiff, Gray, Gaffney, Braslaw, Moldekleiv and Halvorsen2009). As the reaction proceeds, the bath solution basicity increases until a critical pH value is reached and film deposition starts. Film deposition is defined by suspended colloidal particles being deposited on the anode, increasing the circuit resistance.

Figure 1. Initial setup for the 1D case.

The film deposition process is separated into three components. First, the electric field within the bath is computed using the conservation of current density given by

(1)

$$ \nabla \cdot \mathbf{j}=0 $$

(2)

$$ \mathbf{j}=-{\sigma}_{\mathrm{bath}}\nabla \phi $$

(3)

$$ {\left.\phi \right|}_{\Gamma}={R}_{\mathrm{film}}\ {j}_n\hskip1em \mathrm{at}\ \mathrm{the}\ \mathrm{interface}\ \mathrm{film}\hbox{-} \mathrm{bath}, $$

where j is the current density, $ {\sigma}_{\mathrm{bath}} $ is the bath conductivity, $ {j}_n=\mathbf{j}\cdot \mathbf{n} $ is the normal component of the current density, $ \phi $ is the electrical potential, R _film is the film resistance, and $ \varGamma $ represents the interface between the film and the bath. Second, once film deposition begins, deposition rate is defined as

(4)

$$ \frac{\mathrm{d}h}{\mathrm{d}t}={C}_v{j}_n, $$

where $ h $ is the film thickness and $ {C}_v $ is the Coulombic efficiency. Third and finally, the film thickness and film resistance are related through

(5)

$$ \frac{\mathrm{d}{R}_{\mathrm{film}}}{\mathrm{d}t}=\rho \left({j}_n\right)\frac{\mathrm{d}h}{\mathrm{d}t}, $$

where $ \rho \left(\mathbf{j}\right) $ is the film resistivity. In particular, $ \rho \left(\mathbf{j}\right) $ is a decreasing function of the current density, and we adopt an empirical estimate from (Liu and Kevin Ellwood, Reference Liu and Kevin Ellwood2017) of

(6)

$$ \rho =\max \left(8\times {10}^5\exp \left(-0.1j\right),2\times {10}^6\right). $$

Before deposition begins, the right-hand sides of Eqs. (4) and (5) are both equal to 0. The model defines the deposition onset event according to two criteria: the minimum current density and minimum charge conditions. The onset criteria are critical for accurately predicting the film thickness growth in time. If both of the conditions are met, film deposition begins in the baseline model.

The first condition which determines deposition onset is a minimum current density condition. Once the current density at the cathode reaches a threshold value $ {j}_{\mathrm{min}} $ , the film thickness begins to increase. The onset condition parameter $ {j}_{\mathrm{min}} $ is unknown and estimated or inferred from experimental data. The second onset condition is a minimum charge condition. The minimum charge criterion assumes that the deposition does not start until the accumulated charge on the cathode reaches a threshold value $ {Q}_{\mathrm{min}} $ , with the electric charge $ Q $ defined by

(7)

$$ Q(t)={\int}_t{j}_n\mathrm{d}t. $$

The deposition onset threshold $ {Q}_{\mathrm{min}} $ is also unknown and estimated or inferred from experimental data. Once both the minimum charge and minimum current conditions are met, film thickness increase as

(8)

$$ \frac{\mathrm{d}h}{\mathrm{d}t}={C}_v\hskip0.1em {j}_n\ \mathrm{for}\ Q>{Q}_{\mathrm{min}}\ \mathrm{and}\ j>{j}_{\mathrm{min}}. $$

Additional model parameters have already been estimated from previous experimental data. We thus fix those parameters to values presented in Table 1 and focus only on estimating the key unknown parameters $ {C}_v $ , $ {j}_{\mathrm{min}} $ and $ {Q}_{\mathrm{min}} $ in the baseline model.

Table 1. Summary of baseline model parameters

2.2. Computational formulation

We consider two main types of experiments in our computational setup: VR and CC. In VR, the voltage is increased linearly with time at a rate of $ {V}_R $ such that $ \phi \left(t,x=L\right)=V\left(t,x=L\right)={V}_Rt $ . Electric potential is denoted $ \phi $ or $ V $ interchangeably in this work. In CC, the current density at the anode is held constant. Experimentally, the current can only be held constant until some maximum voltage $ {V}_{\mathrm{max}} $ determined by the available equipment. Numerically, once the maximum voltage is reached in CC, $ {V}_{\mathrm{max}} $ is enforced instead of constant current.

Eqs. (1) through (3) are the Poisson equation with Robin boundary condition on the film/bath interface. This model is imposed for both experiments, but the boundary condition at the anode is different for each. Both experiment types are modeled by

(9)

$$ {\sigma}_{\mathrm{bath}}\frac{\partial^2\phi }{\partial {x}^2}=0\hskip1em \mathrm{in}\ \mathrm{the}\ \mathrm{bath} $$

(10)

$$ \phi -{R}_{\mathrm{film}}{\sigma}_{\mathrm{bath}}\frac{\partial \phi }{\partial x}=0\hskip1em \mathrm{at}\hskip0.35em \mathrm{the}\ \mathrm{interface}\ \mathrm{film}\hbox{-} \mathrm{bath}, $$

additionally with VR having an anode boundary condition

(11)

$$ {\phi}_{\mathrm{anode}}(t)={\phi}_{t=0}+{\phi}_{\mathrm{ramp}}(t), $$

and CC having an anode boundary condition

(12)

$$ {\sigma}_{\mathrm{bath}}\frac{\partial \phi }{\partial x}={j}_0. $$

These equations offer an analytic relationship between film resistance and current in 1D for VR:

(13)

$$ j(t)=\frac{\sigma V\left(t,x=L\right)}{\sigma {R}_{\mathrm{film}}(t)+L}. $$

As $ V\left(t,x=L\right) $ is set as the boundary condition, we can solve only the resistance dynamics in Eq. (5) and compute $ j(t) $ from $ V $ and $ {R}_{\mathrm{film}} $ . For CC, the current is set by the experimental conditions as $ {j}_0 $ , and the voltage at the anode can be computed by

(14)

$$ \frac{\partial V}{\partial t}=\frac{\partial {R}_{\mathrm{film}}}{\partial t}{j}_0\ \mathrm{for}\ V<{V}_{\mathrm{max}}, $$

where $ {V}_{\mathrm{max}} $ is the maximum experimental voltage. If $ V\ge {V}_{\mathrm{max}} $ in CC, the relationship between voltage, current, and resistance is given by Eq. (13).

To simulate the baseline dynamics, any black box ODE solver can be implemented to solve Eq. (5) forward in time, computing the time evolution of film resistance. Current, voltage, and charge are given as boundary conditions, computed analytically from Eq. (13), or solved along with resistance dynamics using Eq. (14).

2.3. Experimental setup and data

Experimental data are acquired from a laboratory setup that approximates the computational setting. In the experiment, a 16.0 cm² square anode and cathode are placed at the ends of a long e-coat bath and connected to a power source. A voltage is applied across the anode and the cathode according to either the VR or CC setup. During the course of each experiment, the voltage, current, and film resistance are all measured at a frequency of 10 Hz. For some experiments, thickness measurements are obtained by setting the voltage to zero at some time and measuring the thickness of the film at that time. Note that measuring the thickness terminates the experiment. A depiction of the experimental setup is given in Figure 2

Figure 2. Experimental setup.

Experiments are performed for both VR and CC at different experimental conditions, for a total of six configurations (see Table 2 for an overview). Multiple trials are repeated for each configuration, where each trial is performed to a different final time in obtaining its thickness measurement. The experimental data are collected in $ \mathcal{D}={\left\{{\left\{j\right\}}_i,{\left\{R\right\}}_i\right\}}_{i=1}^6 $ , where $ {\left\{j\right\}}_i,{\left\{R\right\}}_i $ respectively represent the sets of current and resistance measurements for each of the six configurations. We do not use voltage data as they can be computed analytically from resistance and current. The data used during inference and learning are truncated for each experiment type $ i $ to time $ {t}_i $ such that data have been gathered for at least 3 trials at all times $ t\le {t}_i $ . This is done to provide more accurate estimates of the variance during likelihood computation. An example of current measurements $ {\left\{j\right\}}_1 $ from the 13 trials under configuration VR, $ {V}_R=1V $ is shown in Figure 3.

Table 2. Descriptions of the experimental data for each of the six experimental conditions

Figure 3. Visualization of the $ {\left\{j\right\}}_1 $ experimental data (all 13 trials) for configuration VR = 1.0. Each trial ends at a different time, and data are sampled at a rate of 10 Hz.

3. Parameter inference using experimental data

Under the standard Bayesian inference framework, the initial knowledge of parameters of interest $ \Theta :\omega \to {\mathrm{\mathbb{R}}}^m $ is described by the prior $ p\left(\theta \right) $ . The observation of data $ Y:\Omega \to {\mathrm{\mathbb{R}}}^n $ given by realizations $ y $ are assumed to be related to the parameters through a likelihood $ p\left(y|\theta \right) $ . The posterior $ p\left(\theta |y\right) $ gives the updated knowledge about parameters $ \theta $ after observing data $ y $ , and is given by Bayes rule:

(15)

$$ p\left(\theta |y\right)=\frac{p\left(\theta \right)p\left(y|\theta \right)}{p(y)}, $$

where $ p(y) $ is known as the evidence and is typically intractable to compute. However, the evidence is constant for some data distribution as does not depend on the parameters $ \theta $ . In our applications, we are concerned with optimization techniques which require only the log-likelihood and log prior. Thus, the evidence is neglected as a constant and not computed.

In the baseline e-coat model, our parameters of interest are $ \theta =\left\{{j}_{\mathrm{min}},{C}_v,{Q}_{\mathrm{min}}\right\} $ . Our goal is to infer these parameters given the experimental data described in Section 2.3. The truncated normal distributions are used for the prior to ensure the positive support while still enables the specification of the mode and spread regarding the parameters. The data random variable $ Y $ consists of both current $ j $ and film resistance $ {R}_{\mathrm{film}} $ . We assume a measurement model of

(16)

$$ y\left(\theta, t;\eta \right)=G\left(\theta, t;\eta \right)+\unicode{x025B} \left(t,\eta \right) $$

where $ G\left(\theta, t;\eta \right)=\left\{{R}_{\mathrm{film}}\left(\theta, t,\eta \right),j\left(\theta, t,\eta \right)\right\} $ , computed by simulating the baseline model, and is a deterministic function of $ \theta $ , $ t $ , and the experimental configuration parameters $ \eta $ . The measurement noise $ \unicode{x025B} \left(t,\eta \right)\sim \mathcal{N}\left(0,\Sigma \left(t,\eta \right)\right) $ is a function of time and experimental configuration parameters, and we estimate the covariance matrix $ \Sigma \left(t,\eta \right) $ from experimental data.

Considering our experimental data as $ \mathcal{D} $ , the posterior we seek to approximate is given by

(17)

$$ p\left(\theta |\mathcal{D}\right)=\frac{p\left(\mathcal{D}|\theta \right)p\left(\theta \right)}{p\left(\mathcal{D}\right)}. $$

The integration involved in directly computing $ p\left(\mathcal{D}\right)={\int}_{\theta }p\left(\mathcal{D},\theta \right) d\theta $ (the “brute force" approach) is expensive to evaluate, particularly in high dimensions. Therefore, numerical techniques have been widely used to approximate the posterior. In addition, computing the posterior directly does not on its own necessarily allow samples to be easily drawn from the posterior. We discuss various inference techniques investigated to perform more efficient inference on the parameters of interest and easily sample from the inferred distributions. The brute force method of directly computing the posterior is referred to as the gridding approach hereafter due to computing the posterior on a discretized grid in the parameter space. Other methods investigated are all forms of variational inference in which the inference problem is transformed to an optimization problem (Duraisamy, Reference Duraisamy2021; Tran et al., Reference Tran, Nguyen and Dao2021; Guo, Reference Guo, Wang, Fan, Broderick and Dunson2017; Rezende and Mohamed, Reference Rezende, Mohamed, Bach and Blei2015.

3.1. Likelihood

Each of the inference methods described require the computation of the (log) likelihood. Given the assumed measurement model in Eq. (16), the likelihood $ p\left(\mathcal{D}|\theta \right) $ is a Gaussian distribution with mean $ G\left(\theta, t;\eta \right) $ and covariance matrix $ \Sigma \left(t,\eta \right) $ . We assume that all experiments, trials, and samples within each trial are independent. While the later assumption may not be strictly true, it significantly decreases the computational cost of computing the log likelihood due to a diagonal covariance matrix. The log-likelihood is therefore given by

(18)

$$ \log\ p\left(\mathcal{D}|\theta \right)=-\frac{1}{2}\sum \limits_{i=1}^6\sum \limits_{l=1}^{n_i}\sum \limits_{r=1}^{10{t}_{l,i}}{\left(G\left(\theta, \mathrm{0.1}r;{\eta}_i\right)-{\mathcal{D}}_{i,l,r}\right)}^T\Sigma \left(0.1r,{\eta}_i\right)\left(G\left(\theta, \mathrm{0.1}r;{\eta}_i\right)-{\mathcal{D}}_{i,l,r}\right)+Z, $$

where the time is given by $ 0.1r $ due to the constant sampling rate of 10 Hz, and $ {\mathcal{D}}_{i,l,r} $ denotes the 2D vector $ {\left[{\left\{j\right\}}_i^{(l)}(0.1r),{\left\{R\right\}}_i^{(l)}(0.1r)\right]}^T $ , the resistance and current for experiment $ i $ in trial $ l $ at time $ 0.1r $ . In addition, the matrix $ \Sigma \left(0.1r,{\eta}_i\right) $ is diagonal and is computed for experiment $ i $ at time $ 0.1r $ by computing the variance of current and resistance data over all trials which contain data at time $ 0.1r $ . The time $ {t}_{l,i} $ denotes the final experiment time for trial $ l $ of experimental configuration $ i $ . Finally, $ Z $ is the negative log of the likelihood normalization constant which does not depend on $ \theta $ .

3.2. Gridding approach

Computing the Bayesian posterior is typically prohibitively expensive in practice. For our application, simulating the baseline model is the dominant bottleneck in terms of time complexity. Evaluating the forward model as little as possible will provide the greatest benefit for the efficiency of the inference process. More efficient methods of approximating the posterior are thus investigated to perform parameter inference. Each of these results is compared to the Bayesian posterior using a gridding approach, which is considered the “true” posterior.

We employ a simple gridding approach to compute the Bayesian posterior in which the parameter space is discretized into a d-dimensional grid of uniform spacing. This grid is then sequentially refined with higher resolution near the maximum a posteriori (MAP) and any modes of the Bayesian posterior.

It is often computationally more stable to compute the log-distributions first, rather than the distribution directly. Considering a single grid point $ {\theta}_i $ , we first compute the quantity

(19)

$$ \log p\left({\theta}_i|y\right)+\log p(y)=\log p\left({\theta}_i\right)+\log p\left(y|{\theta}_i\right) $$

at all grid points in the parameter space. Note that we ignore the evidence term $ p(y) $ in the gridding approach until Eq. (19) is computed at each point. It is assumed that the grid bounds in parameter space are sufficiently large to capture the most significant aspects of the distribution.

The final distribution is computed at each point in the grid by normalizing the quantity computed in Eq. (19) using

$$ p\left({\theta}_i|y\right)=\exp \left(\log p\left({\theta}_i\right)+\log p\left(y|{\theta}_i\right)\right)/Z, $$

where the normalization constant is given by $ Z={\int}_{\Theta}\exp \left(\log p\left(\theta \right)+\log p\left(y|\theta \right)\right) d\theta $ and approximated using some numerical integration scheme based on the selected grid.

We note that the gridding approach can provide a reasonable approximation to the posterior, but it does not provide a straightforward method of sampling from this posterior.

3.3. Variational inference

Variational inference methods (Rezende and Mohamed, Reference Rezende, Mohamed, Bach and Blei2015; Beck, Reference Beck1976; Guo et al., Reference Guo, Wang, Fan, Broderick and Dunson2017; Hoffman et al., Reference Hoffman, Blei, Wang and Paisley2013; Blei et al., Reference Blei, Kucukelbir and McAuliffe2017) are a powerful and versatile class of techniques used in probabilistic and Bayesian modeling. They have an advantage of being very efficient, in particular for high-dimensional problems, compared to Markov chain Monte Carlo-based methods (Metropolis et al., Reference Metropolis, Rosenbluth, Rosenbluth, Teller and Teller1953; Robert and Casella Reference Robert and Casella2005; Robert and Casella, Reference Robert and Casella2011). By formulating the posterior inference as an optimization problem, variational inference methods aim to approximate the Bayesian posterior with a tractable parametric distribution. Only the parameters of the parametric distribution are learned, rather than the typically intractable Bayesian posterior. An optimization problem is formed such that the parameters of the distribution are optimized by minimizing the Kullback–Leibler (KL) divergence (Cove and Thomas Reference Cover and Thomas2006) between the variational approximation and the true posterior. This optimization problem is formulated as minimizing the evidence lower bound (ELBO).

Variational inference assumes a parametric distribution $ {q}_{\phi}\left(\theta |y\right) $ which is used to approximate the true posterior distribution $ p\left(\theta |y\right) $ . The goal of the variational inference is to minimize the discrepancy between the true and approximate distributions by optimizing the distribution parameters. To achieve this, a measure of discrepancy or distance between two distributions, called the KL divergence, is minimized between the two distributions. The KL divergence is given by

(20)

$$ \mathrm{KL}\left[{q}_{\phi}\left(\theta |y\right)\parallel p\Big(\theta |y\Big)\right]={\unicode{x1D53C}}_{\theta \sim {q}_{\phi}\left(\theta |y\right)}\left[\log \left(\frac{q_{\phi}\left(\theta |y\right)}{p\left(\theta |y\right)}\right)\right]. $$

Variational inference seeks to minimize this by solving the optimization problem

(21)

$$ {\phi}^{\ast }=\underset{\phi }{\arg \hskip0.1em \min}\hskip0.22em \mathrm{KL}\left[{q}_{\phi}\left(\theta |y\right)\parallel p\Big(\theta |y\Big)\right]. $$

Using Bayes’ rule and defining the evidence lower bound (ELBO) $ \mathcal{L}(q) $ as

(22)

$$ \mathcal{L}\left(\theta, \phi \right)={\unicode{x1D53C}}_Q\left[\log \left(\frac{p\left(\theta, y\right)}{q_{\phi}\left(\theta |y\right)}\right)\right]={\unicode{x1D53C}}_Q\left[\log p\left(\theta, y\right)\right]-{\unicode{x1D53C}}_Q\left[\log {q}_{\phi}\left(\theta |y\right)\right], $$

the KL divergence can be written as a function of the ELBO by

(23)

$$ \mathrm{KL}\left[{q}_{\phi}\left(\theta |y\right)\parallel p\Big(\theta |y\Big)\right]=-\mathcal{L}\left(\theta, \phi \right)+\log \left(p(y)\right). $$

The optimization problem of Eq. (21) is therefore equivalent to minimizing the negative ELBO:

$$ {\phi}^{\ast }=\underset{\phi }{\arg \hskip0.1em \min }-\mathcal{L}\left(\theta, \phi \right) $$

3.3.1. Gaussian variational inference

Fixed-form variational inference uses simple parametric distributions as the variational posterior. In our experiments, we choose an independent Gaussian distribution $ {q}_{\phi}\left(\theta |y\right)=\mathcal{N}\left({\mu}_{\phi },{\Sigma}_{\phi}\right) $ for the fixed form such that the parameters $ \phi =\left[{\mu}_{\phi },{\Sigma}_{\phi}\right] $ to be optimized consist of the mean $ {\mu}_{\phi}\in {\mathrm{\mathbb{R}}}^m $ and covariance $ {\Sigma}_{\phi}\in {\mathrm{\mathbb{R}}}^{m\times m} $ matrix. Note that the assumption of independence limits the covariance matrix to $ m $ trainable parameters and $ 2m $ total trainable parameters.

Gradient-free optimization techniques such as the Nelder–Mead algorithm (Nelder. and Mead Reference Nelder and Mead1965) scale poorly in high-dimensional problems and are often far less efficient than gradient-based optimization algorithms. This technique, therefore, requires the use of gradients to be effective, and gradients can be computed for the baseline model following the discussion in Section 3.4. We note that gradient-based algorithms are often trapped in local minima due to the often non-convex nature of general objective functions. However, we believe that the efficiency gains of gradient-based algorithms far outweigh the potential of finding suboptimal local minimum solutions.

3.4. Gradients through ODE solve

Our subsequent analysis will employ inference and ML methods that require gradient information. Therefore, ensuring that the ODE solve is differentiable and having an efficient method of computing gradients is critical. To obtain gradients of the ODE forward solve with respect to model parameters, we employ an adjoint-based method known as neural ODE (Böttcher and Asikis, Reference Böttcher and Asikis2022; Chen et al., Reference Chen, Rubanova, Bettencourt, Duvenaud, Bengio, Wallach, Larochelle, Grauman, Cesa-Bianchi and Garnett2018; Linot et al., Reference Linot, Burby, Tang, Balaprakash, Graham and Maulik2023) using any black-box ODE solver. This method computes gradients through the ODE forward solve efficiently. It is also available as part of existing ML libraries such as Pytorch (Paszke et al., Reference Paszke, Gross, Massa, Lerer, Bradbury, Chanan, Killeen, Lin, Gimelshein, Antiga, Desmaison, Kopf, Yang, DeVito, Raison, Tejani, Chilamkurthy, Steiner, Fang, Bai and Chintala2019), which allows its integration with other ML tools.

Propagating gradients through the deposition onset criteria requires extra care. Before film deposition begins, the resistance and film thickness do not increase. Therefore, the following dynamical system is solved:

(24)

$$ \frac{\mathrm{d}h}{\mathrm{d}t}=0,\hskip1em \frac{\mathrm{d}{R}_{\mathrm{film}}}{\mathrm{d}t}=0, $$

with $ V $ , $ j $ , and $ Q $ computed analytically according to the experiment type. When both $ j>{j}_{\mathrm{min}} $ and $ Q>{Q}_{\mathrm{min}} $ , deposition begins and the model dynamics instantaneously switch to

(25)

$$ \frac{\mathrm{d}h}{\mathrm{d}t}={C}_vj,\hskip1em \frac{\mathrm{d}{R}_{\mathrm{film}}}{\mathrm{d}t}=\rho (j)\frac{\mathrm{d}h}{\mathrm{d}t}. $$

As predicting the time of deposition onset is critical for predicting film growth, gradients of the output with respect to the onset condition parameters $ {j}_{\mathrm{min}} $ and $ {Q}_{\mathrm{min}} $ must be computed. However, the instantaneous change in model constitutes an in-place operation (Paszke et al., Reference Paszke, Gross, Massa, Lerer, Bradbury, Chanan, Killeen, Lin, Gimelshein, Antiga, Desmaison, Kopf, Yang, DeVito, Raison, Tejani, Chilamkurthy, Steiner, Fang, Bai and Chintala2019) and must be treated separately. We use ideas from an extension to neural ODE, which adds the ability the compute gradients through instantaneous event handling (Chen et al., Reference Chen, Amos and Nickel2021). Adding an additional switch state $ \xi $ to our model, we computationally solve the following equations instead of Eqs. (24) and (25):

(26)

$$ \frac{\mathrm{d}h}{\mathrm{d}t}=\xi {C}_vj,\hskip1em \frac{\mathrm{d}{R}_{\mathrm{film}}}{\mathrm{d}t}=\xi \rho (j)\frac{\mathrm{d}h}{\mathrm{d}t}. $$

The initial switch state $ \xi \left(t=0\right)=0 $ is switched to $ \xi \left(t={t}_e\right)=1 $ at the event time, defined as the time $ {t}_e $ such that either of the deposition onset criteria are met. This implementation detail allows computing the gradient of the forward solve with respect to the event time, and in turn with respect to the onset criteria parameters $ {j}_{\mathrm{min}} $ and $ {Q}_{\mathrm{min}} $ , using the framework from Chen et al. (2021).

3.5. Parameter identifiability of baseline model

The process of parameter inference on the baseline model provides some insight into the shortcomings of the model and results in proposed updates to the baseline model to address some of these shortcomings.

One of such shortcomings is parameter unidentifiability of the deposition onset criteria parameters, $ {Q}_{\mathrm{min}} $ and $ {j}_{\mathrm{min}} $ . The nature of this double criteria for the onset of deposition creates situations in which one parameter or the other may not be identifiable. For example, consider the case in which the time $ {t}_j $ at which $ j>{j}_{\mathrm{min}} $ is smaller than the time $ {t}_Q $ at which $ Q>{Q}_{\mathrm{min}} $ , such that $ {t}_j<{t}_Q $ . Changing $ {j}_{\mathrm{min}} $ over some range will thus not have any effect on the baseline model output because both conditions $ j>{j}_{\mathrm{min}} $ and $ Q>{Q}_{\mathrm{min}} $ must be satisfied for deposition to begin. In this case, only $ {Q}_{\mathrm{min}} $ controls the deposition onset time and data may be uninformative about $ {j}_{\mathrm{min}} $ . However, the opposite can occur and there also exist cases in which data may be uninformative about $ {Q}_{\mathrm{min}} $ .

To illustrate this identifiability issue more clearly, we generate artificial data from the baseline forward model by taking the “true” parameters to be $ \left[-\log {C}_v,{Q}_{\mathrm{min}},{j}_{\mathrm{min}}\right]=\left[\mathrm{7.0,150.0,1.0}\right] $ . Other experimental parameters are given in Table 1. A total of 10 samples are obtained by simulating a VR experiment with a small amount of noise added. Figure 4 illustrates the negative log-likelihood of this data for each parameter to be inferred assuming that the other two parameters are fixed to their “true” values. For the first two parameters, a minimum exists at the “true” values, indicating that these parameters can be accurately inferred from the simulated data. However, the negative log-likelihood of the minimum current threshold parameter $ {j}_{\mathrm{min}} $ is flat over some region, indicating that the model is not effected by changes in $ {j}_{\mathrm{min}} $ over this region. If the “true” parameter value lies anywhere in the region in which the derivative of the log-likelihood is zero, that parameter is unidentifiable on that region. We use the term “unidentifiable” to mean that the first derivative of the likelihood is zero over some region in parameter space.

Figure 4. Negative log-likelihoods computed from simulated data on a VR experiment using the baseline model with experimental conditions $ {V}_R=0.125 $ , $ \sigma =0.14 $ , $ -\log {C}_v=8.5 $ , $ {Q}_{\mathrm{min}}=100.0 $ , and $ {j}_{\mathrm{min}}=1.5 $ .

For VR experiments, there exists a set of boundaries in parameter space for which either $ {j}_{\mathrm{min}} $ or $ {Q}_{\mathrm{min}} $ will be unidentifiable, or both will be identifiable in baseline model. These boundaries change with the conditions of the VR experiment; gathering data from additional experiments could provide information on a parameter that is not informed by data from a different experiment.

Consider a case in which data exists such that $ {t}_Q<{t}_j $ . This means that $ {t}_j $ will control the deposition onset time. As $ j(t)>0\forall t>0 $ , then $ Q\left(t>{t}_j\right)>Q\left({t}_j\right)>{Q}_{\mathrm{min}} $ . Thus changing $ {Q}_{\mathrm{min}} $ on a range $ 0\le {Q}_{\mathrm{min}}<Q\left({t}_j\right) $ will have no effect on the output of the baseline model and the data will be uninformative about $ {Q}_{\mathrm{min}} $ on this range. Next consider a case in which data exists such that $ {t}_j<{t}_Q $ . Now $ {t}_Q $ will control the deposition onset time. However, $ j\left(t>{t}_Q\right)>j\left({t}_Q\right) $ is not guaranteed, which can be easily seen by taking the time derivative of Eq. (13). If the VR $ {V}_R<\rho (j){C}_vj $ , then $ \mathrm{d}j/\mathrm{d}t<0 $ and it is not guaranteed that $ j\left(t>{t}_Q\right)>j\left({t}_Q\right) $ . If $ j\left(t>{t}_Q\right)<j\left({t}_Q\right) $ , then it is possible for $ j\left(t>{t}_Q\right)<{j}_{\mathrm{min}} $ and deposition stops, followed by an increase in current, restarting deposition, and the cycle repeats. Ultimately, this indicates that the parameter $ {j}_{\mathrm{min}} $ will have an influence on the baseline model output, and data will be informative about $ {j}_{\mathrm{min}} $ . Thus, if $ {t}_j<{t}_Q $ , data may or may not be informative about the parameter $ {j}_{\mathrm{min}} $ , depending on the experimental conditions.

Two identifiability regions corresponding to different experiments are computed empirically and illustrated in Figure 5. This regions are computed by simulating the baseline model according to the experimental conditions for a set of discretized points in the $ {j}_{\mathrm{min}},{Q}_{\mathrm{min}} $ space. For each simulation, identifiability is checked by checking $ {t}_Q $ and $ {t}_j $ . Purple regions indicate that data from the experiment are not informative about $ {j}_{\mathrm{min}} $ if the true value of $ {j}_{\mathrm{min}} $ and $ {Q}_{\mathrm{min}} $ lie in the region. In other words, $ \partial \mathcal{L}/\partial {j}_{\mathrm{min}}=0 $ on the purple region. Cyan regions indicate that data from the experiment is not informative about $ {Q}_{\mathrm{min}} $ if the true values of $ {j}_{\mathrm{min}} $ and $ {Q}_{\mathrm{min}} $ lie in the region, or $ \partial \mathcal{L}/\partial {j}_{\mathrm{min}}=0 $ . Yellow regions indicate that the experiment can inform both $ {j}_{\mathrm{min}} $ and $ {Q}_{\mathrm{min}} $ . The boundary of the cyan region can be computed analytically, but the other is nontrivial and computed empirically. The times at which the deposition onset criteria are met in the VR experiment for the baseline model are given by

(27)

$$ {t}_j=\frac{2{Q}_{\mathrm{min}}}{\sigma {V}_R}\left(\sigma {R}_0+L\right),\hskip1em {t}_Q={\left[\frac{2{j}_{\mathrm{min}}}{\sigma {V}_R}\left(\sigma {R}_0+L\right)\right]}^{1/2}. $$

Figure 5. Identifiability regions of the baseline model for two different experimental conditions. The log-likelihood on the simulated experimental data will be constant in the purple and cyan regions, indicating that little information is gained about $ {j}_{\mathrm{min}} $ if the true value lies in the purple region or $ {Q}_{\mathrm{min}} $ if the true value lies in the cyan region. Note: the “stepping” behavior observed in the identifiability boundaries here are a product of discretizing the $ {j}_{\mathrm{min}} $ and $ {Q}_{\mathrm{min}} $ domains, but the boundaries are in fact smooth.

Setting Eq. (27) equal, we obtain a closed form solution to the $ {Q}_{\mathrm{min}} $ identifiability boundary as

(28)

$$ {j}_{\mathrm{min}}={\left[\frac{2{Q}_{\mathrm{min}}\sigma {V}_R}{\sigma {R}_0+L}\right]}^{1/2}, $$

which has been validated against the empirically computed boundaries shown in Figure 5. We note that the log-likelihoods in Figures. 4b and 7c correspond to the identifiability plane in Figure 5a. The log-likelihoods of $ {Q}_{\mathrm{min}} $ and $ {j}_{\mathrm{min}} $ have first derivative zero over some region as predicted by the identifiability boundaries. The true values of $ {j}_{\mathrm{min}} $ and $ {Q}_{\mathrm{min}} $ are such that $ {j}_{\mathrm{min}} $ is not informed by the simulated experimental data.

If the experimental data does not inform one of the parameters, then that parameter does not influence the output of our baseline model, and it cannot be accurately inferred. However, as shown by Figure 5, the identifiability boundaries change with the type of experiment. This behavior could cause very poor prediction results. Suppose none of the experimental data is informative about $ {j}_{\mathrm{min}} $ , and inference is performed on the model parameters. Using the model in prediction could result in poor performance, especially if predictions are made for an experimental condition in which $ {j}_{\mathrm{min}} $ does influence the output of the model.

It is also necessary to infer robust posteriors in this case. Suppose Gaussian variational inference is selected as the inference method. A Gaussian variational posterior will be computed for each of the parameters to be inferred, providing a unimodal distribution for each. This can be misleading about the Bayesian posterior distribution of the parameter and result in poor uncertainty quantification during prediction.

We explore two approaches toward the aim of improving model predictions by improving the form of the baseline model. First, we update the model based on insight from the shortcomings observed during inference, as described in Section 4.1. We then pursue a different approach in which model augmentations are learning using machine-learning tools to augment the baseline model with previously unmodeled dynamics, as discussed in Section 4.2.

3.6. Inference on baseline model

In this section, we demonstrate inference results using Gaussian variational inference on the baseline model. The purpose of this experiment is to illustrate the shortcomings of the model form itself, not the selected method of inference. Inference is performed using simulated data from the baseline model, and we demonstrate that the posterior predictive results in good prediction performance for some experimental conditions, but poor performance on others.

Data are generated by simulating the baseline model 10 times up to a final time $ T=250s $ for a VR experiment with $ {V}_R=1.0 $ , $ \log {C}_v=-8 $ , $ {j}_{\mathrm{min}}=1.5 $ , $ \sigma =0.14 $ , and $ {Q}_{\mathrm{min}}=100 $ , and add Gaussian random noise $ \unicode{x025B} \sim \mathcal{N}\left(0,{\eta}^2\right) $ , where $ \eta =0.01 $ , at each time step to simulate measurement noise. Only data from current measurements are considered in the inference process. We then perform Gaussian VI for parameters $ \log {C}_v $ , $ {j}_{\mathrm{min}} $ , and $ {Q}_{\mathrm{min}} $ on the baseline model.

After learning the variational posterior $ {q}_{\phi } $ , samples are drawn and used to simulate the baseline model to obtain samples from the posterior predictive. Ten of these samples along with the mean are shown in Figure 6a. For the experimental configuration on which the data are generated, accurate and low variance prediction is observed. However, we then use the variational posterior distribution to predict on a VR experiment in which all parameters are the same except the voltage ramp $ {V}_R=0.125 $ . Figure 6b shows that prediction performance is very poor for this experiment.

Figure 6. Posterior predictive results after performing Gaussian VI on data from a simulated VR experiment. The posterior predictive results in accurate simulations on the data (a), but poor predictions for other experiments (b). This is caused by unidentifiable $ {j}_{\mathrm{min}} $ in the data.

The reason for this stems from the identifiability issues previously discussed in Section 3.5. In our experiment, we assume that the ‘true’ value of $ {j}_{\mathrm{min}} $ is 1.5. However, in the experiment which we gather data from, $ {j}_{\mathrm{min}} $ is unidentifiable in the baseline model. Thus, the inference results in a variational posterior distribution for $ {j}_{min} $ of $ q\left({j}_{\mathrm{min}}|\mathcal{D}\right)=\mathcal{N}\left(\mathrm{2.66,0.01}\right) $ , which is a low variance but poor estimate of the true parameter value. This posterior distribution is then used for prediction in an experiment in which $ {j}_{\mathrm{min}} $ does have an effect on model output, and because the posterior is not accurate, prediction is also inaccurate.

On top of identifiability issues potentially resulting in poor prediction performance, the parameter $ {Q}_{\mathrm{min}} $ is inconsistent accross different experiment types. To illustrate this issue, Gaussian VI is performed on the parameters $ {C}_v $ , $ {j}_{\mathrm{min}} $ , and $ {Q}_{\mathrm{min}} $ twice—first using real experimental data from only the VR experiments and again using only data from the CC experiments. The MAP of the variational posterior for each distribution are very different—in particular for $ {Q}_{\mathrm{min}} $ . Using only VR experimental data, the MAP of the variational posterior is $ {Q}_{\mathrm{min}}\approx 261 $ ; however, using only CC experimental data, the MAP is located at $ {Q}_{\mathrm{min}}\approx 101 $ . This is indicative of $ {Q}_{\mathrm{min}} $ being problematic in allowing the baseline model to accurately predict experimental data with different boundary conditions. Our natural conclusion is that the model is incorrect, and a root cause may lie in a constant $ {Q}_{\mathrm{min}} $ . Rather, $ {Q}_{\mathrm{min}} $ may be a function of the type of experiment being performed rather than a constant to be inferred. In Section 4.1, physically intuitive model updates are introduced with the aim of alleviating the identifiability concerns of the baseline model and improving the modeling of the minimum charge criterion.

4. Model updates

The baseline model was shown to exhibit identifiability as well as generalization deficiencies in Section 3. In this section, we aim to improve the prediction accuracy and generalizability of the baseline model. First, modifications are made based on the inference results of the baseline model to aid in improving parameter identifiability and minimum charge criterion modeling.

An alternative approach to improving the baseline model is also investigated in which machine-learning augmentations are introduced to model system dynamics which are absent from the baseline model. These augmentations are introduced with an emphasis on interpretability of the augmented model while allowing for greater flexibility in model expressiveness.

4.1. Inference-informed modifications

Based on the inference experiments of Section 3, model updates are proposed to alleviate the observed inadequacies during prediction. The issue of identifiability arises in the baseline model due to the double conditional statement that film deposition begins only if $ j>{j}_{\mathrm{min}} $ and $ Q>{Q}_{\mathrm{min}} $ . This conditional statement, in particular $ j>{j}_{\mathrm{min}} $ , creates non-physical, discontinuous behavior of the model. The film growth rate given by Eq. (24) is exactly zero until the conditional statement is true. Assuming that $ {j}_{\mathrm{min}}>0 $ , the film growth rate will instantaneously increase, and a sharp discontinuity in the film thickness growth occurs. The model also does not accurately model the cases in which both onset criteria are met, but the minimum current condition is no longer met at a later time. This behavior is one of the reasons that prediction is observed to be inaccurate in Figure 6b. We therefore propose a model update to create a model in which the dynamics are continuous which also allows for film dissolution by replacing the film thickness dynamics in Eq. (25) with

(29)

$$ \frac{\mathrm{d}h}{\mathrm{d}t}={C}_v\left({j}_n-{j}_{\mathrm{min}}\right)\ \mathrm{for}\ Q>{Q}_{\mathrm{min}},\mathrm{s}.\mathrm{t}.h\ \geqslant\ 0. $$

This also gives the benefit of $ {j}_{\mathrm{min}} $ being identifiable for all experimental configurations.

With $ {j}_{\mathrm{min}} $ now identifiable, the expressiveness of the parameter is investigated to improve generalizability. Assuming that the minimum charge criterion is related to the concentration of $ {\mathrm{OH}}^{-} $ present in the bath, C illustrates that $ {Q}_{\mathrm{min}} $ is not constant accross experiment types, but depends on a different constant $ K $ . It is shown in C that $ {Q}_{\mathrm{min}} $ is a function of this constant, and the function differs between the VR and CC experiments. For VR (Eq. (30)) and CC (Eq. (31)) experiments, this function is given by

(30)

$$ {Q}_{\mathrm{min}}={\left(\frac{81}{128\beta}\right)}^{1/3}{K}^{4/3} $$

(31)

$$ {Q}_{\mathrm{min}}=\frac{K^2}{j_0}, $$

where $ \beta =\sigma {V}_R/\left(\sigma {R}_0+L\right) $ _.

The updated film thickness dynamics of Eq. (29) along with introducing the parameter $ K $ used to compute the minimum charge criterion constitute an updated model which we dub the “inference-informed” model.

Performing the same exercise to visualize the negative log-likelihood as in Figure 4, we visualize the negative log-likelihood of the inference-informed model on artificial data to illustrate that all parameters are now identifiable. In this case, we simulate the data from the same experimental configuration as Figure 4, which is a VR experiment with $ {V}_R=0.125 $ , $ \sigma =0.14 $ , $ -\log {C}_v=8.5 $ , $ {Q}_{\mathrm{min}}=100.0 $ , and $ {j}_{\mathrm{min}}=1.0 $ . However, the parameter $ K $ is used instead of $ {Q}_{\mathrm{min}} $ ; thus, the value of $ K $ is found using the relationship in Eq. (30), obtaining $ K=23.2 $ . The negative log-likelihoods of $ -\log {C}_v $ , $ K $ , and $ {j}_{\mathrm{min}} $ are illustrated in Figure 7, and all parameters exhibit a global minimum at the true parameter values, indicating that all are now identifiable.

Figure 7. Negative log-likelihoods computed from simulated data on a VR experiment using the inference-informed model with experimental conditions $ {V}_R=0.125 $ , $ \sigma =0.14 $ , $ -\log {C}_v=8.5 $ , $ {Q}_{\mathrm{min}}=100.0 $ ( $ K=23.2 $ ), and $ {j}_{\mathrm{min}}=1.5 $ .

4.1.1. Model comparisons

The baseline model and the inference-informed model are directly compared by performing Bayesian inference using the gridding approach discussed in Section 3.2. We then predict for each model based on the maximum a posteriori and compute the negative log-likelihood of the data. Lower values of the negative log-likelihood at the MAP correspond to a model which better fits the experimental data.

The MAP of the approximated posterior using the gridding approach assuming the baseline model is located at $ {\theta}_{\mathrm{MAP}}=\left[-\log {C}_v,{Q}_{\mathrm{min}},{j}_{\mathrm{min}}\right]=\left[\mathrm{7.58,173.5,0.0}\right] $ with a negative log-likelihood at the MAP of $ 3.35\times {10}^7 $ . Assuming the updated model, the MAP is located at $ {\theta}_{\mathrm{MAP}}=\left[-\log {C}_v,K,{j}_{\mathrm{min}}\right]=\left[\mathrm{7.32,44.2,0.63}\right] $ while the negative log-likelihood is $ 2.65\times {10}^7 $ . This suggests that the inference-informed model performs better than the baseline model by having the potential to represent the experimental data more accurately.

Each model prediction at the map is compared to the experimental data in Figure 8. The current predictions are shown for two experiments: VR with $ {V}_R=1.0\;\mathrm{V}/\mathrm{s} $ and CC with $ {j}_0=7.5\;\mathrm{mA} $ . Predictions for all experiments on current, resistance, and thickness data are included in D. Prediction of the inference-informed model for CC experiments exhibits significant improvement over the baseline model, likely due to the improved parameterization of the minimum charge criterion $ {Q}_{\mathrm{min}} $ . However, there is clearly some physical behavior which exists in the VR experiments which is not present in our model. For instance, there are two current “peaks” in experimental data for VR experiments, but only one is possible in model predictions. In addition, although the thickness prediction of the inference-informed model is closer to the experimental data than the baseline model, Figure 9 illustrates that neither are particularly accurate. Ultimately, thickness prediction is the most important quantity of interest. We thus turn toward improving the model through augmenting the model forming and learning the augmentations via a machine-learning based framework.

Figure 8. Comparisons between current prediction on the baseline model and inference-informed model at the MAP for each on (a) VR experiment with $ {V}_R=1.0V/s $ and (b) CC experiment with $ {j}_0=7.5\; mA $ .

Figure 9. Comparisons between film thickness prediction on the baseline model and inference-informed model at the MAP for each.

4.2. Machine-learning augmentations

Modifying the model using insights from the inference process aids in understanding some shortcomings due to model form but results in limited improvement to predictive performance. Additional physical behavior is missing from the model itself, which results in poor performance even when the optimal model parameters are identified. There are two main features absent from the baseline model: the presence of two peaks in the current dynamics for the VR experiments and smooth behavior of those peaks. Unstable behavior in data for CC experiments observed just before the drop in current is due to the current controller in the experiments, which we do not account for in our model.

To incorporate the two missing physical features in our model, we turn to machine learning with an emphasis on interpretability. In particular, we augment the right hand side of the dynamics model with parameterized neural networks and train the augmentations using NeuralODE (Chen et al., Reference Chen, Rubanova, Bettencourt, Duvenaud, Bengio, Wallach, Larochelle, Grauman, Cesa-Bianchi and Garnett2018 This can be used effectively to utilize adjoint-based gradient computation alongside standard machine-learning pipelines to learn flexible augmentations in the right hand side of a dynamical system.

We first attempt to characterize the underlying physics of each of the two peaks present in the experimental current data for VR experiments. The second peak corresponds to the onset of deposition, which is previously modeled in the baseline and inference-informed models. As film deposition begins, resistance increases and current decreases. However, the baseline model assumes an instantaneous onset of film deposition which results in a discontinuous and physically impossible film growth rate. Experimental data illustrate smooth transitions to film growth in current data, further supporting the need for additional modeling. This smooth transition is likely the result of portions of the surface being coated in a non-uniform manner, resulting in only a fraction of the material surface being coated for a small time. We propose an augmentation to the baseline model, which has the additional benefit of modeling the deposition onset time without the need of threshold parameters. We propose multiplying the right hand side of Eq. (25) by a learnable term $ {g}_{\phi}\left(V,Q\right) $ which varies smoothly between zero and one. This term is dubbed the “coverage fraction model,” and it is a function of voltage and charge only, representing the coverage fraction of the film on the material. It is found empirically that using both voltage and charge as inputs to the model results in the best predictive performance. It is a function of charge due to the dependence of the deposition onset time of the baseline model on charge, and it is a function of voltage due to a change in the time derivative of voltage across experiment types.

The first peak present in the current data for VR experiments is caused by a phenomenon ignored by the baseline model altogether. This peak may be caused by an oxygen-reduction reaction (Nagai et al., Reference Nagai, Onishi and Amaya2012), which occurs at the anode as the voltage increases. The relationship between electric potential and current in redox reactions is described by the (Dickinson and Wain, Reference Dickinson and Wain2020) Eq. (11) of the form

(32)

$$ j={j}_0\left[\exp (aV)-\exp (bV)\right], $$

where the parameters $ a $ and $ b $ depend on many physically relevant parameters. However, the particular form is not relevant to the discussion here as we aim to learn this component of the model while keeping the general form to ensure that the model is physically interpretable. We assume that the combined effect of this redox reaction results in a resistance “source” such that $ j={c}_1\exp \left({c}_2V\right) $ , based on the form of Eq. (32).

After some critical point is reached (which we do not explicitly model, but likely corresponds to some minimum charge criterion), we assume that the $ {\mathrm{OH}}^{-} $ starts being diffused at some rate (Nagai et al., Reference Nagai, Onishi and Amaya2012), which decreases the $ {\mathrm{OH}}^{-} $ concentration. As the redox reaction continues, we assume that the combined effect of these two reactions results in a different exponent such that $ j={c}_3\exp \left({c}_4V\right) $ , where $ {c}_4<0 $ .

To augment the VR experiment model with this behavior of switching between two different exponents resulting from a combination of reactions, we use an exponential function in which the argument includes a hyperbolic tangent function which can vary smoothly between two different values. We thus assume an augmented model of the form

(33)

$$ j=\frac{\sigma V}{\sigma {R}_{\mathrm{film}}+L}+{c}_1\exp \left({c}_2{Vf}_{\theta}\left(V,Q\right)\right),\hskip1em {f}_{\theta}\left(V,Q\right)=\tanh \left(-{\hat{f}}_{\theta}\left(V,Q\right)\right), $$

The augmented model is finally expressed by

(34)

$$ {f}_{\theta}\left(V,Q\right)=\tanh \left(-{\hat{f}}_{\theta}\left(V,Q\right)\right),\hskip1em {g}_{\phi}\left(V,Q\right)=\frac{1}{1+{c}_3\exp \left(-{c}_4{\hat{g}}_{\phi}\left(V,Q\right)\right)} $$

(35)

$$ \frac{\mathrm{d}{R}_{\mathrm{film}}}{\mathrm{d}t}={g}_{\phi }(V)\rho (j){jC}_v,\hskip1em j={c}_1\left(\exp \left({c}_2{Vf}_{\theta }(V)\right)\right)+\frac{\sigma V}{\sigma {R}_{\mathrm{film}}+L}, $$

for VR experiments and

(36)

$$ {g}_{\phi}\left(V,Q\right)=\frac{1}{1+{c}_3\exp \left(-{c}_4{\hat{g}}_{\phi}\left(V,Q\right)\right)},\hskip1em \frac{\mathrm{d}{R}_{\mathrm{film}}}{\mathrm{d}t}={g}_{\phi }(V)\rho (j){jC}_v, $$

for CC experiments. Note that the relationship between voltage, current, and resistance are unchanged from the baseline model in the CC experiments even with the machine-learning augmentations introduced to the model. The model augmentation functions $ {\hat{g}}_{\phi } $ and $ {\hat{f}}_{\theta } $ are parameterized by FNNs with four layers, eight nodes per layer, and ReLU activation functions. These are learned along with the constants $ {c}_1,{c}_2,{c}_3 $ and $ {c}_4 $ using the NeuralODE framework using only current data from the three VR experiments. Thus, applying the model on the CC experiments is purely prediction, as none of the data is seen during training. We learn the parameters $ \theta, \phi, {C}_v $ , $ \sigma, {c}_1,{c}_2,{c}_3 $ , and $ {c}_4 $ in our experiments and show prediction results. Figure 10 illustrates the results of the final trained model, better capturing the “double-peaked” nature of the current in experimental data for VR experiments. Here, we show current predictions for only two experimental configurations, but current and resistance predictions for all other experimental configurations are provided in E.

Figure 10. Current prediction on the machine-learning augmented model trained with the first peak model.

Furthermore, thickness predictions for all experimental configurations are illustrated in Figure 11 along with predictions from the baseline and inference-informed models for comparison. Film thickness predictions show an improvement over the baseline model in all cases except for a single experimental configuration: VR experiments with $ {V}_R=0.125 $ . We expect that this is due to some unmodeled behavior in the low-voltage regime which is difficult to capture. However, in CC experiments and larger voltage values in VR experiments, our augmentations provide significant improvement to model predictions.

Figure 11. Comparisons between film thickness prediction on the baseline model, inference-informed model, and ML-augmented model with first peak.

Learning the first peak model function $ {f}_{\theta } $ in the neural ODE framework is significantly more expensive computationally than training only the coverage fraction model $ {g}_{\phi } $ due to the large gradient change requiring a finely discretized time step to adequately resolve. In addition, the first peak model performs notably poorly for the VR experiment in the low-voltage ramp regime. We thus remove the “first peak” model and retrain only the augmentation function $ {g}_{\phi}\left(V,Q\right) $ by masking out the experimental data corresponding to the first peak in current for VR experiments. This results in more efficient training and thickness prediction while providing quite similar predictions when the first peak model is included. These results show that modeling the dynamics of the first peak may unnecessarily decrease computational efficiency while still providing more accurate thickness prediction, which is ultimately the goal. Current predictions for two experiments (one VR and one CC) are shown in Figure 12, and thickness predictions are shown and compared to all other models presented in Figure 13. These results again illustrate a significant improvement in thickness prediction without the need for including the more expensive first peak model, indicating that modeling such behavior may be largely unnecessary for predicting thickness. However, prediction in the low voltage ramp regime is actually worse than with the first peak model included, further supporting our hypothesis that there exist additional unmodeled dynamics in such cases. Again, all other predictions for current and resistance using our model augmentations without the first peak model included are provided in F.

Figure 12. Current prediction on the machine-learning augmented model trained without the first peak model, shown for (a) VR experiment with $ {V}_R=1.0\;\mathrm{V}/\mathrm{s} $ and (b) CC experiment with $ {j}_0=7.5\;\mathrm{mA} $ .

Figure 13. Comparisons between film thickness prediction on the baseline model, inference-informed model, and ML-augmented model without first peak.

5. Conclusions

We presented a comprehensive framework to introduce adaptable yet interpretable machine-learning-based model augmentations. Our initial objective of parameter inference unveiled significant limitations in the existing baseline model, necessitating the integration of diverse model refinements aimed at rectifying these deficiencies.

Even with such enhancements, however, certain physical behaviors inherent in experimental data were not fully replicated by the models. Notably, instances such as the emergence of two peaks in current data during VR experiments and the smooth evolution of current data remained elusive. To address these disparities, we devised physically meaningful augmentations, seamlessly integrated them into the model, and harnessed a neural ODE framework for implementing learnability. This approach to augmentation construction maintains interpretability while harnessing the expressive power of neural networks to allow enhanced adaptability.

The integration of these augmentations yields models that demonstrate improved predictive accuracy when compared against empirical data. Nonetheless, limitations persist. One key challenge lies in the computational costs associated with modeling the “first peak.” While we found that accurate film thickness predictions could still be achieved without modeling this first peak, its omission leads to a reduction in computational burdens. Despite the reduced offline computational cost, especially when the first peak model is omitted, there remain areas for improvement in balancing accuracy and efficiency, particularly for real-time applications or online model improvement.

Future work could explore alternative augmentation methods or hybrid modeling techniques to better capture unmodeled phenomena without increasing computational demands. Although the ML-enhanced models required some offline computational cost, the online prediction cost remains nearly identical to the baseline model due to the simple and efficient FNN-based model augmentations. Offline optimization converged within a few hours on a single GPU with the first peak model and within an hour without it. Future efforts might focus on reducing this offline burden further while preserving the same degree of accuracy.

Integral to our methodology is an in-depth procedure of evaluating the model’s behavior prior to augmentation. This diagnostic analysis is pivotal, as it grants insights into the limitations of the model. By first grasping these limitations, we gather information on how to design augmentations that retain physical interpretability, while simultaneously fostering flexibility and removing the necessity for manual crafting and fine tuning of model structures. This practice holds immense significance for machine-learning-driven dynamical system understanding and for broader applications in physics-based domains. It facilitates augmentation strategies that preserve interpretability, thereby opening avenues for model improvements.

Future research should focus on extending this methodology to more complex or transient/chaotic dynamic systems where interpretability could become a trade-off with predictive power. In addition, future studies might investigate how well the proposed framework generalizes to additional unseen conditions and environments, ensuring its robustness across varied applications.

We acknowledge that other problem settings may introduce different intricacies; nevertheless, the systematic process outlined in this study can guide future applications in different problems. This study offers a variety of insights, equipping researchers with tools to assess and elevate the forms of model representations for diverse dynamical systems. Ultimately, our work fosters a bridge between machine learning and physics, improving the modeling process while revealing the underlying system dynamics, and setting the stage for future exploration into even more efficient and adaptable approaches.

Data availability statement

All data in this study are available only through approval by Ford Motor Company. The code for our experiments is available at https://github.com/christian-jacobsen/NormalizingFlowsInference.

Author contribution

Conceptualization: K.D.; W.L.; M.A.; C.J. Methodology: C.J; M.K. Data curation: W.L.;M.K. Data visualisation: C.J. Writing original draft: C.J. All authors approved the final submitted draft.

Competing interest

None

Funding statement

This work was supported at the University of Michigan by Ford Motor Company under the grant “Hybrid Physics-Machine Learning Models for Electro-deposition.”

Ethical standard

The research meets all ethical guidelines, including adherence to the legal requirements of the study country.

A. Voltage ramp

For the electric field, we solve a Poisson equation with Robin boundary condition on the interface bath/film and Dirichlet condition at the anode.

(37)

$$ \hskip11em {\sigma}_{\mathrm{bath}}\frac{\partial^2\phi }{\partial {x}^2}=0\hskip37.5em \mathrm{in}\ \mathrm{the}\ \mathrm{bath} $$

(38)

$$ \hskip6em \phi -{R}_{\mathrm{film}}{\sigma}_{\mathrm{bath}}\frac{\partial \phi }{\partial x} = 0\hskip28em \mathrm{at}\ \mathrm{the}\ \mathrm{interface}\ \mathrm{film}\hbox{-} \mathrm{bath} $$

(39)

$$ \hskip11em {\phi}_{\mathrm{anode}}(t)={\phi}_{t=0}+{\phi}_{\mathrm{ramp}}(t)\hskip27.2em \mathrm{at}\hskip0.5em \mathrm{the}\ \mathrm{anode} $$

where $ {\sigma}_{\mathrm{bath}} $ is the bath conductivity, $ j $ is the normal component of the current density, $ \phi $ is the electrical potential, $ {R}_{\mathrm{film}} $ is the film resistance, and $ h $ is the film thickness.

B. Constant current

For the electric field, we solve a Poisson equation with Robin boundary condition on the interface bath/film and Neumann condition at the anode.

(40)

$$ \hskip11em {\sigma}_{\mathrm{bath}}\frac{\partial^2\phi }{\partial {x}^2}=0\hskip37.5em \ \mathrm{in}\ \mathrm{the}\ \mathrm{bath} $$

(41)

$$ \hskip6em \phi -{R}_{\mathrm{film}}{\sigma}_{\mathrm{bath}}\frac{\partial \phi }{\partial x}=0\hskip28.7em \mathrm{at}\ \mathrm{the}\ \mathrm{interface}\ \mathrm{film}\hbox{-} \mathrm{bath} $$

(42)

$$ \hskip11.5em {\sigma}_{\mathrm{bath}}\frac{\partial \phi }{\partial x}={j}_0\hskip36.7em \mathrm{at}\ \mathrm{the}\ \mathrm{anode} $$

C. Evolution of the concentration of OH $ {}^{-} $

C.1. Solution of the diffusion equation

We consider the diffusion equation in the domain [0;L]

(43)

$$ \frac{\partial u}{\partial t}=D\frac{\partial^2u}{\partial x} $$

with the following boundary conditions:

(44)

$$ {\left.\frac{\partial u}{\partial x}\right|}_{x=0}=-\frac{j(t)}{DF}=g(t)\hskip1em \mathrm{and}\hskip1em u\left(x=L,t\right)=h(t) $$

where u is the OH $ {}^{-} $ concentration, $ F $ is the Faraday constant, $ D $ is the diffusion coefficient, $ h $ , and $ g $ are time-dependent functions. The initial condition is defined by $ u\left(x,0\right)={u}_0 $ , $ {u}_0 $ being the initial concentration.

Applying the Laplace transform

(45)

$$ \mathcal{L}\left\{\frac{\partial u}{\partial t}-D\frac{\partial^2u}{\partial x}\right\}=\mathcal{L}\left\{\frac{\partial u}{\partial t}\right\}-D\left\{\frac{\partial^2u}{\partial x}\right\}=s\hskip0.1em \overline{u}-{u}_0-D\hskip0.1em {\overline{u}}_{xx}=0 $$

The solution of the homogeneous equation $ s\hskip0.1em \overline{u}-D\hskip0.1em {\overline{u}}_{xx}=0 $ is given by

(46)

$$ \hskip0.1em {\overline{u}}_h\left(x,s\right)={C}_1\exp \left({r}_1x\right)+{C}_2\exp \left({r}_2x\right) $$

where $ {r}_1=\sqrt{s/D} $ and $ {r}_2=-\sqrt{s/D} $

The specific solution, assuming a constant $ \hskip0.1em {\overline{u}}_{sp.}=A $ , is given by $ \hskip0.1em {\overline{u}}_{sp.}={u}_0/s $ . Therefore, the solution has the following form

(47)

$$ \hskip0.1em \overline{u}\hskip0.1em \left(x,s\right)={C}_1\exp \left({r}_1x\right)+{C}_2\exp \left({r}_2x\right)+\frac{u_0}{s} $$

where $ {C}_1 $ and $ {C}_2 $ are functions derived using the boundary and initial conditions.

Using the Neumann boundary condition at x = 0 and the definition $ \mathcal{L}\left\{g(t)\right\}=G(s) $ , then

(48)

$$ \hskip0.1em {\overline{u}}_x\left(x=0,s\right)=G(s)={C}_1{r}_1+{C}_2{r}_2. $$

Therefore, $ {C}_1=\frac{G(s)-{C}_2{r}_2}{r_1}=\frac{G(s)}{r_1}+{C}_2 $ .

Hence,

(49)

$$ \hskip0.1em \overline{u}\hskip0.1em \left(x,s\right)=\left(\frac{G(s)}{r_1}+{C}_2\right)\exp \left({r}_1x\right)+{C}_2\exp \left({r}_2x\right)+\frac{u_0}{s} $$

Using the Dirichlet boundary condition at x = L:

(50)

$$ \hskip0.1em \overline{u}\hskip0.1em \left(x=L,s\right)=H(s)=\left(\frac{G(s)}{r_1}+{C}_2\right)\exp \left({r}_1L\right)+{C}_2\exp \left({r}_2L\right)+\frac{u_0}{s}, $$

the solution is given by

(51)

$$ \hskip0.1em \overline{u}\hskip0.1em \left(x,s\right)=\left[\frac{G(s)}{r_1}+\frac{H(s)-\frac{u_0}{s}-\frac{G(s)}{r_1}\exp \left({r}_1L\right)}{\exp \left({r}_1L\right)+\exp \left({r}_2L\right)}\right]\exp \left({r}_1x\right)+\left[\frac{H(s)-\frac{u_0}{s}-\frac{G(s)}{r_1}\exp \left({r}_1L\right)}{\exp \left({r}_1L\right)+\exp \left({r}_2L\right)}\right]\exp \left({r}_2x\right)+\frac{u_0}{s} $$

If $ L\to \infty $ and $ h(t)={u}_0 $ (i.e. $ H(s)={u}_0/s $ ), the solution simplifies to

(52)

$$ \hskip0.1em \overline{u}\hskip0.1em \left(x,s\right)=\left[\frac{G(s)}{r_1}-\frac{G(s)}{r_1}\right]\exp \left({r}_1x\right)+\left[-\frac{G(s)}{r_1}\right]\exp \left({r}_2x\right)+\frac{u_0}{s} $$

(53)

$$ =-\frac{G(s)}{r_1}\exp \left({r}_2x\right)+\frac{u_0}{s} $$

Using the inverse Laplace transform yields

(54)

$$ {\mathcal{L}}^{-1}\left\{\hskip0.1em \overline{u}\hskip0.1em \left(x,s\right)\right\}={\mathcal{L}}^{-1}\left\{-G(s)\sqrt{\frac{D}{s}}\ \exp \left(-\sqrt{\frac{s}{D}}\ x\right)+\frac{u_0}{s}\right\} $$

(55)

$$ =-\sqrt{D}{\mathcal{L}}^{-1}\left\{G(s)\frac{1}{\sqrt{s}}\exp \left(-\frac{x}{\sqrt{D}}\sqrt{s}\right)\right\}+{u}_0. $$

Knowing that

(56)

$$ {\mathcal{L}}^{-1}\left\{\frac{1}{\sqrt{s}}\exp \left(-\frac{x}{\sqrt{D}}\sqrt{s}\right)\right\}=\frac{1}{\sqrt{\pi t}}\exp \left(-\frac{x^2}{4 Dt}\right) $$

and

(57)

$$ {\mathcal{L}}^{-1}\left\{G(s)K(s)\right\}={\int}_0^tg\left(\tau \right)k\left(t-\tau \right) d\tau, $$

the concentration as a function of space and time is given by

(58)

$$ u\left(x,t\right)={u}_0+\frac{1}{F\sqrt{\pi D}}{\int}_0^tj\left(\tau \right)\frac{1}{\sqrt{t-\tau }}\exp \left(-\frac{x^2}{4D\left(t-\tau \right)}\right) d\tau . $$

The concentration at the cathode (x = 0) is given by

(59)

$$ u\left(0,t\right)={u}_0+\frac{1}{F\sqrt{\pi D}}{\int}_0^tj\left(\tau \right)\frac{1}{\sqrt{t-\tau }} d\tau . $$

Let $ {u}_{\mathrm{min}} $ be the minimum concentration required to start deposition, and $ \xi $ be the time when the minimum concentration is reached. Therefore,

(60)

$$ {u}_{\mathrm{min}}={u}_0+\frac{1}{F\sqrt{\pi D}}{\int}_0^{\xi }j\left(\tau \right)\frac{1}{\sqrt{\xi -\tau }} d\tau . $$

Let $ K=\frac{1}{2}\left({u}_{\mathrm{min}}-{u}_0\right)F\sqrt{\pi D} $ be a constant characterizing the deposition onset.

C.1.1. Constant current density

For a constant current density $ j $ , the concentration is defined as

(61)

$$ u\left(0,t\right)={u}_0+\frac{2{j}_0}{F\sqrt{\pi D}}\sqrt{t} $$

and the electric charge is defined as

(62)

$$ Q(t)={\int}_0^t jdt={j}_0t. $$

At the deposition onset, this yields the well-known Sand’s equation (25):

(63)

$$ \frac{1}{2}\left({u}_{\mathrm{min}}-{u}_0\right)F\sqrt{\pi D}={j}_0\sqrt{\xi }. $$

The minimum charge can be derived accordingly:

(64)

$$ {Q}_{\mathrm{min}}={j}_0\xi =\frac{K^2}{j_0} $$

C.1.2. Linear voltage ramp

If the voltage ramp is a linear function of time. Before the deposition, due to the Ohm’s law, the current density is a linear function of time $ j=\beta t+{j}_0 $ , and the concentration is expressed as

(65)

$$ u\left(0,t\right)={u}_0+\frac{1}{F\sqrt{\pi D}}{\int}_0^tj\left(\tau \right)\frac{1}{\sqrt{t-\tau }} d\tau $$

(66)

$$ ={u}_0+\frac{1}{F\sqrt{\pi D}}\left[2{j}_0\sqrt{t}+2{\int}_0^t\frac{\partial j}{\partial \tau}\sqrt{t-\tau } d\tau \right] $$

(67)

$$ ={u}_0+\frac{1}{F\sqrt{\pi D}}\left[2{j}_0\sqrt{t}+\frac{4}{3}\beta {t}^{3/2}\right]. $$

At the deposition onset, the induction time is the solution of the following equation

(68)

$$ {\beta \xi}^{3/2}+\frac{3}{2}{j}_0{\xi}^{1/2}=\frac{3}{2}K. $$

Assuming that the voltage at the anode is initially 0, we can derive the following expression for the induction time

(69)

$$ \xi ={\beta}^{-2/3}{\left[\frac{3}{2}K\right]}^{2/3} $$

Finally, the minimum electric charge for the linear voltage ramp is

(70)

$$ {Q}_{\mathrm{min}}={\left(\frac{81}{128\beta}\right)}^{1/3}{K}^{4/3}. $$

D. Baseline/Inference-informed model comparisons

We include here in Figures 14–15, all comparisons between the baseline and inference-informed models. This includes current and resistance comparisons for all six experiments. The inference-informed model performs better than the baseline model in all experiments, but the notably greater improvement on predictions for CC experiments. However, thickness prediction is still inaccurate in many cases.

Figure 14. Comparisons between current prediction on the baseline model and inference-informed model at the MAP for each.

Figure 15. Comparisons between resistance prediction on the baseline model and inference-informed model at the MAP for each.

E. ML-augmented model with first peak

We present all prediction results on the ML-augmented model with first peak. Noteably, predictions are poor for VR experiments in the low $ {V}_R $ regime but accurate for all other experimental configurations. In particular, thickness prediction is much more accurate compared to the baseline and inference-informed models. Predictions for current and resistance are included in Figures 16–17, but the ML model is trained with current data for the three voltage ramp experiments only.

Figure 16. ML-augmented model with first peak current predictions compared to experimental data.

Figure 17. ML-augmented model with first peak resistance predictions compared to experimental data.

F. ML-augmented model without first peak

We present all prediction results on the ML-augmented model without the first peak model included. Removing the first peak model greatly improves model efficiency during prediction time (and training time). Predictions are still poor for VR experiments in the low $ {V}_R $ regime but improved over the baseline and inference-informed models. Again, thickness predictions are more accurate compared to previous models. Predictions for current, resistance, and thickness are included for all six experimental configurations in Figures 16–19, but the ML model is trained with current data for the 3 voltage ramp experiments only.

Figure 18. ML-augmented model without first peak current predictions compared to experimental data.

Figure 19. ML-augmented model without first peak resistance predictions compared to experimental data.

Footnotes

This research article was awarded Open Materials badge for transparent practices. See the Data Availability Statement for details.

References

Ahmed, SE, Pawar, S, San, O, Rasheed, A, Iliescu, T and Noack, BR (2021) On closures for reduced order models—A spectrum of first-principle to machine-learned avenues. Physics of Fluids 33 (9), 091301.CrossRef Google Scholar

Beck, F (1976) Fundamental aspects of electrodeposition of paint. Progress in Organic Coatings 4 (1), 1–60.CrossRef Google Scholar

Blei, DM, Kucukelbir, A and McAuliffe, JD (2017) Variational inference: A review for statisticians. Journal of the American Statistical Association 112 (518), 859–877CrossRef Google Scholar

Böttcher, L and Asikis, T (2022) Near-optimal control of dynamical systems with neural ordinary differential equations. Machine Learning: Science and Technology 3 (4), 045004.Google Scholar

Boyd, DW and Zwack, RR (1996) Predicting electrocoat efficiency: A simple data-based model for predicting electrocoat usage. Progress in Organic Coatings 27 (1), 25–32 Proceedings of the 20th International Conference in Organic Coatings Science and Technology.CrossRef Google Scholar

Brunton, SL, Proctor, JL and Kutz, JN (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 113 (15), 3932–3937.CrossRef Google Scholar PubMed

Chen, R. T. Q., Amos, B., and Nickel, M. (2021). Learning neural event functions for ordinary differential equations. In International Conference on Learning Representations.Google Scholar

Chen, RTQ, Rubanova, Y, Bettencourt, J and Duvenaud, DK (2018) Neural ordinary differential equations. In Bengio, S, Wallach, H, Larochelle, H, Grauman, K, Cesa-Bianchi, N and Garnett, R (eds), Advances in Neural Information Processing Systems, Vol. 31. Curran Associates, Inc.Google Scholar

Cheng, S, Quilodrán-Casas, C, Ouala, S, Farchi, A, Liu, C, Tandeo, P, Fablet, R, Lucor, D, Iooss, B, Brajard, J, Xiao, D, Janjic, T, Ding, W, Guo, Y, Carrassi, A, Bocquet, M and Arcucci, R (2023) Machine learning with data assimilation and uncertainty quantification for dynamical systems: A review. IEEE/CAA Journal of Automatica Sinica 10 (6), 1361–1387.CrossRef Google Scholar

Cover, TM and Thomas, JA (2006) Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). USA: Wiley-Interscience. 18–30.Google Scholar

Dickinson, EJ and Wain, AJ (2020) The butler-volmer equation in electrochemical theory: Origins, value, and practical application. Journal of Electroanalytical Chemistry 872, 114145 Dr. Richard Compton 65th birthday Special issue.CrossRef Google Scholar

Duraisamy, K. (2021). Variational encoders and autoencoders : Information-theoretic inference and closed-form solutions.Google Scholar

Ellwood, K, Tardiff, JL, Gray, LJ, Gaffney, P, Braslaw, J, Moldekleiv, K and Halvorsen, A (2009) Development of a full vehicle electrocoat paint simulation tool. SAE International Journal of Materials and Manufacturing 2 (1), 234–240.CrossRef Google Scholar

Guo, F., Wang, X., Fan, K., Broderick, T., and Dunson, D. B. (2017). Boosting variational inference.Google Scholar

Hasegawa, K, Fukami, K, Murata, T and Fukagata, K (2020) Machine-learning-based reduced-order modeling for unsteady flows around bluff bodies of various shapes. Theoretical and Computational Fluid Dynamics 34 (4), 367–383.CrossRef Google Scholar

Hoffman, MD, Blei, DM, Wang, C and Paisley, J (2013) Stochastic variational inference. Journal of Machine Learning Research 14 (40), 1303–1347.Google Scholar

Linot, AJ, Burby, JW, Tang, Q, Balaprakash, P, Graham, MD and Maulik, R (2023) Stabilized neural ordinary differential equations for long-time forecasting of dynamical systems. Journal of Computational Physics 474, 111838.CrossRef Google Scholar

Liu, W and Kevin Ellwood, FMC (2017) Simulation of the Full Vehicle Electrodeposition for Paint Virtual Manufacturing. SAE World Congress.Google Scholar

Marlar, T, Trainor, A, Liu, W, Ellwood, K, Okerberg, B, Padash, F and Harb, JN (2020) Effect of convection on initial deposition during electrocoating of galvanized steel. Journal of the Electrochemical Society 167 (10), 103502.CrossRef Google Scholar

Metropolis, N, Rosenbluth, AW, Rosenbluth, MN, Teller, AH and Teller, E (1953) Equation of state calculations by fast computing machines. The Journal of Chemical Physics 21 (6), 1087–1092.CrossRef Google Scholar

Mišković-Stanković, VB (2002) The mechanism of cathodic electrodeposition of epoxy coatings and the corrosion behaviour of the electrodeposited coatings. Journal of the Serbian Chemical Society 67 (5), 305–324.CrossRef Google Scholar

Nagai, H, Onishi, Y and Amaya, K (2012) Exact paint resistance and deposition models for accurate numerical electrodeposition simulation. Transactions of the Japan Society of Mechanical Engineers Series A 78 (794), 1446–1461.Google Scholar

Nelder, JA and Mead, R (1965) A simplex method for function minimization. The Computer Journal 7 (4), 308–313.CrossRef Google Scholar

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, 32 pages 8024–8035. Curran Associates, Inc.Google Scholar

Pawar, S, Rahman, SM, Vaddireddy, H, San, O, Rasheed, A and Vedula, P (2019) A deep learning enabler for nonintrusive reduced order modeling of fluid flows. Physics of Fluids 31 (8), 085101.CrossRef Google Scholar

Pierce, PE (1981) The physical chemistry of the cathodic electrodeposition process. Journal of Coatings Technology 53 (672), 52–67.Google Scholar

Rastegar, S, Ranjbar, Z and Fakhrpour, G (2008) A generalized model for the film growth during the constant voltage electro-deposition of resin dispersions. Colloids and Surfaces A: Physicochemical and Engineering Aspects 317 (1), 17–22.CrossRef Google Scholar

Rezende, D and Mohamed, S (2015) Variational inference with normalizing flows. In Bach, F and Blei, D (eds), Proceedings of the 32nd International Conference on Machine Learning, Volume 37 of Proceedings of Machine Learning Research. Lille, France: PMLR, pp. 1530–1538.Google Scholar

Robert, CP and Casella, G (2005) Monte Carlo Statistical Methods (Springer Texts in Statistics). Berlin, Heidelberg: Springer-Verlag. 79–318.Google Scholar

Robert, C and Casella, G (2011) A short history of markov chain Monte Carlo: Subjective recollections from incomplete data. Statistical Science 26 (1).CrossRef Google Scholar

Rudy, SH, Brunton, SL, Proctor, JL and Kutz, JN (2017) Data-driven discovery of partial differential equations. Science Advances 3 (4), e1602614.CrossRef Google Scholar PubMed

San, O and Maulik, R (2018) Neural network closures for nonlinear model order reduction. Advances in Computational Mathematics 44 (6), 1717–1750.CrossRef Google Scholar

Sand, HJS (1899) III. On the concentration at the electrodes in a solution, with special reference to the liberation of hydrogen by electrolysis of a mixture of copper sulphate and sulphuric acid. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 1 (1), 45–79.Google Scholar

Tran, M.-N., Nguyen, T.-N., and Dao, V.-H. (2021). A practical tutorial on variational bayes.Google Scholar

Figure 1. Initial setup for the 1D case.

Table 1. Summary of baseline model parameters

Figure 2. Experimental setup.

Table 2. Descriptions of the experimental data for each of the six experimental conditions

Figure 3. Visualization of the $ {\left\{j\right\}}_1 $ experimental data (all 13 trials) for configuration VR = 1.0. Each trial ends at a different time, and data are sampled at a rate of 10 Hz.

Figure 4. Negative log-likelihoods computed from simulated data on a VR experiment using the baseline model with experimental conditions $ {V}_R=0.125 $, $ \sigma =0.14 $, $ -\log {C}_v=8.5 $, $ {Q}_{\mathrm{min}}=100.0 $, and $ {j}_{\mathrm{min}}=1.5 $.

Figure 7. Negative log-likelihoods computed from simulated data on a VR experiment using the inference-informed model with experimental conditions $ {V}_R=0.125 $, $ \sigma =0.14 $, $ -\log {C}_v=8.5 $, $ {Q}_{\mathrm{min}}=100.0 $ ($ K=23.2 $), and $ {j}_{\mathrm{min}}=1.5 $.