Impact Statement
This article presents a modern solution to a critical problem in wildlife ecology using scientific machine learning (ML). We developed an ML-guided partial differential equation (PDE) method to learn pathogen dynamics. Theoretical guarantees are derived from PDEs and ML approximation theory. This method is applied to white-nose syndrome, an infectious fungal disease caused by Pseudogymnoascus destructans, which is responsible for significant declines in several bat species in North America. The presented results are relevant for disease surveillance and conservation efforts.
1. Introduction
Different anthropogenic activities are linked to an increased frequency of the emergence of wildlife infectious diseases (Daszak et al., Reference Daszak, Cunningham and Hyatt2000). These diseases can spread rapidly and usually affect hosts that lack adequate response mechanisms, contributing to the decline and extinction of important species (Woodroffe, Reference Woodroffe1999; Daszak et al., Reference Daszak, Cunningham and Hyatt2000; McCallum, Reference McCallum2012; García-March et al., Reference García-March, Tena, Henandis, Vázquez-Luis, López, Téllez, Prado, Navas, Bernal, Catanese and Grau2020; Plewnia et al., Reference Plewnia, Böning and Lötters2023). Moreover, emerging pathogens exhibit a variety of spatial spread and growth patterns due to landscape heterogeneity and geographic host distribution (Meentemeyer et al., Reference Meentemeyer, Cunniffe, Cook, Filipe, Hunter, Rizzo and Gilligan2011; Lilley et al., Reference Lilley, Anttila and Ruokolainen2018). There is a continued need to forecast prevalence and unravel complex spatial disease dynamics (Hefley et al., Reference Hefley, Hooten, Russell, Walsh and Powell2017b). Mathematical ecological models can mechanistically describe disease spread and growth mediated by landscape characteristics (Cosner, Reference Cosner2008). Machine learning (ML) models provide a flexible approach to approximating functions learning from data (Hastie et al., Reference Hastie, Tibshirani, Friedman and Friedman2009). The integration of scientific concepts from mathematical ecology with the flexibility of ML has the potential to accurately forecast disease prevalence while learning the complex spatial dynamics of disease.
The utilization of scientific knowledge for enhancing the generalization properties of ML has gained relevance in the last years. This paradigm has been presented with different names, such as theory-guided data science (Karpatne et al., Reference Karpatne, Atluri, Faghmous, Steinbach, Banerjee, Ganguly, Shekhar, Samatova and Kumar2017) or scientific ML (Dandekar et al., Reference Dandekar, Rackauckas and Barbastathis2020). Hereafter, we adopt the term scientific machine learning (SciML). Recent research in SciML has shown the benefits of integrating non-linear ML algorithms with mechanistic models framed as differential equations (Baker et al., Reference Baker, Alexander, Bremer, Hagberg, Kevrekidis, Najm, Parashar, Patra, Sethian, Wild and Willcox2019; Dandekar et al., Reference Dandekar, Rackauckas and Barbastathis2020; Rackauckas et al., Reference Rackauckas, Ma, Martensen, Warner, Zubov, Supekar, Skinner, Ramadhan and Edelman2020). The incorporation of prior knowledge of the underlying problem’s structure, through differential equations, yields an interpretable model with robust prediction and forecasting (Dandekar et al., Reference Dandekar, Rackauckas and Barbastathis2020; Rackauckas et al., Reference Rackauckas, Ma, Martensen, Warner, Zubov, Supekar, Skinner, Ramadhan and Edelman2020). However, when using external covariates to inform predictions, the functional forms of covariate impact are usually unknown. The objective of this article is to present a novel method for the flexible incorporation of external covariates into differential equations using ML algorithms. Our approach has the potential to support ongoing research in wildlife disease modeling. For illustration, we apply our method to model white-nose syndrome (WNS) in bats.
Bats have a vital role in ecosystems as pollinators, pest controllers, and nutrient recyclers (Ramírez-Fráncel et al., Reference Ramírez-Fráncel, García-Herrera, Losada-Prado, Reinoso-Flórez, Sánchez-Hernández, Estrada-Villegas, Lim and Guevara2022). Since its emergence in 2006, WNS, an infectious fungal disease caused by Pseudogymnoascus destructans (Pd), has been responsible for a significant reduction of populations of several bat species across North America (Blehert et al., Reference Blehert, Hicks, Behr, Meteyer, Berlowski-Zier, Buckles, Coleman, Darling, Gargas, Niver and Okoniewski2009; Cheng et al., Reference Cheng, Reichard, Coleman, Weller, Thogmartin, Reichert, Bennett, Broders, Campbell, Etchison and Feller2021). Recent WNS modeling approaches use mechanistic models, based on reaction–diffusion partial differential equations (PDEs), known as ecological diffusion equations (EDEs), to incorporate prior knowledge in statistical models, which helps to understand the role of habitat variation in WNS spread (Oh et al., Reference Oh, Aravamuthan, Ma, Mandujano Reyes, Ballmann, Hefley, McGahan, Russell, Walsh and Zhu2023). EDEs describe pathogen spread mediated by heterogeneous landscapes, allowing spatial covariates to affect PDE coefficients (Garlick et al., Reference Garlick, Powell, Hooten and McFarlane2011, Reference Garlick, Powell, Hooten and MacFarlane2014; Hefley et al., Reference Hefley, Hooten, Hanks, Russell and Walsh2017a,b). These EDEs mechanistically characterize spatio-temporal processes, which have benefits in forecasting compared to more classical statistical approaches for modeling spatio-temporal processes (Hefley et al., Reference Hefley, Hooten, Hanks, Russell and Walsh2017a,b). Nevertheless, all attempts to include spatial variation in EDEs have been limited to additive linear effects, which may not capture the real dynamics of these ecological processes.
In this work, we propose a novel method that leverages prior physical knowledge of ecological systems through PDEs, with flexible representations of underlying dynamics using ML algorithms. By incorporating ML algorithms, we augment EDEs. To demonstrate our approach, we use neural networks (NNs) as universal function approximators, to learn underlying unknown dynamic patterns of spreading wildlife diseases, specifically applying it to Pd in the contiguous U.S. We remark that, while in this article we use NNs, our developed approach easily allows the use of other ML algorithms, such as k-nearest neighbors, regression trees, or random forests. Finally, we derive theoretical results on the approximation error of our method that apply to any supervised ML model embedded in our EDE system.
The rest of this article is organized as follows. In Section 2, we provide some background on EDEs and define notation essential to the presentation of our ML-guided diffusion model. Section 3 describes the case study that motivated this article and presents the modeling details of the proposed method. In Section 4 we summarize and interpret the results of the WNS application. Finally, in Section 5 we discuss the results and present a conclusion. The theoretical results derived in this work can be found in Appendix B. Appendix C provides pseudocode for the derived equations in the background section. Appendix D presents pseudocode for the main method of this article.
2. Background: ecological diffusion equations
This section will serve as background material for understanding the EDEs, which have been used in multiple previous studies for wildlife diseases (Garlick et al., Reference Garlick, Powell, Hooten and McFarlane2011; Hefley et al., Reference Hefley, Hooten, Russell, Walsh and Powell2017b; Oh et al., Reference Oh, Aravamuthan, Ma, Mandujano Reyes, Ballmann, Hefley, McGahan, Russell, Walsh and Zhu2023, Reference Oh, McGahan, Walsh and Zhu2024). We use such an equation to characterize the spread and growth of wildlife diseases over space and time, influenced by local conditions in varying landscapes. EDEs can merge two important epidemiological processes. First, we have a term that describes the spatial spread (diffusion) of the modeled pathogen density, which contains a spatially varying coefficient that is inversely related to aggregation (high diffusion means little pathogen aggregation, while low diffusion means high aggregation). Second, we have a growth component that characterizes population dynamics with a modulation coefficient (growth rate) that depends on local spatial information. Understanding the growth and spread of pathogens, particularly potential drivers of these processes, is critical for developing effective control strategies.
 Let 
 $ \mathcal{S}=\left\{\left(x,y\right)\in {\mathrm{\mathbb{R}}}^2|0\le x\le {L}_1,0\le y\le {L}_2\right\} $
 be a spatial domain of interest, and
$ \mathcal{S}=\left\{\left(x,y\right)\in {\mathrm{\mathbb{R}}}^2|0\le x\le {L}_1,0\le y\le {L}_2\right\} $
 be a spatial domain of interest, and 
 $ \mathcal{T}=\left(0,T\right]\subset {\mathrm{\mathbb{R}}}^{+} $
 be a temporal domain. We are interested in the reaction–diffusion PDE system:
$ \mathcal{T}=\left(0,T\right]\subset {\mathrm{\mathbb{R}}}^{+} $
 be a temporal domain. We are interested in the reaction–diffusion PDE system:
 $$ \frac{\partial }{\partial t}u\left(x,y,t\right)={\nabla}^2\left[\mu \left(x,y\right)u\left(x,y,t\right)\right]+\lambda \left(x,y\right)u\left(x,y,t\right), $$
$$ \frac{\partial }{\partial t}u\left(x,y,t\right)={\nabla}^2\left[\mu \left(x,y\right)u\left(x,y,t\right)\right]+\lambda \left(x,y\right)u\left(x,y,t\right), $$
 $$ \hskip-11em u\left(x,y,0\right)=\left\{\begin{array}{cc}\eta, & \mathrm{if}\left(x,y\right)=\omega \\ {}0,& \mathrm{if}\left(x,y\right)\ne \omega, \end{array}\right. $$
$$ \hskip-11em u\left(x,y,0\right)=\left\{\begin{array}{cc}\eta, & \mathrm{if}\left(x,y\right)=\omega \\ {}0,& \mathrm{if}\left(x,y\right)\ne \omega, \end{array}\right. $$
 $$ \hskip-21.4em u\left(\left(x,y\right)\in \partial \mathcal{S},t\right),=0\hskip1em \forall t>0, $$
$$ \hskip-21.4em u\left(\left(x,y\right)\in \partial \mathcal{S},t\right),=0\hskip1em \forall t>0, $$
where 
 $ u\left(x,y,t\right) $
 represents the density of a dispersing pathogen at location
$ u\left(x,y,t\right) $
 represents the density of a dispersing pathogen at location 
 $ \left(x,y\right)\in \mathcal{S} $
 and time
$ \left(x,y\right)\in \mathcal{S} $
 and time 
 $ t\in \mathcal{T} $
,
$ t\in \mathcal{T} $
, 
 $ \partial \mathcal{S} $
 represents the boundary of the spatial domain, and
$ \partial \mathcal{S} $
 represents the boundary of the spatial domain, and 
 $ \omega $
 represents the location of the initial introduction of the pathogen with initial density
$ \omega $
 represents the location of the initial introduction of the pathogen with initial density 
 $ \eta . $
 Spatial spread is captured by the Laplacian operator
$ \eta . $
 Spatial spread is captured by the Laplacian operator 
 $ {\nabla}^2=\frac{\partial^2}{\partial {x}^2}+\frac{\partial^2}{\partial {y}^2} $
. The function
$ {\nabla}^2=\frac{\partial^2}{\partial {x}^2}+\frac{\partial^2}{\partial {y}^2} $
. The function 
 $ \mu \left(x,y\right) $
 is the spatially dependent diffusion coefficient such that
$ \mu \left(x,y\right) $
 is the spatially dependent diffusion coefficient such that 
 $ 0<\mu <\infty $
, and the function
$ 0<\mu <\infty $
, and the function 
 $ \lambda \left(x,y\right) $
 is a spatially varying growth rate. In many cases, the spatial variation of both
$ \lambda \left(x,y\right) $
 is a spatially varying growth rate. In many cases, the spatial variation of both 
 $ \mu $
 and
$ \mu $
 and 
 $ \lambda $
 is at a much finer scale than the scale of variation for the population,
$ \lambda $
 is at a much finer scale than the scale of variation for the population, 
 $ u $
. Particularly, in the case of wildlife diseases, the population scale is hundreds of kilometers, and the scale of habitat variation,
$ u $
. Particularly, in the case of wildlife diseases, the population scale is hundreds of kilometers, and the scale of habitat variation, 
 $ \mu $
 and
$ \mu $
 and 
 $ \lambda $
, is meters. This mismatch creates a computational difficulty resolved by homogenization.
$ \lambda $
, is meters. This mismatch creates a computational difficulty resolved by homogenization.
2.1. Homogenization and analytic approximation
In this subsection, we describe the process of homogenization and its application to the EDEs in equations (2.1)–(2.3). Then, we provide an approximate analytical solution that offers an advantage in computation time and memory requirements compared to numerical solvers (Oh et al., Reference Oh, McGahan, Walsh and Zhu2024).
Homogenization is an asymptotic method (Holmes, Reference Holmes2012) that leverages the two different scales of interest to simplify the EDE system, (2.1)–(2.3). This multi-scale method frames the PDE on the large scale by locally averaging the fine-scale variation, producing locally constant coefficients in a PDE system with known solutions. After solving this homogenized PDE system, the solution is adjusted by the fine-scale variation.
Following known homogenization procedures (Garlick et al., Reference Garlick, Powell, Hooten and McFarlane2011, Reference Garlick, Powell, Hooten and MacFarlane2014), we derive a homogenized EDE system:
 $$ \frac{\partial }{\partial t}c\left(x,y,t\right)=\overline{\mu}\left(x,y\right){\nabla}^2c\left(x,y,t\right)+\overline{\lambda}\left(x,y\right)c\left(x,y,t\right), $$
$$ \frac{\partial }{\partial t}c\left(x,y,t\right)=\overline{\mu}\left(x,y\right){\nabla}^2c\left(x,y,t\right)+\overline{\lambda}\left(x,y\right)c\left(x,y,t\right), $$
 $$ c\left(x,y,0\right)=\left\{\begin{array}{cc}\eta \bar{\mu}\left(x,y\right),& \mathrm{if}\left(x,y\right)=\omega, \\ {}\hskip-3.5em 0,& \mathrm{if}\left(x,y\right)\ne \omega, \end{array}\right. $$
$$ c\left(x,y,0\right)=\left\{\begin{array}{cc}\eta \bar{\mu}\left(x,y\right),& \mathrm{if}\left(x,y\right)=\omega, \\ {}\hskip-3.5em 0,& \mathrm{if}\left(x,y\right)\ne \omega, \end{array}\right. $$
 $$ \hskip-10em c\left(\left(x,y\right)\in \partial \mathcal{S},t\right)=0,\hskip1em \forall t>0, $$
$$ \hskip-10em c\left(\left(x,y\right)\in \partial \mathcal{S},t\right)=0,\hskip1em \forall t>0, $$
where 
 $ c\left(x,y,t\right) $
 is the homogenized approximated solution, related to the original PDE solution,
$ c\left(x,y,t\right) $
 is the homogenized approximated solution, related to the original PDE solution, 
 $ u\left(x,y,t\right) $
, by
$ u\left(x,y,t\right) $
, by
 $$ u\left(x,y,t\right)\approx \frac{c\left(x,y,t\right)}{\mu \left(x,y\right)}. $$
$$ u\left(x,y,t\right)\approx \frac{c\left(x,y,t\right)}{\mu \left(x,y\right)}. $$
The new homogenized coefficients are locally constant averages taken over the large scale. The homogenized diffusion coefficient, 
 $ \overline{\mu}\left(x,y\right) $
, is defined as
$ \overline{\mu}\left(x,y\right) $
, is defined as
 $$ \overline{\mu}\left(x,y\right)=\frac{\int_{\mathcal{A}}1 dxdy}{\int_{\mathcal{A}}\frac{1}{\mu \left(x,y\right)} dxdy}, $$
$$ \overline{\mu}\left(x,y\right)=\frac{\int_{\mathcal{A}}1 dxdy}{\int_{\mathcal{A}}\frac{1}{\mu \left(x,y\right)} dxdy}, $$
and the homogenized growth rate, 
 $ \overline{\lambda}\left(x,y\right) $
, is
$ \overline{\lambda}\left(x,y\right) $
, is
 $$ \overline{\lambda}\left(x,y\right)=\frac{\overline{\mu}\left(x,y\right)}{\mid \mathcal{A}\mid }{\int}_{\mathcal{A}}\frac{\lambda \left(x,y\right)}{\mu \left(x,y\right)} dxdy. $$
$$ \overline{\lambda}\left(x,y\right)=\frac{\overline{\mu}\left(x,y\right)}{\mid \mathcal{A}\mid }{\int}_{\mathcal{A}}\frac{\lambda \left(x,y\right)}{\mu \left(x,y\right)} dxdy. $$
The homogenization region 
 $ \mathcal{A}\subset {\mathrm{\mathbb{R}}}^2 $
, with area
$ \mathcal{A}\subset {\mathrm{\mathbb{R}}}^2 $
, with area 
 $ \mid \mathcal{A}\mid $
, over which we calculate the homogenized parameters, is defined at an intermediate scale between the large and the small scale. Using separation of variables and the Fourier series expansion, Oh et al. (Reference Oh, McGahan, Walsh and Zhu2024) derive the following approximate solution to the system (2.4)–(2.6). Then, for sufficiently large positive integers
$ \mid \mathcal{A}\mid $
, over which we calculate the homogenized parameters, is defined at an intermediate scale between the large and the small scale. Using separation of variables and the Fourier series expansion, Oh et al. (Reference Oh, McGahan, Walsh and Zhu2024) derive the following approximate solution to the system (2.4)–(2.6). Then, for sufficiently large positive integers 
 $ M $
 and
$ M $
 and 
 $ N $
, a truncated series solution to the homogenized system is
$ N $
, a truncated series solution to the homogenized system is
 $$ {\tilde{c}}_{M,N}\left(x,y,t\right)=\sum \limits_{m=1}^M\sum \limits_{n=1}^N{c}_{m,n}\left(x,y,t\right), $$
$$ {\tilde{c}}_{M,N}\left(x,y,t\right)=\sum \limits_{m=1}^M\sum \limits_{n=1}^N{c}_{m,n}\left(x,y,t\right), $$
where
 $$ {c}_{m,n}\left(x,y,t\right)=\frac{4\eta \overline{\mu}\left(\omega \right)}{L_1{L}_2}\sin \left(\tilde{m}{\omega}_1\right)\sin \left(\tilde{n}{\omega}_2\right)\sin \left(\tilde{m}x\right)\sin \left(\tilde{n}y\right)\exp \left(\overline{\lambda}\left(x,y\right)t-\overline{\mu}\left(x,y\right)t\left({\tilde{m}}^2+{\tilde{n}}^2\right)\right), $$
$$ {c}_{m,n}\left(x,y,t\right)=\frac{4\eta \overline{\mu}\left(\omega \right)}{L_1{L}_2}\sin \left(\tilde{m}{\omega}_1\right)\sin \left(\tilde{n}{\omega}_2\right)\sin \left(\tilde{m}x\right)\sin \left(\tilde{n}y\right)\exp \left(\overline{\lambda}\left(x,y\right)t-\overline{\mu}\left(x,y\right)t\left({\tilde{m}}^2+{\tilde{n}}^2\right)\right), $$
with 
 $ \tilde{m}= m\pi /{L}_1 $
,
$ \tilde{m}= m\pi /{L}_1 $
, 
 $ \tilde{n}= n\pi /{L}_2 $
, and
$ \tilde{n}= n\pi /{L}_2 $
, and 
 $ \omega =\left({\omega}_1,{\omega}_2\right) $
. Finally, the analytical approximated solution to
$ \omega =\left({\omega}_1,{\omega}_2\right) $
. Finally, the analytical approximated solution to 
 $ u\left(x,y,t\right) $
, the solution of the EDE system (2.1)–(2.3), is
$ u\left(x,y,t\right) $
, the solution of the EDE system (2.1)–(2.3), is
 $$ {\tilde{u}}_{M,N}\left(x,y,t\right)=\frac{{\tilde{c}}_{M,N}\left(x,y,t\right)}{\mu \left(x,y\right)}. $$
$$ {\tilde{u}}_{M,N}\left(x,y,t\right)=\frac{{\tilde{c}}_{M,N}\left(x,y,t\right)}{\mu \left(x,y\right)}. $$
Refer to Oh et al. (Reference Oh, McGahan, Walsh and Zhu2024) for a comparison of the analytical approximate solution with the commonly used Forward Time Centered Space solver method (finite differences), showing the computational benefits provided by the analytical approximate solution. Appendix C presents pseudocode for this section.
3. Methodology
In this section of this article, we will present the WNS surveillance data. Then, we introduce the ML-guided PDE method for ecological diffusion, which is the main contribution of the presented work. Finally, we specify the model and describe its fitting process.
3.1. White-nose syndrome data
The dataset used in this article has geo-referenced samples collected across the contiguous U.S. by state and federal agencies and tested for the presence of Pd at the U.S. Geological Survey National Wildlife Health Center (Madison, WI) between April 2007 and April 2020 (Ballmann et al., Reference Ballmann, Grider and Russell2021). Samples were taken from bat carcasses, bat wing punch biopsies, forearm or muzzle skin, wing swabs, guano, roost substrate swabs, cave (wall/ceiling) swabs, and soil, all of them tested for Pd infection using a quantitative PCR (Muller et al., Reference Muller, Lorch, Lindner, O’Connor, Gargas and Blehert2013). We use bat samples and environmental samples, as Pd is known to persist in the environment (Lorch et al., Reference Lorch, Lindner, Gargas, Muller, Minnis and Blehert2013), to model the presence or absence of Pd rather than the disease (WNS) that this pathogen causes.
 Two outbreaks of the disease have been reported: one in New York in 2006 and another in Washington in 2016. However, we only model the data from the New York outbreak by excluding all positive samples in Washington state. We have the spatial coordinates (latitude and longitude) of 18,515 negative and 1,557 positive observations (Figure 1). Additionally, we consider five spatial covariates that are believed to influence Pd dynamics (Oh et al., Reference Oh, Aravamuthan, Ma, Mandujano Reyes, Ballmann, Hefley, McGahan, Russell, Walsh and Zhu2023): percentage of tree canopy cover (canopy), linear hydrography including streams/rivers, braided streams, canals, ditches, artificial paths, and aqueducts (waterways), topographic ruggedness index which is related to the magnitude of elevation (TRI), number of coal mines per 
 $ 10 $
 km
$ 10 $
 km 
 $ \times 10 $
 km grid cell (mines), and percentage of karst geomorphology (karst). See Additional Figure A1 in Appendix A. We transformed these covariates to lie in the interval
$ \times 10 $
 km grid cell (mines), and percentage of karst geomorphology (karst). See Additional Figure A1 in Appendix A. We transformed these covariates to lie in the interval 
 $ \left[0,1\right] $
. The covariates were selected following recommendations from bat biologists on the Strategic Pd Surveillance Advisory Team and have been used in previous modeling efforts (Oh et al., Reference Oh, Aravamuthan, Ma, Mandujano Reyes, Ballmann, Hefley, McGahan, Russell, Walsh and Zhu2023).
$ \left[0,1\right] $
. The covariates were selected following recommendations from bat biologists on the Strategic Pd Surveillance Advisory Team and have been used in previous modeling efforts (Oh et al., Reference Oh, Aravamuthan, Ma, Mandujano Reyes, Ballmann, Hefley, McGahan, Russell, Walsh and Zhu2023).

Figure 1. White-nose syndrome (WNS) is an infectious fungal disease in bats caused by Pseudogymnoascus destructans. Presented here are the geographic locations where WNS samples were collected by the USGS WNS surveillance team between 2006 and 2016. There were 1,557 positive tests, represented by the colored dots, with lighter colors corresponding to more recently taken samples and darker colors corresponding to earlier samples. There were 18,515 negative samples, represented with small black dots.
3.2. Machine learning-guided partial differential equations
 Using the notation from Section 2, the analytical approximated solution 
 $ {\tilde{u}}_{M,N}\left(x,y,t\right) $
 in (2.11) to the PDE system (2.1)–(2.3) depends on the two functions
$ {\tilde{u}}_{M,N}\left(x,y,t\right) $
 in (2.11) to the PDE system (2.1)–(2.3) depends on the two functions 
 $ \mu \left(x,y\right) $
 and
$ \mu \left(x,y\right) $
 and 
 $ \lambda \left(x,y\right) $
, which are the unknown spatial coefficients. Recent research has shown that NNs can be used as function approximators to learn components of differential equation systems (Dandekar et al., Reference Dandekar, Rackauckas and Barbastathis2020; Rackauckas et al., Reference Rackauckas, Ma, Martensen, Warner, Zubov, Supekar, Skinner, Ramadhan and Edelman2020). Following this principle, we propose the use of ML algorithms to learn unknown dynamic patterns related to the diffusion (
$ \lambda \left(x,y\right) $
, which are the unknown spatial coefficients. Recent research has shown that NNs can be used as function approximators to learn components of differential equation systems (Dandekar et al., Reference Dandekar, Rackauckas and Barbastathis2020; Rackauckas et al., Reference Rackauckas, Ma, Martensen, Warner, Zubov, Supekar, Skinner, Ramadhan and Edelman2020). Following this principle, we propose the use of ML algorithms to learn unknown dynamic patterns related to the diffusion (
 $ \mu $
) and growth (
$ \mu $
) and growth (
 $ \lambda $
) in our system. Let
$ \lambda $
) in our system. Let 
 $ {NN}_{\theta_1}\left(z\left(x,y\right)\right) $
 and
$ {NN}_{\theta_1}\left(z\left(x,y\right)\right) $
 and 
 $ {NN}_{\theta_2}\left(z\left(x,y\right)\right) $
 be two supervised ML algorithms with trainable parameters
$ {NN}_{\theta_2}\left(z\left(x,y\right)\right) $
 be two supervised ML algorithms with trainable parameters 
 $ {\theta}_1 $
 and
$ {\theta}_1 $
 and 
 $ {\theta}_2 $
 respectively, taking an input vector
$ {\theta}_2 $
 respectively, taking an input vector 
 $ z\left(x,y\right)\in {\mathrm{\mathbb{R}}}^p $
 of spatial covariates. We consider
$ z\left(x,y\right)\in {\mathrm{\mathbb{R}}}^p $
 of spatial covariates. We consider 
 $ {NN}_{\theta_1} $
 to approximate
$ {NN}_{\theta_1} $
 to approximate 
 $ \log \mu $
 (instead of
$ \log \mu $
 (instead of 
 $ \mu $
 directly to ensure positivity), and
$ \mu $
 directly to ensure positivity), and 
 $ {NN}_{\theta_2} $
 to approximate
$ {NN}_{\theta_2} $
 to approximate 
 $ \lambda $
. Additionally, considering that we have a set of observed values
$ \lambda $
. Additionally, considering that we have a set of observed values 
 $ {u}_{\mathrm{data}}={\left\{{x}_i,{y}_i,{t}_i,{u}_i\right\}}_i^n $
 that will be modeled using (2.1)–(2.3), let
$ {u}_{\mathrm{data}}={\left\{{x}_i,{y}_i,{t}_i,{u}_i\right\}}_i^n $
 that will be modeled using (2.1)–(2.3), let 
 $ \mathrm{\ell}\left({\tilde{u}}_{M,N},{u}_{\mathrm{data}}|\theta \right) $
 be a loss function depending on
$ \mathrm{\ell}\left({\tilde{u}}_{M,N},{u}_{\mathrm{data}}|\theta \right) $
 be a loss function depending on 
 $ \theta =\left({\theta}_1,{\theta}_2\right) $
 through
$ \theta =\left({\theta}_1,{\theta}_2\right) $
 through 
 $ {\tilde{u}}_{M,N} $
, that measures how well the approximated solution matches WNS surveillance data. We are interested in finding the set of parameters that minimize our loss function, i.e.,
$ {\tilde{u}}_{M,N} $
, that measures how well the approximated solution matches WNS surveillance data. We are interested in finding the set of parameters that minimize our loss function, i.e.,
 $$ \hat{\theta}=\arg \hskip0.3em \underset{\theta }{\min}\mathrm{\ell}\left({\tilde{u}}_{M,N},{u}_{\mathrm{data}}|\theta \right). $$
$$ \hat{\theta}=\arg \hskip0.3em \underset{\theta }{\min}\mathrm{\ell}\left({\tilde{u}}_{M,N},{u}_{\mathrm{data}}|\theta \right). $$
Hereafter, we assume that 
 $ {NN}_{\theta_1} $
 and
$ {NN}_{\theta_1} $
 and 
 $ {NN}_{\theta_2} $
 are two feedforward artificial NNs with trainable weights and biases
$ {NN}_{\theta_2} $
 are two feedforward artificial NNs with trainable weights and biases 
 $ {\theta}_1 $
 and
$ {\theta}_1 $
 and 
 $ {\theta}_2 $
, respectively. However, we remark that any two ML algorithms can be used, and they do not need to be the same. See Appendix B for the theoretical guarantees for the approximation error for
$ {\theta}_2 $
, respectively. However, we remark that any two ML algorithms can be used, and they do not need to be the same. See Appendix B for the theoretical guarantees for the approximation error for 
 $ u\left(x,y,t\right) $
 under the proposed procedure. We showed that the approximation error will be small as long as: (1) homogenization assumptions are satisfied; (2) the truncation bounds
$ u\left(x,y,t\right) $
 under the proposed procedure. We showed that the approximation error will be small as long as: (1) homogenization assumptions are satisfied; (2) the truncation bounds 
 $ M $
 and
$ M $
 and 
 $ N $
 on the approximate solution are sufficiently large; and (3) the ML algorithms are properly trained. Finally, note that the function
$ N $
 on the approximate solution are sufficiently large; and (3) the ML algorithms are properly trained. Finally, note that the function 
 $ \mathrm{\ell} $
 can be the mean squared error or the negative log-likelihood associated with an assumed distribution in our data (e.g., binomial distribution for binary observations), in which case we would need to perform a post-processing transformation on
$ \mathrm{\ell} $
 can be the mean squared error or the negative log-likelihood associated with an assumed distribution in our data (e.g., binomial distribution for binary observations), in which case we would need to perform a post-processing transformation on 
 $ {\tilde{u}}_{M,N} $
 using a link function.
$ {\tilde{u}}_{M,N} $
 using a link function.
3.3. Model specification and fitting
 Herein, we describe our statistical model for wildlife diseases, which will be applied to the WNS data. Note that, previously, we assumed data were direct observations of 
 $ u\left(x,y,t\right) $
 however, in the WNS data, we observe a function of
$ u\left(x,y,t\right) $
 however, in the WNS data, we observe a function of 
 $ u\left(x,y,t\right) $
, namely, the binary observations of tests for the presence of a pathogen. Thus, we connect the observed binary response to the solution of our PDE by the logit link function defined as
$ u\left(x,y,t\right) $
, namely, the binary observations of tests for the presence of a pathogen. Thus, we connect the observed binary response to the solution of our PDE by the logit link function defined as 
 $ {g}^{-1}(a)=\exp (a)/\left(1+\exp (a)\right) $
. Let
$ {g}^{-1}(a)=\exp (a)/\left(1+\exp (a)\right) $
. Let 
 $ {v}_i $
 represent the binary response variable (1 for infected, 0 otherwise) sampled at location
$ {v}_i $
 represent the binary response variable (1 for infected, 0 otherwise) sampled at location 
 $ \left({x}_i,{y}_i\right) $
 and time
$ \left({x}_i,{y}_i\right) $
 and time 
 $ {t}_i $
 for
$ {t}_i $
 for 
 $ i=1,\dots, n $
. We denote
$ i=1,\dots, n $
. We denote 
 $ {u}_{\mathrm{data}}={\left\{{x}_i,{y}_i,{t}_i,{v}_i\right\}}_i^n $
 and consider the model:
$ {u}_{\mathrm{data}}={\left\{{x}_i,{y}_i,{t}_i,{v}_i\right\}}_i^n $
 and consider the model:
 $$ \hskip-8.43em {v}_i\sim \mathrm{Bernoulli}\left({p}_i\right), $$
$$ \hskip-8.43em {v}_i\sim \mathrm{Bernoulli}\left({p}_i\right), $$
 $$ \hskip-7.5em {p}_i={g}^{-1}\left(u\left({x}_i,{y}_i,{t}_i\right)\right), $$
$$ \hskip-7.5em {p}_i={g}^{-1}\left(u\left({x}_i,{y}_i,{t}_i\right)\right), $$
 $$ \hskip0.43em \frac{\partial }{\partial t}u\left(x,y,t\right)={\nabla}^2\left[\mu \left(x,y\right)u\left(x,y,t\right)\right]+\lambda \left(x,y\right)u\left(x,y,t\right), $$
$$ \hskip0.43em \frac{\partial }{\partial t}u\left(x,y,t\right)={\nabla}^2\left[\mu \left(x,y\right)u\left(x,y,t\right)\right]+\lambda \left(x,y\right)u\left(x,y,t\right), $$
 $$ \hskip-13.2em u\left(x,y,0\right)=\left\{\begin{array}{cc}\eta, & \mathrm{if}\left(x,y\right)=\omega, \\ {}0,& \mathrm{if}\left(x,y\right)\ne \omega, \end{array}\right. $$
$$ \hskip-13.2em u\left(x,y,0\right)=\left\{\begin{array}{cc}\eta, & \mathrm{if}\left(x,y\right)=\omega, \\ {}0,& \mathrm{if}\left(x,y\right)\ne \omega, \end{array}\right. $$
 $$ \hskip-21.43em u\left(\left(x,y\right)\in \partial \mathcal{S},t\right),=0\hskip1em \forall t>0, $$
$$ \hskip-21.43em u\left(\left(x,y\right)\in \partial \mathcal{S},t\right),=0\hskip1em \forall t>0, $$
 $$ \hskip-14em \mathrm{\log}\mu \left(x,y\right)=N{N}_{\theta_1}\left(z\left(x,y\right)\right), $$
$$ \hskip-14em \mathrm{\log}\mu \left(x,y\right)=N{N}_{\theta_1}\left(z\left(x,y\right)\right), $$
 $$ \hskip-11.3em \lambda \left(x,y\right)={NN}_{\theta_2}\left(z\left(x,y\right)\right), $$
$$ \hskip-11.3em \lambda \left(x,y\right)={NN}_{\theta_2}\left(z\left(x,y\right)\right), $$
where 
 $ {p}_i=P\left({v}_i=1\right) $
 is the probability of observing a positive outcome,
$ {p}_i=P\left({v}_i=1\right) $
 is the probability of observing a positive outcome, 
 $ g $
 is the logit link function, and
$ g $
 is the logit link function, and 
 $ u $
 is the solution to the PDE system (3.3)–(3.5) with
$ u $
 is the solution to the PDE system (3.3)–(3.5) with 
 $ \mu \left(x,y\right) $
 representing the spatially varying diffusion coefficient, and
$ \mu \left(x,y\right) $
 representing the spatially varying diffusion coefficient, and 
 $ \lambda \left(x,y\right) $
 the spatially varying growth rate.
$ \lambda \left(x,y\right) $
 the spatially varying growth rate. 
 $ {NN}_{\theta_1} $
 and
$ {NN}_{\theta_1} $
 and 
 $ {NN}_{\theta_2} $
 represent two ML algorithms with trainable parameters
$ {NN}_{\theta_2} $
 represent two ML algorithms with trainable parameters 
 $ {\theta}_1 $
 and
$ {\theta}_1 $
 and 
 $ {\theta}_2 $
, respectively, and
$ {\theta}_2 $
, respectively, and 
 $ z\left(x,y\right) $
 is the 5-dimensional input vector of spatial covariates described at the beginning of this section. The location of the introduction point,
$ z\left(x,y\right) $
 is the 5-dimensional input vector of spatial covariates described at the beginning of this section. The location of the introduction point, 
 $ \omega $
, and the initial density,
$ \omega $
, and the initial density, 
 $ \eta $
, in equations (2.1)–(2.3) were assumed to be known, using estimates from a recent study (Oh et al., Reference Oh, Aravamuthan, Ma, Mandujano Reyes, Ballmann, Hefley, McGahan, Russell, Walsh and Zhu2023). Assuming
$ \eta $
, in equations (2.1)–(2.3) were assumed to be known, using estimates from a recent study (Oh et al., Reference Oh, Aravamuthan, Ma, Mandujano Reyes, Ballmann, Hefley, McGahan, Russell, Walsh and Zhu2023). Assuming 
 $ {\tilde{u}}_{M,N} $
 approximates
$ {\tilde{u}}_{M,N} $
 approximates 
 $ u $
 and denoting
$ u $
 and denoting 
 $ {\hat{p}}_i={g}^{-1}\left({\tilde{u}}_{M,N}\left({x}_i,{y}_i,{t}_i\right)\right) $
, we minimize the negative log-likelihood
$ {\hat{p}}_i={g}^{-1}\left({\tilde{u}}_{M,N}\left({x}_i,{y}_i,{t}_i\right)\right) $
, we minimize the negative log-likelihood
 $$ \mathrm{\ell}\left({\tilde{u}}_{M,N},{u}_{\mathrm{data}}|\theta \right)=-\frac{1}{n}\sum \limits_{i=1}^n{v}_i\log \left({\hat{p}}_i\right)+\left(1-{v}_i\right)\log \left(1-{\hat{p}}_i\right). $$
$$ \mathrm{\ell}\left({\tilde{u}}_{M,N},{u}_{\mathrm{data}}|\theta \right)=-\frac{1}{n}\sum \limits_{i=1}^n{v}_i\log \left({\hat{p}}_i\right)+\left(1-{v}_i\right)\log \left(1-{\hat{p}}_i\right). $$
In Figure 2, we illustrate how NN predictions inform our PDE model during model fitting. The two NNs 
 $ {NN}_{\theta_1} $
 and
$ {NN}_{\theta_1} $
 and 
 $ {NN}_{\theta_2} $
 have the same architecture with
$ {NN}_{\theta_2} $
 have the same architecture with 
 $ 2 $
 hidden layers,
$ 2 $
 hidden layers, 
 $ 512 $
 neurons in each hidden layer, and ReLU as the activation functions. The NNs were defined using the Tensorflow (TF) environment in Python. Regarding optimization, we use Adam optimizer with a scheduled learning rate starting at
$ 512 $
 neurons in each hidden layer, and ReLU as the activation functions. The NNs were defined using the Tensorflow (TF) environment in Python. Regarding optimization, we use Adam optimizer with a scheduled learning rate starting at 
 $ 0.0005 $
, with a decay rate of
$ 0.0005 $
, with a decay rate of 
 $ 0.96 $
 per epoch. We observed a challenging optimization problem when the weights and biases are randomly initialized, specifically for the NN
$ 0.96 $
 per epoch. We observed a challenging optimization problem when the weights and biases are randomly initialized, specifically for the NN 
 $ {NN}_{\theta_2} $
 dedicated to predicting
$ {NN}_{\theta_2} $
 dedicated to predicting 
 $ \log \left(\mu \right) $
. Therefore, we suggest a careful initialization of the bias of the output layer following the estimation of the wavefront speed heuristic presented in Shigesada and Kawasaki (Reference Shigesada and Kawasaki1997). To do this, we let
$ \log \left(\mu \right) $
. Therefore, we suggest a careful initialization of the bias of the output layer following the estimation of the wavefront speed heuristic presented in Shigesada and Kawasaki (Reference Shigesada and Kawasaki1997). To do this, we let 
 $ \mathcal{P} $
 represent the collection of the
$ \mathcal{P} $
 represent the collection of the 
 $ P $
 indexes corresponding to the positive samples in the training dataset, then the bias initialization in the output layer of
$ P $
 indexes corresponding to the positive samples in the training dataset, then the bias initialization in the output layer of 
 $ {NN}_{\theta_2} $
 is
$ {NN}_{\theta_2} $
 is
 $$ \log \left(\frac{1}{P}\sum \limits_{i\in \mathcal{P}}\frac{d\left(\left({x}_i,{y}_i\right),\left({\omega}_1,{\omega}_2\right)\right)}{\pi {t}_i}\right)\approx 22.2, $$
$$ \log \left(\frac{1}{P}\sum \limits_{i\in \mathcal{P}}\frac{d\left(\left({x}_i,{y}_i\right),\left({\omega}_1,{\omega}_2\right)\right)}{\pi {t}_i}\right)\approx 22.2, $$
where 
 $ \left({\omega}_1,{\omega}_2\right) $
 is the coordinate pair for the introduction point,
$ \left({\omega}_1,{\omega}_2\right) $
 is the coordinate pair for the introduction point, 
 $ \left({x}_i,{y}_i\right) $
 is the coordinate pair for the
$ \left({x}_i,{y}_i\right) $
 is the coordinate pair for the 
 $ i- $
th positive sample,
$ i- $
th positive sample, 
 $ {t}_i $
 is the time the
$ {t}_i $
 is the time the 
 $ i- $
th positive sample was taken, and
$ i- $
th positive sample was taken, and 
 $ d\left(\cdot, \cdot \right) $
 represents the Euclidean distance. See Appendix D for implementation details of model fitting.
$ d\left(\cdot, \cdot \right) $
 represents the Euclidean distance. See Appendix D for implementation details of model fitting.

Figure 2. Diagram of neural networks (NNs) informing the partial differential equation (PDE) to model pathogen spread and growth. The covariates, tree canopy cover (canopy), linear hydrography (waterways), topographic ruggedness index (TRI), number of coal mines (mines), and karst geomorphology (karst) are used as input. The NNs (red and blue boxes) take these covariates and predict the unknown spatially varying log-diffusion 
 $ \left(\log \mu \right) $
 and growth
$ \left(\log \mu \right) $
 and growth 
 $ \left(\lambda \right) $
 coefficients which characterize the PDE. The solution of the PDE, given the predicted coefficients, is post-processed and contrasted with the observed data using a loss function. The loss guides the training process to refine the NN’s predictions.
$ \left(\lambda \right) $
 coefficients which characterize the PDE. The solution of the PDE, given the predicted coefficients, is post-processed and contrasted with the observed data using a loss function. The loss guides the training process to refine the NN’s predictions.
3.4. Validation and comparison with PDE with spatial additive linear effects
 For model evaluation in a forecasting task, we left the last 12 months of samples as test data. Thus, we evaluated our model’s ability to detect the Pd pathogen by measuring the area under the Receiver Operating Characteristics curve (ROC-AUC) and the area under the Precision/Recall curve (PR-AUC) metrics in both training and testing datasets. We chose these metrics according to Saito and Rehmsmeier (Reference Saito and Rehmsmeier2015) recommendations for imbalanced classification problems. The baseline for ROC-AUC is 
 $ 0.5 $
, meaning that a random classifier would obtain
$ 0.5 $
, meaning that a random classifier would obtain 
 $ 0.5 $
 in this metric. For PR-AUC, the baseline depends on the positive/negative ratio: baseline = (number of positives)/(number of positives + number of negatives). Therefore, for the training dataset, the baseline was
$ 0.5 $
 in this metric. For PR-AUC, the baseline depends on the positive/negative ratio: baseline = (number of positives)/(number of positives + number of negatives). Therefore, for the training dataset, the baseline was 
 $ 0.084 $
, while for the testing dataset, the baseline was
$ 0.084 $
, while for the testing dataset, the baseline was 
 $ 0.044 $
.
$ 0.044 $
.
To contrast our proposed method with the existing efforts on EDE for WNS modeling, we considered an alternative version of the model that uses the PDE with spatial additive linear effects. We assumed linear functional forms for the diffusion coefficient
 $$ \log \mu \left(x,y\right)={\alpha}_0+{\alpha}^{\top }z\left(x,y\right), $$
$$ \log \mu \left(x,y\right)={\alpha}_0+{\alpha}^{\top }z\left(x,y\right), $$
and growth rate
 $$ \lambda \left(x,y\right)={\gamma}_0+{\gamma}^{\top }z\left(x,y\right). $$
$$ \lambda \left(x,y\right)={\gamma}_0+{\gamma}^{\top }z\left(x,y\right). $$
where 
 $ {\alpha}_0 $
 and
$ {\alpha}_0 $
 and 
 $ {\gamma}_0 $
 are intercepts, and
$ {\gamma}_0 $
 are intercepts, and 
 $ \alpha $
 and
$ \alpha $
 and 
 $ \gamma $
 are vector coefficients are associated with the spatial covariate vector
$ \gamma $
 are vector coefficients are associated with the spatial covariate vector 
 $ z\left(x,y\right) $
. This approach is similar to the ones presented in Oh et al. (Reference Oh, Aravamuthan, Ma, Mandujano Reyes, Ballmann, Hefley, McGahan, Russell, Walsh and Zhu2023) and Oh et al. (Reference Oh, McGahan, Walsh and Zhu2024). Hereafter, this model will be called the EDE with spatial additive linear effects model.
$ z\left(x,y\right) $
. This approach is similar to the ones presented in Oh et al. (Reference Oh, Aravamuthan, Ma, Mandujano Reyes, Ballmann, Hefley, McGahan, Russell, Walsh and Zhu2023) and Oh et al. (Reference Oh, McGahan, Walsh and Zhu2024). Hereafter, this model will be called the EDE with spatial additive linear effects model.
4. Results
 After fitting the model, we obtained a characterization of the prevalence of Pd, i.e., we can obtain 
 $ {\hat{p}}_i={g}^{-1}\left({\tilde{u}}_{M,N}\left(x,y,t\right)\right) $
 at any time
$ {\hat{p}}_i={g}^{-1}\left({\tilde{u}}_{M,N}\left(x,y,t\right)\right) $
 at any time 
 $ t $
 and space point
$ t $
 and space point 
 $ \left(x,y\right) $
. Additionally, we learned two functions
$ \left(x,y\right) $
. Additionally, we learned two functions 
 $ \log \hat{\mu}\left(x,y\right)={NN}_{{\hat{\theta}}_1}\left(z\left(x,y\right)\right) $
 and
$ \log \hat{\mu}\left(x,y\right)={NN}_{{\hat{\theta}}_1}\left(z\left(x,y\right)\right) $
 and 
 $ \hat{\lambda}\left(x,y\right)={NN}_{{\hat{\theta}}_2}\left(z\left(x,y\right)\right) $
 that can help agencies target management actions by allowing them to understand where the disease can spread faster (
$ \hat{\lambda}\left(x,y\right)={NN}_{{\hat{\theta}}_2}\left(z\left(x,y\right)\right) $
 that can help agencies target management actions by allowing them to understand where the disease can spread faster (
 $ \mu $
) and where it is expected to grow more readily (
$ \mu $
) and where it is expected to grow more readily (
 $ \lambda $
). Figure 3 shows the probability of infection maps for April 2010 and April 2020. We observe a heterogeneous diffusive behavior, which is important for differentiating the impact of Pd reaching different zones. Additionally, the upper maps of Figure 3 depict a process with a defined boundary between high and low probabilities of infection, which may enable management agencies to allocate testing resources based on the likelihood of detecting the pathogen (Oh et al., Reference Oh, Aravamuthan, Ma, Mandujano Reyes, Ballmann, Hefley, McGahan, Russell, Walsh and Zhu2023).
$ \lambda $
). Figure 3 shows the probability of infection maps for April 2010 and April 2020. We observe a heterogeneous diffusive behavior, which is important for differentiating the impact of Pd reaching different zones. Additionally, the upper maps of Figure 3 depict a process with a defined boundary between high and low probabilities of infection, which may enable management agencies to allocate testing resources based on the likelihood of detecting the pathogen (Oh et al., Reference Oh, Aravamuthan, Ma, Mandujano Reyes, Ballmann, Hefley, McGahan, Russell, Walsh and Zhu2023).

Figure 3. Machine learning-guided partial differential equation approximations of the probability of presence of Pseudogymnoascus destructans (Pd) (upper) vs. observed data (lower) for April 2010 (left) and April 2020 (right). The observed data color represents time (earlier times have a darker color). We observe a heterogeneous diffusive behavior, which is important for differentiating the impact of Pd reaching different zones.
 The training dataset (samples up to April 2019, Pd imbalanced ratio 
 $ =0.084 $
) obtained a PR-AUC
$ =0.084 $
) obtained a PR-AUC 
 $ =0.359\pm 0.088 $
 and ROC-AUC
$ =0.359\pm 0.088 $
 and ROC-AUC 
 $ =0.849\pm 0.070 $
, for testing data (samples after April 2019, Pd imbalanced ratio
$ =0.849\pm 0.070 $
, for testing data (samples after April 2019, Pd imbalanced ratio 
 $ =0.044 $
), we obtained PR-AUC
$ =0.044 $
), we obtained PR-AUC 
 $ =0.085\pm 0.010 $
 and ROC-AUC
$ =0.085\pm 0.010 $
 and ROC-AUC 
 $ =0.729\pm 0.024 $
. The ROC-AUC value represents the probability of correctly predicting a random positive and negative example. Thus, the expected ROC-AUC of a random classifer is 0.5. The higher the ROC-AUC, the better the model’s ability to distinguish between positive and negative samples. The Pd imbalanced ratio is the PR-AUC baseline, which can be interpreted as the expected performance of a random classifier for each dataset. Therefore, considering the average PR-AUC, we perform about
$ =0.729\pm 0.024 $
. The ROC-AUC value represents the probability of correctly predicting a random positive and negative example. Thus, the expected ROC-AUC of a random classifer is 0.5. The higher the ROC-AUC, the better the model’s ability to distinguish between positive and negative samples. The Pd imbalanced ratio is the PR-AUC baseline, which can be interpreted as the expected performance of a random classifier for each dataset. Therefore, considering the average PR-AUC, we perform about 
 $ 4 $
 times better than a random classifier for the training dataset (PR-AUC
$ 4 $
 times better than a random classifier for the training dataset (PR-AUC 
 $ =4\times \mathrm{Pd}\;\mathrm{imbalanced}\ \mathrm{ratio} $
 in training data), and
$ =4\times \mathrm{Pd}\;\mathrm{imbalanced}\ \mathrm{ratio} $
 in training data), and 
 $ 2 $
 times better than a random classifier for the testing dataset (PR-AUC
$ 2 $
 times better than a random classifier for the testing dataset (PR-AUC 
 $ =2\times \mathrm{Pd}\;\mathrm{imbalanced}\ \mathrm{ratio} $
 in testing data). Table 1 summarizes the validation metric results of 30 runs of our model with median, mean, and standard deviation compared with the linear model. In all metrics, our ML-guided PDE method outperformed the PDE with a spatial additive linear effects model.
$ =2\times \mathrm{Pd}\;\mathrm{imbalanced}\ \mathrm{ratio} $
 in testing data). Table 1 summarizes the validation metric results of 30 runs of our model with median, mean, and standard deviation compared with the linear model. In all metrics, our ML-guided PDE method outperformed the PDE with a spatial additive linear effects model.
Table 1. Validation metrics on Pseudogymnoascus destructans (Pd) detection for our machine learning-guided partial differential equation (PDE) method (machine leaning [ML]-guided) and the PDE with spatial additive linear effects model (linear) for the train and test datasets

a Area under the receiver operating characteristics curve.
b Area under the precision/recall curve.
 Although improving computation time was not an objective of this development, we report the computation time of our experiments. The PDE model with spatial additive linear effects was completed in an average 
 $ 19.76\pm 6.63 $
 minutes, whereas the ML-guided PDE method completed
$ 19.76\pm 6.63 $
 minutes, whereas the ML-guided PDE method completed 
 $ 14.16\pm 4.33 $
 minutes. Both approaches were trained using the same computational characteristics (Google Colab, RAM 12.7 GB, Disk 78.2 GB, and 2 CPU cores). Additionally, we note that the average loss function value (negative log-likelihood) in the test dataset for our ML-guided PDE method was
$ 14.16\pm 4.33 $
 minutes. Both approaches were trained using the same computational characteristics (Google Colab, RAM 12.7 GB, Disk 78.2 GB, and 2 CPU cores). Additionally, we note that the average loss function value (negative log-likelihood) in the test dataset for our ML-guided PDE method was 
 $ 0.705\pm 0.015 $
, and for PDE with spatial additive linear effects model was
$ 0.705\pm 0.015 $
, and for PDE with spatial additive linear effects model was 
 $ 0.720\pm 0.034 $
 (Figure 4).
$ 0.720\pm 0.034 $
 (Figure 4).

Figure 4. Left: loss function value for test dataset in our machine learning-guided partial differential equation (PDE) method (machine learning [ML]-guided) vs. the PDE with spatial additive linear effects model (linear). Right: training computation time (in seconds) for ML-guided vs. linear model.
 Furthermore, we did a literature review to externally validate our outcomes with the reported spread and growth of WNS in previous studies. Our NN predictions (upper left, Figure 5) presented a high diffusion and low growth in the karst landscapes which aligns with reported findings about the spread of Pd in the karst regions of Appalachian Mountains (Maher et al., Reference Maher, Kramer, Pulliam, Zokan, Bowden, Barton, Magori and Drake2012; Hoyt et al., Reference Hoyt, Kilpatrick and Langwig2021) (arrow 1 in zoomed states in Figure A2 in Appendix A). Similar behavior can be observed with the linear prediction (lower left, Figure 5), but with higher values. In the NN predictions (upper panels in Figure 5), we observed a color pattern characterizing an area with relatively high diffusion and low growth in northern Florida, across Georgia and South Carolina states (arrow 2 in zoomed states in Figure A2 in Appendix A). Such a pattern may indicate a predicted natural barrier because the combination of high diffusion (low pathogen residence time) and low growth rates would result in no aggregation of pathogens in this area. This finding supports the hypothesis of this area being the southern limit of Pd distribution (Hoyt et al., Reference Hoyt, Kilpatrick and Langwig2021). The linear predictor (lower panels in Figure 5) does not match with this hypothesis, as it shows a low growth area with a relatively low diffusion. Conversely, Pd is known to grow best at low temperatures (12.5–15.8
 $ {}^{\circ } $
C) (Verant et al., Reference Verant, Boyles, Waldrep, Wibbelt and and Blehert2012), and our approximated growth coefficient (in both models) presents high values in the north-central region, which matches zones of low monthly average temperatures.
$ {}^{\circ } $
C) (Verant et al., Reference Verant, Boyles, Waldrep, Wibbelt and and Blehert2012), and our approximated growth coefficient (in both models) presents high values in the north-central region, which matches zones of low monthly average temperatures.

Figure 5. Maps with approximated log-diffusion coefficient 
 $ \log \mu $
 and growth coefficient
$ \log \mu $
 and growth coefficient 
 $ \lambda $
 from neural networks (upper) and linear function (lower). Color scales are different to allow easier visualization of details in each figure. Approximated diffusion and growth values are only interpretable within the boundaries of the continental USA. Predicted values are not interpretable for large water bodies (e.g., the ocean and the Great Lakes).
$ \lambda $
 from neural networks (upper) and linear function (lower). Color scales are different to allow easier visualization of details in each figure. Approximated diffusion and growth values are only interpretable within the boundaries of the continental USA. Predicted values are not interpretable for large water bodies (e.g., the ocean and the Great Lakes).
Figure 6 was created to explore the non-linear relationships present in the diffusion and growth coefficients learned by the NNs and compare them with the linear models. We plotted log-diffusion and growth as a function of each covariate, keeping the remaining covariates fixed. We let one covariate vary at a time and fixed the other covariates at the 0.5 level. In general, we observed that the functional forms for all covariates in diffusion and growth were close to a second-degree polynomial, and in a few cases, a third-degree polynomial for the NN. The growth coefficient showed a more complex behavior compared with the diffusion one. Nevertheless, we remark that these relationships should not be over-interpreted.

Figure 6. Functional relationship between each variable versus the log-diffusion coefficient 
 $ \log \mu $
 (upper) and growth coefficient
$ \log \mu $
 (upper) and growth coefficient 
 $ \lambda $
 (lower) from neural networks (NNs) (right) and a linear model (left). Each covariate is varied while the remaining covariates are fixed at 0.5. Note the different values on the y-axis between linear and NN models.
$ \lambda $
 (lower) from neural networks (NNs) (right) and a linear model (left). Each covariate is varied while the remaining covariates are fixed at 0.5. Note the different values on the y-axis between linear and NN models.
 Finally, to provide interpretation from our results, we leveraged the SHAP (SHapley Additive exPlanations) values technique (Lundberg and Lee, Reference Lundberg, Lee, Guyon, Luxburg, Bengio, Wallach, Fergus, Vishwanathan and Garnett2017). SHAP values are used to explain the covariate contributions to the predictions of each observation. In particular, we used the Deep SHAP approach, which gives a high-speed approximation for SHAP values in deep learning models. Figure 7 displays bee swarm plots of SHAP values for 
 $ \mathrm{5,000} $
 randomly sampled locations for log-diffusion and growth NN models. In these figures, covariates are ranked from top to bottom by their mean absolute SHAP value. For each covariate, each of the randomly sampled locations has a point distributed along the horizontal axis by its SHAP value. SHAP values with high density are represented by stacking the points vertically. For each point, the color represents its raw value. Therefore, in Figure 7, we observe that canopy cover is the most influential covariate for log-diffusion and growth. Higher values of canopy are associated with lower values of diffusion, while lower values of canopy are associated with higher values of diffusion. Similarly, large values of karst have a negative association with diffusion, and lower values of karst are associated with higher diffusion rates. For Pd growth, higher values of karst have a positive association, while lower values have a negative association. Lastly, large values of canopy are associated with low growth rates, and lower canopy values have a positive association with growth.
$ \mathrm{5,000} $
 randomly sampled locations for log-diffusion and growth NN models. In these figures, covariates are ranked from top to bottom by their mean absolute SHAP value. For each covariate, each of the randomly sampled locations has a point distributed along the horizontal axis by its SHAP value. SHAP values with high density are represented by stacking the points vertically. For each point, the color represents its raw value. Therefore, in Figure 7, we observe that canopy cover is the most influential covariate for log-diffusion and growth. Higher values of canopy are associated with lower values of diffusion, while lower values of canopy are associated with higher values of diffusion. Similarly, large values of karst have a negative association with diffusion, and lower values of karst are associated with higher diffusion rates. For Pd growth, higher values of karst have a positive association, while lower values have a negative association. Lastly, large values of canopy are associated with low growth rates, and lower canopy values have a positive association with growth.

Figure 7. Bee swarm plots of SHapley Additive exPlanations (SHAP) values from 
 $ \mathrm{5,000} $
 randomly sampled locations for log-diffusion (upper) and growth (lower) neural network models. SHAP values explain the covariate contributions to the predictions of each observation. Covariates are ranked from top to bottom by their mean absolute SHAP value (shown in parentheses beside the name). For each covariate, each location has a point distributed along the horizontal axis by its SHAP value. SHAP values with high density are represented by stacking the points vertically. Color represents covariate raw values.
$ \mathrm{5,000} $
 randomly sampled locations for log-diffusion (upper) and growth (lower) neural network models. SHAP values explain the covariate contributions to the predictions of each observation. Covariates are ranked from top to bottom by their mean absolute SHAP value (shown in parentheses beside the name). For each covariate, each location has a point distributed along the horizontal axis by its SHAP value. SHAP values with high density are represented by stacking the points vertically. Color represents covariate raw values.
5. Conclusions and discussion
We developed and implemented a modern solution to a critical problem in wildlife ecology, using ML-guided PDEs. Our approach exploits prior physical knowledge of ecological systems, encoded through PDEs, using ML algorithms for a flexible representation of interpretable coefficients. Moreover, we provided the theoretical guarantees derived from the analytical approximation for our PDE and the universal approximation theorem B.4. Additionally, we demonstrated the benefits of our method by comparing its forecasting power with the commonly used PDE with a spatial additive linear effects model. However, while our validation metrics outperformed existing methods, there is room for improvement regarding the pathogen detection power of the presented model.
Some of our practical findings are compatible and support hypotheses from existing WNS literature, which gives us confidence that our methods have the potential to be broadly applied to other wildlife diseases or other ecological processes that exhibit growth and spread. Some of the areas that can benefit from our methods are plant and animal population growth and dispersal, modeling the expansion of invasive species, and studying wildlife migration. Furthermore, the proposed method is a general approach that can be adapted to different PDE system structures, including other initial and boundary conditions, with different diffusion and growth mechanisms.
An important upside of the presented method is the modeling flexibility for the diffusion and growth coefficients associated with the EDEs. Unlike using additive linear effects or other hard parametric assumed relationships, our approach does not require users to know or specify a functional form of the impact of covariates on the processes of interest. Thus, the incorporation of the effect of several different covariates is much easier to do with the embedded ML algorithms. Our method showcased the value of SciML in wildlife epidemiology, providing a powerful framework for modeling complex ecological processes. This article may inspire forthcoming research focused on modeling wildlife diseases considering a wide range of covariates related to important environmental processes, such as climate change, and anthropogenic activities, such as pollutant emissions. Finally, we believe that exploring and comparing different ML algorithms, including more complex NN architectures, is a promising future path of investigation with the potential of capturing intricate non-linear relationships that simpler models might miss.
Acknowledgements
Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. The findings and conclusions in this article are those of the authors and do not necessarily represent the views of the U.S. Fish and Wildlife Service. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Author contribution
Conceptualization: J.F.M.R., I.M.; Data curation: J.F.M.R., G.O.; Data visualization: J.F.M.R.; Funding acquisition: D.P.W., J.Z.; Methodology: J.F.M.R., I.M.; Project administration: D.P.W., J.Z.; Writing—original draft: J.F.M.R.; Writing—review and editing: J.F.M.R., I.M., T.F.M., R.R., D.P.W.; J.Z. All authors approved the final submitted draft.
Competing interests
The authors declare that they have no competing interests.
Data availability statement
The dataset analyzed during the current study is available in the Pseudogymnoascus destructans detections by U.S. county 2013–2020: U.S. Geological Survey data release repository (https://doi.org/10.5066/P9MONOPJ).
Funding statement
This work is, in part, supported by the U.S. Geological Survey under a grant/cooperative agreement and NSF DMS-2245906. This work was funded by the U.S. Department of Agriculture, National Institute of Food and Agriculture (2022-05138) through the NSF-NIH-USDA Ecology and Evolution of Infectious Diseases program.
Ethical standard
The research meets all ethical guidelines, including adherence to the legal requirements of the study country.
A. Additional figures

Figure A1. Covariates used as explanatory variables for the coefficients in the ecological diffusion equation modeling the probability of the presence of Pseudogymnoascus destructans (Pd). We use the percentage of tree canopy cover (canopy), linear hydrography including streams/rivers, braided streams, canals, ditches, artificial paths, and aqueducts (waterways), topographic ruggedness index, which is related to the magnitude of elevation (TRI), number of coal mines per 
 $ 10 $
 km
$ 10 $
 km 
 $ \times 10 $
 km grid cell (mines), and percentage of karst geomorphology (karst). Covariates are transformed to lie in the interval
$ \times 10 $
 km grid cell (mines), and percentage of karst geomorphology (karst). Covariates are transformed to lie in the interval 
 $ \left[0,1\right] $
 and were selected following recommendations from bat biologists on the Strategic Pd Surveillance Advisory Team.
$ \left[0,1\right] $
 and were selected following recommendations from bat biologists on the Strategic Pd Surveillance Advisory Team.

Figure A2. Approximated log-diffusion coefficient 
 $ \log \mu $
 and growth coefficient
$ \log \mu $
 and growth coefficient 
 $ \lambda $
 from neural networks for the Southeastern United States. Black arrows: (1) color pattern for high diffusion and low growth in the karst landscapes of the Appalachian Mountains; and (2) color pattern characterizing relatively high diffusion and low growth in northern Florida, across Georgia, and South Carolina. The region circled (arrow 2) along the Florida/Georgia border exhibits a slightly higher density of red pixels for
$ \lambda $
 from neural networks for the Southeastern United States. Black arrows: (1) color pattern for high diffusion and low growth in the karst landscapes of the Appalachian Mountains; and (2) color pattern characterizing relatively high diffusion and low growth in northern Florida, across Georgia, and South Carolina. The region circled (arrow 2) along the Florida/Georgia border exhibits a slightly higher density of red pixels for 
 $ \mu $
 than in Florida, indicating a higher average diffusion rate in that region. Additionally, it exhibits a lower
$ \mu $
 than in Florida, indicating a higher average diffusion rate in that region. Additionally, it exhibits a lower 
 $ \lambda $
, indicating a lower average growth rate than in Florida. We interpret this to mean that the region acts as a natural barrier to establishment.
$ \lambda $
, indicating a lower average growth rate than in Florida. We interpret this to mean that the region acts as a natural barrier to establishment.
B. Theory
In this section, we present the theoretical results that support our method. In particular, following the notation in the main body of this article, we provide a derivation of a bound for the approximation error.
 
Assumption B.1. Let 
 $ {NN}_{\theta_1} $
 and
$ {NN}_{\theta_1} $
 and 
 $ {NN}_{\theta_2} $
 be two ML algorithms, with learnable vector parameters
$ {NN}_{\theta_2} $
 be two ML algorithms, with learnable vector parameters 
 $ {\theta}_1 $
 and
$ {\theta}_1 $
 and 
 $ {\theta}_2 $
, approximating
$ {\theta}_2 $
, approximating 
 $ \log \left(\mu \right) $
 and
$ \log \left(\mu \right) $
 and 
 $ \lambda $
 respectively. We assume that for arbitrarily small
$ \lambda $
 respectively. We assume that for arbitrarily small 
 $ {\varepsilon}_{0,1},{\varepsilon}_{0,2}>0 $
, there exist vector parameters
$ {\varepsilon}_{0,1},{\varepsilon}_{0,2}>0 $
, there exist vector parameters 
 $ {\theta}_1 $
 and
$ {\theta}_1 $
 and 
 $ {\theta}_2 $
, such that
$ {\theta}_2 $
, such that 
 $ {NN}_{\theta_1} $
 and
$ {NN}_{\theta_1} $
 and 
 $ {NN}_{\theta_2} $
$ {NN}_{\theta_2} $
 $$ \mid \log \left(\mu \right)-{NN}_{\theta_1}\mid <{\varepsilon}_{0,1}, $$
$$ \mid \log \left(\mu \right)-{NN}_{\theta_1}\mid <{\varepsilon}_{0,1}, $$
and
 $$ \mid \lambda -{NN}_{\theta_2}\mid <{\varepsilon}_{0,2}. $$
$$ \mid \lambda -{NN}_{\theta_2}\mid <{\varepsilon}_{0,2}. $$
Assumption B.1 ensures that we can properly train our ML algorithms. Note that when 
 $ {NN}_{\theta_1} $
 and
$ {NN}_{\theta_1} $
 and 
 $ {NN}_{\theta_2} $
 are two NNs, and
$ {NN}_{\theta_2} $
 are two NNs, and 
 $ {\theta}_1 $
 and
$ {\theta}_1 $
 and 
 $ {\theta}_2 $
 are the collection of their weights and biases in a vector format, this assumption is a consequence of the universal approximation theorem (Hornik et al., Reference Hornik, Stinchcombe and White1989; Hornik, Reference Hornik1991).
$ {\theta}_2 $
 are the collection of their weights and biases in a vector format, this assumption is a consequence of the universal approximation theorem (Hornik et al., Reference Hornik, Stinchcombe and White1989; Hornik, Reference Hornik1991).
 
Lemma B.2. Let 
 $ u\left(x,y,t\right) $
 be the solution of the PDE system (2.1)–(2.3) and
$ u\left(x,y,t\right) $
 be the solution of the PDE system (2.1)–(2.3) and 
 $ c\left(x,y,t\right) $
 be the solution of its homogenized version (2.4)–(2.6). We denote
$ c\left(x,y,t\right) $
 be the solution of its homogenized version (2.4)–(2.6). We denote 
 $ {u}_h\left(x,y,t\right)=c\left(x,y,t\right)/\mu \left(x,y\right) $
 the homogenized solution of the original system (2.1)–(2.3). Then for a given
$ {u}_h\left(x,y,t\right)=c\left(x,y,t\right)/\mu \left(x,y\right) $
 the homogenized solution of the original system (2.1)–(2.3). Then for a given 
 $ {\varepsilon}_4>0 $
,
$ {\varepsilon}_4>0 $
, 
 $$ \mid u\left(x,y,t\right)-{u}_h\left(x,y,t\right)\mid <{\varepsilon}_4. $$
$$ \mid u\left(x,y,t\right)-{u}_h\left(x,y,t\right)\mid <{\varepsilon}_4. $$
A proof of Lemma B.2 can be found in Garlick et al. (Reference Garlick, Powell, Hooten and McFarlane2011).
 
Lemma B.3. For 
 $ \left(x,y\right)\in \mathcal{S} $
 and
$ \left(x,y\right)\in \mathcal{S} $
 and 
 $ t\in \mathcal{T} $
, let
$ t\in \mathcal{T} $
, let 
 $ c\left(x,y,t\right)={\sum}_{m=1}^{\infty }{\sum}_{n=1}^{\infty }{c}_{m,n}\left(x,y,t\right) $
, be the infinite series solution of the homogenized PDE system (2.4)–(2.6), and let
$ c\left(x,y,t\right)={\sum}_{m=1}^{\infty }{\sum}_{n=1}^{\infty }{c}_{m,n}\left(x,y,t\right) $
, be the infinite series solution of the homogenized PDE system (2.4)–(2.6), and let 
 $ {\tilde{c}}_{M,N}\left(x,y,t\right)={\sum}_{m=1}^M{\sum}_{n=1}^N{c}_{m,n}\left(x,y,t\right) $
 be the truncated series solution of (2.4)–(2.6), where the
$ {\tilde{c}}_{M,N}\left(x,y,t\right)={\sum}_{m=1}^M{\sum}_{n=1}^N{c}_{m,n}\left(x,y,t\right) $
 be the truncated series solution of (2.4)–(2.6), where the 
 $ {c}_{m,n}\left(x,y,t\right) $
 terms are defined as in (2.10. Then, for any
$ {c}_{m,n}\left(x,y,t\right) $
 terms are defined as in (2.10. Then, for any 
 $ \varepsilon >0 $
, there are large enough
$ \varepsilon >0 $
, there are large enough 
 $ N $
 and
$ N $
 and 
 $ M $
 such that
$ M $
 such that
 $$ \left|c\left(x,y,t\right)-{\tilde{c}}_{M,N}\left(x,y,t\right)\right|\hskip0.33em <\varepsilon . $$
$$ \left|c\left(x,y,t\right)-{\tilde{c}}_{M,N}\left(x,y,t\right)\right|\hskip0.33em <\varepsilon . $$
Lemma B.3 represents the truncation error of the truncated series solution of the homogenized PDE system (2.4)–(2.6) and was proved by Oh et al. (Reference Oh, McGahan, Walsh and Zhu2024).
 
Theorem B.4. For 
 $ \left(x,y\right)\in \mathcal{S} $
 and
$ \left(x,y\right)\in \mathcal{S} $
 and 
 $ t\in \mathcal{T} $
, let
$ t\in \mathcal{T} $
, let 
 $ u\left(x,y,t\right) $
 be the solution of the PDE system (2.1)–(2.3), and let
$ u\left(x,y,t\right) $
 be the solution of the PDE system (2.1)–(2.3), and let 
 $ {\tilde{c}}_{M,N}\left(x,y,t\right) $
 be the truncated series solution to the homogenized PDE system (2.4)–(2.6), defined in (2.9). Let
$ {\tilde{c}}_{M,N}\left(x,y,t\right) $
 be the truncated series solution to the homogenized PDE system (2.4)–(2.6), defined in (2.9). Let 
 $ {NN}_{\theta_1} $
 and
$ {NN}_{\theta_1} $
 and 
 $ {NN}_{\theta_2} $
 be two ML algorithms, with learnable parameters
$ {NN}_{\theta_2} $
 be two ML algorithms, with learnable parameters 
 $ {\theta}_1 $
 and
$ {\theta}_1 $
 and 
 $ {\theta}_2 $
, approximating the unknown function
$ {\theta}_2 $
, approximating the unknown function 
 $ \log \left(\mu \right) $
 and
$ \log \left(\mu \right) $
 and 
 $ \lambda $
 respectively, such that Assumption B.1 holds. Let
$ \lambda $
 respectively, such that Assumption B.1 holds. Let 
 $ {\tilde{c}}_{M,N}^{NN}\left(x,y,t\right) $
 denote
$ {\tilde{c}}_{M,N}^{NN}\left(x,y,t\right) $
 denote 
 $ {\tilde{c}}_{M,N}\left(x,y,t\right) $
 evaluated when replacing
$ {\tilde{c}}_{M,N}\left(x,y,t\right) $
 evaluated when replacing 
 $ \mu $
 and
$ \mu $
 and 
 $ \lambda $
 by
$ \lambda $
 by 
 $ {NN}_{\theta_1} $
 and
$ {NN}_{\theta_1} $
 and 
 $ {NN}_{\theta_2} $
, respectively. Then, for any
$ {NN}_{\theta_2} $
, respectively. Then, for any 
 $ \varepsilon >0 $
,
$ \varepsilon >0 $
, 
 $ \left(x,y\right)\in \mathcal{S} $
 and
$ \left(x,y\right)\in \mathcal{S} $
 and 
 $ t\in \mathcal{T} $
, there exist
$ t\in \mathcal{T} $
, there exist 
 $ {\theta}_1 $
,
$ {\theta}_1 $
, 
 $ {\theta}_2 $
 and large enough
$ {\theta}_2 $
 and large enough 
 $ N $
 and
$ N $
 and 
 $ M $
 such that
$ M $
 such that
 $$ \left|u\left(x,y,t\right)-\frac{{\tilde{c}}_{M,N}^{NN}\left(x,y,t\right)}{\exp \left({NN}_{\theta_1}\left(x,y\right)\right)}\right|<\varepsilon . $$
$$ \left|u\left(x,y,t\right)-\frac{{\tilde{c}}_{M,N}^{NN}\left(x,y,t\right)}{\exp \left({NN}_{\theta_1}\left(x,y\right)\right)}\right|<\varepsilon . $$
 
Proof. Let 
 $ {\overline{NN}}_{\theta_1} $
 and
$ {\overline{NN}}_{\theta_1} $
 and 
 $ {\overline{NN}}_{\theta_2} $
 denote the approximated homogenized coefficients
$ {\overline{NN}}_{\theta_2} $
 denote the approximated homogenized coefficients 
 $ \overline{\mu} $
 and
$ \overline{\mu} $
 and 
 $ \overline{\lambda} $
, defined by replacing
$ \overline{\lambda} $
, defined by replacing 
 $ \mu $
 and
$ \mu $
 and 
 $ \lambda $
 by
$ \lambda $
 by 
 $ {NN}_{\theta_1} $
 and
$ {NN}_{\theta_1} $
 and 
 $ {NN}_{\theta_2} $
 in their definitions (2.7) and (2.8) respectively. Then, we have
$ {NN}_{\theta_2} $
 in their definitions (2.7) and (2.8) respectively. Then, we have
 $$ {\overline{NN}}_1\left(x,y\right)=\frac{\int_{\mathcal{A}}1 dxdy}{\int_{\mathcal{A}}\frac{1}{\exp \left({NN}_{\theta_1}\left(x,y\right)\right)} dxdy}, $$
$$ {\overline{NN}}_1\left(x,y\right)=\frac{\int_{\mathcal{A}}1 dxdy}{\int_{\mathcal{A}}\frac{1}{\exp \left({NN}_{\theta_1}\left(x,y\right)\right)} dxdy}, $$
and
 $$ {\overline{NN}}_2\left(x,y\right)=\frac{{\overline{NN}}_{\theta_1}\left(x,y\right)}{\mid \mathcal{A}\mid }{\int}_{\mathcal{A}}\frac{NN_{\theta_2}\left(x,y\right)}{\exp \left({NN}_{\theta_1}\left(x,y\right)\right)} dxdy, $$
$$ {\overline{NN}}_2\left(x,y\right)=\frac{{\overline{NN}}_{\theta_1}\left(x,y\right)}{\mid \mathcal{A}\mid }{\int}_{\mathcal{A}}\frac{NN_{\theta_2}\left(x,y\right)}{\exp \left({NN}_{\theta_1}\left(x,y\right)\right)} dxdy, $$
where 
 $ {NN}_{\theta_1} $
 approximates
$ {NN}_{\theta_1} $
 approximates 
 $ \log \left(\mu \right) $
 and
$ \log \left(\mu \right) $
 and 
 $ {NN}_{\theta_2} $
 approximates
$ {NN}_{\theta_2} $
 approximates 
 $ \lambda $
. By Assumption B.1, we have that for arbitrarily small
$ \lambda $
. By Assumption B.1, we have that for arbitrarily small 
 $ {\varepsilon}_{0,1},{\varepsilon}_{0,2}>0 $
, there exist parameters
$ {\varepsilon}_{0,1},{\varepsilon}_{0,2}>0 $
, there exist parameters 
 $ {\theta}_1 $
 and
$ {\theta}_1 $
 and 
 $ {\theta}_2 $
 such that
$ {\theta}_2 $
 such that
 $$ \mid \log \left(\mu \right)-{NN}_{\theta_1}\mid <{\varepsilon}_{0,1}, $$
$$ \mid \log \left(\mu \right)-{NN}_{\theta_1}\mid <{\varepsilon}_{0,1}, $$
and
 $$ \mid \lambda -{NN}_{\theta_2}\mid <{\varepsilon}_{0,2}. $$
$$ \mid \lambda -{NN}_{\theta_2}\mid <{\varepsilon}_{0,2}. $$
Then, for small 
 $ {\varepsilon}_{1,1},{\varepsilon}_{1,2}>0 $
, we notice that by Lebesgue’s dominated convergence theorem and continuity (Rudin, Reference Rudin1976, Theorem 11.32), we have
$ {\varepsilon}_{1,1},{\varepsilon}_{1,2}>0 $
, we notice that by Lebesgue’s dominated convergence theorem and continuity (Rudin, Reference Rudin1976, Theorem 11.32), we have
 $$ \left|\overline{\mu}\left(x,y\right)-{\overline{NN}}_{\theta_1}\Big(x,y\Big)\right|=\left|\frac{\int_{\mathcal{A}}1 dxdy}{\int_{\mathcal{A}}\frac{1}{\mu \left(x,y\right)} dxdy}-\frac{\int_{\mathcal{A}}1 dxdy}{\int_{\mathcal{A}}\frac{1}{\exp \left({NN}_{\theta_1}\left(x,y\right)\right)} dxdy}\right|<{\varepsilon}_{1,1}, $$
$$ \left|\overline{\mu}\left(x,y\right)-{\overline{NN}}_{\theta_1}\Big(x,y\Big)\right|=\left|\frac{\int_{\mathcal{A}}1 dxdy}{\int_{\mathcal{A}}\frac{1}{\mu \left(x,y\right)} dxdy}-\frac{\int_{\mathcal{A}}1 dxdy}{\int_{\mathcal{A}}\frac{1}{\exp \left({NN}_{\theta_1}\left(x,y\right)\right)} dxdy}\right|<{\varepsilon}_{1,1}, $$
and
 $$ \left|\overline{\lambda}\left(x,y\right)-{\overline{NN}}_{\theta_2}\Big(x,y\Big)\right|=\left|\frac{\overline{\mu}\left(x,y\right)}{\mid \mathcal{A}\mid }{\int}_{\mathcal{A}}\frac{\lambda \left(x,y\right)}{\mu \left(x,y\right)} dxdy-\frac{{\overline{NN}}_{\theta_1}\left(x,y\right)}{\mid \mathcal{A}\mid }{\int}_{\mathcal{A}}\frac{NN_{\theta_2}\left(x,y\right)}{\exp \left({NN}_{\theta_1}\left(x,y\right)\right)} dxdy\right|<{\varepsilon}_{1,2}. $$
$$ \left|\overline{\lambda}\left(x,y\right)-{\overline{NN}}_{\theta_2}\Big(x,y\Big)\right|=\left|\frac{\overline{\mu}\left(x,y\right)}{\mid \mathcal{A}\mid }{\int}_{\mathcal{A}}\frac{\lambda \left(x,y\right)}{\mu \left(x,y\right)} dxdy-\frac{{\overline{NN}}_{\theta_1}\left(x,y\right)}{\mid \mathcal{A}\mid }{\int}_{\mathcal{A}}\frac{NN_{\theta_2}\left(x,y\right)}{\exp \left({NN}_{\theta_1}\left(x,y\right)\right)} dxdy\right|<{\varepsilon}_{1,2}. $$
On the other hand, if we consider
 $$ h\left(\overline{\mu},\overline{\lambda}\right)={c}_{m,n}\left(x,y,t\right)=\frac{4\eta \overline{\mu}\left(\omega \right)}{L_1{L}_2}\sin \left(\tilde{m}{\omega}_1\right)\sin \left(\tilde{n}{\omega}_2\right)\sin \left(\tilde{m}x\right)\sin \left(\tilde{n}y\right)\exp \left(\overline{\lambda}\left(x,y\right)t-\overline{\mu}\Big(x,y\Big)t\left({\tilde{m}}^2+{\tilde{n}}^2\right)\right) $$
$$ h\left(\overline{\mu},\overline{\lambda}\right)={c}_{m,n}\left(x,y,t\right)=\frac{4\eta \overline{\mu}\left(\omega \right)}{L_1{L}_2}\sin \left(\tilde{m}{\omega}_1\right)\sin \left(\tilde{n}{\omega}_2\right)\sin \left(\tilde{m}x\right)\sin \left(\tilde{n}y\right)\exp \left(\overline{\lambda}\left(x,y\right)t-\overline{\mu}\Big(x,y\Big)t\left({\tilde{m}}^2+{\tilde{n}}^2\right)\right) $$
as a function of 
 $ \left(\overline{\mu},\overline{\lambda}\right) $
, then by continuity, for a small
$ \left(\overline{\mu},\overline{\lambda}\right) $
, then by continuity, for a small 
 $ {\varepsilon}_2>0 $
, we have that
$ {\varepsilon}_2>0 $
, we have that
 $$ \left|{\tilde{c}}_{M,N}\left(x,y,t\right)-{\tilde{c}}_{M,N}^{NN}\Big(x,y,t\Big)\right|<{\varepsilon}_2. $$
$$ \left|{\tilde{c}}_{M,N}\left(x,y,t\right)-{\tilde{c}}_{M,N}^{NN}\Big(x,y,t\Big)\right|<{\varepsilon}_2. $$
Letting 
 $ c\left(x,y,t\right)={\sum}_{m=1}^{\infty }{\sum}_{n=1}^{\infty }{c}_{m,n}\left(x,y,t\right) $
 be the infinite series solution of the homogenized PDE system (2.4)–(2.6), Lemma B.3 ensures that given a small
$ c\left(x,y,t\right)={\sum}_{m=1}^{\infty }{\sum}_{n=1}^{\infty }{c}_{m,n}\left(x,y,t\right) $
 be the infinite series solution of the homogenized PDE system (2.4)–(2.6), Lemma B.3 ensures that given a small 
 $ {\varepsilon}_3>0 $
, there are large enough
$ {\varepsilon}_3>0 $
, there are large enough 
 $ N $
 and
$ N $
 and 
 $ M $
 such that
$ M $
 such that
 $$ \left|c\left(x,y,t\right)-{\tilde{c}}_{M,N}\Big(x,y,t\Big)\right|<{\varepsilon}_3, $$
$$ \left|c\left(x,y,t\right)-{\tilde{c}}_{M,N}\Big(x,y,t\Big)\right|<{\varepsilon}_3, $$
which represents the truncation error of the truncated series solution. Denoting 
 $ {u}_h\left(x,y,t\right)=c\left(x,y,t\right)/\mu \left(x,y\right) $
, the homogenized solution of the original PDE system (2.1)–(2.3), according to Lemma B.2, we have that for a given
$ {u}_h\left(x,y,t\right)=c\left(x,y,t\right)/\mu \left(x,y\right) $
, the homogenized solution of the original PDE system (2.1)–(2.3), according to Lemma B.2, we have that for a given 
 $ {\varepsilon}_4>0 $
,
$ {\varepsilon}_4>0 $
,
 $$ \left|u\left(x,y,t\right)-{u}_h\Big(x,y,t\Big)\right|<{\varepsilon}_4. $$
$$ \left|u\left(x,y,t\right)-{u}_h\Big(x,y,t\Big)\right|<{\varepsilon}_4. $$
Then we have the final result following the triangle inequality:
 $$ {\displaystyle \begin{array}{c}\hskip-11.66em \left|u\left(x,y,t\right)-\frac{{\tilde{c}}_{M,N}^{NN}\left(x,y,t\right)}{\exp \left(N{N}_{\theta_1}\left(x,y\right)\right)}\right|\le \left|u\left(x,y,t\right)-\frac{c\left(x,y,t\right)}{\mu \left(x,y\right)}\right|+\left|\frac{c\left(x,y,t\right)}{\mu \left(x,y\right)}-\frac{{\tilde{c}}_{M,N}\left(x,y,t\right)}{\mu \left(x,y\right)}\right|\\ {}\hskip0.5em +\left|\frac{{\tilde{c}}_{M,N}\left(x,y,t\right)}{\mu \left(x,y\right)}-\frac{{\tilde{c}}_{M,N}\left(x,y,t\right)}{\exp \left(N{N}_{\theta_1}\left(x,y\right)\right)}\right|\\ {}\hskip4em +\left|\frac{{\tilde{c}}_{M,N}\left(x,y,t\right)}{\exp \left(N{N}_{\theta_1}\left(x,y\right)\right)}-\frac{{\tilde{c}}_{M,N}^{NN}\left(x,y,t\right)}{\exp \left(N{N}_{\theta_1}\left(x,y\right)\right)}\right|\\ {}\hskip12.9em \le \left|u\left(x,y,t\right)-{u}_h\Big(x,y,t\Big)\right|+\left|\frac{1}{\mu \left(x,y\right)}\right|\left|c\left(x,y,t\right)-{\tilde{c}}_{M,N}\Big(x,y,t\Big)\right|\\ {}\hskip5.5em +\left|{\tilde{c}}_{M,N}\left(x,y,t\right)\right|\left|\frac{1}{\mu \left(x,y\right)}-\frac{1}{\exp \left(N{N}_{\theta_1}\left(x,y\right)\right)}\right|\\ {}\hskip8.66em +\left|\frac{1}{\exp \left(N{N}_{\theta_1}\left(x,y\right)\right)}\right|\left|{\tilde{c}}_{M,N}\left(x,y,t\right)-{\tilde{c}}_{M,N}^{NN}\Big(x,y,t\Big)\right|\\ {}\hskip13.5em \le {\varepsilon}_4+\left|\frac{1}{\mu \left(x,y\right)}\right|{\varepsilon}_3+\left|{\tilde{c}}_{M,N}\left(x,y,t\right)\right|{\varepsilon}_5+\left|\frac{1}{\exp \left(N{N}_{\theta_1}\left(x,y\right)\right)}\right|{\varepsilon}_2,\end{array}} $$
$$ {\displaystyle \begin{array}{c}\hskip-11.66em \left|u\left(x,y,t\right)-\frac{{\tilde{c}}_{M,N}^{NN}\left(x,y,t\right)}{\exp \left(N{N}_{\theta_1}\left(x,y\right)\right)}\right|\le \left|u\left(x,y,t\right)-\frac{c\left(x,y,t\right)}{\mu \left(x,y\right)}\right|+\left|\frac{c\left(x,y,t\right)}{\mu \left(x,y\right)}-\frac{{\tilde{c}}_{M,N}\left(x,y,t\right)}{\mu \left(x,y\right)}\right|\\ {}\hskip0.5em +\left|\frac{{\tilde{c}}_{M,N}\left(x,y,t\right)}{\mu \left(x,y\right)}-\frac{{\tilde{c}}_{M,N}\left(x,y,t\right)}{\exp \left(N{N}_{\theta_1}\left(x,y\right)\right)}\right|\\ {}\hskip4em +\left|\frac{{\tilde{c}}_{M,N}\left(x,y,t\right)}{\exp \left(N{N}_{\theta_1}\left(x,y\right)\right)}-\frac{{\tilde{c}}_{M,N}^{NN}\left(x,y,t\right)}{\exp \left(N{N}_{\theta_1}\left(x,y\right)\right)}\right|\\ {}\hskip12.9em \le \left|u\left(x,y,t\right)-{u}_h\Big(x,y,t\Big)\right|+\left|\frac{1}{\mu \left(x,y\right)}\right|\left|c\left(x,y,t\right)-{\tilde{c}}_{M,N}\Big(x,y,t\Big)\right|\\ {}\hskip5.5em +\left|{\tilde{c}}_{M,N}\left(x,y,t\right)\right|\left|\frac{1}{\mu \left(x,y\right)}-\frac{1}{\exp \left(N{N}_{\theta_1}\left(x,y\right)\right)}\right|\\ {}\hskip8.66em +\left|\frac{1}{\exp \left(N{N}_{\theta_1}\left(x,y\right)\right)}\right|\left|{\tilde{c}}_{M,N}\left(x,y,t\right)-{\tilde{c}}_{M,N}^{NN}\Big(x,y,t\Big)\right|\\ {}\hskip13.5em \le {\varepsilon}_4+\left|\frac{1}{\mu \left(x,y\right)}\right|{\varepsilon}_3+\left|{\tilde{c}}_{M,N}\left(x,y,t\right)\right|{\varepsilon}_5+\left|\frac{1}{\exp \left(N{N}_{\theta_1}\left(x,y\right)\right)}\right|{\varepsilon}_2,\end{array}} $$
where the first term is bounded by the homogenization approximation (B.4), the second one is the truncation error of the series expansion (B.3), the third term, 
 $ \left|\frac{1}{\mu \left(x,y\right)}-\frac{1}{\exp \left({NN}_{\theta_1}\left(x,y\right)\right)}\right|, $
 is bounded by
$ \left|\frac{1}{\mu \left(x,y\right)}-\frac{1}{\exp \left({NN}_{\theta_1}\left(x,y\right)\right)}\right|, $
 is bounded by 
 $ {\varepsilon}_5 $
 due to continuity and Assumption B.1, and the last term is bounded again by a consequence of Assumption B.1 and the continuity of
$ {\varepsilon}_5 $
 due to continuity and Assumption B.1, and the last term is bounded again by a consequence of Assumption B.1 and the continuity of 
 $ h\left(\overline{\mu},\overline{\lambda}\right) $
 (B.2). Finally, the right-hand side of the previous inequality is a sum of epsilons and the product of epsilons with finite value quantities. Thus, for any
$ h\left(\overline{\mu},\overline{\lambda}\right) $
 (B.2). Finally, the right-hand side of the previous inequality is a sum of epsilons and the product of epsilons with finite value quantities. Thus, for any 
 $ \varepsilon $
, there exist
$ \varepsilon $
, there exist 
 $ {\varepsilon}_2,{\varepsilon}_3,{\varepsilon}_4,{\varepsilon}_5 $
 with corresponding
$ {\varepsilon}_2,{\varepsilon}_3,{\varepsilon}_4,{\varepsilon}_5 $
 with corresponding 
 $ {NN}_{\theta_1},{NN}_{\theta_2} $
, such that
$ {NN}_{\theta_1},{NN}_{\theta_2} $
, such that
 $$ \left|u\left(x,y,t\right)-\frac{{\tilde{c}}_{M,N}^{NN}\left(x,y,t\right)}{\exp \left({NN}_{\theta_1}\left(x,y\right)\right)}\right|<\varepsilon . $$
$$ \left|u\left(x,y,t\right)-\frac{{\tilde{c}}_{M,N}^{NN}\left(x,y,t\right)}{\exp \left({NN}_{\theta_1}\left(x,y\right)\right)}\right|<\varepsilon . $$
Therefore, we conclude that the approximation error will be small as long as: (1) the homogenization assumptions are satisfied (Garlick et al., Reference Garlick, Powell, Hooten and McFarlane2011); (2) we choose large enough 
 $ N $
 and
$ N $
 and 
 $ M $
 integers (Oh et al., Reference Oh, McGahan, Walsh and Zhu2024); and (3) we have proper optimization methods and enough data to train our ML algorithms.
$ M $
 integers (Oh et al., Reference Oh, McGahan, Walsh and Zhu2024); and (3) we have proper optimization methods and enough data to train our ML algorithms.
C. Ecological diffusion equation: analytic approximation by homogenization
Algorithm C.1. Aggregation functions for homogenization.
 1: function 
 $ \overline{\mu}x $
, factor
$ \overline{\mu}x $
, factor 
 $ \vartriangleright $
 This function is defined according to (2.7) in the main body of the paper. In this function
$ \vartriangleright $
 This function is defined according to (2.7) in the main body of the paper. In this function 
 $ x $
 is a two-dimensional array.
$ x $
 is a two-dimensional array.
 2: 
 $ patch\_ size\leftarrow ( factor,\hskip0.35em factor) $
.
$ patch\_ size\leftarrow ( factor,\hskip0.35em factor) $
.
 3: 
 $ {n}_x\leftarrow \mathrm{tf}.\mathrm{shape}(x)\left[0\right]// patch\_ size\left[0\right] $
.
$ {n}_x\leftarrow \mathrm{tf}.\mathrm{shape}(x)\left[0\right]// patch\_ size\left[0\right] $
.
 4: 
 $ {n}_y\leftarrow \mathrm{tf}.\mathrm{shape}(x)\left[1\right]// patch\_ size\left[1\right] $
.
$ {n}_y\leftarrow \mathrm{tf}.\mathrm{shape}(x)\left[1\right]// patch\_ size\left[1\right] $
.
 5: 
 $ x\_ reshaped\leftarrow \mathrm{tf}.\mathrm{reshape}\Big(x\left[:{n}_x\times patch\_ size\left[0\right],:{n}_y\times patch\_ size\left[1\right]\right], $
.
$ x\_ reshaped\leftarrow \mathrm{tf}.\mathrm{reshape}\Big(x\left[:{n}_x\times patch\_ size\left[0\right],:{n}_y\times patch\_ size\left[1\right]\right], $
.
 6: 
 $ \left({n}_x, patch\_ size\left[0\right],{n}_y, patch\_ size\left[1\right]\right)\Big) $
.
$ \left({n}_x, patch\_ size\left[0\right],{n}_y, patch\_ size\left[1\right]\right)\Big) $
.
 7: 
 $ x\_ reciprocal\leftarrow \frac{1}{x\_ reshaped} $
.
$ x\_ reciprocal\leftarrow \frac{1}{x\_ reshaped} $
.
 8: 
 $ x\_ mean\leftarrow \frac{1}{\mathrm{tf}.\mathrm{reduce}\_\mathrm{mean}\left(x\_ reciprocal, axis=\left(1,3\right)\right)} $
.
$ x\_ mean\leftarrow \frac{1}{\mathrm{tf}.\mathrm{reduce}\_\mathrm{mean}\left(x\_ reciprocal, axis=\left(1,3\right)\right)} $
.
 9: return 
 $ x\_ mean $
.
$ x\_ mean $
.
10: end function.
 11: function 
 $ \overline{\lambda}x $
,
$ \overline{\lambda}x $
, 
 $ factor $
$ factor $
 
 $ \vartriangleright $
 This function is defined according to (2.8) in the main body of the paper. In this function
$ \vartriangleright $
 This function is defined according to (2.8) in the main body of the paper. In this function 
 $ x $
 is a two-dimensional array.
$ x $
 is a two-dimensional array.
 12: 
 $ patch\_ size\leftarrow \left( factor, factor\right) $
.
$ patch\_ size\leftarrow \left( factor, factor\right) $
.
 13: 
 $ {n}_x\leftarrow \mathrm{tf}.\mathrm{shape}(x)\left[0\right]// patch\_ size\left[0\right] $
.
$ {n}_x\leftarrow \mathrm{tf}.\mathrm{shape}(x)\left[0\right]// patch\_ size\left[0\right] $
.
 14: 
 $ {n}_y\leftarrow \mathrm{tf}.\mathrm{shape}(x)\left[1\right]// patch\_ size\left[1\right] $
.
$ {n}_y\leftarrow \mathrm{tf}.\mathrm{shape}(x)\left[1\right]// patch\_ size\left[1\right] $
.
 15: 
 $ x\_ reshaped\leftarrow \mathrm{tf}.\mathrm{reshape}\left(x\right[:{n}_x\times patch\_ size\left[0\right], $
.
$ x\_ reshaped\leftarrow \mathrm{tf}.\mathrm{reshape}\left(x\right[:{n}_x\times patch\_ size\left[0\right], $
.
 16:  
 $ :{n}_y\times patch\_ size\left[1\right]\left],\left({n}_x, patch\_ size\left[0\right],{n}_y, patch\_ size\left[1\right]\right)\right) $
.
$ :{n}_y\times patch\_ size\left[1\right]\left],\left({n}_x, patch\_ size\left[0\right],{n}_y, patch\_ size\left[1\right]\right)\right) $
.
 17: 
 $ x\_ mean\leftarrow \mathrm{tf}.\mathrm{reduce}\_\mathrm{mean}\left(x\_ reshaped, axis=\left(1,3\right)\right) $
.
$ x\_ mean\leftarrow \mathrm{tf}.\mathrm{reduce}\_\mathrm{mean}\left(x\_ reshaped, axis=\left(1,3\right)\right) $
.
 18: return 
 $ x\_ mean $
.
$ x\_ mean $
.
19: end function.
Algorithm C.2. Truncated series solution to the homogenized ecological diffusion equation system.
 1: function 
 $ {\tilde{c}}_{MN}M,N,x,y,t,{\omega}_1,{\omega}_2,\overline{\lambda},\overline{\mu},{L}_1,{L}_2,\theta $
$ {\tilde{c}}_{MN}M,N,x,y,t,{\omega}_1,{\omega}_2,\overline{\lambda},\overline{\mu},{L}_1,{L}_2,\theta $
 
 $ \vartriangleright $
 This function is defined according to (2.9) and (2.10) and represents the truncated series solution to the homogenized ecological diffusion equation system.
$ \vartriangleright $
 This function is defined according to (2.9) and (2.10) and represents the truncated series solution to the homogenized ecological diffusion equation system.
 2: 
 $ m\_ values\leftarrow \mathrm{tf}.\mathrm{cast}\left(\mathrm{tf}.\mathrm{linspace}\left(1,M,M\right),\mathrm{dtype}=\mathrm{tf}.\mathrm{float}32\right) $
.
$ m\_ values\leftarrow \mathrm{tf}.\mathrm{cast}\left(\mathrm{tf}.\mathrm{linspace}\left(1,M,M\right),\mathrm{dtype}=\mathrm{tf}.\mathrm{float}32\right) $
.
 3: 
 $ n\_ values\leftarrow \mathrm{tf}.\mathrm{cast}\left(\mathrm{tf}.\mathrm{linspace}\left(1,N,N\right),\mathrm{dtype}=\mathrm{tf}.\mathrm{float}32\right) $
.
$ n\_ values\leftarrow \mathrm{tf}.\mathrm{cast}\left(\mathrm{tf}.\mathrm{linspace}\left(1,N,N\right),\mathrm{dtype}=\mathrm{tf}.\mathrm{float}32\right) $
.
 4: 
 $ m\_ grid,n\_ grid\leftarrow \mathrm{tf}.\mathrm{meshgrid}\left(m\_ values,n\_ values\right) $
.
$ m\_ grid,n\_ grid\leftarrow \mathrm{tf}.\mathrm{meshgrid}\left(m\_ values,n\_ values\right) $
.
 5: 
 $ {A}_{mn}\_ values\leftarrow \theta \times \mathrm{tf}.\sin \left(m\_ grid\times \pi \times {\omega}_1/{L}_1\right)\times \mathrm{tf}.\sin \left(n\_ grid\times \pi \times {\omega}_2/{L}_2\right)\times 4/\left({L}_1\times {L}_2\right) $
.
$ {A}_{mn}\_ values\leftarrow \theta \times \mathrm{tf}.\sin \left(m\_ grid\times \pi \times {\omega}_1/{L}_1\right)\times \mathrm{tf}.\sin \left(n\_ grid\times \pi \times {\omega}_2/{L}_2\right)\times 4/\left({L}_1\times {L}_2\right) $
.
 6: 
 $ \mathit{\exp}\_ terms\leftarrow \mathrm{tf}.\exp \left(-\overline{\mu}\times t\times \left({\left(m\_ grid\times \pi /{L}_1\right)}^2+{\left(n\_ grid\times \pi /{L}_2\right)}^2\right)\right) $
.
$ \mathit{\exp}\_ terms\leftarrow \mathrm{tf}.\exp \left(-\overline{\mu}\times t\times \left({\left(m\_ grid\times \pi /{L}_1\right)}^2+{\left(n\_ grid\times \pi /{L}_2\right)}^2\right)\right) $
.
 7: 
 $ \mathit{\sin}\_ terms\_x\leftarrow \mathrm{tf}.\sin \left(m\_ grid\times \pi \times x/{L}_1\right) $
.
$ \mathit{\sin}\_ terms\_x\leftarrow \mathrm{tf}.\sin \left(m\_ grid\times \pi \times x/{L}_1\right) $
.
 8: 
 $ \mathit{\sin}\_ terms\_y\leftarrow \mathrm{tf}.\sin \left(n\_ grid\times \pi \times y/{L}_2\right) $
.
$ \mathit{\sin}\_ terms\_y\leftarrow \mathrm{tf}.\sin \left(n\_ grid\times \pi \times y/{L}_2\right) $
.
 9: 
 $ {c}_{mn}\_ values\leftarrow {A}_{mn}\_ values\times \mathrm{tf}.\exp \left(\overline{\lambda}\times t\right)\times \exp \_\mathrm{terms}\times \sin \_\mathrm{terms}\_\mathrm{x}\times \sin \_\mathrm{terms}\_\mathrm{y} $
.
$ {c}_{mn}\_ values\leftarrow {A}_{mn}\_ values\times \mathrm{tf}.\exp \left(\overline{\lambda}\times t\right)\times \exp \_\mathrm{terms}\times \sin \_\mathrm{terms}\_\mathrm{x}\times \sin \_\mathrm{terms}\_\mathrm{y} $
.
 10: 
 $ sol\leftarrow \mathrm{tf}.\mathrm{reduce}\_\mathrm{sum}\left({c}_{mn}\_ values\right) $
.
$ sol\leftarrow \mathrm{tf}.\mathrm{reduce}\_\mathrm{sum}\left({c}_{mn}\_ values\right) $
.
 11: return 
 $ sol $
.
$ sol $
.
12: end function.
Algorithm C.3. Approximated solution to the ecological diffusion equation by homogenization.
 1: function Solver 1(
 $ X,Y,T,M,N,{\omega}_1,{\omega}_2,\overline{\lambda},\overline{\mu},{L}_1,{L}_2,\theta $
)
$ X,Y,T,M,N,{\omega}_1,{\omega}_2,\overline{\lambda},\overline{\mu},{L}_1,{L}_2,\theta $
) 
 $ \vartriangleright $
 This function is defined according to (2.9) and collects the truncated series solution to the homogenized system for all the observed locations and times.
$ \vartriangleright $
 This function is defined according to (2.9) and collects the truncated series solution to the homogenized system for all the observed locations and times.
 2: 
 $ n\leftarrow \mathrm{length}(X) $
.
$ n\leftarrow \mathrm{length}(X) $
.
 3: 
 $ output\leftarrow \mathrm{zeros}(n) $
.
$ output\leftarrow \mathrm{zeros}(n) $
.
 4: for 
 $ i\leftarrow 0 $
 to
$ i\leftarrow 0 $
 to 
 $ n-1 $
 do.
$ n-1 $
 do.
 5:  
 $ output\left[i\right]\leftarrow {c}_{MN}\left(M,N,X\left[i\right],Y\left[i\right],T\left[i\right],{\omega}_1,{\omega}_2,\overline{\lambda}\left[i\right],\overline{\mu}\left[i\right],{L}_1,{L}_2,\theta \right) $
.
$ output\left[i\right]\leftarrow {c}_{MN}\left(M,N,X\left[i\right],Y\left[i\right],T\left[i\right],{\omega}_1,{\omega}_2,\overline{\lambda}\left[i\right],\overline{\mu}\left[i\right],{L}_1,{L}_2,\theta \right) $
.
6: end for.
 7: return 
 $ \mathrm{tf}.\mathrm{cast}\left( output,\mathrm{tf}.\mathrm{float}32\right) $
.
$ \mathrm{tf}.\mathrm{cast}\left( output,\mathrm{tf}.\mathrm{float}32\right) $
.
8: end function.
 9: function Solution EDE
 $ X,Y,T, nn\_ output,{\theta}_0,{t}_0,\omega, domain, factor,{L}_1,{L}_2,M,N $
$ X,Y,T, nn\_ output,{\theta}_0,{t}_0,\omega, domain, factor,{L}_1,{L}_2,M,N $
 
 $ \vartriangleright $
 In this function, we assume that df contains the data associated with our problem. The dataset df should be a pandas data frame with columns ‘X_loc’ and ‘Y_loc’ with columns and rows, respectively, corresponding to the cells in the domain.
$ \vartriangleright $
 In this function, we assume that df contains the data associated with our problem. The dataset df should be a pandas data frame with columns ‘X_loc’ and ‘Y_loc’ with columns and rows, respectively, corresponding to the cells in the domain.
 10: 
 $ \theta \leftarrow {\theta}_0 $
.
$ \theta \leftarrow {\theta}_0 $
.
 11: 
 $ \mu \leftarrow \mathrm{tf}.\exp \left( nn\_ output\left[0\right]\right) $
.
$ \mu \leftarrow \mathrm{tf}.\exp \left( nn\_ output\left[0\right]\right) $
.
 12: 
 $ \lambda \leftarrow nn\_ output\left[1\right] $
.
$ \lambda \leftarrow nn\_ output\left[1\right] $
.
 13: 
 $ fs\_ cells\_ mu\leftarrow \mu $
.
$ fs\_ cells\_ mu\leftarrow \mu $
.
 14: 
 $ fs\_ cells\_ lambda\leftarrow \lambda /\mu $
.
$ fs\_ cells\_ lambda\leftarrow \lambda /\mu $
.
 15: 
 $ mu\_ bar\leftarrow \mathrm{custom}\_\mathrm{aggregation}\hskip0.35em ( fs\_ cells\_ mu,\hskip0.35em factor) $
.
$ mu\_ bar\leftarrow \mathrm{custom}\_\mathrm{aggregation}\hskip0.35em ( fs\_ cells\_ mu,\hskip0.35em factor) $
.
 16: 
 $ \overline{\lambda}\leftarrow \mathrm{aggregation}\hskip0.35em ( fs\_ cells\_ lambda,\hskip0.35em factor) $
.
$ \overline{\lambda}\leftarrow \mathrm{aggregation}\hskip0.35em ( fs\_ cells\_ lambda,\hskip0.35em factor) $
.
 17: 
 $ \overline{\lambda}\leftarrow \overline{\mu}\times \overline{\lambda} $
.
$ \overline{\lambda}\leftarrow \overline{\mu}\times \overline{\lambda} $
.
 18: 
 $ num\_ squares\_x\leftarrow \mathrm{domain}.\mathrm{shape}\left[0\right]// factor $
.
$ num\_ squares\_x\leftarrow \mathrm{domain}.\mathrm{shape}\left[0\right]// factor $
.
 19: 
 $ num\_ squares\_y\leftarrow \mathrm{domain}.\mathrm{shape}\left[1\right]// factor $
.
$ num\_ squares\_y\leftarrow \mathrm{domain}.\mathrm{shape}\left[1\right]// factor $
.
 20: 
 $ mu\_ bar\_ tiled\leftarrow \mathrm{tf}.\mathrm{tile}\left( mu\_ bar\left[:,\mathrm{tf}.\mathrm{newaxis},:,\mathrm{tf}.\mathrm{newaxis}\right]\right),\hskip0.35em [1,\hskip0.35em factor,\hskip0.35em 1,\hskip0.35em factor]) $
.
$ mu\_ bar\_ tiled\leftarrow \mathrm{tf}.\mathrm{tile}\left( mu\_ bar\left[:,\mathrm{tf}.\mathrm{newaxis},:,\mathrm{tf}.\mathrm{newaxis}\right]\right),\hskip0.35em [1,\hskip0.35em factor,\hskip0.35em 1,\hskip0.35em factor]) $
.
 21: 
 $ \overline{\lambda}\_ tiled\leftarrow \mathrm{tf}.\mathrm{tile}\left(\overline{\lambda}\left[:,\mathrm{tf}.\mathrm{newaxis},:,\mathrm{tf}.\mathrm{newaxis}\right]\right),[1,\hskip0.35em factor,\hskip0.35em 1, factor] $
.
$ \overline{\lambda}\_ tiled\leftarrow \mathrm{tf}.\mathrm{tile}\left(\overline{\lambda}\left[:,\mathrm{tf}.\mathrm{newaxis},:,\mathrm{tf}.\mathrm{newaxis}\right]\right),[1,\hskip0.35em factor,\hskip0.35em 1, factor] $
.
 22: 
 $ mu\_ bar\_ reshaped\leftarrow \mathrm{tf}.\mathrm{reshape}\left( mu\_ bar\_ tiled,\right[ $
.
$ mu\_ bar\_ reshaped\leftarrow \mathrm{tf}.\mathrm{reshape}\left( mu\_ bar\_ tiled,\right[ $
.
 23:  
 $ \left[ num\_ squares\_x\times factor, num\_ squares\_y\times factor\right]\Big) $
.
$ \left[ num\_ squares\_x\times factor, num\_ squares\_y\times factor\right]\Big) $
.
 24: 
 $ \overline{\lambda}\_ reshaped\leftarrow \mathrm{tf}.\mathrm{reshape}\left(\overline{\lambda}\_ tiled,\right[ $
.
$ \overline{\lambda}\_ reshaped\leftarrow \mathrm{tf}.\mathrm{reshape}\left(\overline{\lambda}\_ tiled,\right[ $
.
 25:  
 $ \left[ num\_ squares\_x\times factor, num\_ squares\_y\times factor\right]\Big) $
.
$ \left[ num\_ squares\_x\times factor, num\_ squares\_y\times factor\right]\Big) $
.
 26: 
 $ X\_ lo c\leftarrow df\left[{}^{\prime }X\_ lo{c}^{\prime}\right] $
.
$ X\_ lo c\leftarrow df\left[{}^{\prime }X\_ lo{c}^{\prime}\right] $
.
 27: 
 $ Y\_ lo c\leftarrow df\left[{}^{\prime }Y\_ lo{c}^{\prime}\right] $
.
$ Y\_ lo c\leftarrow df\left[{}^{\prime }Y\_ lo{c}^{\prime}\right] $
.
 28: 
 $ positions\leftarrow \mathrm{list}\left(\mathrm{map}\left( lambda\;x,y:\left[y,x\right],Y\_ loc,X\_ loc\right)\right) $
.
$ positions\leftarrow \mathrm{list}\left(\mathrm{map}\left( lambda\;x,y:\left[y,x\right],Y\_ loc,X\_ loc\right)\right) $
.
 29: 
 $ mu\_ list\leftarrow \left[\overline{\mu}\_ reshaped\left[ pos\right]\mathrm{for} pos\mathrm{in} pos itions\right] $
.
$ mu\_ list\leftarrow \left[\overline{\mu}\_ reshaped\left[ pos\right]\mathrm{for} pos\mathrm{in} pos itions\right] $
.
 30: 
 $ \lambda \_ list\leftarrow \left[\overline{\lambda}\_ reshaped\left[ pos\right]\mathrm{for} pos\mathrm{in} pos itions\right] $
.
$ \lambda \_ list\leftarrow \left[\overline{\lambda}\_ reshaped\left[ pos\right]\mathrm{for} pos\mathrm{in} pos itions\right] $
.
 31: 
 $ mu\_ omega\_ scaled\leftarrow \overline{\mu}\_ reshaped\left[ coords\_ omega\left[0\right], coords\_ omega\left[1\right]\right] $
.
$ mu\_ omega\_ scaled\leftarrow \overline{\mu}\_ reshaped\left[ coords\_ omega\left[0\right], coords\_ omega\left[1\right]\right] $
.
 32: 
 $ scaled\_ theta\leftarrow \theta \times \mu \_ omega\_ scaled $
.
$ scaled\_ theta\leftarrow \theta \times \mu \_ omega\_ scaled $
.
 33: 
 $ sol\leftarrow \mathrm{Solver}1\Big(X=X,Y=Y,T=\left(T+{t}_0-1\right), $
.
$ sol\leftarrow \mathrm{Solver}1\Big(X=X,Y=Y,T=\left(T+{t}_0-1\right), $
.
 34:  
 $ M=M,N=N,{\omega}_1=\omega \left[0\right],{\omega}_2=\omega \left[1\right], $
.
$ M=M,N=N,{\omega}_1=\omega \left[0\right],{\omega}_2=\omega \left[1\right], $
.
 35:  
 $ \overline{\lambda}=\lambda \_ list,\overline{\mu}=\mu \_ list, $
.
$ \overline{\lambda}=\lambda \_ list,\overline{\mu}=\mu \_ list, $
.
 36:  
 $ {L}_1={L}_1,{L}_2={L}_2,\theta = scaled\_ theta\Big) $
.
$ {L}_1={L}_1,{L}_2={L}_2,\theta = scaled\_ theta\Big) $
.
 37: 
 $ sol\leftarrow sol/\left[\mu \left[ pos\right]\mathrm{for} pos\mathrm{in} pos itions\right] $
.
$ sol\leftarrow sol/\left[\mu \left[ pos\right]\mathrm{for} pos\mathrm{in} pos itions\right] $
.
 38: 
 $ sol\leftarrow \mathrm{tf}.\mathrm{where}\left( sol<1e-10,1e-10, sol\right) $
$ sol\leftarrow \mathrm{tf}.\mathrm{where}\left( sol<1e-10,1e-10, sol\right) $
 
 $ \vartriangleright $
 To avoid taking
$ \vartriangleright $
 To avoid taking 
 $ \log $
 of zero
$ \log $
 of zero
 39: return 
 $ sol $
.
$ sol $
.
40: end function.
D. Pseudocode for machine learning-guided partial differential equations
Algorithm D.1. Utils functions.
 
function Post-processing(
 $ y\_ pred $
).
$ y\_ pred $
).
 
 $ \vartriangleright $
In this function, we post-process the outcome of the neural network to use it as
$ \vartriangleright $
In this function, we post-process the outcome of the neural network to use it as 
 $ \log \mu $
 and
$ \log \mu $
 and 
 $ \lambda $
 within the approximate solution of the ecological diffusion equation. Specific values are indicated for our problem, but can be changed by the user to accommodate specific problems.
$ \lambda $
 within the approximate solution of the ecological diffusion equation. Specific values are indicated for our problem, but can be changed by the user to accommodate specific problems.
 
 $ y\_ pred\_ reshaped\leftarrow \mathrm{tf}.\mathrm{reshape}\left(\left[y\_ pred\left[:,0\right],y\_ pred\left[:,1\right]\right],\left(\mathrm{2,300,480}\right)\right) $
$ y\_ pred\_ reshaped\leftarrow \mathrm{tf}.\mathrm{reshape}\left(\left[y\_ pred\left[:,0\right],y\_ pred\left[:,1\right]\right],\left(\mathrm{2,300,480}\right)\right) $
 
 $ postprocessed\leftarrow \mathrm{SolutionEDE}\Big( $
$ postprocessed\leftarrow \mathrm{SolutionEDE}\Big( $
 
 $ X,Y,T,y\_ pred\_ reshaped, $
$ X,Y,T,y\_ pred\_ reshaped, $
 
 $ {\theta}_0=1.5\times {10}^{12}, $
$ {\theta}_0=1.5\times {10}^{12}, $
                  
 $ \vartriangleright $
 initial population from the literature
$ \vartriangleright $
 initial population from the literature
 
 $ {t}_0=42, $
$ {t}_0=42, $
                     
 $ \vartriangleright $
 initial time from the literature
$ \vartriangleright $
 initial time from the literature
 
 $ \omega =\left[4235000,2175000\right], $
$ \omega =\left[4235000,2175000\right], $
                 
 $ \vartriangleright $
 initial location from the literature
$ \vartriangleright $
 initial location from the literature
 
 $ domain= domain, $
$ domain= domain, $
              
 $ \vartriangleright $
 the domain is a data frame that contains the covariates
$ \vartriangleright $
 the domain is a data frame that contains the covariates
 
 $ factor=10, $
$ factor=10, $
                     
 $ \vartriangleright $
 factor for homogenization
$ \vartriangleright $
 factor for homogenization
 
 $ {L}_1=4.8\times {10}^6, $
$ {L}_1=4.8\times {10}^6, $
                       
 $ \vartriangleright $
 limits of the domain
$ \vartriangleright $
 limits of the domain
 
 $ {L}_2=3\times {10}^6, $
$ {L}_2=3\times {10}^6, $
 
 $ M=20, $
$ M=20, $
             
 $ \vartriangleright $
 M for the truncated series solution to the homogenized system
$ \vartriangleright $
 M for the truncated series solution to the homogenized system
 
 $ N=20\Big) $
$ N=20\Big) $
              
 $ \vartriangleright $
 N for the truncated series solution to the homogenized system
$ \vartriangleright $
 N for the truncated series solution to the homogenized system
 return 
 $ postprocessed $
.
$ postprocessed $
.
end function.
Algorithm D.2. Neural Network functions.
 
function Create Model(
 $ input\_ shape $
).
$ input\_ shape $
).
 
 $ \vartriangleright $
In this function, we follow the same architecture used in the paper. However, it can be changed by the user for other problems.
$ \vartriangleright $
In this function, we follow the same architecture used in the paper. However, it can be changed by the user for other problems.
 
 $ input\_ layer\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{Input}\left(\mathrm{shape}=\mathrm{input}\_\mathrm{shape}\right) $
$ input\_ layer\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{Input}\left(\mathrm{shape}=\mathrm{input}\_\mathrm{shape}\right) $
 
 $ common\_ base\_1\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{Dense}\left(\mathrm{units}=512,\mathrm{activation}='\mathrm{relu}'\right)\left( input\_ layer\right) $
$ common\_ base\_1\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{Dense}\left(\mathrm{units}=512,\mathrm{activation}='\mathrm{relu}'\right)\left( input\_ layer\right) $
 
 $ common\_ base\_2\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{Dense}\left(\mathrm{units}=512,\mathrm{activation}='\mathrm{relu}'\right)\left( input\_ layer\right) $
$ common\_ base\_2\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{Dense}\left(\mathrm{units}=512,\mathrm{activation}='\mathrm{relu}'\right)\left( input\_ layer\right) $
 
 $ no\_ common\_ base\_1\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{Dense}\left(\mathrm{units}=512,\mathrm{activation}='\mathrm{relu}'\right)\left( common\_ base\_1\right) $
$ no\_ common\_ base\_1\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{Dense}\left(\mathrm{units}=512,\mathrm{activation}='\mathrm{relu}'\right)\left( common\_ base\_1\right) $
 
 $ no\_ common\_ base\_2\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{Dense}\left(\mathrm{units}=512,\mathrm{activation}='\mathrm{relu}'\right)\left( common\_ base\_2\right) $
$ no\_ common\_ base\_2\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{Dense}\left(\mathrm{units}=512,\mathrm{activation}='\mathrm{relu}'\right)\left( common\_ base\_2\right) $
 
 $ output\_ mu\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{Dense}\Big( $
$ output\_ mu\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{Dense}\Big( $
 
 $ units=1, $
$ units=1, $
 
 $ bias\_ initializer=\mathrm{tf}.\mathrm{keras}.\mathrm{initializers}.\mathrm{Constant}(22.2), $
$ bias\_ initializer=\mathrm{tf}.\mathrm{keras}.\mathrm{initializers}.\mathrm{Constant}(22.2), $
 
 $ name='\log \_\mathrm{mu}'\left)\right( $
$ name='\log \_\mathrm{mu}'\left)\right( $
 
 $ no\_ common\_ base\_1\Big) $
$ no\_ common\_ base\_1\Big) $
 
 $ output\_ lambda\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{Dense}\Big( $
$ output\_ lambda\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{Dense}\Big( $
 
 $ units=1, $
$ units=1, $
 
 $ name=^{\prime}\mathrm{lambd}{\mathrm{a}}^{\prime}\Big)\left( no\_ common\_ base\_2\right) $
$ name=^{\prime}\mathrm{lambd}{\mathrm{a}}^{\prime}\Big)\left( no\_ common\_ base\_2\right) $
 
 $ model\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{models}.\mathrm{Model}\Big( $
$ model\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{models}.\mathrm{Model}\Big( $
 
 $ \mathrm{inputs}=\mathrm{input}\_\mathrm{layer}, $
$ \mathrm{inputs}=\mathrm{input}\_\mathrm{layer}, $
 
 $ \mathrm{outputs}=\mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{concatenate}\left(\left[\mathrm{output}\_\mathrm{mu},\mathrm{output}\_\mathrm{lambda}\right]\right)\Big) $
$ \mathrm{outputs}=\mathrm{tf}.\mathrm{keras}.\mathrm{layers}.\mathrm{concatenate}\left(\left[\mathrm{output}\_\mathrm{mu},\mathrm{output}\_\mathrm{lambda}\right]\right)\Big) $
 return 
 $ model $
.
$ model $
.
end function.
 
function Custom Loss(
 $ y\_ true $
,
$ y\_ true $
, 
 $ y\_ pred $
, bce = tf.keras.losses.BinaryCrossentropy(from_logits = True)).
$ y\_ pred $
, bce = tf.keras.losses.BinaryCrossentropy(from_logits = True)).
 
 $ \vartriangleright $
 In this function, we assume that df contains the data associated with our problem. The dataset df should be a pandas data frame with columns ‘y’ for the binary outcome, ‘nums’ for the number of observations in a given location.
$ \vartriangleright $
 In this function, we assume that df contains the data associated with our problem. The dataset df should be a pandas data frame with columns ‘y’ for the binary outcome, ‘nums’ for the number of observations in a given location.
 
 $ p\leftarrow \mathrm{Post}-\mathrm{processing}\left(y\_ pred\right) $
$ p\leftarrow \mathrm{Post}-\mathrm{processing}\left(y\_ pred\right) $
 
 $ y\_ obs\leftarrow \mathrm{tf}.\mathrm{cast}\left( df\left[{}^{\prime }{y}^{\prime}\right]. to\_ numpy\left(\right),\mathrm{tf}.\mathrm{float}32\right) $
$ y\_ obs\leftarrow \mathrm{tf}.\mathrm{cast}\left( df\left[{}^{\prime }{y}^{\prime}\right]. to\_ numpy\left(\right),\mathrm{tf}.\mathrm{float}32\right) $
 
 $ counts\leftarrow \mathrm{tf}.\mathrm{cast}\left( df\left[{}^{\prime } num{s}^{\prime}\right]. to\_ numpy\left(\right),\mathrm{tf}.\mathrm{float}32\right) $
$ counts\leftarrow \mathrm{tf}.\mathrm{cast}\left( df\left[{}^{\prime } num{s}^{\prime}\right]. to\_ numpy\left(\right),\mathrm{tf}.\mathrm{float}32\right) $
 
 $ y\_ real\leftarrow \mathrm{tf}.\mathrm{repeat}\left(y\_ obs,\mathrm{tf}.\mathrm{cast}\left( counts,\mathrm{tf}.\operatorname{int}32\right)\right) $
$ y\_ real\leftarrow \mathrm{tf}.\mathrm{repeat}\left(y\_ obs,\mathrm{tf}.\mathrm{cast}\left( counts,\mathrm{tf}.\operatorname{int}32\right)\right) $
 
 $ y\_ predicted\leftarrow \mathrm{tf}.\mathrm{repeat}\left(p,\mathrm{tf}.\mathrm{cast}\left( counts,\mathrm{tf}.\operatorname{int}32\right)\right) $
$ y\_ predicted\leftarrow \mathrm{tf}.\mathrm{repeat}\left(p,\mathrm{tf}.\mathrm{cast}\left( counts,\mathrm{tf}.\operatorname{int}32\right)\right) $
 
 $ loss\leftarrow \mathrm{bce}\left(y\_ real,y\_ predicted\right) $
$ loss\leftarrow \mathrm{bce}\left(y\_ real,y\_ predicted\right) $
 return 
 $ loss $
.
$ loss $
.
end function.
Algorithm D.3. Training Pipeline.
Step 1: Prepare the domain that contains the input data for the neural network. The sample size is the number of cells in the domain.
 
 $ sample\_ size\leftarrow 300\times 480 $
$ sample\_ size\leftarrow 300\times 480 $
 
 $ domain\_ transposed\leftarrow \mathrm{np}.\mathrm{transpose}\left( domain,\left(\mathrm{1,2,0}\right)\right) $
$ domain\_ transposed\leftarrow \mathrm{np}.\mathrm{transpose}\left( domain,\left(\mathrm{1,2,0}\right)\right) $
 
 $ domain\_ reshaped\leftarrow \mathrm{np}.\mathrm{reshape}\left( domain\_ transposed,\left( sample\_ size,5\right)\right) $
$ domain\_ reshaped\leftarrow \mathrm{np}.\mathrm{reshape}\left( domain\_ transposed,\left( sample\_ size,5\right)\right) $
 
 $ x\_ train\leftarrow domain\_ reshaped $
$ x\_ train\leftarrow domain\_ reshaped $
 
 $ y\_ train\leftarrow \mathrm{np}.\mathrm{array}\left(\left[0\right]\times sample\_ size\right) $
$ y\_ train\leftarrow \mathrm{np}.\mathrm{array}\left(\left[0\right]\times sample\_ size\right) $
 
 $ \vartriangleright $
 We do not use y_train in the fitting process, but it is required by Tensorflow. This is just a placeholder.
$ \vartriangleright $
 We do not use y_train in the fitting process, but it is required by Tensorflow. This is just a placeholder.
 
 $ \vartriangleright $
 Step 2: Create the model specifying the input size (number of covariates).
$ \vartriangleright $
 Step 2: Create the model specifying the input size (number of covariates).
 
 $ input\_ shape\leftarrow \left(5,\right) $
$ input\_ shape\leftarrow \left(5,\right) $
 
 $ model\leftarrow \mathrm{CreateModel}\left( input\_ shape\right) $
$ model\leftarrow \mathrm{CreateModel}\left( input\_ shape\right) $
Step 3: Prepare learning schedule and optimizer, and define callbacks and history to compile the model.
 
 $ \mathrm{tf}.\mathrm{config}.\mathrm{run}\_\mathrm{functions}\_\mathrm{eagerly}(True) $
$ \mathrm{tf}.\mathrm{config}.\mathrm{run}\_\mathrm{functions}\_\mathrm{eagerly}(True) $
 
 $ \vartriangleright $
 We can run in eager mode for debugging.
$ \vartriangleright $
 We can run in eager mode for debugging.
 
 $ lr\_ schedule\leftarrow \mathrm{keras}.\mathrm{optimizers}.\mathrm{schedules}.\mathrm{ExponentialDecay}\Big( $
$ lr\_ schedule\leftarrow \mathrm{keras}.\mathrm{optimizers}.\mathrm{schedules}.\mathrm{ExponentialDecay}\Big( $
 
 $ \mathrm{initial}\_\mathrm{learning}\_\mathrm{rate}=0.0005, $
$ \mathrm{initial}\_\mathrm{learning}\_\mathrm{rate}=0.0005, $
 
 $ \mathrm{decay}\_\mathrm{steps}=1, $
$ \mathrm{decay}\_\mathrm{steps}=1, $
 
 $ \mathrm{decay}\_\mathrm{rate}=0.96\Big) $
$ \mathrm{decay}\_\mathrm{rate}=0.96\Big) $
 
 $ opt\leftarrow \mathrm{keras}.\mathrm{optimizers}.\mathrm{Adam}\left( learning\_ rate= lr\_ schedule\right) $
$ opt\leftarrow \mathrm{keras}.\mathrm{optimizers}.\mathrm{Adam}\left( learning\_ rate= lr\_ schedule\right) $
 
 $ callback\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{callbacks}.\mathrm{EarlyStopping}\left(\mathrm{monitor}{=}^{\prime}\mathrm{los}{\mathrm{s}}^{\prime },\mathrm{patience}=10,\mathrm{restore}\_\mathrm{best}\_\mathrm{weights}=\mathrm{True}\right) $
$ callback\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{callbacks}.\mathrm{EarlyStopping}\left(\mathrm{monitor}{=}^{\prime}\mathrm{los}{\mathrm{s}}^{\prime },\mathrm{patience}=10,\mathrm{restore}\_\mathrm{best}\_\mathrm{weights}=\mathrm{True}\right) $
 
 $ callback2\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{callbacks}.\mathrm{TerminateOnNaN}\left(\right) $
$ callback2\leftarrow \mathrm{tf}.\mathrm{keras}.\mathrm{callbacks}.\mathrm{TerminateOnNaN}\left(\right) $
 
 $ history\leftarrow \mathrm{History}\left(\right) $
$ history\leftarrow \mathrm{History}\left(\right) $
 
 $ model. compile\left( optimizer= opt, loss=\mathrm{CustomLoss}\right) $
$ model. compile\left( optimizer= opt, loss=\mathrm{CustomLoss}\right) $
Step 4: Train the model, collect history, and measure computation time.
 
 $ \mathrm{st}\leftarrow \mathrm{time}.\mathrm{time}\left(\right) $
$ \mathrm{st}\leftarrow \mathrm{time}.\mathrm{time}\left(\right) $
 
 $ model. fit\Big(x\_ train,y\_ train, $
$ model. fit\Big(x\_ train,y\_ train, $
 
 $ epochs=200, $
$ epochs=200, $
 
 $ batch\_ size=2\times 300\times 480, $
$ batch\_ size=2\times 300\times 480, $
 
 $ verbose=1, $
$ verbose=1, $
 
 $ callbacks=\left[ callback, callback2, history\right]\Big) $
$ callbacks=\left[ callback, callback2, history\right]\Big) $
 
 $ \mathrm{et}\leftarrow \mathrm{time}.\mathrm{time}\left(\right) $
$ \mathrm{et}\leftarrow \mathrm{time}.\mathrm{time}\left(\right) $
 
 $ elapsed\_ time\leftarrow \mathrm{et}-\mathrm{st} $
$ elapsed\_ time\leftarrow \mathrm{et}-\mathrm{st} $
 
 























