Hostname: page-component-cb9f654ff-mwwwr Total loading time: 0 Render date: 2025-08-22T19:36:47.145Z Has data issue: false hasContentIssue false

Homogenisation of Wasserstein gradient flows

Published online by Cambridge University Press:  29 July 2025

Yuan Gao
Affiliation:
Department of Mathematics, Purdue University, West Lafayette, USA
Nung Kwan Yip*
Affiliation:
Department of Mathematics, Purdue University, West Lafayette, USA
*
Corresponding author: Nung Kwan Yip; Email: yipn@purdue.edu
Rights & Permissions [Opens in a new window]

Abstract

We prove the convergence of a Wasserstein gradient flow of a free energy in inhomogeneous media. Both the energy and media can depend on the spatial variable in a fast oscillatory manner. In particular, we show that the gradient-flow structure is preserved in the limit, which is expressed in terms of an effective energy and Wasserstein metric. The gradient flow and its limiting behavior are analysed through an energy dissipation inequality. The result is consistent with asymptotic analysis in the realm of homogenisation. However, we note that the effective metric is in general different from that obtained from the Gromov–Hausdorff convergence of metric spaces. We apply our framework to a linear Fokker–Planck equation, but we believe the approach is robust enough to be applicable in a broader context.

Information

Type
Papers
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

1. Introduction

Optimal transport has appeared in many practical and theoretical applications, cf. [Reference Peyré and Cuturi41, Reference Rachev and Rüschendorf43, Reference Rachev and Rüschendorf44, Reference Villani52, Reference Villani53]. Precisely, given a cost function $c(\cdot ,\cdot )\,:\, \mathbb{R}^n\times \mathbb{R}^n\longrightarrow \mathbb{R}$ , and two probability measures $\mu ,\nu$ on $\mathbb{R}^n$ , the problem of optimal transport is to find the minimum cost of transporting $\mu$ to $\nu$ . It has the following two classical formulations: first by Monge [Reference Monge39] in terms of optimal transport map and a second formulation using duality by Kantorovich [Reference Kantorovich33] in terms of optimal coupling measure:

\begin{equation*} \text{Monge:}\,\quad \quad \quad \inf \left \{ \int c(x,\Phi (x))\,\mathrm{d}\mu (x)\,:\,\,\,\Phi \,:\, \mathbb{R}^n\longrightarrow \mathbb{R}^n,\,\,\,\Phi _\sharp \mu = \nu \right \}, \end{equation*}

and

\begin{equation*} \text{Kantorovich:}\,\,\quad \quad \quad \inf \left \{ \iint c(x,y) \,\mathrm{d} \gamma (x,y); \quad \int \gamma (x,\,\mathrm{d} y) = \mu (x), \,\, \int \gamma (\,\mathrm{d} x,y) = \nu (y) \right \}. \end{equation*}

In the above, $\gamma$ is a probability measure on the product space $\mathbb{R}^n\times \mathbb{R}^n$ . The equivalence of the above, under appropriate general assumptions, has been established in ref. [Reference Pratelli42]. Typical examples of cost functions include the Euclidean distance square, $c(x,y)=|x-y|^2$ which is convex and spatially homogeneous in the sense that $c(x,y)=c(x-y)$ . In this case, the infimum value of the above two formulations is the square of Wasserstein-2 distance between $\mu$ and $\nu$ , denoted as $W_2^2(\mu ,\nu )$ . We refer to [Reference Ambrosio, Gigli and Savaré2Reference Santambrogio46Reference Villani52, Reference Villani53] for examples of monographs on the theory of optimal transports.

The main purpose of the current paper is to incorporate spatial inhomogeneity into the above problem, or more precisely, the cost function $c$ . We then consider gradient flows with respect to the Wasserstein metric induced by $c$ and analyse their limiting behaviour or description when the inhomogeneity converges in appropriate sense. We believe these types of questions appear naturally in many applications such as urban transportations [Reference Bernot, Caselles and Morel8, Reference Buttazzo, Pratelli, Solimini and Stepanov11], network science [Reference Kivelä, Arenas, Barthelemy, Gleeson, Moreno and Porter32], spread of epidemics [Reference Balcan, Colizza, Gonçalves, Hu, Ramasco and Vespignani7], optics [Reference Rubinstein and Wolansky45], and many others. Such a consideration indeed has a long history in the realm of homogenisation [Reference Bensoussan, Lions and Papanicolaou10, Reference Sánchez-Palencia48]. On a technical level, we aim to explore how the ideas of homogenisation can be introduced into optimal transport problems. Even though in the current paper, we work in a spatially continuous setting, the problem formulation can be posed in a discrete, graph or network setting, as seen from the above-mentioned applications. See also the end of this section for some mathematical work on these attempts.

To be specific, we consider cost functions $c_\varepsilon (\cdot , \cdot )$ that depend on the spatial variables in some oscillatory manner. We find that the formulation of Benamou–Brenier [Reference Benamou and Brenier5] is well-suited for this purpose. Not only does it connect optimal transport to some underlying “dynamical process,” it allows us to incorporate spatial inhomogeneity “more or less at will”. More precisely, we focus on the case that $c_\varepsilon (x,y)$ is defined through a least action principle,

(1.1) \begin{equation} c_\varepsilon (x,y) = \min \left \{ \int _0^1 L_\varepsilon (\dot {z}_t, z_t) \,\mathrm{d} t, \quad z\,:\, [0,1]\longrightarrow \mathbb{R}^n,\,\,z_0=x, \,\, z_1=y \right \}, \end{equation}

where we envision that $L_\varepsilon$ is convex in the first variable $ v=\dot {z}_t$ and oscillatory or periodic in the second variable $z_t$ . Note that this cost function also defines a metric in an inhomogeneous media with periodic structure. If one further assumes that $L$ is a bilinear form in $v$ , given by a positive definite matrix $B_\varepsilon (x)$ ,

(1.2) \begin{equation} L(v,z) = \langle B_\varepsilon (z)v, \, v\rangle , \end{equation}

then $c_\varepsilon (x,y)$ defines a Riemannian metric on $\mathbb{R}^n$

(1.3) \begin{equation} c_\varepsilon ^2(x,y) = \min \left \{ \int _0^1 \langle B_\varepsilon (z_t) \dot {z}_t, \dot {z}_t \rangle \,\mathrm{d} t, \quad z\,:\, [0,1]\longrightarrow \mathbb{R}^n,\,\, z_0 = x, \,\, z_1=y \right \}. \end{equation}

The above leads to the following $\varepsilon$ -Wasserstein distance (square) between $\mu ,\nu \in \mathcal{P}(\mathbb{R}^d)$ ,

(1.4) \begin{equation} W_\varepsilon ^2(\mu ,\nu )\,:\!=\, \inf \left \{ \iint c_\varepsilon (x,y) \,\mathrm{d} \gamma (x,y); \quad \int \gamma (x,\,\mathrm{d} y) = \mu (x), \,\, \int \gamma (\,\mathrm{d} x,y) = \nu (y) \right \}. \end{equation}

The description and formulation in this and next sections is applicable for general spatially inhomogeneous $B_\varepsilon$ , but the focus of this paper is when $B_\varepsilon$ takes the form $\displaystyle B_\varepsilon (x)=B\Big(\frac {x}{\varepsilon }\Big)$ – see Section 2.4 for precise statements and assumptions.

In order to keep the technicality in this paper manageable, we will only consider probability measures having densities with respect to the Lebesgue measure. Henceforth, for simplicity, we will use $\mathcal{P}_2(\mathbb{R}^n)$ to denote these measures or their densities. The subscript $2$ means these measures have finite second moments. More precise assumptions will be stated in Section 2.4. Now let $(\mathcal{P}_2(\mathbb{R}^n), W_\varepsilon )$ be the Polish space endowed with the $\varepsilon$ -Wasserstein metric. The main questions we want to understand are: whether gradient-flow structures in $(\mathcal{P}_2(\mathbb{R}^n), W_\varepsilon )$ are preserved as $\varepsilon \to 0$ and if so, what the limiting Wasserstein distance $\overline {W}$ and gradient flow are. We have given positive results for the case of linear Fokker–Planck equations in periodic media.

With (1.3), the $\varepsilon$ -Wasserstein distance $W_\varepsilon$ can be expressed using the following spatially inhomogeneous Benamou–Brenier formulation,

(1.5) \begin{equation} W_\varepsilon ^2(\rho _0, \rho _1) \,:\!=\, \inf \left \{ \int _0^1\int \rho _t(x)\langle B_\varepsilon (x) v_t(x), v_t(x)\rangle \,dx\,dt,\quad (\rho _t, v_t)\in V(\rho _0, \rho _1) \right \} \end{equation}

where

(1.6) \begin{equation} V(\rho _0, \rho _1) \,:\!=\, \Big \{(\rho _t, v_t)\,:\,\, \frac {\partial \rho _t}{\partial t} + \nabla \cdot (\rho _t v_t) = 0, \quad \rho (\cdot , 0) = \rho _0,\quad \rho (\cdot , 1) = \rho _1 \Big \}. \end{equation}

The work [Reference Bernard and Buffoni6] – see its Theorems A and B – in fact shows that the $\inf$ of (1.5) (and (1.4)) is achieved by a unique interpolation between $\rho _0$ and $\rho _1$ , given by a flow map $\frac {\,\mathrm{d}}{\,\mathrm{d} t}\Phi ^\varepsilon _t=v_t(\Phi ^\varepsilon _t)$ ,

(1.7) \begin{equation} \rho _t = (\Phi ^\varepsilon _t)_{\sharp }\rho _0, \quad 0\leq t \leq 1. \end{equation}

Note that for the case $\varepsilon =1, B_\varepsilon = I$ , (1.5) is the celebrated Benamou–Brenier formula [Reference Benamou and Brenier5] for the standard (squared) Wasserstein distance

(1.8) \begin{equation} W_2^2(\rho _0,\rho _1)=\inf \left \{ \iint |x-y|^2 \,\mathrm{d} \gamma (x,y); \quad \int \gamma (x,\,\mathrm{d} y) = \rho _0(x)\,\mathrm{d} x, \,\, \int \gamma (\,\mathrm{d} x,y) = \rho _1(y)\,\mathrm{d} y \right \}. \end{equation}

The functional in (1.5) defines an action functional on $(\mathcal{P}_2(\mathbb{R}^n), W_2)$ , which allows one to directly use least action principles on $(\mathcal{P}_2(\mathbb{R}^n), W_2)$ to compute the $W_2$ -distance. In the seminal paper [Reference Otto40], Otto went further to regard $W_2$ as a Pseudo-Riemannian distance on $\mathcal{P}_2(\mathbb{R}^n)$ with the Riemannian metric being the same as the one given by the Benamou–Brenier formula. More precisely, for any $s_1, s_2$ on the tangent plane $T_{\mathcal{P}}$ at $\rho \in \mathcal{P}$ , the metric tensor on $T_{\mathcal{P}}\times T_{\mathcal{P}}$ is given by

(1.9) \begin{equation} \big \langle s_1, s_2\big \rangle _{T_{\mathcal{P}}, T_{\mathcal{P}}}\,:\!=\, \int \rho (x)\langle \nabla \varphi _1(x), \nabla \varphi _2(x)\rangle \,dx, \quad \text{ where } s_i = -\nabla \cdot (\rho \nabla \varphi _i), \,\,\,i=1,2. \end{equation}

(See Section 2.2 for an explanation of going from $v_t$ in (1.5) to $\nabla \varphi$ above.) With the above set-up for the Wasserstein distance, we proceed to consider gradient flows in $(\mathcal{P}_2(\mathbb{R}^n), W_\varepsilon )$ of a given energy functional $E_\varepsilon \,:\, \mathcal{P}_2(\mathbb{R}^n) \longrightarrow \mathbb{R}$ ,

(1.10) \begin{equation} \partial _t \rho ^\varepsilon _t = - \nabla ^{W_\varepsilon } E_\varepsilon (\rho ^\varepsilon _t). \end{equation}

The precise dynamics is uniquely determined by a dissipation functional on the tangent plane characterising the rate of change of the energy from which the Wasserstein gradient $\nabla ^{W_\varepsilon }$ is derived. In this paper, we consider energy dissipation expressed by the metric $W_\varepsilon$ (induced by (1.5)). It turns out $W_\varepsilon$ can be formally interpreted as a Riemannian metric (see (2.11)), which in particular is given by a bilinear form. Based on the expression of $\nabla ^{W_\varepsilon }$ (see (2.14)), $\varepsilon$ -Wasserstein gradient flow (1.10) can be explicitly written as

(1.11) \begin{equation} \partial _t \rho ^\varepsilon _t =\nabla \cdot \left (\rho ^\varepsilon _t B_\varepsilon ^{-1} \nabla \frac {\delta E_\varepsilon }{\delta \rho }(\rho ^\varepsilon _t)\right ). \end{equation}

Note that our formulation allows oscillations in both the energy $E_\varepsilon$ and media $B_\varepsilon$ .

If the total energy is taken as the relative entropy or the Kullback–Leibler divergence between $\rho$ and another probability distribution $\pi _\varepsilon \in \mathcal{P}_2(\mathbb{R}^n)$ ,

(1.12) \begin{equation} E_\varepsilon (\rho ) = KL(\rho ||\pi _\varepsilon ) \,:\!=\, \int _{\mathbb{R}^n} \rho (x) \log \frac {\rho (x)}{\pi _\varepsilon (x)} \,\mathrm{d} x, \end{equation}

then the above $\varepsilon$ -Wasserstein gradient flow (1.11) is the same as a linear Fokker–Planck equation with oscillatory coefficients. The above energy is often called the free energy of the system and $\pi _\varepsilon$ in (1.12) is a stationary distribution corresponding to an underlying stochastic process.

Our main result is the evolutionary convergence of the $\varepsilon$ -Wasserstein gradient flow (1.11) as $\varepsilon \to 0$ , to a limit also characterised as a gradient flow of an effective total energy $\overline {E}$ with respect to an effective Wasserstein distance $\overline {W}$ . The distance $\overline {W}$ induced by the evolutionary convergence is still a Riemannian metric on $\mathcal{P}_2(\mathbb{R}^n)$ . However, we find that it is in general different from the direct Gromov–Hausdorff limit of $W_\varepsilon$ . Even though our main result is proven for continuous state spaces, the approach we used for proving the convergence of multi-scale gradient flows can also be applied to discrete state spaces, in particular, graphs with inhomogeneous structure.

The main approach we use is to first recast the $\varepsilon$ -Wasserstein gradient flow (1.11) as a generalised gradient flow in the following form of an energy dissipation inequality (EDI)

(1.13) \begin{equation} E_\varepsilon (\rho _t^\varepsilon ) + \int _0^t \left [ \psi _\varepsilon (\rho ^\varepsilon _\tau , \partial _\tau \rho ^\varepsilon _\tau ) + \psi ^*_\varepsilon \left (\rho _\tau ^\varepsilon , -\frac {\delta E_\varepsilon }{\delta \rho }(\rho _\tau ^\varepsilon )\right ) \right ] \,\mathrm{d} \tau \leq E_\varepsilon (\rho _0^\varepsilon ). \end{equation}

This formulation involves dissipation functionals $\psi _\varepsilon$ and $\psi ^*_\varepsilon$ on the tangent and the co-tangent plane of $\mathcal{P}_2(\mathbb{R}^n)$ , respectively. Inequality (1.13) is in fact equivalent to the strong form of gradient flow (1.10) since the functional $\psi _\varepsilon$ and $\psi ^*_\varepsilon$ are convex conjugate of each other; for details, see Section 2.3. Then the limiting behaviour of the dynamics is obtained by considering the limit of the functionals in (1.13).

The framework using the EDI formulation of gradient flows to obtain the evolutionary $\Gamma$ -convergence of gradient flows was first established by Sandier and Serfaty [Reference Sandier and Serfaty49, Reference Serfaty47]. In this setting, the key estimates are the lower bounds of the free energy and the energy dissipations in terms of the metric velocity and the metric slope. Many generalisations of the evolutionary convergence for generalised gradient flow systems are developed by Mielke, Peletier and collaborators; see the concept of energy-dissipation-principle (EDP) convergence of gradient flows in [Reference Arnrich, Mielke, Peletier, Savaré and Veneroni4, Reference Liero, Mielke, Peletier and Renger34], the concept of generalised tilt/contact EDP convergence developed in [Reference Dondl, Frenzel and Mielke16, Reference Mielke, Montefusco and Peletier38], and also the review [Reference Mielke36].

Following the above general framework for evolutionary $\Gamma$ -convergence of gradient flows, we pass the limit in $\varepsilon$ -EDI (1.13) by proving the lower bounds of all three functionals on the left-hand-side of (1.13): the energy functional $E_\varepsilon$ , the time integrals of dissipation functionals $\psi _\varepsilon$ and $\psi ^*_\varepsilon$ . The lower bounds of the latter two, denoted as $\psi$ and $\psi ^*$ , are still functionals in bilinear form and are convex conjugate of each other and thus determines the limiting Wasserstein gradient flow with an effective Wasserstein distance $\overline {W}$ ; see the precise definition of these lower bounds in Theorem4.1. The lower bound for $\psi ^*_\varepsilon$ is obtained by using a Fisher information reformulation in terms of $\displaystyle \sqrt {\frac {\rho ^\varepsilon }{\pi _\varepsilon }}$ [Reference Ambrosio, Gigli and Savaré2, Reference Arnrich, Mielke, Peletier, Savaré and Veneroni4] and a by now classical $\Gamma$ -convergence technique for an associated Dirichlet energy. On the other hand, the lower bound for $\psi _\varepsilon$ is obtained by a relaxation via the Legendre transformation and an upper bound estimate for $\psi ^*_\varepsilon$ . This requires one to overcome some regularity issues brought by the oscillations in the energy functional $E_\varepsilon$ and the solution curve $\rho ^\varepsilon$ . This is achieved via a symmetric reformulation of the Fokker–Planck equation in terms of the variable $\displaystyle f^\varepsilon \,:\!=\,\frac {\rho ^\varepsilon }{\pi _\varepsilon }.$

We briefly mention some related references on Wasserstein gradient flow with multi-scale behaviours. Modelling of Fokker–Planck equation as a gradient flow in Wasserstein space was first noted by Jordan–Kinderlehrer–Otto [Reference Jordan, Kinderlehrer and Otto31]. They also show the convergence of a variational backward Euler scheme. There are many other evolutionary problems that can be formulated using multi-scale Wasserstein gradient flows; see for instance the porous medium equation [Reference Otto40] and more general aggregation-diffusion equations reviewed in ref. [Reference Carrillo, Craig and Yao14]. In [Reference Arnrich, Mielke, Peletier, Savaré and Veneroni4], they use the evolutionary convergence of Wasserstein gradient flow to analyse the mean field equation in a zero noise limit for a reversible drift-diffusion process. There are also extensions for the zero noise limit from diffusion processes to chemical reactions described by time-changed Poisson processes on countable states; see [Reference Maas and Mielke37] for the reversible case using a discrete Wasserstein gradient-flow approach and [Reference Gao and Liu24] for the irreversible case using a nonlinear semigroup approach for Hamilton–Jacobi equations. Homogenisation of action functionals on the space of probability measures has also been studied in [Reference Gangbo and Tudorascu27]. In addition, convergence of Wasserstein gradient flows has been applied to related questions, which explore the mean-field limit and large deviation principle of weakly interacting particles; cf. [Reference Dupuis and Spiliopoulos19, Reference Budhiraja, Dupuis and Fischer9] and some recent developments in refs. [Reference Carrillo, Delgadino and Pavliotis15, Reference Delgadino, Gvalani and Pavliotis17]. Furthermore, a similar convergence approach has also been used for generalised gradient flows and optimal transport on graphs and their diffusive limits. In various discrete settings, we refer to [Reference Gigli and Maas26] for Gromov–Hausdorff convergence of discrete Wasserstein metrics, [Reference Forkert, Maas and Portinale20] for evolutionary $\Gamma$ -convergence of finite volume scheme for linear Fokker–Planck equation [Reference Gladbach, Kopfer, Maas and Portinale22, Reference Gladbach, Kopfer, Maas and Portinale23], for the homogenisation of Wasserstein distance on periodic graphs, and the recent works [Reference Schlichting and Seis50, Reference Hraivoronska and Tse30, Reference Hraivoronska, Schlichting and Tse28] for diffusive limits of some generalised gradient flows on graph.

The remainder of this paper is outlined as follows. In Section 2, we introduce the inhomogeneous Fokker–Planck and the $\varepsilon$ -Wasserstein gradient flow in EDI form and describe our assumptions and main results. In Section 3, we obtain some uniform regularity estimates and convergence results for the $\varepsilon$ -Wasserstein gradient flow. In Section 4, we pass the limit in the EDI form of the $\varepsilon$ -Wasserstein gradient flow by proving lower bounds for the free energy and two dissipation functionals; see Theorem4.1. In Section 5, we study the limiting gradient flow with respect to the induced limiting Wasserstein metric and compare it with the usual Gromov–Hausdorff convergence of $W_\varepsilon .$

2. $\varepsilon$ -system: inhomogeneous Fokker–Planck and generalised gradient flow

In this section, we introduce a spatially inhomogeneous Fokker–Planck equation, which, with fixed $\varepsilon \gt 0$ , can be recast as a generalised gradient flow in $\varepsilon$ -Wasserstein space in terms of a total energy given by a relative entropy. This Fokker–Planck equation is motivated by a drift-diffusion process with inhomogeneous noise and drift that satisfy the fluctuation–dissipation relation. In Section 2.3, we choose a pair of quadratic dissipation functionals $(\psi _\varepsilon , \psi ^*_\varepsilon )$ which are convex conjugate to each other to recast the $\varepsilon$ -Fokker–Planck equation as a generalised gradient flow in an EDI form. Then in Section 2.4, we state and explain our main results on the convergence of the gradient-flow structure as $\varepsilon \to 0$ and the resulting homogenised gradient flow of an effective free energy $\overline {E}$ with respect to an effective Wasserstein metric $\overline {W}$ .

From now on, to avoid boundary effects, we work on periodic domain, denoted as $\Omega \,:\!=\,\mathbb{T}^n.$ Given any smooth potential function $U_\varepsilon \,:\, \Omega \longrightarrow \mathbb{R}$ , consider the following (free) energy functional on $\mathcal{P}(\Omega )$

(2.1) \begin{equation} E_\varepsilon (\rho ) = \int _\Omega U_\varepsilon (x) \rho (x) \,\mathrm{d} x + \int _\Omega \rho (x) \log \rho (x) \,\mathrm{d} x. \end{equation}

Let

(2.2) \begin{equation} \pi _\varepsilon (x) = e^{-U_\varepsilon (x)}. \end{equation}

Then (2.1) can be written in the form (1.12). The first variation $\displaystyle \frac {\delta E_\varepsilon }{\delta \rho }$ of $E_\varepsilon$ is then given by

(2.3) \begin{equation} \frac {\delta E_\varepsilon }{\delta \rho }(\rho ) = \log \rho + 1 + U_\varepsilon = \log \frac {\rho }{\pi _\varepsilon } +1. \end{equation}

With a positive definite matrix $B_\varepsilon$ , we consider the following inhomogeneous Fokker–Planck equation

(2.4) \begin{equation} \partial _t \rho ^\varepsilon _t = \nabla \cdot \left ( \rho ^\varepsilon _t B_\varepsilon ^{-1} \nabla \frac {\delta E_\varepsilon }{\delta \rho }(\rho ^\varepsilon _t) \right ) = \nabla \cdot \left ({B_\varepsilon ^{-1}} \nabla \rho _t^\varepsilon + \rho _t^\varepsilon {B_\varepsilon ^{-1}} \nabla U_\varepsilon \right ). \end{equation}

The above equation can be interpreted in two ways. One is to regard it as the Kolmogorov forward equation of a drift-diffusion process with a multiplicative noise, while another as a gradient flow in a Wasserstein space $(\mathcal{P}(\Omega ), W_\varepsilon )$ with the cost function defined in (1.3). We describe both of these in the following.

2.1. $\varepsilon$ -Fokker–Planck equation (2.4) as a Kolmogorov equation

Consider a drift-diffusion process $(X_t)_{t\geq 0}$ , described by the following stochastic differential equation

(2.5) \begin{equation} \,\mathrm{d} X_t = b(X_t) \,\mathrm{d} t + \sigma (X_t) * \,\mathrm{d} B_t, \end{equation}

where $B_t$ is a one-dimensional Brownian motion, and

(2.6) \begin{equation} b(x) = - {B_\varepsilon ^{-1}}(x) \nabla U_\varepsilon (x), \quad \text{and}\quad \sigma (x) = \sqrt {2 {B_\varepsilon ^{-1}}(x)}. \end{equation}

Here the multiplicative noise $\sigma (X_t) * \,\mathrm{d} B_t$ is in the backward Ito differential sense, which is equivalent to the forward Ito differential by adding an additional drift term

\begin{equation*} \sigma (X_t)* \,\mathrm{d} B_t = \frac 12\nabla \cdot (\sigma \sigma ^T)(X_t) \,\mathrm{d} t + \sigma (X_t) \,\mathrm{d} B_t. \end{equation*}

By Ito’s formula, the generator of the process $(X_t)_{t\geq 0}$ is derived as follows. For any test function $\varphi \in C^2_b(\mathbb{R}^n)$ and initial condition $X_0=x$ , we compute

(2.7) \begin{equation} \begin{aligned} \lim _{t\to 0^+} \frac {{\mathbb{E}}^x[\varphi (X_t)]-\varphi (x)}{t} = &\lim _{t\to 0^+} {\mathbb{E}}^x \frac {1}{t}\int _0^t \big [ \nabla \varphi (X_s) \cdot b(X_s) \\ &+ \frac 12 \nabla ^2 \varphi (X_s) \,:\, (\sigma \sigma ^T)(X_s) + \frac 12 (\nabla \cdot (\sigma \sigma ^T)(X_s)) \cdot \nabla \varphi (X_s) \big ] \,\mathrm{d} s\\ =& \nabla \varphi (x) \cdot b(x) + \frac 12 \nabla \cdot (\sigma \sigma ^T \nabla \varphi (x))\,=\!:\, \mathcal{L}\varphi . \end{aligned} \end{equation}

Thus the corresponding Fokker–Planck equation to (2.5) is given by

(2.8) \begin{eqnarray} \partial _t \rho _t^\varepsilon &=& \mathcal{L}^*\rho ^\varepsilon _t \nonumber \\ &\,:\!=\,& \frac 12 \nabla \cdot \left (\sigma \sigma ^T \nabla \rho _t^\varepsilon \right ) -\nabla \cdot \left (\rho _t^\varepsilon b\right )\nonumber \\ & = & \nabla \cdot \left ({B_\varepsilon ^{-1}}(x) \nabla \rho _t^\varepsilon (x)\right ) + \nabla \cdot (\rho _t^\varepsilon (x) {B_\varepsilon ^{-1}}(x) \nabla U_\varepsilon (x) ), \end{eqnarray}

which is exactly (2.4). Note that the $\pi _\varepsilon$ defined in (2.2), which is in the form of a Gibbs measure, is in fact the unique stationary distribution of (2.8), $\mathcal{L}^*\pi _\varepsilon = 0$ .

We remark that in the above drift-diffusion process, we used the Ito backward differential to ensure that our process $(X_t)_{t\geq 0}$ with a multiplicative noise is reversible so that one can have a gradient flow structure for the corresponding Fokker–Planck equation. More precisely, we have that the diffusion process $(X_t)_{t\geq 0}$ (2.5) starting from $X_0 \sim \pi _\varepsilon$ is reversible in the sense that the time reversed process has the same distribution, i.e.

(2.9) \begin{equation} {\mathbb{E}}(\varphi _1(X_t)\varphi _2(X_0)|X_0\sim \pi _\varepsilon ) = {\mathbb{E}}(\varphi _1(X_0)\varphi _2(X_t)|X_0\sim \pi _\varepsilon ), \quad \forall \varphi _1,\varphi _2\in C_0^\infty (\mathbb{R}^n),\, \forall t\gt 0. \end{equation}

This condition is equivalent to the symmetry of the generator $\mathcal{L}$ in $L^2(\pi _\varepsilon )$ ; cf. [Reference Gao, Liu and Liu25].

2.2. $\varepsilon$ -Fokker-Planck equation (2.4) as a gradient flow in $(\mathcal{P}(\Omega ), W_\varepsilon )$

Following Otto’s formal Riemannian calculus on Wasserstein space [Reference Otto40], we now interpret the Fokker–Planck equation as a (negative) gradient flow in $(\mathcal{P}(\Omega ), W_\varepsilon )$ . For this purpose, we need to compute the Wasserstein gradient $\nabla ^{W_\varepsilon } E_\varepsilon$ of $E_\varepsilon$ in $(\mathcal{P}(\Omega ), W_\varepsilon )$ .

Given any absolutely continuous curve $\tilde {\rho }_t$ in $(\mathcal{P}(\Omega ), W_\varepsilon )$ given by $\tilde {\rho }_t \,:\!=\, (\chi _t)_\# \rho$ with $\tilde {\rho }_{t=0}=\rho$ , where $\chi _t$ is the flow map induced by a smooth velocity field $v_t$ . Then $\tilde {\rho }_t$ satisfies the continuity equation

\begin{equation*}\partial _t \tilde {\rho }_t + \nabla \cdot \left (\tilde {\rho }_t v_t \right )=0.\end{equation*}

With this, we compute the first variation of $E_\varepsilon$

(2.10) \begin{equation} \frac {\,\mathrm{d}}{\,\mathrm{d} t}\Big |_{t=0} E_\varepsilon (\tilde {\rho }_t) = \int _\Omega \frac {\delta E_\varepsilon }{\delta \rho } \partial _t \tilde {\rho }_t\big |_{t=0} \,\mathrm{d} x = \int _\Omega \frac {\delta E_\varepsilon }{\delta \rho } \left ( -\nabla \cdot (\tilde {\rho }_tv_t)\big |_{t=0} \right )\,\mathrm{d} x = \int _\Omega \left \langle \nabla \frac {\delta E_\varepsilon }{\delta \rho }, v_0 \right \rangle \rho \,\mathrm{d} x. \end{equation}

We will use the above to identify the gradient $\nabla ^{W_\varepsilon } E_\varepsilon$ of $E_\varepsilon$ with respect to a Riemannian metric $\langle \cdot , \cdot \rangle _{T_{\mathcal{P}}, T_{\mathcal{P}}}$ on the tangent plane $T_{\mathcal{P}}$ of $(\mathcal{P}(\Omega ), W_\varepsilon )$ .

Based on (1.5), we have that for any $\rho \in \mathcal{P}(\Omega )$ and $s_1, s_2\in T_{\mathcal{P}}$ at $\rho$ , the metric is given by

(2.11) \begin{equation} \big \langle s_1, s_2\big \rangle _{T_{\mathcal{P}}, T_{\mathcal{P}}}\,:\!=\, \int \rho (x)\left \langle B^{-1}_\varepsilon (x) \nabla \varphi _1(x), \nabla \varphi _2(x)\right \rangle \,dx, \quad \text{ where } s_i = -\nabla \cdot (\rho B^{-1}_\varepsilon \nabla \varphi _i),\,\,\,i=1,2. \end{equation}

A word is in place here to explain going from $v_t$ in (1.5) to $\nabla \varphi$ above. At a fixed $t$ and $\rho _t$ , upon minimising $\displaystyle \int _\Omega \rho _t\langle B_\varepsilon (x) v_t, v_t\rangle \,\,\mathrm{d} x$ over $v_t$ subject to $\displaystyle - \nabla \cdot (\rho _t v_t) = s \left (\,:\!=\,\frac {\partial \rho _t}{\partial t}\right )$ , we have that

\begin{equation*} \int _\Omega \rho _t\langle B_\varepsilon (x) v_t, \xi \rangle \,\,\mathrm{d} x = 0 \,\,\,\text{for all smooth vector field $\xi $ satisfying $- \nabla \cdot (\rho _t \xi ) = 0$}. \end{equation*}

Hence, $B_\varepsilon v_t$ is orthogonal to all divergence free vector field of the form $\rho _t\xi$ . We then conclude that $B_\varepsilon v_t$ must be the gradient of some (potential) function $\varphi$ . Thus, $v_t$ can be represented as $v_t = B^{-1}_\varepsilon \nabla \varphi$ .

With the above, we express the first variation of $E_\varepsilon$ using $\nabla ^{W_\varepsilon }E_\varepsilon$ as follows:

(2.12) \begin{equation} \frac {\,\mathrm{d}}{\,\mathrm{d} t}\Big |_{t=0} E_\varepsilon (\tilde {\rho }_t) = \Big \langle \nabla ^{W_\varepsilon } E_\varepsilon , \partial _t \tilde {\rho }_t\big |_{t=0} \Big \rangle _{T_{\mathcal{P}}, T_{\mathcal{P}}} = \int _\Omega \rho \langle B^{-1}_\varepsilon \nabla \tilde {\varphi } , \nabla \varphi _0 \rangle \,\mathrm{d} x, \end{equation}

where

(2.13) \begin{equation} \partial _t\tilde {\rho }_t\big |_{t=0} = -\nabla \cdot \left ( \rho B^{-1}_\varepsilon \nabla \varphi _0 \right ) \quad \text{and}\quad \nabla ^{W_\varepsilon } E_\varepsilon (\rho )= -\nabla \cdot \left (\rho B^{-1}_\varepsilon \nabla \tilde {\varphi } \right ). \end{equation}

Comparing (2.10) with (2.12), we have

\begin{equation*} \int _\Omega \left \langle \nabla \frac {\delta E_\varepsilon }{\delta \rho }, v_0 \right \rangle \rho \,\mathrm{d} x = \int _\Omega \rho \big \langle B^{-1}_\varepsilon \nabla \tilde {\varphi } , \nabla \varphi _0 \big \rangle \,\mathrm{d} x \end{equation*}

which is set to hold for any $v_0=B^{-1}_\varepsilon \nabla \varphi _0$ . Hence, $\displaystyle \nabla \tilde {\varphi } = \nabla \frac {\delta E_\varepsilon }{\delta \rho }$ . Thus, the second part of (2.13) leads to the following identification of $\nabla ^{W_\varepsilon } E (\rho )$ ,

(2.14) \begin{equation} \nabla ^{W_\varepsilon } E_\varepsilon (\rho ) \,:\!=\, -\nabla \cdot \left (\rho B_\varepsilon ^{-1} \nabla \frac {\delta E_\varepsilon }{\delta \rho }\right )=-\nabla \cdot \left (\rho B_\varepsilon ^{-1} \nabla \log \frac {\rho }{\pi _\varepsilon }\right ). \end{equation}

Hence, the inhomogeneous Fokker–Planck equation (2.4) indeed can be written as a gradient flow of $E_\varepsilon$ with respect to the $\varepsilon$ -Wasserstein metric $W_\varepsilon$ , i.e.,

(2.15) \begin{equation} \partial _t \rho ^\varepsilon _t = -\nabla ^{W_\varepsilon } E_\varepsilon (\rho ^\varepsilon _t) = \nabla \cdot \left (\rho ^\varepsilon B_\varepsilon ^{-1} \nabla \log \frac {\rho ^\varepsilon }{\pi _\varepsilon }\right ). \end{equation}

We remark that in general an equation may have many different gradient flow structures with respect to the same free energy $E_\varepsilon$ , cf. [Reference Mielke, Montefusco and Peletier38]. However, in this paper, we restrict ourselves within the framework of Wasserstein gradient flows as it fits naturally to the evolution in probability space.

2.3. $\varepsilon$ -generalised gradient flow in energy-dissipation inequality (EDI) form

As mentioned previously, in order to study the limiting gradient flow structure as the small parameter $\varepsilon \to 0$ in our $\varepsilon$ -gradient flow (2.15), we will recast it in an energy-dissipation inequality (EDI) form (1.13) that is shown to be equivalent to the original $\varepsilon$ -gradient flow system.

Denote the $\varepsilon$ -dissipation on the tangent plane $T_{\mathcal{P}}$ as a functional $\psi _\varepsilon \,:\,\mathcal{P} \times T_{\mathcal{P}} \to \mathbb{R}$ defined by

(2.16) \begin{equation} \psi _\varepsilon (\rho , s)\,:\!=\, \frac 12 \int _\Omega \langle \nabla u, B_\varepsilon ^{-1} \nabla u \rangle \rho \,\mathrm{d} x, \quad \text{ with }\, s=-\nabla \cdot \left (\rho B_\varepsilon ^{-1} \nabla u\right ), \end{equation}

and the $\varepsilon$ -dissipation on the cotangent plane $T^*_{\mathcal{P}}$ as a functional $\psi ^*_\varepsilon \,:\, \mathcal{P} \times T^*_{\mathcal{P}} \to \mathbb{R}$ defined by

(2.17) \begin{equation} \psi ^*_\varepsilon (\rho , \xi )\,:\!=\, \frac 12 \int _\Omega \langle \nabla \xi , B_\varepsilon ^{-1} \nabla \xi \rangle \rho \,\mathrm{d} x. \end{equation}

It is easy to check that

(2.18) \begin{equation} \begin{aligned} \psi _\varepsilon (\rho ,s) =& \sup _{\xi \in T^*_\rho } \Big \{ \langle \xi , s \rangle _{T^*_\rho , T_\rho } - \psi ^*_\varepsilon (\rho , \xi ) \Big \}\\ =& \langle \xi ^*, s \rangle _{T^*_\rho , T_\rho } - \psi ^*_\varepsilon (\rho , \xi ^*) \quad \text{ with } s= -\nabla \cdot \left (\rho B_\varepsilon ^{-1}\nabla \xi ^* \right )\\ =& \frac 12 \int _\Omega \langle \nabla \xi ^*, B_\varepsilon ^{-1} \nabla \xi ^* \rangle \rho \,\mathrm{d} x. \end{aligned} \end{equation}

Applying the Fenchel–Young inequality to the convex functionals $\psi _\varepsilon$ and $\psi ^*_\varepsilon$ , we have

(2.19) \begin{equation} \langle \xi , s\rangle \leq \psi _\varepsilon ^*(\rho , \xi ) + \psi _\varepsilon (\rho , s), \quad \text{for all}\quad \xi \in T_\rho ^*,\,\,\,\text{and}\,\,\,s\in T_\rho , \end{equation}

with equality holds if and only if $\xi \in \partial _s\psi _\varepsilon (\rho ,s)$ and $s\in \partial _\xi \psi ^*_\varepsilon (\rho ,\xi )$ . Here $\partial _s\psi _\varepsilon (\rho , s)$ and $\partial _\xi \psi _\varepsilon ^*(\rho , \xi )$ refer to the subdifferentials of $\psi _\varepsilon$ and $\psi _\varepsilon ^*$ on $T_\rho$ and $T_\rho ^*$ , respectively, at a fixed $\rho$ . We also note the following.

  1. (1) For all $\eta \in T^*_{\mathcal{P}}$ , we have

    \begin{eqnarray*} &&\Big \langle \partial _\xi \psi ^*_\varepsilon (\rho , \xi ),\eta \Big \rangle = \lim _{\tau \to 0}\left . \frac {d}{d\tau }\psi ^*_\varepsilon (\rho , \xi +\tau \eta )\right |_{\tau =0}\\ &=& \int \langle \nabla \xi , B_\varepsilon ^{-1}\nabla \eta \rangle \rho \,\,\mathrm{d} x =\int -\eta \nabla \cdot (\rho B_\varepsilon ^{-1}\nabla \xi )\,\,\mathrm{d} x \end{eqnarray*}
    so that $\partial _\xi \psi ^*_\varepsilon (\rho ,\xi )=-\nabla \cdot (\rho B_\varepsilon ^{-1}\nabla \xi )$ . Hence, $s\in \partial _\xi \psi ^*_\varepsilon (\rho , \xi )$ means $s=-\nabla \cdot (\rho B_\varepsilon ^{-1}\nabla \xi )$ .
  2. (2) For all $\sigma \in T_{\mathcal{P}}$ , we have

    \begin{eqnarray*} &&\Big \langle \partial _s\psi _\varepsilon (\rho , s),\sigma \Big \rangle = \lim _{\tau \to 0}\left . \frac {d}{d\tau }\psi _\varepsilon (\rho , s+\tau \sigma )\right |_{\tau =0}\\ &=& \int \langle \nabla u, B_\varepsilon ^{-1}\nabla \omega \rangle \rho \,\,\mathrm{d} x = \int -u\nabla \cdot (\rho B_\varepsilon ^{-1}\nabla \omega )\,\,\mathrm{d} x\\ && \Big (\text{where}\,\,\, s=-\nabla \cdot (\rho B_\varepsilon ^{-1}\nabla u),\,\,\, \sigma =-\nabla \cdot (\rho B_\varepsilon ^{-1}\nabla \omega ) \Big ) \\ &=& \int u\sigma \,\,\mathrm{d} x \end{eqnarray*}
    so that $\partial _s\psi _\varepsilon (\rho ,s)=u$ . Hence, $\xi \in \partial _s\psi _\varepsilon (\rho , s)$ means $\xi$ satisfies $s=-\nabla \cdot (\rho B_\varepsilon ^{-1}\nabla \xi )$ .

With the above, we now reformulate (2.15) in the form of an EDI. To this end, we compute,

(2.20) \begin{equation} \frac {d}{dt}E_\varepsilon (\rho _t^\varepsilon ) = \left \langle \frac {\delta E_\varepsilon }{\delta \rho },\partial _t\rho _t^\varepsilon \right \rangle , \,\,\,\text{or}\,\,\, \frac {d}{dt}E_\varepsilon (\rho _t^\varepsilon ) + \left \langle -\frac {\delta E_\varepsilon }{\delta \rho },\partial _t\rho _t^\varepsilon \right \rangle = 0. \end{equation}

By (2.19), $\partial _t\rho _t^\varepsilon = -\nabla ^{W_\varepsilon }E_\varepsilon (\rho _t^\varepsilon )=\nabla \cdot \left (\rho B_\varepsilon ^{-1} \nabla \frac {\delta E_\varepsilon }{\delta \rho }\right )$ if and only if

\begin{equation*} \psi _\varepsilon (\rho ^\varepsilon _\tau , \partial _\tau \rho ^\varepsilon _\tau ) + \psi ^*_\varepsilon \left (\rho _\tau ^\varepsilon , -\frac {\delta E_\varepsilon }{\delta \rho }(\rho _\tau ^\varepsilon )\right ) \leq \left \langle -\frac {\delta E_\varepsilon }{\delta \rho },\partial _t\rho _t^\varepsilon \right \rangle . \end{equation*}

Hence, upon integrating (2.20), our gradient flow (2.15) is equivalent to the following:

(2.21) \begin{equation} E_\varepsilon (\rho _t^\varepsilon ) + \int _0^t \left [ \psi _\varepsilon (\rho ^\varepsilon _\tau , \partial _\tau \rho ^\varepsilon _\tau ) + \psi ^*_\varepsilon \left (\rho _\tau ^\varepsilon , -\frac {\delta E_\varepsilon }{\delta \rho }(\rho _\tau ^\varepsilon )\right ) \right ] \,\mathrm{d} \tau \leq E_\varepsilon (\rho _0^\varepsilon ). \end{equation}

We note that the very first step, (2.20) is a crucial chain rule of differentiation. This is justified in our paper due to the regularity property of our energy functional and the solution. Precise statements will be given in Section 3. In general (for example, discrete or general metric space) settings, the absolute continuity of $E_\varepsilon (\rho _t^\varepsilon )$ (in time) and the validity of the chain rule (2.20) need to be proved; cf., [Reference Hoeksema and Tse29, Reference Hraivoronska, Schlichting and Tse28].

Before leaving this section, for convenience, we write down the following explicit expressions.

(2.22) \begin{eqnarray} \psi ^*_\varepsilon \left (\rho _\tau ^\varepsilon , -\frac {\delta E_\varepsilon }{\delta \rho }(\rho _\tau ^\varepsilon )\right ) &=& \int _\Omega \left \langle \nabla \left (\frac {\delta E_\varepsilon }{\delta \rho }\right ), B_\varepsilon ^{-1} \nabla \left (\frac {\delta E_\varepsilon }{\delta \rho }\right ) \right \rangle \rho _t^\varepsilon \,\,\mathrm{d} x\nonumber \\ &=& \frac 12 \int _\Omega \left \langle \nabla \log \frac {\rho ^\varepsilon _\tau }{\pi _\varepsilon }, {B_\varepsilon ^{-1}} \nabla \log \frac {\rho ^\varepsilon _\tau }{\pi _\varepsilon } \right \rangle \rho ^\varepsilon _\tau \,\mathrm{d} x, \end{eqnarray}

and

(2.23) \begin{equation} \psi _\varepsilon \left (\rho ^\varepsilon _\tau , \partial _\tau \rho ^\varepsilon _\tau \right ) = \frac 12 \int _\Omega \langle \nabla u, B_\varepsilon ^{-1} \nabla u \rangle \rho _\tau ^\varepsilon \,\mathrm{d} x, \quad \text{ with }\, -\nabla \cdot \left (\rho ^\varepsilon _\tau B_\varepsilon ^{-1} \nabla u\right ) = \partial _\tau \rho _\tau ^\varepsilon . \end{equation}

2.4. Main results

Briefly stated, our main result is that the gradient-flow structure is preserved in the limit, i.e., (2.15) converges to a limiting gradient flow. More precisely, the solution $\rho ^\varepsilon _t$ of (2.15) converges (weakly) to $\rho _t$ that solves a gradient flow with respect to a limiting Wasserstein distance $\overline {W}$ ,

(2.24) \begin{equation} \partial _t \rho _t = -\nabla ^{\overline {W}}\overline {E}(\rho _t) = \nabla \cdot \left (\rho _t \overline {B}^{-1} \nabla \log \frac {\rho _t}{\overline {\pi }}\right ). \end{equation}

In the above, the limiting energy is given as

(2.25) \begin{equation} \overline {E}(\rho ) = \text{KL}(\rho ||\overline {\pi }) = \int _\Omega \rho \log \frac {\rho }{\overline {\pi }}\,\,\mathrm{d} x, \end{equation}

where the $\overline {\pi }$ is simply the spatial average of $\pi _\varepsilon$ with respect to some fast variable – see (2.33) below. The matrix $\overline {B}$ is obtained by taking appropriate average of $B_\varepsilon$ over the fast variable weighted by the solution of a cell problem (A.9) or equivalently, by considering the $\Gamma$ -limit of a variational functional (Theorem4.2). The Wasserstein distance $\overline {W}$ is related to $\overline {B}$ just as the way $W_\varepsilon$ is related to $B_\varepsilon$ – see Section 5.1.

Similar to (2.21), (2.16), and (2.17), equation (2.24) is formulated as an EDI, i.e.,

(2.26) \begin{equation} \overline {E}(\rho _t) + \int _0^t \left [ \psi (\rho _\tau , \partial _\tau \rho _\tau ) + \psi ^*\left (\rho _\tau , -\frac {\delta \overline {E}}{\delta \rho }(\rho _\tau )\right ) \right ] \,\mathrm{d} \tau \leq \overline {E}(\rho _0), \end{equation}

where $\psi ^{*}\,:\, \mathcal{P} \times T^*_{\mathcal{P}} \to \mathbb{R}$ is the limiting dissipation functional on the cotangent plane $T^*_{\mathcal{P}}$ given by

(2.27) \begin{equation} \psi ^*(\rho , \xi )\,:\!=\, \frac 12 \int _\Omega \langle \nabla \xi , \bar {B}^{-1} \nabla \xi \rangle \rho \,\mathrm{d} x, \end{equation}

and $\psi \,:\, \mathcal{P} \times T_{\mathcal{P}} \to \mathbb{R}$ is the limiting dissipation functional on the tangent plane $T_{\mathcal{P}}$ given by

(2.28) \begin{equation} \psi (\rho , s)\,:\!=\, \frac 12 \int _\Omega \langle \nabla u, \bar {B}^{-1} \nabla u \rangle \rho \,\mathrm{d} x, \quad \text{ with }\, s=-\nabla \cdot \left (\rho \bar {B}^{-1} \nabla u\right ). \end{equation}

The precise statement of the convergence of (2.21) to (2.26) will be given in Section 4, Theorem4.1.

Curiously, under the current setting, $\overline {W}$ is not the Gromov–Hausdorff limit $W_{\text{GH}}$ of $W_\varepsilon$ which is the common mode of convergence for metric spaces, cf. [Reference Villani53, Reference Gigli and Maas26, Reference Gladbach, Kopfer and Maas21]. In Section 5.2, We have constructed examples such that $\overline {W}$ is strictly bigger than $W_{\text{GH}}$ . We believe that this statement is true for general heterogeneous media.

Before proceeding further, we introduce the following notations and conventions. As we will often consider functions that oscillate on a small length scale, $0 \lt \varepsilon \ll 1$ , it is convenient to introduce the following fast variable

(2.29) \begin{equation} y=\frac {x}{\varepsilon }. \end{equation}

The domain for $y$ is taken to be the $n$ -dimensional torus $\mathbb T^n$ when the oscillatory functions are $1$ -periodic in $y$ . The notation $\overline {A}$ means that it is derived from some averaging of $A$ over the fast variable $y$ . For time-dependent problems, we often deal with functions defined on both space and time variables $x,t$ . For ease of notation, given a function $f=f(x,t)$ , we often use $f_t$ to denote $f_t(\cdot )$ , i.e., the slice of $f$ at a fixed time $t$ . We will use $\rightharpoonup$ and $\longrightarrow$ to denote weak and strong convergence in some function spaces. Two common spaces used are the space of probability measures $\mathcal{P}(\Omega )$ and $L^p(\Omega )$ spaces. The value of $p$ will depend on contexts. For the convergence of a sequence of functions $f_\varepsilon$ as $\varepsilon \to 0$ , we will use the same notation even if the convergence only holds upon extraction of subsequence. (The convergence can be established for the whole sequence if the limiting equation has unique solution which is the case for our linear Fokker–Planck equation (2.24).)

Next we state the main assumptions for our results. Some of these are made only for simplicity. They can be relaxed if we choose to use more technical tools.

  1. (i) Recall that the domain $\Omega$ is taken to be an $n$ -dimensional torus ${\mathbb T}^n$ . This is not to be confused with the ${\mathbb T}^n$ for the fast variable $y$ . We note that the boundedness of the domain can be removed, allowing one to work in $P_2(\mathbb{R}^n)$ if a confinement potential $U$ is incorporated in the dynamics. Other boundary conditions, such as Dirichlet or no-flux conditions, may also be considered.

  2. (ii) For $B_\varepsilon$ , we consider

    (2.30) \begin{equation} B_\varepsilon (x) = B\left (\frac {x}{\varepsilon }\right ),\,\,\,\text{or}\,\,\, B_\varepsilon (x) = B(y), \end{equation}
    where $B(\cdot )$ is $1$ -periodic. Furthermore, $B(\cdot )$ is bounded and uniformly positive definite, i.e., there are $C_1, C_2 \gt 0$ such that for all $y\in \mathbb T^n$ , it holds that
    (2.31) \begin{equation} C_1 I \leq B(y) \leq C_2 I. \end{equation}
    This form of $B_\varepsilon$ can certainly be generalised to allow for dependence on the slow variable: $B_\varepsilon (x)=B(x,\frac {x}\varepsilon )$ . For simplicity, we assume further that $B$ is smooth in $y$ .
  3. (iii) For $\pi _\varepsilon$ , we consider the following form of separation of length scales:

    (2.32) \begin{equation} \pi _\varepsilon (x) = \pi \left (x,\frac {x}{\varepsilon }\right ). \end{equation}
    In the above, $\pi$ is $1$ -periodic in the fast variable $\displaystyle y=\frac {x}{\varepsilon }$ . We further assume that $\pi$ is smooth in both $x$ and $y$ and is bounded away from zero and from above uniformly in $\varepsilon \gt 0$ . The following notation referring to an averaged version of $\pi$ will be used in this paper:
    (2.33) \begin{equation} \overline {\pi }(x) = \int \pi (x,y)\,\mathrm{d} y. \end{equation}
    As concrete examples, $\pi _\varepsilon$ can be taken as
    (2.34) \begin{equation} \pi _\varepsilon ^{\text{I}}(x) = \pi _0(x) + \pi _1\left (x,\frac {x}{\varepsilon }\right ), \quad \text{or}\quad \pi _\varepsilon ^{\text{II}}(x) = \pi _0(x) + \varepsilon \pi _1\left (x,\frac {x}{\varepsilon }\right ). \end{equation}
    Then $\pi _\varepsilon ^{\text{I}}$ and $\pi _\varepsilon ^{\text{II}}$ converge as follow:
    (2.35) \begin{equation} \pi _\varepsilon ^{\text{I}}(x) \rightharpoonup \overline {\pi }^{\text{I}}(x) \,:\!=\, \pi _0(x) + \int _{\mathbb T^n}\pi _1(x,y)\,\,\mathrm{d} y ,\,\,\,\text{and}\,\,\, \pi _\varepsilon ^{\text{II}}(x) \longrightarrow \overline {\pi }^{\text{II}}(x) \,:\!=\, \pi _0(x). \end{equation}
    We thus call $\pi _\varepsilon ^{\text{I}}$ the oscillatory case while $\pi _\varepsilon ^{\text{II}}$ the uniform case. (We refer to the work [Reference Dupuis and Spiliopoulos19] for large deviations for multiscale diffusion with $\pi _\varepsilon ^{\text{II}}$ .)
  4. (iv) The initial data $\rho ^\varepsilon _0$ is bounded away from zero and from above uniformly in $\varepsilon \gt 0$ . It is assumed to be well-prepared in the following sense,

    (2.36) \begin{equation} \text{there is a $\rho _0$ such that as $\varepsilon \to 0$, it holds $\rho ^\varepsilon _0 \rightharpoonup \rho _0$ and $E_\varepsilon (\rho ^\varepsilon _0) \to \overline {E}(\rho _0), \quad \text{ as } \, \varepsilon \to 0,$} \end{equation}
    where $E_\varepsilon$ and $\overline {E}$ are given by (2.1) and (2.25). More precise smoothness requirements on $\rho _0$ will be listed in Lemmas3.1, 3.3, and Corollaries3.2 and 3.4.

We have the following remarks about our results.

Remark 2.1.

  1. (1) As $\pi _\varepsilon ^{\text{II}}$ can be treated as a special case of $\pi _\varepsilon ^{\text{I}}$ , or more generally, of $\pi _\varepsilon$ , we will concentrate on the proof for $\pi _\varepsilon$ . Our result is also consistent with the statement obtained by using the asymptotic expansion described in Appendix A . At the end of that section, we also make some remarks about the revised statement for $\pi _\varepsilon ^{\text{II}}$ .

  2. (2) The approach we take resembles the work of Forkert–Maas–Portinale [Reference Forkert, Maas and Portinale20] on the convergence of a finite volume scheme for a Fokker–Planck equation. By and large, the framework of their (numerical) approximation enjoys stronger regularity, while our current problem concentrates on the oscillation of the underlying medium.

3. Some a-priori estimates

In order to study the asymptotic behaviour as $\varepsilon \to 0$ , we first establish some a-priori estimates for our $\varepsilon$ -gradient flow system (2.4) (or (2.15)). These would then give us the space-time compactness and convergence. These variational estimates for linear parabolic equations are standard but we give a brief proof for completeness.

First, we recast (2.4) as

(3.1) \begin{equation} \partial _t \rho _t^\varepsilon = \nabla \cdot \left (\pi _\varepsilon B_\varepsilon ^{-1} \nabla \frac {\rho _t^\varepsilon }{\pi _\varepsilon }\right ). \end{equation}

Denote $\displaystyle f^\varepsilon _t \,:\!=\, \frac {\rho ^\varepsilon _t}{\pi _\varepsilon }$ . Then $f^\varepsilon _t$ satisfies the following backward equation

(3.2) \begin{equation} \partial _t f^\varepsilon _t = \frac {1}{\pi _\varepsilon } \nabla \cdot \left ( \pi _\varepsilon B_\varepsilon ^{-1} \nabla f^{\varepsilon }_t \right ) \,=\!:\, L_\varepsilon (f^\varepsilon _t). \end{equation}

It is easy to verify that $L_\varepsilon$ is self-adjoint in $L^2(\pi _\varepsilon )$ , i.e.,

(3.3) \begin{equation} \langle L_\varepsilon u, v \rangle _{\pi _\varepsilon } = \langle u, L_\varepsilon v \rangle _{\pi _\varepsilon }, \quad \forall u, v \in L^2(\pi _\varepsilon ), \end{equation}

where $\langle \cdot , \cdot \rangle _{\pi _\varepsilon }$ denotes the $\pi _\varepsilon$ -weighted $L^2$ inner product, $\displaystyle \langle u, v\rangle _{\pi _\varepsilon } \,:\!=\, \int _\Omega u(x)v(x)\pi _\varepsilon (x)\,\mathrm{d} x$ .

We recall here the standing assumptions of uniform positive definiteness of $B_\varepsilon$ and uniform positivity and boundedness of $\pi _\varepsilon$ as stated in (2.30) and (2.32) in Section 2.4. We then have the following uniform estimates for $f^\varepsilon _t$ .

Lemma 3.1. Let $f^\varepsilon _0$ be the initial data for (3.2) . We define,

(3.4) \begin{eqnarray} A_0 & \,:\!=\, & \sup _{\varepsilon \gt 0}\int _\Omega (f_0^\varepsilon )^2\pi _\varepsilon \,\mathrm{d} x, \end{eqnarray}
(3.5) \begin{eqnarray} B_0 & \,:\!=\, & \sup _{\varepsilon \gt 0}\int _\Omega \langle \nabla f_0^\varepsilon , B_\varepsilon ^{-1}\pi _\varepsilon \nabla f_0^\varepsilon \rangle \,\mathrm{d} x.\\[6pt]\nonumber \end{eqnarray}

Let $0 \lt T \lt \infty$ be given. We have the following statements.

  1. (1) If $0 \lt m_0 \lt \inf f_0^\varepsilon \lt M_0 \lt \infty$ on $\Omega$ for some finite positive constants $m_0$ and $M_0$ , then $m_0 \lt \inf f_t^\varepsilon\lt M_0$ for all $t \gt 0$ .

  2. (2) If $A_0 \lt \infty$ , then $f^\varepsilon \in L^\infty ((0,T);L^2(\Omega ))\bigcap L^2((0,T);H^1(\Omega ))$ with the following uniform-in- $\varepsilon$ bound: for all $0 \lt t \lt T$ ,

    (3.6) \begin{equation} \frac 12||f_t^\varepsilon ||_{\pi _\varepsilon }^2 + \int _0^t \int _\Omega \langle \nabla f_s^\varepsilon , B_\varepsilon ^{-1}\pi _\varepsilon \nabla f_s^\varepsilon \rangle \,\mathrm{d} x\,\mathrm{d} s =\frac 12||f_0^\varepsilon ||_{\pi _\varepsilon }^2 \leq A_0. \end{equation}
  3. (3) If $B_0 \lt \infty$ (which by Poincare inequality implies $A_0 \lt \infty$ ), then

    \begin{equation*} f^\varepsilon \in L^\infty ((0,T);\,H^1(\Omega ))\bigcap H^1((0,T);\,L^2(\Omega )) \end{equation*}
    with the following uniform-in- $\varepsilon$ bound: for all $0 \lt t \lt T$ ,
    (3.7) \begin{equation} \frac 12\int _\Omega \langle \nabla f_0^\varepsilon , B_\varepsilon ^{-1}\pi _\varepsilon \nabla f_0^\varepsilon \rangle \,\mathrm{d} x + \int _0^t \int _\Omega (\partial _s f_s^\varepsilon )^2\pi _\varepsilon \,\mathrm{d} x\,\mathrm{d} s =\frac 12\int _\Omega \langle \nabla f_0^\varepsilon , B_\varepsilon ^{-1}\pi _\varepsilon \nabla f_0^\varepsilon \rangle \,\mathrm{d} x \leq \frac {B_0}{2}. \end{equation}
    From ( 3.2 ) and $\displaystyle \int _0^t \int _\Omega (\partial _s f_s^\varepsilon )^2\pi _\varepsilon \,\mathrm{d} x\,\mathrm{d} s\leq \frac {B_0}{2}$ , we also have
    (3.8) \begin{equation} \sup _{\varepsilon \gt 0}\int _0^T\int _\Omega \Big (\nabla \cdot \big (B_\varepsilon ^{-1}\pi _\varepsilon \nabla f_s^\varepsilon \big )\Big )^2 \,\mathrm{d} x \,\mathrm{d} s \lt \infty . \end{equation}

Proof. Note that

\begin{equation*} \partial _t f_t^\varepsilon = B_\varepsilon ^{-1}\,:\,D^2f_t^\varepsilon + \frac {1}{\pi _\varepsilon }\big \langle \nabla (B_\varepsilon ^{-1}\pi _\varepsilon ),\nabla f_t^\varepsilon \big \rangle . \end{equation*}

By the positive definitenss of $B_\varepsilon$ , statement (1) then follows directly from maximum principle.

Next, both (3.6) and (3.7) follows from simple energy identity. For the former, we compute

\begin{eqnarray*} \frac {d}{dt}\frac 12||f_t^\varepsilon ||^2_{\pi _\varepsilon } = \int _\Omega f_t^\varepsilon \partial _t f_t^\varepsilon \pi _\varepsilon \,\mathrm{d} x = -\int _\Omega \langle \nabla f_t^\varepsilon , B_\varepsilon ^{-1}\pi _\varepsilon \nabla f_t^\varepsilon \rangle \,\mathrm{d} x. \end{eqnarray*}

Integration in time from $0$ to $t$ gives (3.6).

For (3.7), we compute

\begin{eqnarray*} &&\frac {d}{dt}\frac 12\int _\Omega \langle \nabla f_t^\varepsilon , B_\varepsilon ^{-1}\pi _\varepsilon \nabla f_t^\varepsilon \rangle \,\mathrm{d} x = \int _\Omega \langle \nabla \partial _tf_t^\varepsilon , B_\varepsilon ^{-1}\pi _\varepsilon \nabla f_t^\varepsilon \rangle \,\mathrm{d} x\\ &=& -\int _\Omega \partial _tf_t^\varepsilon \nabla \cdot \big (B_\varepsilon ^{-1}\pi _\varepsilon \nabla f_t^\varepsilon \big ) \,\mathrm{d} x = -\int _\Omega (\partial _tf_t^\varepsilon )^2\pi _\varepsilon \,\mathrm{d} x. \end{eqnarray*}

Integration in time from $0$ to $t$ again gives the result. Estimate (3.8) follows from definition.

The above and Fubini’s Theorem immediately leads to the following compactness results.

Corollary 3.2. If $B_0 \lt \infty$ , then there is a subsequence $f^\varepsilon$ and an $f\in L^2(0,T;\, L^2(\Omega ))$ such that $f^\varepsilon \longrightarrow f$ in $L^2(0,T;\, L^2(\Omega ))$ , i.e.,

(3.9) \begin{equation} \int _0^T\int _\Omega |f^\varepsilon _t-f_t|^2\,\mathrm{d} x\,\mathrm{d} t\rightarrow 0. \end{equation}

Furthermore, we have

(3.10) \begin{equation} \int _\Omega |f^\varepsilon _t-f_t|^2\,\mathrm{d} x\rightarrow 0 \quad \text{for a.e. $t\in [0,T]$.} \end{equation}

For our application, we will also need some regularity estimates for the time derivative of $f^\varepsilon$ . Define $h_t^\varepsilon \,:\!=\, \partial _t f^\varepsilon _t$ . Then it satisfies the same equation (3.2), i.e.,

(3.11) \begin{equation} \partial _t h^\varepsilon _t = \frac {1}{\pi _\varepsilon } \nabla \cdot \left ( \pi _\varepsilon B_\varepsilon ^{-1} \nabla h^{\varepsilon }_t \right ) \,=\!:\, L_\varepsilon (h^\varepsilon _t). \end{equation}

As a direct application of Lemma3.1 and Corollary3.2, we have the following lemma and corollary.

Lemma 3.3. Let $h^\varepsilon _0 = \partial _t f_t^\varepsilon |_{t=0}$ be the initial data for (3.11) . We define,

(3.12) \begin{eqnarray} C_0 & \,:\!=\, & \sup _{\varepsilon \gt 0}\int _\Omega (h_0^\varepsilon )^2\pi _\varepsilon \,\mathrm{d} x \,\,\left (= \sup _{\varepsilon \gt 0}\int _\Omega (\partial _t f_0^\varepsilon )^2\pi _\varepsilon \,\mathrm{d} x \right ), \end{eqnarray}
(3.13) \begin{eqnarray} D_0 & \,:\!=\, & \sup _{\varepsilon \gt 0}\int _\Omega \langle \nabla h_0^\varepsilon , B_\varepsilon ^{-1}\pi _\varepsilon \nabla h_0^\varepsilon \rangle \,\mathrm{d} x \,\,\left (= \sup _{\varepsilon \gt 0}\int _\Omega \langle \nabla (\partial _t f_0^\varepsilon ), B_\varepsilon ^{-1}\pi _\varepsilon \nabla (\partial _t f_0^\varepsilon )\rangle \,\mathrm{d} x\right ).\\[6pt]\nonumber \end{eqnarray}

Let $0 \lt T \lt \infty$ be given. We have the following statements.

  1. (1) If $C_0 \lt \infty$ , then $h^\varepsilon \in L^\infty ((0,T);\,L^2(\Omega ))\bigcap L^2((0,T);\,H^1(\Omega ))$ . In particular, for all $0 \lt t \lt T$ , we have the following identity,

    (3.14) \begin{equation} \frac 12||h_t^\varepsilon ||_{\pi _\varepsilon }^2 + \int _0^t\int _\Omega \langle \nabla h_s^\varepsilon , B_\varepsilon ^{-1}\pi _\varepsilon \nabla h_s^\varepsilon \rangle \,\mathrm{d} x\,\mathrm{d} s =\frac 12||h_0^\varepsilon ||_{\pi _\varepsilon }^2. \end{equation}
  2. (2) If $D_0 \lt \infty$ , then $h^\varepsilon \in L^\infty ((0,T);\,H^1(\Omega ))\bigcap H^1((0,T);\,L^2(\Omega ))$ . In particular, for all $0 \lt t \lt T$ , we have the following identity,

    (3.15) \begin{equation} \frac 12\int _\Omega \langle \nabla h_t^\varepsilon , B_\varepsilon ^{-1}\pi _\varepsilon \nabla h_t^\varepsilon \rangle \,\mathrm{d} x + \int _0^t \int _\Omega (\partial _s h_s^\varepsilon )^2\pi _\varepsilon \,\mathrm{d} x\,\mathrm{d} s =\frac 12\int _\Omega \langle \nabla h_0^\varepsilon , B_\varepsilon ^{-1}\pi _\varepsilon \nabla h_0^\varepsilon \rangle \,\mathrm{d} x \end{equation}

Corollary 3.4. If $D_0 \lt \infty$ , then there is a subsequence $h^\varepsilon$ and an $h\in L^2(0,T;\,L^2(\Omega )$ such that $h^\varepsilon \longrightarrow h$ in $L^2(0,T;\,L^2(\Omega ))$ , i.e.

(3.16) \begin{equation} \int _0^T\int _\Omega |h^\varepsilon _t-h_t|^2\,\mathrm{d} x\,\mathrm{d} t\rightarrow 0. \end{equation}

Furthermore, we have

(3.17) \begin{equation} \int _\Omega |h^\varepsilon _t-h_t|^2\,\mathrm{d} x\rightarrow 0, \,\,\,\text{for a.e. $t\in [0,T]$.} \end{equation}

Recall Assumption (iii) in Section 2.4 for the invariant measure $\pi _\varepsilon$ . For the convenience of our upcoming proof, we collect the necessary convergence results in the following lemma.

Lemma 3.5. Suppose $A_0, B_0, C_0$ and $D_0 \lt \infty$ . Then (from Lemmas 3.1 and 3.3 ) we have

(3.18) \begin{equation} f^\varepsilon \in L^\infty ((0,T);\,H^1(\Omega ))\bigcap H^1((0,T);\,L^2(\Omega )), \quad \text{and}\quad \partial _t f^\varepsilon \in L^\infty ((0,T);\,H^1(\Omega ))\bigcap H^1((0,T);\,L^2(\Omega )). \end{equation}

Furthermore (from Corollaries 3.2 and 3.4 ), up to $\varepsilon$ -subsequence, we have

(3.19) \begin{equation} f^\varepsilon \longrightarrow f ,\quad \text{and}\quad \partial _tf^\varepsilon \longrightarrow \partial _tf \quad \text{in $L^2((0,T);\,L^2(\Omega ))$}. \end{equation}

Upon defining $\rho _t = f_t\overline {\pi }$ , we have

(3.20) \begin{eqnarray} \frac {\rho ^\varepsilon }{\pi _\varepsilon }\,\,\,(=f^\varepsilon ) &\longrightarrow & \frac {\rho }{\overline {\pi }}\,\,\,(=f) \quad \text{in $L^2((0,T);\,L^2(\Omega ))$,} \end{eqnarray}
(3.21) \begin{eqnarray} \rho ^\varepsilon &\rightharpoonup & \rho \quad \text{in $L^2((0,T);\,L^2(\Omega ))$,} \end{eqnarray}

and

(3.22) \begin{eqnarray} \frac {\partial _t\rho ^\varepsilon }{\pi _\varepsilon }\,\,\,(=\partial _tf^\varepsilon ) &\longrightarrow & \frac {\partial _t\rho }{\overline {\pi }}\,\,\,(=\partial _tf) \quad \text{in $L^2((0,T);\,L^2(\Omega ))$,} \end{eqnarray}
(3.23) \begin{eqnarray} \partial _t\rho ^\varepsilon &\rightharpoonup & \partial _t\rho \quad \text{in $L^2((0,T);\,L^2(\Omega ))$.}\\[6pt]\nonumber \end{eqnarray}

Instead of strong and weak convergence in $L^2(0,T;\,L^2(\Omega ))$ , by (3.10) and (3.17) , statements (3.19) – (3.23) also hold with the same respective strong and weak topologies in $L^2(\Omega )$ for a.e. $t\in [0,T]$ .

Remark 3.6. Note that currently our approach does require a high degree of regularity for the initial data. Its existence and construction would require the characterisation of precise oscillations of the solution which in principle can be done by considering second and higher order cell problems. However, we believe this requirement can be much relaxed by means of parabolic regularity. For example, if $A_0 \lt \infty$ , then $f_t^\varepsilon \in H^1(\Omega )$ for some $t\gt 0$ and if $B_0 \lt \infty$ , then $\partial _t f^\varepsilon _t\in L^2(\Omega )$ for some $t\gt 0$ . This can be iterated due to the variational structure of equation (3.2) . Alternatively, we can opt to utilise some technical results similar to [Reference Jordan, Kinderlehrer and Otto31, p.14, steps (a – c)] and [Reference Forkert, Maas and Portinale20, Proposition 4.4] in which the initial data even belongs to $L^1(\Omega )$ . For simplicity, in this paper, we do not pursue this route, as we consider it beyond the scope of homogenisation which is our key motivation.

The final statement in this section gives the time continuity of $\rho _t^\varepsilon$ in the standard Wasserstein space $(\mathcal{P}(\Omega ), W_2)$ (1.8).

Lemma 3.7. Assume $E_\varepsilon (\rho _0^\varepsilon )\lt +\infty$ . For any $T\gt 0$ , let $\rho _t^\varepsilon , t\in [0,T]$ be a solution to the $\varepsilon$ -gradient flow system (2.21) . Then there is $0 \lt C \lt \infty$ such that

(3.24) \begin{equation} W_2^2(\rho ^\varepsilon _t, \rho ^\varepsilon _s) \leq C|t-s|, \quad \forall \, 0\leq s\leq t\leq T, \end{equation}

where $W_2(\cdot , \cdot )$ is the standard $W_2$ -distance. Consequently, there exist a subsequence $\rho ^\varepsilon$ and $\rho \in C([0,T];\, \mathcal{P}(\Omega ))$ such that

(3.25) \begin{equation} W_2^2(\rho _t^\varepsilon , \rho _t) \to 0, \quad \text{ uniformly in } t\in [0,T]. \end{equation}

Proof. First, since $\rho _t^\varepsilon , t\in [0,T]$ satisfies (2.21) and $E_\varepsilon (\rho _0^\varepsilon )\lt +\infty$ , we have for any $0\leq s\leq t\leq T$ ,

(3.26) \begin{equation} \int _s^t \psi _\varepsilon ( \rho ^\varepsilon _\tau , \partial _\tau \rho ^\varepsilon _\tau ) \,\mathrm{d} \tau \lt +\infty . \end{equation}

This means for the curve $\rho _t^\varepsilon , t\in [0,T]$ with $\partial _t \rho ^\varepsilon _t = -\nabla \cdot \left (\rho _t^\varepsilon {B_\varepsilon ^{-1}} \nabla u^\varepsilon _t\right )$ , we have

(3.27) \begin{equation} \int _s^t \int _\Omega \frac 12 \langle \nabla u^\varepsilon _\tau , {B_\varepsilon ^{-1}} \nabla u^\varepsilon _\tau \rangle \rho ^\varepsilon _\tau \,\mathrm{d} x \,\mathrm{d} \tau \lt +\infty . \end{equation}

For this curve, the velocity in the continuity equation is given by $v^\varepsilon _t = {B_\varepsilon ^{-1}} \nabla u^\varepsilon _t$ . From [Reference Ambrosio, Brué and Semola1, Theorem 17.2], we have

(3.28) \begin{equation} \begin{aligned} W_2^2(\rho ^\varepsilon _t, \rho ^\varepsilon _s) \leq |t-s| \int _s^t \int _\Omega |v^\varepsilon _\tau |^2 \rho ^\varepsilon _\tau \,\mathrm{d} x \,\mathrm{d} \tau =& |t-s| \int _s^t \int _\Omega |{B_\varepsilon ^{-1}} \nabla u^\varepsilon _\tau |^2 \rho ^\varepsilon _\tau \,\mathrm{d} x \,\mathrm{d} \tau \\ \lesssim & |t-s|\int _s^t \int _\Omega \langle \nabla u^\varepsilon _\tau , {B_\varepsilon ^{-1}} \nabla u^\varepsilon _\tau \rangle \rho ^\varepsilon _\tau \,\mathrm{d} x \,\mathrm{d} \tau . \end{aligned} \end{equation}

This gives the equi-continuity of $\rho _t^\varepsilon$ in $(\mathcal{P}(\Omega ), W_2)$ .

Second, for any $t$ fixed, as $\int _\Omega \rho _t^\varepsilon \,\mathrm{d} x =1$ and $\Omega$ is compact, by [[ABS $^{+}$ 21],Theorem 8.8], the weak* convergence of $\rho _t^\varepsilon \in \mathcal{P}$ to some $\rho _t\in \mathcal{P}$ implies that

(3.29) \begin{equation} W_2(\rho ^\varepsilon _t, \rho _t) \to 0. \end{equation}

We then complete the proof by applying the Arzelá–Ascoli Theorem in $(\mathcal{P}(\Omega ), W_2)$ .

4. Passing limit in EDI formulation of $\varepsilon$ -gradient flow

In this section, we prove that the EDI formulation (2.21) of $\varepsilon$ -gradient flow (2.15) converges to the limiting EDI (2.26). To this end, we need to prove three lower bounds for the functionals (2.1), (2.16), and (2.17) on the left-hand-side of (2.21). Recall the definitions of $\bar {E}, \psi , \psi ^*$ in Section 2.4. The lower bounds estimates are stated in the following.

Theorem 4.1. Assume the initial data $\rho _0^\varepsilon$ satisfies the assumptions of Lemma 3.5 . Let further $\rho _0$ be the limit of $\rho _0^\varepsilon$ in $(\mathcal{P}(\Omega ), W_2)$ and $\rho _0^\varepsilon$ be well-prepared in the sense of (2.36) . Then

  1. (i) there exists a subsequence $\rho ^\varepsilon$ and $\rho \in C([0,T];\, L^2(\Omega ))$ such that (3.25) holds;

  2. (ii) for a.e. $t\in [0,T]$ , the lower bound for free energy holds

    (4.1) \begin{equation} \liminf _{\varepsilon \to 0} E_\varepsilon (\rho _t^\varepsilon ) \geq \overline {E}(\rho _t); \end{equation}
  3. (iii) for any $t\in [0,T]$ , the lower bound for the dissipation on the cotangent plane holds

    (4.2) \begin{equation} \liminf _{\varepsilon \to 0} \int _0^t \psi ^*_\varepsilon \left (\rho ^\varepsilon _\tau , - \frac {\delta E_\varepsilon }{\delta \rho }(\rho ^\varepsilon _\tau )\right ) \,\mathrm{d} \tau \geq \int _0^t \psi ^*\left (\rho _\tau , - \frac {\delta \overline {E}}{\delta \rho }(\rho _\tau )\right ) \,\mathrm{d} \tau ; \end{equation}
  4. (iv) for any $t\in [0,T]$ , the lower bound for the dissipation on the tangent plane holds

    (4.3) \begin{equation} \liminf _{\varepsilon \to 0} \int _0^t \psi _\varepsilon (\rho ^\varepsilon _\tau , \partial _\tau \rho ^\varepsilon _\tau ) \,\mathrm{d} \tau \geq \int _0^t \psi (\rho _\tau , \partial _\tau \rho _\tau ) \,\mathrm{d} \tau . \end{equation}

As mentioned before, our approach relies on the idea of convergence of functionals in a variational setting. In particular, we make use of the following result which is a special case of by now classical results of $\Gamma$ -convergence. See for example, [Reference Marcellini35, Theorems 4.1, 4.4], and also [Reference Braides13, Reference Braides12, Reference Maso18] for more detailed explanations.

Theorem 4.2 ( $\Gamma$ -conv). Let $\Omega$ be an open bounded domain of $\mathbb R^n$ and $A_\varepsilon (\cdot ) = A(\cdot ,\frac {\cdot }{\varepsilon })$ be a symmetric positive definite matrix. Consider the functional

(4.4) \begin{equation} {\mathcal F}_\varepsilon (v) =\int _\Omega \left \langle A\left (x,\frac {x}{\varepsilon }\right )\nabla v, \nabla v\right \rangle \,\mathrm{d} x, \quad v\in H^1_0(\Omega ) + w \end{equation}

where $w\in H^1(\Omega )$ is given. Then ${\mathcal F}_\varepsilon$ $\Gamma$ -converges in $L^2(\Omega )$ to the following functional

(4.5) \begin{equation} {\mathcal F}(v) =\int _\Omega \left \langle \overline {A}(x)\nabla v, \nabla v\right \rangle \,\mathrm{d} x, \quad v\in H^1_0(\Omega ) + w. \end{equation}

In detail,

  1. (1) for any $v_\varepsilon \in H^1_0(\Omega ) + w$ that converges to $v\in H^1_0(\Omega ) + w$ in $L^2(\Omega )$ , it holds that

    (4.6) \begin{equation} \liminf _{\varepsilon \to 0}{\mathcal F}_\varepsilon (v_\varepsilon ) \geq {\mathcal F}(v); \end{equation}
  2. (2) for any $v\in H^1_0(\Omega ) + w$ , there exists $v_\varepsilon \in H^1_0(\Omega ) + w$ that converges to $v$ in $L^2(\Omega )$ , such that

    (4.7) \begin{equation} \lim _{\varepsilon \to 0}{\mathcal F}_\varepsilon (v_\varepsilon ) = {\mathcal F}(v). \end{equation}

Furthermore, the effective matrix $\overline {A}$ can be found by the following variational formula: for any $p\in \mathbb R^n$ ,

(4.8) \begin{equation} \big \langle \overline {A}(x)p, p\big \rangle = \inf \left \{\int _{\mathbb T^n} \left \langle A\left (x,y\right )(\nabla v+p),\, (\nabla v + p)\right \rangle \,\mathrm{d} y, \quad v\in H^1(\mathbb T^n)\right \}. \end{equation}

As an application, we will apply the above result to the case $\Omega = \mathbb T^n$ and

\begin{equation*} A(x,y) = D(x,y)\,\,(= \pi (x,y)B^{-1}(y)) \quad \text{(see (A.1)).} \end{equation*}

The resultant formula for $\overline {A}(x)$ is given by $\overline {D}+\overline {G}$ ; see the expressions of $\overline {D}$ and $\overline {G}$ in (A.8). In Appendix A, we derive the same formula using asymptotic analysis.

Proof of (4.1). This statement follows directly from [Reference Ambrosio, Gigli and Savaré2, Lemma 9.4.3] which says that the entropy functional is jointly lower-semicontinuous with respect to the weak convergence of $\rho _t^\varepsilon$ and $\pi _\varepsilon$ . In our case, it also follows simply from the strong convergence of $f_t^\varepsilon$ (together with the fact that $f_t^\varepsilon$ is uniformly bounded from above and away from zero):

\begin{eqnarray*} \lim _{\varepsilon \to 0} \int _\Omega \rho _t^\varepsilon \log \frac {\rho _t^\varepsilon }{\pi _\varepsilon }\,\,\mathrm{d} x = \lim _{\varepsilon \to 0} \int _\Omega f_t^\varepsilon (\log f_t^\varepsilon )\pi _\varepsilon \,\,\mathrm{d} x = \int _\Omega f_t(\log f_t)\overline {\pi }\,\,\mathrm{d} x = \int _\Omega \rho _t\log \frac {\rho _t}{\overline {\pi }}\,\,\mathrm{d} x.\hfill\end{eqnarray*}

Proof of (4.2) (time independence case). Let $\tau \in [0,T]$ be fixed. We will prove that

(4.9) \begin{equation} \liminf _{\varepsilon \to 0}\psi ^*_\varepsilon (\rho ^\varepsilon _\tau , -\log \frac {\rho ^\varepsilon _\tau }{\pi _\varepsilon }) \geq \psi ^*\left (\rho _\tau , -\log \frac {\rho _\tau }{\overline {\pi }}\right ). \end{equation}

We re-write the functional $\psi ^*$ in the following way,

\begin{align*} \psi ^*_\varepsilon (\rho ^\varepsilon _\tau , -\log \frac {\rho ^\varepsilon _\tau }{\pi _\varepsilon }) =& \frac 12 \int _\Omega \left \langle \nabla \log \frac {\rho ^\varepsilon _\tau }{\pi _\varepsilon }, {B_\varepsilon ^{-1}} \nabla \log \frac {\rho ^\varepsilon _\tau }{\pi _\varepsilon } \right \rangle \rho ^\varepsilon _\tau \,\mathrm{d} x\\ =& 2 \int _\Omega \left \langle \, \nabla \sqrt {\frac {\rho ^\varepsilon _\tau }{\pi _\varepsilon }}, {B_\varepsilon ^{-1}}\pi _\varepsilon \nabla \sqrt {\frac {\rho ^\varepsilon _\tau }{\pi _\varepsilon }} \, \right \rangle \,\mathrm{d} x\\ =& 2 \int _\Omega \big \langle \, \nabla w^\varepsilon _\tau , D_\varepsilon \nabla w^\varepsilon _\tau \, \big \rangle \,\mathrm{d} x, \end{align*}

where

\begin{equation*} w^\varepsilon _\tau \,:\!=\, \sqrt {f^\varepsilon _\tau },\,\,\,\text{and}\,\,\, D_\varepsilon = B^{-1}_\varepsilon \pi _\varepsilon . \end{equation*}

As $f_\tau ^\varepsilon \to f_\tau =\frac {\rho _\tau }{\overline {\pi }}$ strongly in $L^p(\Omega )$ for any $p\geq 1$ , we have $w^\varepsilon _\tau \to w_\tau \,:\!=\, \sqrt {f_\tau } =\sqrt {\frac {\rho _\tau }{\overline {\pi }}}$ in $L^2(\Omega )$ . Now we can invoke Theorem4.2 to deduce that

\begin{eqnarray*} &&\liminf _{\varepsilon \to 0}2\int _\Omega \big \langle \, \nabla w_\tau ^\varepsilon , D_\varepsilon \nabla w_\tau ^\varepsilon \, \big \rangle \,\mathrm{d} x\\ &\geq & 2\int _\Omega \langle \nabla w_\tau , (\overline {D}+\overline {G})\nabla w_\tau \rangle \,\mathrm{d} x =2\int _\Omega \left \langle \nabla \sqrt {f_\tau }, (\overline {D}+\overline {G})\nabla \sqrt {f_\tau } \right \rangle \,\mathrm{d} x \\ &=& 2\int _\Omega \left \langle \nabla \sqrt {\frac {\rho _\tau }{\overline {\pi }}}, (\overline {D}+\overline {G}) \nabla \sqrt {\frac {\rho _\tau }{\overline {\pi }}}\right \rangle \,\mathrm{d} x =\frac 12\int _\Omega \left \langle \nabla \log {\frac {\rho _\tau }{\overline {\pi }}}, \left (\frac {\overline {D}+\overline {G}}{\overline {\pi }}\right ) \nabla \log {\frac {\rho _\tau }{\overline {\pi }}}\right \rangle \rho _\tau \,\mathrm{d} x \\ &=&\frac 12\int _\Omega \left \langle \nabla \log {\frac {\rho _\tau }{\overline {\pi }}}, \overline {B}^{-1}\nabla \log {\frac {\rho _\tau }{\overline {\pi }}}\right \rangle \rho _\tau \,\mathrm{d} x = \psi ^*\left (\rho _\tau , -\log \frac {\rho _\tau }{\overline {\pi }}\right ), \end{eqnarray*}

concluding the result (4.9), with the identification $\overline {B}=\left (\frac {\overline {D}+\overline {G}}{\overline {\pi }}\right )^{-1}$ , from (A.9).

Proof of (4.3) (time independence case). Here we establish

(4.10) \begin{equation} \liminf _{\varepsilon \to 0}\psi _\varepsilon (\rho ^\varepsilon , s^\varepsilon ) \geq \psi (\rho , s) \end{equation}

for any $\rho ^\varepsilon \rightharpoonup \rho$ in $L^1(\Omega )$ and $s^\varepsilon \rightharpoonup s$ in $L^2(\Omega )$ with the property that

\begin{equation*} f^\varepsilon = \frac {\rho ^\varepsilon }{\pi _\varepsilon } \longrightarrow f = \frac {\rho }{\overline {\pi }}\,\,\,\text{in}\,\,\,L^2(\Omega ). \end{equation*}

Using the definition of $\psi _\varepsilon$ , we have

(4.11) \begin{equation} \begin{aligned} \psi _\varepsilon (\rho ^\varepsilon ,s^\varepsilon ) =&\sup _{\xi \in L^2(\Omega )}\left \{ \int _\Omega \xi s^\varepsilon \,\mathrm{d} x - \frac 12\int _\Omega \langle \nabla \xi ,B_\varepsilon ^{-1}\nabla \xi \rangle \rho ^\varepsilon \,\mathrm{d} x \right \} \end{aligned} \end{equation}

and likewise,

(4.12) \begin{equation} \begin{aligned} \psi (\rho ,s) =&\sup _{\xi \in L^2(\Omega )}\left \{ \int _\Omega \xi s\,\mathrm{d} x - \frac 12\int _\Omega \langle \nabla \xi ,\bar {B}^{-1}\nabla \xi \rangle \rho \,\mathrm{d} x \right \}. \end{aligned} \end{equation}

Note that the supremum in both definitions can be attained. In particular, there is a $\tilde \xi$ such that

(4.13) \begin{equation} \begin{aligned} \psi (\rho ,s) =& \int _\Omega \tilde \xi s\,\mathrm{d} x - \frac 12\int _\Omega \langle \nabla \tilde \xi ,\bar {B}^{-1}\nabla \tilde \xi \rangle \rho \,\mathrm{d} x \,\,\,\text{where}\,\,\, s = -\nabla \cdot \left (\rho \bar {B}^{-1}\nabla \tilde \xi \right ). \end{aligned} \end{equation}

Next we make use of an approximating sequence $\tilde \xi ^\varepsilon \rightharpoonup \tilde \xi$ in $H^1(\Omega )$ (and hence $\tilde \xi ^\varepsilon \to \tilde \xi$ in $L^2(\Omega )$ ) such that

(4.14) \begin{equation} \lim _{\varepsilon \to 0} \frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon ,B_\varepsilon ^{-1}\nabla \tilde \xi ^\varepsilon \rangle \rho _\varepsilon \,\mathrm{d} x = \frac 12\int _\Omega \langle \nabla \tilde \xi ,\overline {B}^{-1}\nabla \tilde \xi \rangle \rho \,\mathrm{d} x. \end{equation}

The above is equivalent to

(4.15) \begin{equation} \lim _{\varepsilon \to 0} \frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle f^\varepsilon \,\mathrm{d} x =\frac 12\int _\Omega \langle \nabla \tilde \xi ,(\overline {D}+\overline {G})\nabla \tilde \xi \rangle f\,\mathrm{d} x . \end{equation}

The construction of $\tilde {\xi }^\varepsilon$ can essentially be given by Theorem4.2 if we set $A_\varepsilon = D_\varepsilon f^\varepsilon$ . But in order to separate the dependence between $D_\varepsilon$ and $f^\varepsilon$ , a different argument is needed. We will provide the details in Appendix B.

Now by the fact that $\tilde \xi ^\varepsilon \longrightarrow \tilde \xi$ in $L^2(\Omega )$ , together with the assumption $s^\varepsilon \rightharpoonup s$ in $L^2(\Omega )$ , we have

\begin{equation*} \int _\Omega \tilde \xi ^\varepsilon s^\varepsilon \,\mathrm{d} x \longrightarrow \int _\Omega \tilde \xi s\,\mathrm{d} x. \end{equation*}

Then (4.15) implies that

(4.16) \begin{equation} \begin{aligned} \psi (\rho ,s) =& \int _\Omega \tilde \xi s\,\mathrm{d} x - \frac 12\int _\Omega \langle \nabla \tilde \xi ,\overline {B}^{-1}\nabla \tilde \xi \rangle \rho \,\mathrm{d} x \\ =& \lim _{\varepsilon \to 0} \left \{\int _\Omega \tilde \xi ^\varepsilon s^\varepsilon \,\mathrm{d} x - \frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon ,{B}_\varepsilon ^{-1}\nabla \tilde \xi ^\varepsilon \rangle \rho ^\varepsilon \,\mathrm{d} x \right \}\\ \leq & \liminf _{\varepsilon \to 0}\left [ \sup _{\xi }\left \{\int _\Omega \xi s^\varepsilon \,\mathrm{d} x - \frac 12\int _\Omega \langle \nabla \xi ,{B}_\varepsilon ^{-1}\nabla \xi \rangle \rho ^\varepsilon \,\mathrm{d} x \right \}\right ] \\ \leq &\liminf _{\varepsilon \to 0} \psi (\rho ^\varepsilon , s^\varepsilon ), \,\,\, \end{aligned} \end{equation}

which completes the proof for (4.16).

Proof of (4.2) and (4.3): time dependent case. To extend the time independent case to the time dependent case and finish the proofs of lower bounds (4.2) and (4.3), we will make use of a general $\Gamma$ - $\liminf$ result as stated in [Reference Stefanelli51, Cor. 4.4]. Specifically, let $H$ be a separable and reflexive Banach space, and $g_n$ , $g_\infty \,:\, (0,T)\times H\longrightarrow (-\infty , \infty ]$ be such that $g_n(t,\cdot )$ and $g_\infty (t,\cdot )\,:\,H\longrightarrow (-\infty , \infty ]$ are convex and for all $u\in H$ and a.e. $t\in (0,T)$ , the following holds:

(4.17) \begin{equation} g_\infty (t,u) \leq \inf \left \{ \liminf _{n} g_n(t,u_n)\,:\,u_n\rightharpoonup u\,\,\,\text{in $H$} \right \}. \end{equation}

Then for $p\in [1, \infty ]$ , $u_n\rightharpoonup u$ in $L^p(0,T;\,H)$ (weak- $*$ if $p=\infty$ ) and $t\longrightarrow \max \{0, -g_n(t,u_n(t)\}$ uniformly integrable, we have,

(4.18) \begin{equation} \int _0^T g_\infty (t,u(t))\,\mathrm{d} t \leq \liminf _n \int _0^T g_n(t,u_n(t))\,\mathrm{d} t. \end{equation}

Note that the uniform integrability condition is automatically satisfied if $g_n$ are non-negative, or bounded from below. See also the remark after Cor. 4.4 in [Reference Stefanelli51].

For (4.2), we set $H=H^1(\Omega )$ ,

\begin{equation*} g^*_\varepsilon (t,w) \,:\!=\, 2\int _\Omega \langle \nabla w, D_\varepsilon \nabla w\rangle \,\mathrm{d} x, \,\,\,\text{and}\,\,\, g^*_\infty (t,w) \,:\!=\, 2\int _\Omega \langle \nabla w, (\overline {D}+\overline {G})\nabla w\rangle \,\mathrm{d} x. \end{equation*}

Then $g^*_\varepsilon (t,\cdot )$ and $g^*_\infty (t,\cdot )$ are convex and (4.17) holds true by the time independent version of (4.2). Hence we have

\begin{equation*} \int _0^T 2\int _\Omega \langle \nabla w, (\overline {D}+\overline {G})\nabla w\rangle \,\mathrm{d} x\,\mathrm{d} t \leq \liminf _\varepsilon \int _0^T 2\int _\Omega \langle \nabla w^\varepsilon (t), D_\varepsilon \nabla w^\varepsilon (t)\rangle \,\mathrm{d} x\,\mathrm{d} t \end{equation*}

provided $w^\varepsilon \rightharpoonup w$ in $L^2((0,T);H)$ . This last condition is satisfied by the identification $w^\varepsilon (t) = \sqrt {\frac {\rho ^\varepsilon (t)}{\pi _\varepsilon }}$ , $w(t) = \sqrt {\frac {\rho (t)}{\overline {\pi }}}$ , and (3.20). This concludes the lower bound (4.2).

For (4.3), we set $H = L^2(\Omega )$ ,

\begin{equation*} g_\varepsilon (t,s) \,:\!=\, \psi _\varepsilon (\rho (t),s) = \frac 12\int _\Omega \langle \nabla u^\varepsilon , B_\varepsilon ^{-1}\nabla u^\varepsilon \rangle \rho ^\varepsilon (t)\,\mathrm{d} x \,\,\,\text{with}\,\,\, -\nabla \cdot (\rho ^\varepsilon B_\varepsilon ^{-1}\nabla u^\varepsilon ) = s, \end{equation*}

and

\begin{equation*} g_\infty (t,s) \,:\!=\, \psi (\rho (t),s) = \frac 12\int _\Omega \langle \nabla u, \overline {B}^{-1}\nabla u\rangle \rho (t)\,\mathrm{d} x \,\,\,\text{with}\,\,\, -\nabla \cdot (\rho \overline {B}^{-1}\nabla u) = s. \end{equation*}

Again, $g_\varepsilon (t,\cdot )$ and $g_\infty (t,\cdot )$ are convex because the map $s\to u^\varepsilon$ or $u$ is uniquely defined and linear. By (4.10), (4.17) is satisfied. Hence, we have

\begin{equation*} \int _0^T\psi \big (\rho (t),s(t)\big )\,\mathrm{d} t \leq \liminf \int _0^T\psi _\varepsilon \big (\rho ^\varepsilon (t), s^\varepsilon (t)\big )\,\mathrm{d} t \end{equation*}

upon the identification $s^\varepsilon (t) = \partial _t\rho ^\varepsilon _t$ and $s(t) = \partial _t\rho _t$ . The fact that $s^\varepsilon \rightharpoonup s$ in $L^2((0,T);\,H)$ follows from (3.23). Lower bound (4.3) is thus proved.

The above conclude the proof for Theorem4.1.

5. Comparison between limiting Wasserstein distances

In this section, we use the just established convergence result for gradient flows in EDI form to further analyse the induced limiting Wasserstein distance $\overline {W}$ . In particular, we will show that the limiting Wasserstein metric $\overline {W}$ is in general, different, and in fact strictly larger than $W_{\text{GH}}$ obtained from the Gromov–Hausdorff limit of $W_\varepsilon$ which is a commonly considered mode of convergence of metric spaces. Gromov–Hausdorff distance can be used to compare the distortion of two metric spaces from being isometric. The particular property needed in this paper is that the Gromov–Hausdorff convergence of a compact metric space $\Omega _k$ implies the Gromov–Hausdorff convergence of the Wasserstein space $(\mathcal{P}(\Omega _k), W_k)$ [Reference Villani53, Theorem 28.6]. Briefly stated, let $({\mathcal X}, d_{\mathcal X})$ and $({\mathcal Y}, d_{\mathcal Y})$ be two metric spaces. Their Gromov–Hausdorff distance is defined as [Reference Villani53, (27.2)]

(5.1) \begin{equation} D_{GH}({\mathcal X}, {\mathcal Y}) = \frac 12\inf _{\mathcal R} \sup _{(x,y),(x',y')\in {\mathcal R}}\Big | d_{\mathcal X}(x,x')- d_{\mathcal Y}(y,y') \Big |, \end{equation}

where ${\mathcal R}\subset {\mathcal X}\times {\mathcal Y}$ is a correspondence or relation between $\mathcal X$ and $\mathcal Y$ . We refer to [Reference Villani53, Chapters 27, 28] for more detailed information about the concept of Gromov–Hausdorff distances and convergence. For our application, we will take $({\mathcal X}, d_{\mathcal X}) \,:\!=\, (\Omega , d_\varepsilon )$ or $(\mathcal{P}(\Omega ), W_\varepsilon )$ .

We remark that several of the following statements require the existence of densities (with respect to Lebesgue measure) for the underlying probability measures and the space to be geodesic complete. These are automatically satisfied by our standing assumptions (see Section 2.4).

5.1. Effective Wasserstein distance $\overline {W}$ induced by convergence of gradient flows

For convenience, we recall here the Kantorovich and Benamou–Brenier formulations (1.4) and (1.5) for our $\varepsilon$ -Wasserstein metric $W_\varepsilon$ :

(5.2) \begin{equation} W_\varepsilon ^2(\rho _0, \rho _1)\,:\!=\,\inf \left \{ \iint d^2_\varepsilon (x,y) \,\mathrm{d} \gamma (x,y); \quad \int _{\Omega } \gamma ( x,\,\mathrm{d} y) = \rho _0(x)\,\mathrm{d} x, \,\, \int _{\Omega } \gamma (\,\mathrm{d} x, y) = \rho _1(y)\,\mathrm{d} y \right \} \end{equation}

and

(5.3) \begin{equation} W_\varepsilon ^2(\rho _0, \rho _1) \,:\!=\, \inf \left \{ \int _0^1\int \rho _t(x)\langle B_\varepsilon (x) v_t(x), v_t(x)\rangle \,\mathrm{d} x\,\mathrm{d} t,\quad (\rho _t, v_t)\in V(\rho _0, \rho _1) \right \}, \end{equation}

where $V$ is defined in (1.6). The $\varepsilon$ -metric $d_\varepsilon$ on $\Omega \subset \mathbb{R}^n$ is given via the least action

(5.4) \begin{equation} d_\varepsilon ^2(x,y) \,:\!=\, \inf \left \{ \int _0^1 \langle B_\varepsilon (z_t) \dot {z}_t, \dot {z}_t \rangle \,\mathrm{d} t, \quad z_0 = x, \quad z_1=y \right \}. \end{equation}

A curve $z(\cdot )\in AC([0,1]; \mathbb{R}^n)$ that achieves the infimum in (5.4) is a geodesic in the metric space $(\mathbb{R}^n, d_\varepsilon ).$ From [Reference Bernard and Buffoni6, Theorem A,B], (5.2) and (5.3) are equivalent.

The same formulations hold for our induced limit Wasserstein distance $\overline {W}$ . More precisely, we have

(5.5) \begin{equation} \overline {W}^2(\rho _0, \rho _1)\,:\!=\, \inf \left \{ \int \int \overline {d}^2(x,y) \,\mathrm{d} \gamma (x,y); \quad \int _{\Omega } \gamma ( x,\,\mathrm{d} y) = \rho _0(x)\,\mathrm{d} x, \,\, \int _{\Omega } \gamma (\,\mathrm{d} x, y) = \rho _1(y)\,\mathrm{d} y \right \}, \end{equation}

and the equivalent formulation

(5.6) \begin{equation} \overline {W}^2(\rho _0, \rho _1)\,:\!=\, \inf \left \{ \int _0^1\int \rho _t(x)\langle \overline {B} v_t(x), v_t(x)\rangle \,\mathrm{d} x\,\mathrm{d} t,\quad (\rho _t, v_t)\in V(\rho _0, \rho _1) \right \}. \end{equation}

Here the constant matrix $\overline {B}$ is defined in (A.9) and the induced-metric $\overline {d}$ on $\Omega \subset \mathbb{R}^n$ is again given via the least action

(5.7) \begin{equation} \overline {d}^2(x,y) \,:\!=\, \inf \left \{ \int _0^1 \langle \overline {B} \dot {z}_t, \dot {z}_t \rangle \,\mathrm{d} t, \quad z_0 = x, \quad z_1=y \right \}. \end{equation}

From the Euler–Lagrangian equation for the minimiser of (5.7), the optimal curve $\tilde {z}(\cdot )$ that achieves the least action satisfies $\overline {B}\ddot {\tilde {z}}_t=0$ , and hence it has constant speed, $\dot {\tilde {z}}_t = y-x$ . Thus, we have explicitly

(5.8) \begin{equation} \overline {d}^2(x,y) = \langle \overline {B} (y-x), y-x \rangle = \langle \overline {B} \hat {n}, \hat {n} \rangle |y-x|^2, \quad \text{where}\quad \hat {n}=\frac {y-x}{|y-x|}. \end{equation}

Note that both $W^\varepsilon$ and $\overline {W}$ induce a Riemannian metric on $\mathcal{P}(\Omega )$ . More precisely, for any $\rho \in \mathcal{P}(\Omega )$ , and any $s_1, s_2\in T_{\mathcal{P}}$ , the tangent plane at $\rho$ , the first fundamental form are defined, respectively, as

(5.9) \begin{eqnarray} \big \langle s_1, s_2\big \rangle _{T_{\mathcal{P}}, T_{\mathcal{P}}, \varepsilon }&\,:\!=\,& \int \rho (x)\langle B^{-1}_\varepsilon (x) \nabla u_1(x), \nabla u_2(x)\rangle \,\mathrm{d} x, \end{eqnarray}

where $s_i = -\nabla \cdot (\rho B^{-1}_\varepsilon \nabla u_i)$ , $i=1,2$ for $W^\varepsilon$ , and

(5.10) \begin{eqnarray} \big \langle s_1, s_2\big \rangle _{T_{\mathcal{P}}, T_{\mathcal{P}}}&\,:\!=\,& \int \rho (x)\langle \overline {B}^{-1}(x) \nabla u_1(x), \nabla u_2(x)\rangle \,\mathrm{d} x, \end{eqnarray}

where $s_i = -\nabla \cdot (\rho \overline {B}^{-1}\nabla u_i)$ , $i=1,2$ for $\overline {W}$ . This is also manifested by the fact that both the corresponding dissipation functionals are bilinear forms in $s$ :

\begin{equation*} \psi _\varepsilon (\rho , s)= \frac 12 \int _\Omega \langle \nabla u, B_\varepsilon ^{-1} \nabla u \rangle \rho \,\mathrm{d} x \quad \text{ with }\, s=-\nabla \cdot \left (\rho B_\varepsilon ^{-1} \nabla u\right ), \end{equation*}

and

\begin{equation*} \psi (\rho , s)= \frac 12 \int _\Omega \langle \nabla u, \overline {B}^{-1} \nabla u \rangle \rho \,\mathrm{d} x \quad \text{ with }\, s=-\nabla \cdot \left (\rho \overline {B}^{-1} \nabla u\right ). \end{equation*}

5.2. The Gromov–Hausdorff limit $W_{\text{GH}}$ of $W_\varepsilon$

Now we consider the convergence in the Gromov–Hausdorff sense of $W_\varepsilon$ to a limiting Wasserstein metric, denoted as $W_{\text{GH}}$ .

We first show that even in one dimension, in general it is always the case that $W_{\text{GH}}\lt \overline {W}$ unless $\pi _\varepsilon$ and $B_\varepsilon$ are related to each other in some specific way. Recall the metric $d_\varepsilon$ in (5.4). From the Euler–Lagrangian equation for the minimiser $z_t = \tilde {z}^\varepsilon _t$ , we have

\begin{equation*} \frac {\,\mathrm{d}}{\,\mathrm{d} t} (2 B_\varepsilon (z_t)\dot {z}_t) = B_\varepsilon '(z_t)(\dot {z}_t)^2, \end{equation*}

leading to $B_\varepsilon '(z_t)\dot {z}_t^2+ 2 B_\varepsilon (z_t)\ddot {z}_t=0$ and thus

\begin{equation*} B_\varepsilon (z) \dot {z}^2=C_\varepsilon (x,y), \quad \text{for some constant $C_\varepsilon (x,y)$.} \end{equation*}

Upon solving this ODE for $z_t$ with the two boundary conditions $z(0)=x$ , $z(1)=y$ , we have

\begin{equation*} \sqrt {C_\varepsilon (x,y)}=\int _x^y \sqrt {B_\varepsilon (z)} \,\mathrm{d} z. \end{equation*}

Hence the infimum in (5.4) is given by

(5.11) \begin{equation} d^2_\varepsilon (x,y) = C_\varepsilon (x,y) = \left (\int _x^y \sqrt {B_\varepsilon (z)} \,\mathrm{d} z\right )^2. \end{equation}

As $B_\varepsilon (x)=B(\frac {x}{\varepsilon })$ , it is easy to verify that for any $x,y\in \Omega$ , there exist some integer $N_\varepsilon$ and $\delta \in (-1,1)$ , such that $y-x=N_\varepsilon \varepsilon + \delta \varepsilon$ and $N_\varepsilon \varepsilon \to |x-y|$ . Notice also $B(\cdot )$ is $1$ -periodic. Hence,

\begin{eqnarray*} d^2_\varepsilon (x,y) = \left ( \varepsilon \int _{\frac {x}{\varepsilon }}^{\frac {y}{\varepsilon }} \sqrt {B(s)} \,\mathrm{d} s\right )^2 &=& \left (\varepsilon N_\varepsilon \int _0^1 \sqrt {B(s)} \,\mathrm{d} s + \varepsilon \int _0^\delta \sqrt {B(s)} \,\mathrm{d} s\right )^2\\ &\stackrel {\longrightarrow }{\varepsilon \to 0} & |x-y|^2\left (\int _0^1 \sqrt {B(s)} \,\mathrm{d} s\right )^2 \,=\!:\, d^2_{\text{GH}}(x,y). \end{eqnarray*}

Notice that if one chooses $\mathcal R$ to be the identity map as the correspondence between the metric spaces $\mathcal{X}\,:\!=\,(\Omega , d_\varepsilon )$ and $\mathcal{Y}\,:\!=\, (\Omega , d_{\text{GH}})$ , then from (5.1), we have

\begin{equation*}D_{\text{GH}}(\mathcal{X},\mathcal{Y})\leq \frac {1}{2} \sup _{(x,x), (y,y)\in \mathcal{X}\times \mathcal{Y}}|d_\varepsilon (x,y)-d_{\text{GH}}(x,y)| \to 0.\end{equation*}

Hence the one dimensional metric space $(\Omega , d_\varepsilon )$ Gromov–Hausdorff converges to $(\Omega , d_{\text{GH}})$ . By [Reference Villani53, Theorem 28.6], the Wasserstein distance $W_\varepsilon$ defined in (5.2) also converges to the following limiting Wasserstein distance $W_{\text{GH}}$ in the Gromov–Hausdorff sense,

(5.12) \begin{equation} W_{\text{GH}}^2(\rho _0, \rho _1)\,:\!=\, \inf \left \{ \int \int d^2_{\text{GH}}(x,y) \,\mathrm{d} \gamma (x,y); \quad \int _{\Omega } \gamma ( x,\,\mathrm{d} y) = \rho _0(x) \,\mathrm{d} x, \,\, \int _{\Omega } \gamma (\,\mathrm{d} x, y) = \rho _1(y) \,\mathrm{d} y \right \}. \end{equation}

Again by [Reference Bernard and Buffoni6, Theorem AB], $W_{\text{GH}}$ can be equivalently written in the Benamou–Brenier formulation

(5.13) \begin{equation} W_{\text{GH}}^2(\rho _0, \rho _1)\,:\!=\, \inf \left \{ \int _0^1\int \rho _t(x)\langle \overline {C} v_t(x), v_t(x)\rangle \,\mathrm{d} x\,\mathrm{d} t,\quad (\rho _t, v_t)\in V(\rho _0, \rho _1) \right \} \end{equation}

with $\displaystyle \overline {C}=\left (\int _0^1 \sqrt {B(s)} \,\mathrm{d} s\right )^2.$

On the other hand, in one dimension, we can solve the cell problem (A.6) explicitly:

\begin{eqnarray*} \partial _y\big (D(x,y)\partial _y w(x,y)\big ) &=& -\partial _y\left (D(x,y)\right ), \quad \text{where}\,\,\,D(x,y) = \pi (x,y)B(y)^{-1},\\ \partial _y w(x,y) &=& -1 + \frac {C(x)}{D(x,y)} \quad \text{with}\,\,\,C(x) = \left (\int \frac {1}{D(x,y)}\,\mathrm{d} y\right )^{-1}. \end{eqnarray*}

Then (A.7) and (A.9) are given as

\begin{eqnarray*} \overline {D}(x) &=& \int D(x,y)\,\mathrm{d} y,\\ \overline {G}(x) &=& \int D(x,y) \left (-1 + \frac {C(x)}{D(x,y)}\right )\,\mathrm{d} y = -\int D(x,y)\,\mathrm{d} y + \left (\int \frac {1}{D(x,y)}\,\mathrm{d} y\right )^{-1},\\ \overline {B} &=& \left (\frac {\overline {D} + \overline {G}}{\overline {\pi }}\right )^{-1} = \overline {\pi }\int \frac {1}{D(x,y)}\,\mathrm{d} y = \overline {\pi }\int \frac {B(y)}{\pi (x,y)}\,\mathrm{d} y. \end{eqnarray*}

By the Cauchy–Schwarz inequality, we always have

\begin{align*} \overline {C} =\left (\int _0^1 \sqrt {B(s)} \,\mathrm{d} s\right )^2 =&\left (\int _0^1 \sqrt {\pi (x,y)}\sqrt {\frac {B(y)}{\pi (x,y)}} \,\mathrm{d} y\right )^2 \\ \leq & \left (\int \pi (x,y)\,\mathrm{d} y\right ) \left (\int \frac {B(y)}{\pi (x,y)}\,\mathrm{d} y\right ) = \overline {B}(x), \end{align*}

and the equality holds if and only if there exists some constant $c\gt 0$ such that

(5.14) \begin{eqnarray} \sqrt {\pi (x,y)} = c\sqrt {\frac {B(y)}{\pi (x,y)}}, \quad \text{i.e.}\quad \pi (x,y)=\pi (y) = c\sqrt {B(y)}. \end{eqnarray}

Hence, unless $\pi (y)=c\sqrt {B(y)}$ , we always have

\begin{equation*} d_{\text{GH}}(x,y) \lt \overline {d}(x,y)\quad \text{for all $x,y\in \Omega $} \end{equation*}

i.e. $W_{\text{GH}} \lt \overline {W}.$ As an afterthought, it seems not quite surprising that some condition, such as (5.14), is needed in order for $\overline {W}$ to be equal to $W_{\text{GH}}$ . We will elaborate upon this at the end of this section.

Next, we illustrate the $n$ -dimensional case by means of an example. From [Reference Braides12, Section 3.3], it is shown that the functional

(5.15) \begin{equation} {\mathcal F}_\varepsilon (z) = \int _0^1 \langle B_\varepsilon (z_t) \dot {z}_t, \dot {z}_t \rangle \,\mathrm{d} t, \quad \text{for} \quad z(\cdot )\in (H^1([0,1]))^n \quad \text{with}\quad z_0 = x, \quad z_1=y, \end{equation}

$\Gamma$ -converges with respect to the strong $L^2(0,1)$ -topology to

(5.16) \begin{equation} {\mathcal F}(z) = \int _0^1 \varphi (\dot {z}(t)) \,\mathrm{d} t \quad \text{for}\quad z(\cdot )\in (H^1([0,1]))^n, \quad \text{with}\quad z_0 = x, \quad z_1=y, \end{equation}

where the limiting integrand $\varphi$ is given by

(5.17) \begin{equation} \varphi (v) \,:\!=\, \lim _{T\to +\infty } \inf _{u\in (H^1_0([0,T]))^n} \left \{\frac {1}{T} \int _0^T \langle B(u(t)+vt) (\dot {u}(t)+v), \, \dot {u}(t)+v \rangle \,\mathrm{d} t\right \}. \end{equation}

Now following [Reference Braides12, Example 3.3], we consider $B_\varepsilon (z)=b(\frac {z}{\varepsilon })$ where $b$ is the following $1$ -periodic function on $[0,1]^n$ ,

\begin{equation*} b(y) = \left \{ \begin{array}{ll} \beta & \text{ if } y\in (0,1)^n;\\ \alpha & \text{ if for some }i,\, y_i\in \mathbb{Z}. \end{array} \right . \end{equation*}

If $n \alpha \lt \beta$ , one obtains that the limiting energy integrand $\varphi$ is given by

(5.18) \begin{equation} \varphi (v)=\alpha \left (\sum _{i=1}^n |v_i|\right )^2. \end{equation}

Using the property of $\Gamma$ -convergence [Reference Braides12],Theorem 1.21], we deduce also the convergence of the minimum value $d^2_\varepsilon$ of $\mathcal{F}_\varepsilon$ to the minimum value $d^2_{\text{GH}}$ of $\mathcal{F}$ , where

(5.19) \begin{equation} d_{\text{GH}}(x,y)= \sqrt {\alpha }\left (\sum _{i=1}^n |\hat {n}_i|\right )|y-x| = \sqrt {\alpha }\|y-x\|_{\ell ^1} \,\,\, \text{with $\hat {n}=\frac {y-x}{|y-x|}.$} \end{equation}

On the other hand, note that the value $\alpha$ is attained only on the $(n-1)$ -dimensional set $\bigcup _{i=1}^n\{y_i \in \mathbb Z\}$ . This set is invisible by $\overline {B}$ which is obtained by solving the elliptic cell problem (A.6). Hence the induced limiting Wasserstein distance $\overline {W}$ (5.5) with $\overline {d}$ defined in (5.7) is $\overline {d}(x,y)=\beta |x-y|$ for all $x,y\in \Omega$ . Thus, for this example, we have

\begin{equation*} d_{\text{GH}}(x,y)= \sqrt {\alpha } \|y-x\|_{\ell ^1} \leq \sqrt {\alpha n}\|y-x\|_{\ell ^2} \lt \sqrt {\beta } |y-x| = \overline {d}(x,y). \end{equation*}

Hence we have again $W_{\text{GH}} \lt \overline {W}$ .

We would like to point out that for the above example, the integrand $\varphi$ in (5.17) is always quadratic, or homogeneous of degree 2 in $p$ . (In fact, for any $\lambda \neq 0$ , by applying the change of variables $\tilde {t}=\lambda t, \tilde {u}(\tilde {t})=u(t)$ , it is easy to verify that $\varphi (\lambda v) = \lambda ^2 \varphi (v)$ .) However, the $\varphi$ in (5.18) is not bilinear in $p$ , in contrast to the $\varphi$ in (5.7):

\begin{equation*} \varphi (p) = \langle \bar {B} p, p \rangle . \end{equation*}

Below we give further remarks about the discrepancy between $\overline {W}$ and $W_{\text{GH}}$ .

  1. (1) We first explain the condition (5.14). This is nothing but the fact that one can choose the Riemannian metric $(\mathbb{R}, g_\varepsilon )$ with $(g_\varepsilon )_{ij}(x)= B_\varepsilon (x),$ so that the Wasserstein distance on $(\mathbb{R}, g_\varepsilon )$ coincides with $W_\varepsilon$ . More precisely, the condition (5.14) implies the volume form on $(\mathbb{R}, g_\varepsilon )$ is

    (5.20) \begin{equation} \,\mathrm{d}\text{Vol} = \sqrt {|g_\varepsilon |}\,\mathrm{d} x = \sqrt {B_\varepsilon }\,\mathrm{d} x = c\pi _\varepsilon (x)\,\mathrm{d} x=c\pi (\frac {x}\varepsilon )\,\mathrm{d} x. \end{equation}
    Therefore, the heat flow on $(\mathbb{R}, g_\varepsilon )$ , in terms of the density function with respect to the volume element $\,\mathrm{d}\text{Vol}$ is given by
    (5.21) \begin{equation} \partial _t p_\varepsilon = \frac {1}{\sqrt {|g_\varepsilon |}}\nabla \cdot (\sqrt {|g_\varepsilon |}g^{ij}_\varepsilon \nabla p_\varepsilon )= \frac {1}{\pi _\varepsilon } \nabla \cdot (\pi _\varepsilon {B_\varepsilon ^{-1}} \nabla p_\varepsilon ). \end{equation}
    This equation, in terms of the density function $\rho _\varepsilon (x,t) = p_\varepsilon (x,t) \sqrt {|g_\varepsilon |} = p_\varepsilon (x,t) \pi _\varepsilon (x)$ , is exactly the $W_\varepsilon$ -gradient flow with respect to the relative entropy $E_\varepsilon$ in (2.1):
    (5.22) \begin{eqnarray} \partial _t \rho _\varepsilon = \nabla \cdot (\pi _\varepsilon {B_\varepsilon ^{-1}} \nabla \frac {\rho _\varepsilon }{\pi _\varepsilon })=\nabla \cdot \left ( \rho ^\varepsilon _t B_\varepsilon ^{-1} \nabla \frac {\delta E_\varepsilon }{\delta \rho }(\rho ^\varepsilon _t) \right ). \end{eqnarray}
    Therefore, condition (5.14) means that the discrepancy between $\overline {W}$ and $W_{\text{GH}}$ does not happen in one dimension when one considers homogenisation of heat flow on $(\mathbb{R}, g_\varepsilon )$ . In other words, the homogenised heat flow in one dimension naturally induces the same limiting distance as finding the limiting minimum path on $(\mathbb{R}, g_\varepsilon )$ . On the other hand, even in one dimension, the convergence of the discrete transport distance to continuous transport distance $W_2$ requires an isotropic mesh condition [Reference Gladbach, Kopfer, Maas and Portinale22],eq. (1.3)]. Without this condition, the discrete-to-continuous limiting distance in the Gromov–Hausdorff sense can be different from the continuous transport distance $W_2$ [Reference Gladbach, Kopfer, Maas and Portinale22], Theorem 1.1, Remarks 1.2 and 1.3].
  2. (2) We believe that the above conclusion of $W_{\text{GH}} \lt \overline {W}$ is true in general, particularly in higher dimensions, even if we consider heat flow. This is because the Gromov–Hausdorff limit $d_{\text{GH}}$ of $d_\varepsilon$ involves finding the minimum or geodesic distance between two points as indicated in (5.4). This amounts to searching for the minimum path in the underlying spatial inhomogeneity. On the other hand, the $\overline {B}$ in the limiting induced distance $\overline {d}$ is found by solving an elliptic cell-problem (A.6) which requires taking some average of the spatial inhomogeneity. (Note that in contrast, in one dimension, any path will explore the whole inhomogeneous landscape.) Hence, in general $d_{\text{GH}}$ and $W_{\text{GH}}$ should be smaller than $\overline {d}$ and $\overline {W}$ . See also the discussion in [Reference Forkert, Maas and Portinale20, p. 4298] and the work [Reference Gladbach, Kopfer, Maas and Portinale22].

6. Conclusion

This paper provides a variational framework using the energy dissipation inequality to prove the convergence of gradient flows in Wasserstein spaces. Our key contribution is the incorporation of fast oscillations in the underlying energy and medium. In particular, the gradient-flow structure is preserved in the limit but is described with respect to an effective energy and metric. Our result is consistent with asymptotic analysis from the realm of homogenisation. Even though we apply the result to a linear Fokker-Planck equation in a continuous setting, we believe the approach is applicable to a broader class of problems including nonlinear equations or evolutions on graphs and networks.

Acknowledgements

We would also like to thank the anonymous referees for their constructive comments.

Funding

Yuan Gao was partially supported by NSF DMS-2204288 and NSF CAREER DMS-2440651.

Competing interests

The authors declare none.

Appendix A. Asymptotic analysis for the $\varepsilon$ -gradient flow

In this section, we use the method of asymptotic expansion to analyse the convergence of the $\varepsilon$ -Fokker-Planck equation (2.4) (or (2.15)) to the limiting homogenised one (2.24).

Recall the assumptions (2.30) and (2.32) for $B_\varepsilon$ and $\pi _\varepsilon$ in Section 2.4 and the definition of fast variable $\displaystyle y\,:\!=\, \frac {x}{\varepsilon }$ . Introducing

(A.1) \begin{equation} D(x,y)=\pi (x,y)B^{-1}(y), \end{equation}

then (3.2) reads

(A.2) \begin{equation} \partial _t f^\varepsilon =\frac {1}{\pi _\varepsilon } \nabla \cdot \left ( D(x,\frac {x}{\varepsilon }) \nabla f^\varepsilon \right ). \end{equation}

Consider the ansatz

(A.3) \begin{equation} f^\varepsilon \big (x,t\big ) = f_0\big (x,\frac {x}{\varepsilon },t\big ) + \varepsilon f_1\big (x,\frac {x}{\varepsilon },t\big ) + O(\varepsilon ^2) \quad \text{with $f_0$ and $f_1$ $1$-periodic in $y$.} \end{equation}

Substituting it into (A.2), we have

(A.4) \begin{align} \partial _t \big (f_0+\varepsilon f_1 + O(\varepsilon ^2)\big ) = \frac {1}{\pi (x,y)}\left (\nabla _x + \frac {1}{\varepsilon } \nabla _y\right )\cdot \left (D(x,y)\left (\nabla _x+\frac {1}{\varepsilon }\nabla y\right ) \big (f_0+\varepsilon f_1 + O(\varepsilon ^2)\big )\right ). \end{align}

Terms of different orders are analysed as follows.

  1. (I) $\frac {1}{\varepsilon ^2}$ -terms. They satisfy,

    \begin{align*} \nabla _y \cdot \left (D(x,y) \nabla _y f_0(x,y,t)\right ) =0. \end{align*}
    Multiply the above by $f_0(x,y,t)$ and then integrate over $y$ gives $\displaystyle \int |\nabla _y f_0(x,y,t)|^2\,\mathrm{d} y = 0$ which implies $f_0(x,y,t)=f_0(x,t).$
  2. (II) $\frac {1}{\varepsilon }$ -terms. They satisfy,

    (A.5) \begin{equation} \nabla _y \cdot \left (D(x,y) (\nabla _x f_0 + \nabla _y f_1)\right )=0. \end{equation}
    For $i=1,2,\ldots d$ , let $w_i(y)$ be the solution to the cell problem
    (A.6) \begin{equation} \nabla _y \cdot \left (D(x,y)\nabla _y w_i(x,y)\right ) + \nabla _y \cdot \left (D(x,y) \vec {e}_i\right )=0, \end{equation}
    where $\vec {e}_i$ is the unit vector in $i$ -coordinate. The above equation is solvable for each $i$ due to the compatibility condition $\displaystyle \int \nabla _y \cdot \left (D(x,y) \vec {e}_i\right ) \,\mathrm{d} y = 0$ . Then we can write $f_1$ as
    \begin{equation*} f_1(x,y,t)=\sum _i \partial _{x_i} f_0(x,t) w_i(x,y). \end{equation*}
  3. (III) $O(1)$ -terms. Collecting the $O(1)$ -terms in (A.4) and integrating with respect to $y$ lead to

    \begin{align*} \partial _t f_0(x,t)\, \bar {\pi }(x) = \nabla _x \cdot (\overline {D}(x) \nabla _x f_0(x,t)) + \nabla \cdot \left (\sum _i \partial _{x_i} f_0(x,t) \overline {G}_i(x) \right ), \end{align*}
    where
    (A.7) \begin{equation} \overline {D}(x)\,:\!=\,\int \pi (x,y) B^{-1}(y) \,\mathrm{d} y, \quad \overline {G}_i(x)\,:\!=\, \int \pi (x,y) B^{-1}(y) \nabla _y w_i(x,y) \,\mathrm{d} y, \end{equation}
    and $\displaystyle \overline {\pi } = \int \pi (x,y)\,\mathrm{d} y$ ; see (2.33).

Then the leading dynamics in terms of $f_0$ is given by

(A.8) \begin{equation} \partial _t f_0 = \frac {1}{\overline {\pi }} \nabla \cdot \left ((\overline {D}+\overline {G}) \nabla f_0\right ), \quad \text{where}\,\,\,\overline {G} = (G_1, G_2,\ldots G_n). \end{equation}

Upon defining

(A.9) \begin{equation} \overline {B}(x) = \left (\frac {\overline {D}+\overline {G}}{\overline {\pi }}\right )^{-1}, \end{equation}

in terms of $\rho =f_0 \overline {\pi }$ , (A.8) can be written as

(A.10) \begin{equation} \partial _t \rho = \nabla \cdot \left (\rho \, \overline {B}^{-1} \nabla \log \frac {\rho }{\overline {\pi }}\right ). \end{equation}

The above procedure certainly works for the simpler uniform convergence case $\pi _\varepsilon =\pi _\varepsilon ^{\text{II}}$ in (2.34) which converges uniformly to $\pi _0$ . We find it illustrative to write down the homogenized limit equation. In this case, the definition of $D$ (A.1), the cell problem (A.6) and the effective coefficients (A.7) now become

\begin{equation*} D(x,y) = \pi _0(x)B^{-1}(y),\quad \nabla _y \cdot \left (B^{-1}(y)\nabla _y w_i(y)\right ) + \nabla _y \cdot \left (B^{-1}(y) \vec {e}_i\right )=0, \end{equation*}

and

\begin{equation*} \overline {D}(x)\,:\!=\,\pi _0(x)\int B^{-1}(y) \,\mathrm{d} y, \quad \overline {G}(x)\,:\!=\, \pi _0(x) \int B^{-1}(y) \nabla _y w(y) \,\mathrm{d} y, \quad (\text{where}\,\,\,w=(w_1,w_2\ldots w_n)), \end{equation*}

so that

(A.11) \begin{equation} \overline {B}(x) = \left (\frac {\overline {D}(x) + \overline {G}(x)}{\pi _0(x)}\right )^{-1} = \left (\int B^{-1}(y) \,\mathrm{d} y + \int B^{-1}(y) \nabla _y w(y) \,\mathrm{d} y\right )^{-1}. \end{equation}

Then the effective Fokker-Planck equation is given by

(A.12) \begin{equation} \begin{aligned} \partial _t \rho = \nabla \cdot \left ( \rho \, \overline {B}^{-1} \nabla \log \frac {\rho }{\pi _0}\right ). \end{aligned} \end{equation}

Comparing (A.9) and (A.11), it is clear that there is interaction between $B_\varepsilon$ and $\pi _\varepsilon$ in the former case but not in the latter.

Appendix B. Construction of $\tilde {\xi }^\varepsilon$ for (4.15)

Here we construct an approximating sequence $\tilde \xi ^\varepsilon \rightharpoonup \tilde \xi$ in $H^1(\Omega )$ such that (4.15) holds. As mentioned, due to the spatially varying weight function $f^\varepsilon$ , in order to decouple the dependence between $D_\varepsilon$ and $f^\varepsilon$ , an extra step is needed if we want to invoke the classical $\Gamma$ -convergence result Theorem4.2. Without loss of generality, we assume that $\tilde {\xi }$ is smooth so that pointwise evaluation $\tilde {\xi }(x)$ is well-defined. This can be achieved by first convolving $\tilde {\xi }$ with a smooth kernel. We also recall by statement (1) of Lemma3.1 that $f$ is a bounded and uniformly positive function.

For this purpose, we write for any $\tilde \xi ^\varepsilon$ that

\begin{eqnarray*} &&\frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle f^\varepsilon \,\mathrm{d} x\\ & = & \frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle f_c\,\mathrm{d} x + \frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle (f-f_c)\,\mathrm{d} x +\frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle (f^\varepsilon -f)\,\mathrm{d} x, \end{eqnarray*}

where $f_c$ is some continuous function approximating $f$ . Next, we partition $\Omega$ into finitely many cubes $C_j$ and define the following piece-wise constant function

\begin{equation*} \bar {f}_c(x) = \bar {f}_{c_j} \,:\!=\, \frac {1}{|C_j|}\int _{C_j}f_c \,\mathrm{d} x \quad \text{for $x\in C_j$}. \end{equation*}

Hence

\begin{equation*} \frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle f_c\,\mathrm{d} x = \sum _j\frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle \bar {f_c}_j\,\mathrm{d} x + \sum _j\frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle (f_c-\bar {f_c}_j)\,\mathrm{d} x. \end{equation*}

With the above, we have

\begin{eqnarray*} &&\lim _{\varepsilon \to 0}\frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle f^\varepsilon \,\mathrm{d} x\\ & = & \lim _{\varepsilon \to 0} \frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle \bar {f}_c\,\mathrm{d} x +\lim _{\varepsilon \to 0} \frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle (f_c-\bar {f}_c)\,\mathrm{d} x\\ && + \lim _{\varepsilon \to 0}\frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle (f-f_c)\,\mathrm{d} x +\lim _{\varepsilon \to 0}\frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle (f^\varepsilon -f)\,\mathrm{d} x. \end{eqnarray*}

Now on each $C_j$ , we can invoke Theorem4.2 to state the existence of recovery sequence $\tilde \xi ^\varepsilon _j\rightharpoonup \tilde \xi$ in $H^1_0(C_j) + {g_c}_j$ , where ${g_c}_j = \tilde \xi \Big |_{\partial C_j}$ such that

(B.1) \begin{equation} \lim _{\varepsilon \to 0} \frac 12\int _{C_j}\langle \nabla \tilde \xi ^\varepsilon _j, D_\varepsilon \nabla \tilde \xi ^\varepsilon _j \rangle \bar {f_c}_j\,\mathrm{d} x =\frac 12\int _{C_j}\langle \nabla \tilde \xi ,(\overline {D}+\overline {G})\nabla \tilde \xi \rangle \bar {f_c}_j\,\mathrm{d} x. \end{equation}

Next let $\tilde \xi ^\varepsilon = \tilde \xi ^\varepsilon _j$ on $C_j$ . Note that $\tilde \xi ^\varepsilon$ thus defined is a global $H^1$ -function on $\Omega$ . As there are only finitely many cubes $C_j$ , we can conclude that

(B.2) \begin{equation} \lim _{\varepsilon \to 0} \frac 12\int _{\Omega }\langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle \bar {f_c}\,\mathrm{d} x =\frac 12\int _{\Omega }\langle \nabla \tilde \xi ,(\overline {D}+\overline {G})\nabla \tilde \xi \rangle \bar {f_c}\,\mathrm{d} x. \end{equation}

Hence we have

(B.3) \begin{eqnarray} &&\lim _{\varepsilon \to 0}\frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle f^\varepsilon \,\mathrm{d} x\nonumber \\ & = & \frac 12\int _{\Omega }\langle \nabla \tilde \xi ,(\overline {D}+\overline {G})\nabla \tilde \xi \rangle f\,\mathrm{d} x \nonumber \\ && +\frac 12\int _{\Omega }\langle \nabla \tilde \xi ,(\overline {D}+\overline {G})\nabla \tilde \xi \rangle (\bar {f_c}-f)\,\mathrm{d} x+\lim _{\varepsilon \to 0} \frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle (f_c-\bar {f}_c)\,\mathrm{d} x \end{eqnarray}
(B.4) \begin{eqnarray} &&+ \lim _{\varepsilon \to 0}\frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle (f-f_c)\,\mathrm{d} x +\lim _{\varepsilon \to 0}\frac 12\int _\Omega \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle (f^\varepsilon -f)\,\mathrm{d} x.\\[6pt]\nonumber \end{eqnarray}

A final ingredient we need is that the sequence of functions $\langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle$ is equi-integrable: for all $\sigma \gt 0$ , there exists a $\delta \gt 0$ such that for any $S\subset \Omega$ with $|S| \leq \delta$ , then

(B.5) \begin{equation} \int _S \langle \nabla \tilde \xi ^\varepsilon , D_\varepsilon \nabla \tilde \xi ^\varepsilon \rangle \leq \sigma \,\,\,\text{holds for all }\varepsilon \gt 0. \end{equation}

Once this is shown, we can then make use of Lusin and Egorov Theorems to claim that all the terms in (B.3) and (B.4) converge to zero as $\varepsilon \to 0$ : up to arbitrarily small measures, $f$ equals a continuous function $f_c$ , and the convergence of $f^\varepsilon$ to $f$ is uniform. We recall again that $f^\varepsilon$ and $f$ are uniformly bounded functions.

We now show that the sequence of functions $\tilde {\xi }^\varepsilon$ can be constructed so as it satisfies (B.5). Without loss of generality, we replace $\tilde {\xi }$ by a continuous and piece-wise affine function – this can be achieved by an approximation using Galerkin or finite element method (given that $\tilde {\xi }$ is smooth). Then we have a partition of $\Omega$ into a collection of polyhedrons. For simplicity, we can further assume that these polyhedrons are the same as the $C_j$ on each of which $\bar {f}_c$ is constant. Now we construct $\tilde \xi ^\varepsilon$ according to the following procedure.

First, we define $A(x,y)=D(x,y)=\pi (x,y)B^{-1}(y)$ . By the smooth assumption of $\pi$ and $B$ , we have that $A$ is smooth in $y\in \mathbb T^n$ and $x\in C_j$ .

Now, for $x\in C_j$ , as $\nabla \tilde {\xi }$ is a constant vector $p_j\in \mathbb{R}^n$ , the homogenized matrix $\overline {A}(x)$ in Theorem4.2 is given by (4.8) and is repeated here for convenience.

\begin{equation*} \big \langle \overline {A}(x) p_j, p_j\big \rangle = \inf \left \{\int _{\mathbb T^n} \left \langle A\left (x,y\right )(p_j + \nabla v),\, (p_j + \nabla v)\right \rangle \,\mathrm{d} y, \quad v\in H^1(\mathbb T^n)\right \}. \end{equation*}

The $\inf$ above is achieved by $v_j(y)=|p_j|\hat {w}_j(x,y)$ where $\hat {w}_j$ solves the following cell-problem:

\begin{equation*} \text{div}_y\left (A(x,y)\nabla \hat {w}_j\right )=-\text{div}_y\left (A(x,y)\frac {p_j}{|p_j|}\right ),\,\,\,\hat {w}_j(x,\cdot )\in H^1(\mathbb T^n),\,\,\,\int _{\mathbb T^n}\hat {w}_j(x,y)\,\,\mathrm{d} y = 0. \end{equation*}

The smoothness assumption on $A$ implies that

\begin{equation*} \|\hat {w}_j(x,\cdot ),\,\,\,\nabla _y \hat {w}_j(x,\cdot ),\,\,\, \nabla _x \hat {w}_j(x,\cdot )\|_{L^\infty (\mathbb T^2)} \leq C \end{equation*}

for some constant $C$ that does not depend on $x$ and $\varepsilon$ .

Next, let $0 \lt d_1 \lt d_2$ be two positive numbers. For each $C_j$ , there exists a smooth subdomain $C_j'$ of $C_j$ such that $d_1\varepsilon \leq \text{dist}(\partial C_j', \partial C_j) \leq d_2\varepsilon$ . Then we define a cut-off function $\eta ^\varepsilon _j$ on $C_j$ satisfying: (i) $0\leq \eta ^\varepsilon _j \leq 1$ on $C_j$ ; (ii) $\eta ^\varepsilon _j = 1$ on $C_j'$ ; and (iii) $\eta ^\varepsilon _j(x)\longrightarrow 0$ smoothly as $x\longrightarrow \partial C_j$ so that $\eta ^\varepsilon _j \in C^\infty _0(C_j)$ ; (iv) $\|\varepsilon \nabla \eta ^\varepsilon _j\|_{L^\infty (C_j)} \leq C$ for an $\varepsilon$ -independent constant $C$ .

With the above, suppose $\tilde {\xi }(x) = \sum _j\big [\alpha _j + \langle p_j,x\rangle \big ] \chi _{C_j}(x)$ , where $\chi _{C_j}$ is the characteristic function of $C_j$ . We then define

\begin{equation*} \tilde {\xi }^\varepsilon (x) = \sum _j\left [\alpha _j + \langle p_j,x\rangle + \varepsilon \eta ^\varepsilon _j(x)|p_j|\hat {w}_j(x,\frac {x}{\varepsilon }) \right ]\chi _{C_j}(x). \end{equation*}

Then we have,

\begin{equation*} \nabla \tilde {\xi }^\varepsilon (x) = \sum _j \left [p_j + \eta ^\varepsilon _j(x)|p_j|\nabla _y\hat {w}_j(x,\frac {x}{\varepsilon }) + \varepsilon \eta ^\varepsilon _j(x)|p_j|\nabla _x\hat {w}_j(x,\frac {x}{\varepsilon }) + \varepsilon \nabla \eta ^\varepsilon _j(x)|p_j|\hat {w}_j(x,\frac {x}{\varepsilon })\right ]\chi _{C_j}(x). \end{equation*}

By the aforementioned estimates for $\hat {w}_j$ and $\eta ^\varepsilon _j$ , we can conclude that $|\nabla \tilde {\xi }^\varepsilon (x)| \leq C|p_j|$ for $x\in C_j$ and hence

\begin{equation*} |\nabla \tilde {\xi }^\varepsilon (x)| \leq C|\nabla \tilde {\xi }(x)| \,\,\,\text{for all $x\in \Omega $.} \end{equation*}

(Here we make use of the $L^\infty (\mathbb T^n)$ estimates for $\hat {w}_j$ but we could also resort to the weaker $L^2(\mathbb T^n)$ estimates.) Note that the above statement holds uniformly for all $\varepsilon \ll 1$ . We can then conclude (B.5) as $\displaystyle \int _{\Omega }|\nabla \tilde {\xi }|^2 \,\mathrm{d} x$ is finite.

The fact that $\left \{\tilde {\xi }^\varepsilon \right \}_{\varepsilon \gt 0}$ is a recovery sequence for $\tilde {\xi }$ is due to the properties that $\tilde {\xi }^\varepsilon \longrightarrow \tilde {\xi }$ in $L^2(\Omega )$ and $\nabla \tilde {\xi }^\varepsilon$ differs from the “optimal” oscillatory functions $\left \{p_j + |p_j|\nabla _y\hat {w}_j(x,\frac {x}{\varepsilon })\right \}_j$ only on $\bigcup _j C_j\backslash C_j'$ which has vanishing measure as $\varepsilon \longrightarrow 0$ . More precisely, we have

\begin{eqnarray*} &&\lim _{\varepsilon \to 0}\int \big \langle A(x,\frac {x}{\varepsilon })\nabla \tilde {\xi }^\varepsilon ,\nabla \tilde {\xi }^\varepsilon \big \rangle \bar {f}_c\,\mathrm{d} x = \lim _{\varepsilon \to 0}\sum _j\int _{C_j}\big \langle A(x,\frac {x}{\varepsilon })\nabla \tilde {\xi }^\varepsilon ,\nabla \tilde {\xi }^\varepsilon \big \rangle \bar {f}_{c_j}\,\mathrm{d} x\\ &=&\sum _j\int _{C_j}\int _{\mathbb T^n}\Big \langle A(x,y)\big (p_j + |p_j|\nabla _y\hat {w}_j(x,y)\big ), \big (p_j + |p_j|\nabla _y\hat {w}_j(x,y)\big )\Big \rangle \,\mathrm{d} y \,\bar {f}_{c_j}\,\mathrm{d} x\\ &=& \sum _j\int _{C_j}\langle \bar {A}(x)p_j,p_j\rangle \bar {f}_{c_j}\,\mathrm{d} x = \int \big \langle \bar {A}(x)\nabla \tilde {\xi },\nabla \tilde {\xi }\big \rangle \bar {f}_c\,\mathrm{d} x. \end{eqnarray*}

The above computation is classical in the theory of two-scale convergence – see [Reference Allaire3, Prop. 1.14(i), and equations (2.10), (2.11)]. Note also that (B.1) and (B.2) hold as $\bar {f}_c$ is constant on the $C_j$ ’s.

We can now conclude (4.15).

References

Ambrosio, L., Brué, E., Semola, D., et al. (2021) Lectures on Optimal Transport, Springer.CrossRefGoogle Scholar
Ambrosio, L., Gigli, N. & Savaré, G. (2008) Gradient Flows in Metric Spaces and in the Space of Probability Measures, Springer Science & Business Media.Google Scholar
Allaire, G. (1992) Homogenization and two-scale convergence. SIAM J. Math. Anal. 23 (6), 14821518.CrossRefGoogle Scholar
Arnrich, S., Mielke, A., Peletier, M. A., Savaré, G. & Veneroni, M. (2012) Passing to the limit in a Wasserstein gradient flow: From diffusion to reaction. Calc. Var. Partial. Dif. 44 (3–4), 419454.CrossRefGoogle Scholar
Benamou, J. & Brenier, Y. (2000) A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numer. Math. 84 (3), 375393.CrossRefGoogle Scholar
Bernard, P. & Buffoni, B. (2007) Optimal mass transportation and mather theory. J. Eur. Math. Soc. 9 (1), 85121.CrossRefGoogle Scholar
Balcan, D., Colizza, V., Gonçalves, B., Hu, H., Ramasco, J. J. & Vespignani, A. (2009) Multiscale mobility networks and the spatial spreading of infectious diseases. Proc. Natl. Acad. Sci. 106 (51), 2148421489.CrossRefGoogle ScholarPubMed
Bernot, M., Caselles, V. & Morel, J.-M. (2008) Optimal Transportation Networks: Models and Theory, Springer.Google Scholar
Budhiraja, A., Dupuis, P. & Fischer, M. (2012) Large deviation properties of weakly interacting processes via weak convergence methods. Ann. Probab. 40 (1), 74102.CrossRefGoogle Scholar
Bensoussan, A., Lions, J.-L. & Papanicolaou, G. (2011). Asymptotic Analysis for Periodic Structures, Vol. 374, American Mathematical Soc.Google Scholar
Buttazzo, G., Pratelli, A., Solimini, S. & Stepanov, E. (2008). Optimal Urban Networks Via Mass Transportation, Springer Science & Business Media.Google Scholar
Braides, A. (2002). Gamma-Convergence for Beginners, Vol. 22, Clarendon Press.CrossRefGoogle Scholar
Braides, A. (2006). A handbook of Γ-convergence. In Handbook of Differential Equations: Stationary Partial Differential Equations, Vol. 3, Elsevier, pp. 101213.Google Scholar
Carrillo, J. A., Craig, K. & Yao, Y. (2019) Aggregation-diffusion equations: dynamics, asymptotics, and singular limits. In Active Particles, Volume 2: Advances in Theory, Models, and Applications, pp. 65108.CrossRefGoogle Scholar
Carrillo, J. A., Delgadino, M. G. & Pavliotis, G. A. (2020) A $\lambda$ -convexity based proof for the propagation of chaos for weakly interacting stochastic particles. J. Funct. Anal. 279 (10), 108734.CrossRefGoogle Scholar
Dondl, P., Frenzel, T. & Mielke, A. (2019) A gradient system with a wiggly energy and relaxed EDP-convergence. ESAIM: Control Optim. Calc. Var. 25, 68.Google Scholar
Delgadino, M. G., Gvalani, R. S., Pavliotis, & G. A. (2021) On the diffusive-mean field limit for weakly interacting diffusions exhibiting phase transitions. Arch. Ration. Mech. Anal. 241 (1), 91148.CrossRefGoogle Scholar
Maso, G. D. (2012). An Introduction to Γ-Convergence, Vol. 8, Springer Science & Business Media.Google Scholar
Dupuis, P. & Spiliopoulos, K. (2012) Large deviations for multiscale diffusion via weak convergence methods. Stoch. Proc. Appl. 122(4), 19471987.CrossRefGoogle Scholar
Forkert, D., Maas, J. & Portinale, L. (2022) Evolutionary Γ-convergence of entropic gradient flow structures for Fokker-Planck equations in multiple dimensions. SIAM J. Math. Anal. 54 (4), 42974333.CrossRefGoogle Scholar
Gladbach, P., Kopfer, E. & Maas, J. (2020) Scaling limits of discrete optimal transport. SIAM J. Math. Anal. 52 (3), 27592802.CrossRefGoogle Scholar
Gladbach, P., Kopfer, E., Maas, J., & Portinale, L. (2020) Homogenisation of one-dimensional discrete optimal transport. J. Math. Pures Appl. 139, 204234.CrossRefGoogle Scholar
Gladbach, P., Kopfer, E., Maas, J., & Portinale, L. (2023) Homogenisation of dynamical optimal transport on periodic graphs. Calc. Var. Partial Dif. Eq. 62 (5), 175.Google ScholarPubMed
Gao, Y., & Liu, J.-G. (2023) Large deviation principle and thermodynamic limit of chemical master equation via nonlinear semigroup. Multiscale Model. Simul. 21 (4), 15341569.CrossRefGoogle Scholar
Gao, Y., Liu, J.-G., & Liu, Z. (2024) Some properties on the reversibility and the linear response theory of Langevin dynamics. Acta Appl. Math. 194 (1), 12.CrossRefGoogle Scholar
Gigli, N. & Maas, J. (2013) Gromov-Hausdorff convergence of discrete transportation metrics. SIAM J. Math. Anal. 45 (2), 879899.CrossRefGoogle Scholar
Gangbo, W., & Tudorascu, A. (2012) Homogenization for a class of integral functionals in spaces of probability measures. Adv. Math. 230 (3), 11241173.CrossRefGoogle Scholar
Hraivoronska, A., Schlichting, , & Tse, O. (2024) Variational convergence of the Scharfetter-Gummel scheme to the aggregation-diffusion equation and vanishing diffusion limit. Numer. Math. 156 (6), 22212292.CrossRefGoogle Scholar
Hoeksema, J., & Tse, O. (2023) Generalized gradient structures for measure-valued population dynamics and their large-population limit. Calc. Var. Partial Dif. Eq. 62 (5), 158.CrossRefGoogle Scholar
Hraivoronska, A., & Tse, O. (2023) Diffusive limit of random walks on tessellations via generalized gradient flows. SIAM J. Math. Anal. 55 (4), 29482995.CrossRefGoogle Scholar
Jordan, R., Kinderlehrer, D., & Otto, F. (1998) The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29 (1), 117.CrossRefGoogle Scholar
Kivelä, M., Arenas, A., Barthelemy, M., Gleeson, J. P., Moreno, Y. & Porter, M. A. (2014) Multilayer networks. J. Complex Netw. 2 (3), 203271.CrossRefGoogle Scholar
Kantorovich, L. V. (1942) On the translocation of masses. Dokl. Akad. Nauk. USSR 37, 199201. English translation in J. Math. Sci. 133(4) (2006), 1381--1382.Google Scholar
Liero, M., Mielke, A., Peletier, M. A. & Renger, D. R. M. (2017) On microscopic origins of generalized gradient structures. Discrete Contin. Dyn. Syst. Ser. S 10 (1), 135.Google Scholar
Marcellini, P. (1978) Periodic solutions and homogenization of non linear variational problems. Ann. Mat. Pura Appl. 117 (1), 139152.CrossRefGoogle Scholar
Mielke, A. (2016) On evolutionary Γ-convergence for gradient systems. In Macroscopic and Large Scale Phenomena: Coarse Graining, Mean Field Limits and Ergodicity, Springer, pp. 187249.CrossRefGoogle Scholar
Maas, J. & Mielke, A. (2020) Modeling of chemical reaction systems with detailed balance using gradient structures. J. Stat. Phys. 181 (6), 22572303.CrossRefGoogle ScholarPubMed
Mielke, A., Montefusco, A., & Peletier, M. A. (2021) Exploring families of energy-dissipation landscapes via tilting: Three types of EDP convergence. Contin. Mech. Thermodyn. 33 (3), 611637.CrossRefGoogle Scholar
Monge, G. (1781) Mémoire sur la théorie des déblais et des remblais. In Histoire de l'Académie Royale des Sciences de Paris, 666704.Google Scholar
Otto, F. (2001) The geometry of dissipative evolution equations: The porous medium equation. Commun. Partial Differ. Equ. 26 (1–2), 101174.CrossRefGoogle Scholar
Peyré, G., Cuturi, M., et al. (2019) Computational optimal transport: With applications to data science. Found. Trends® Mach. Learn. 11 (5–6), 355607.CrossRefGoogle Scholar
Pratelli, A. (2007) On the equality between Monge’s infimum and Kantorovich’s minimum in optimal mass transportation. Ann. Inst. Henri Poincaré Probab. Stat. 43(1), 113.CrossRefGoogle Scholar
Rachev, S. T. & Rüschendorf, L. (1998). Mass Transportation Problems: Volume I: Theory, Vol. 1, Springer Science & Business Media.Google Scholar
Rachev, S. T. & Rüschendorf, L. (1998) Mass Transportation Problems: Volume II: Applications, Vol. 2, Springer Science & Business Media.Google Scholar
Rubinstein, J. & Wolansky, G. (2017) Geometrical optics and optimal transport. J. Opt. Soc. Am. A 34 (10), 18171823.CrossRefGoogle ScholarPubMed
Santambrogio, F. (2015) Optimal transport for applied mathematicians. Birkäuser, NY 55 (58–63), 94.Google Scholar
Serfaty, S. (2011) Gamma-convergence of gradient flows on hilbert and metric spaces and applications. Discrete Contin. Dyn. Syst. A 31 (4), 14271451.CrossRefGoogle Scholar
Sánchez-Palencia, E. (1980) Non-homogeneous media and vibration theory. Lecture Note in Physics, Vol. 320, Springer-Verlag, pp. 5765.Google Scholar
Sandier, E. & Serfaty, S. (2004) Gamma‐convergence of gradient flows with applications to Ginzburg‐Landau. Comm. Pure Appl. Math. 57 (12), 16271672.CrossRefGoogle Scholar
Schlichting, A. & Seis, C. (2022) The Scharfetter-Gummel scheme for aggregation–diffusion equations. IMA J. Numer. Anal. 42 (3), 23612402.CrossRefGoogle Scholar
Stefanelli, U. (2008) The Brezis-Ekeland principle for doubly nonlinear equations. SIAM J. Control Optim. 47 (3), 16151642.CrossRefGoogle Scholar
Villani, C. (2003) Topics in Optimal Transportation, Vol. 58, American Mathematical Soc.Google Scholar
Villani, C. (2009) Optimal Transport: Old and New, Vol. 338, Springer.CrossRefGoogle Scholar