1. Introduction
In recent years, neural networks have achieved remarkable success in a wide range of classification and learning tasks. However, it is now well-known that these networks do not learn in the same ways as humans and will fail in specific settings. In particular, a wide range of recent work has shown that they fail to be robust to specially designed adversarial attacks [Reference Engstrom, Tran, Tsipras, Schmidt and Madry11, Reference Eykholt, Evtimov and Fernandes12, Reference Goodfellow, Shlens and Szegedy18, Reference Qin, Carlini, Goodfellow, Cottrell and Raffel23, Reference Szegedy, Zaremba and Sutskever27].
One general approach for mitigating this problem is to include an adversary in the training process. A simple mathematical formulation of this method for 0–1 loss [Reference Madry, Makelov, Schmidt, Tsipras and Vladu19], in the large-data or population limit, is to consider the optimization problem
where the
$y$
variables represent observed classification labels and
$x$
variables represent features (we give a more precise description of our setting in the next section). This can be seen as a robust optimization problem, where an adversary is allowed to modify the inputs to our classifier up to some distance
$\varepsilon$
. When
$\varepsilon =0$
, this corresponds to the standard Bayes risk.
Recent work has significantly expanded our mathematical understanding of this problem. Our work directly builds upon [Reference Bungert, Trillos and Murray5], which rewrites the previous functional as
where
$\mathrm{Per}_{\varepsilon }$
is a special data-adapted perimeter, whose definition is given in (4). This is related to a growing body of recent work, for example, showing that
$\mathrm{Per}_{\varepsilon }$
converges to the (weighted) classical perimeter [Reference Bungert and Stinson7], and demonstrating links between the adversarially robust training problem and mean curvature flow [Reference Bungert, Laux and Stinson6, Reference Trillos and Murray16]. This literature seeks to provide a more complete description of the effect of
$\varepsilon$
on adversarially robust classifiers in a geometric sense. This relates to the study of nonlocal perimeter minimization and flows [Reference Cesaroni, Dipierro, Novaga and Valdinoci8, Reference Cesaroni and Novaga9, Reference Chambolle, Morini and Ponsiglione10], where the unweighted
$\varepsilon$
-perimeter is considered. As training these robust classifiers is generally a challenging task, one overarching goal of this type of work is to provide a more precise understanding of the effect of
$\varepsilon$
, practical means for approximating that effect and the impact on classifier complexity: each of these has the potential to improve more efficient solvers for these problems.
Various modifications of the robust classification energy
$J_\varepsilon$
have been proposed. For example, some authors relax either the criteria for an adversarial attack or the loss function to interpolate between the accurate yet brittle Bayes classifier and the robust yet costly minimizers of the adversarial training problem [Reference Bungert, Trillos, Jacobs, McKenzie, Nikolić and Wang4, Reference Heredia, Pydi, Meunier, Negrevergne and Chevaleyre17, Reference Raman, Subedi and Tewari24, Reference Robey, Chamon, Pappas, Hassani, Chaudhuri, Jegelka, Song, Szepesvari, Niu and Sabato25]. Still others employ optimal transport techniques to study distributionally robust optimization, where instead of perturbing data points, the adversary perturbs the underlying data distribution [Reference Frank and Niles-Weed13, Reference Trillos, Jacobs and Kim14, Reference Trillos, Kim and Jacobs15, Reference Pydi and Jog21, Reference Pydi and Jog22].
The main goal of this paper is to study the convergence of solutions to the adversarially robust classification problem towards the original Bayes classification task for data-perturbing models. We build a framework that allows us to consider a wide range of adversarial settings at the same time. In doing so, we obtain Hausdorff convergence results, which are generally much stronger than the
$L^1$
-type results previously obtained [Reference Bungert, Trillos, Jacobs, McKenzie, Nikolić and Wang4]. These results parallel many of the basic results in the study of variational problems involving perimeters, wherein one first proves stability in
$L^\infty$
spaces, and then subsequently proves stronger regularity results for minimizers. In a similar way, we see our results as a building block towards stronger regularity results for the adversarially robust classification problem, which have received significant attention in the literature. We begin by concretely describing the setup of our problem and then giving an informal statement of our results along with some discussion.
1.1. Setup
Let the Euclidean space
$\mathbb{R}^d$
equipped with the metric
$\mathrm{d}(\cdot ,\cdot )$
represent the space of features for a data point, and let
$\mathcal{B}(\mathbb{R}^d)$
be the set of all Borel measurable subsets of
$\mathbb{R}^d$
. We will let
$\mathcal L^d$
be the
$d$
-dimensional Lebesgue measure. We are considering a supervised binary classification setting, in which training pairs
$(x,y)$
are distributed according to a probability measure
$\mu$
over
$\mathbb{R}^d\times \{0,1\}$
. Here
$y$
represents the class associated with a given data point, and the fact that
$y \in \{0,1\}$
corresponds to the binary classification setting. Let
$\rho$
denote the
$\mathbb{R}^d$
marginal of
$\mu$
, namely
$\rho (A) = \mu (A\times \{0,1\})$
. We decompose
$\rho \in \mathcal{P}(\mathbb{R}^d)$
into
$\rho = w_0\rho _0 + w_1\rho _1$
where
$w_i = \mu (\mathbb{R}^d \times \{i\})$
, and the conditional probability measure
$\rho _i \in \mathcal{P}(\mathbb{R}^d)$
for a set
$A\in \mathcal{B}(\mathbb{R}^d)$
is
for
$i = 0,1$
. All of these measures are assumed to be Radon measures.
In binary classification, we associate a set
$A\in \mathcal{B}(\mathbb{R}^d)$
with a classifier, meaning that
$x\in \mathbb{R}^d$
is assigned label 1 when
$x\in A$
and
$x$
is assigned the label 0 when
$x\in A^{\mathsf{c}}$
. Unless otherwise stated, we will assume all classifiers
$A\in \mathcal{B}(\mathbb{R}^d).$
The Bayes classification problem for the 0–1 loss function is given by
In this work, we will only consider the 0–1 loss function, which allows us to restrict our attention to indicator functions for minimizers of (1). We refer to minimizes of the Bayes risk as Bayes classifiers.
Remark 1.1 (Uniqueness of Bayes Classifiers). If we assume that
$\rho$
has a density everywhere on
$\mathbb{R}^d$
and identify the measures
$\rho _i$
with the density at
$x\in \mathbb{R}^d$
given by
$\rho _i(x)$
, then we can describe the uniqueness, or lack thereof, of Bayes classifiers in terms of those densities. Specifically, Bayes classifiers are unique up to the set
$\{w_1\rho _1 = w_0\rho _0\}$
, which may be a set of positive measure depending on
$\mu$
. We define maximal and minimal Bayes classifiers (in the sense of set inclusion) by
When
$w_0\rho _0-w_1\rho _1 \in C^1$
and
$|w_0\nabla \rho _0 - w_1\nabla \rho _1| \gt \alpha \gt 0$
on the set
$\{w_0\rho _0 = w_1\rho _1\}$
, the Bayes classifier is unique up to sets of
$\rho$
measure zero. In the case where
$\rho$
is supported everywhere, Bayes classifiers are unique up to sets of
$\mathcal L^d$
measure zero. Whenever we refer to the Bayes classifier as unique, we mean unique in this measure-theoretic sense. Later on in Assumption 3.11, we will refer to such uniqueness as the ’non-degeneracy’ of the Bayes classifier and represent the unique classifier by
$A_0$
.
Throughout this paper, we will consider and seek to unify two optimization problems from the literature that aim to train robust classifiers. First, we consider the adversarial training problem, which trains classifiers to mitigate the effect of worst-case perturbations [Reference Madry, Makelov, Schmidt, Tsipras and Vladu19]. The adversarial training problem is
\begin{equation*} \inf _{A\in \mathcal{B}(\mathbb{R}^d)}\mathbb{E}_{(x,y)\sim \mu }\Bigg [\sup _{\tilde {x}\in B_{\mathrm{d}}(x,\varepsilon )}|{\unicode{x1D7D9}}_A(\tilde {x}) - y|\Bigg ], \end{equation*}
where
$B_{\mathrm{d}}(x,\varepsilon )$
is the open metric ball of radius
$\varepsilon \gt 0$
. The existence of solutions in this setting was previously established [Reference Bungert, Trillos and Murray5]. The parameter
$\varepsilon$
is called the adversarial budget, and it represents the strength of the adversary. By using the open ball, we are following the conventions set in the previous work on convergence of optimal adversarial classifiers [Reference Bungert and Stinson7]. Other works have utilized the closed ball due to consistency with the standard classification problem when
$\varepsilon = 0$
, but that comes at the price of added measurability concerns: see Remark 1.2 for more details.
An equivalent form of this variational problem (see [Reference Bungert, Trillos and Murray5]), wherein the problem is rewritten using a nonlocal perimeter, is
with the
$\varepsilon$
-perimeter defined by
This normalization with
$\varepsilon$
in the denominator is chosen so that we recover the (weighted) classical perimeter as
$\varepsilon \to 0^+$
. In this sense, we consider the nonlocal
$\varepsilon$
-perimeter a data-adapted approximation of the classical perimeter. From the variational problem given by (3), we define the adversarial classification risk for a classifier
$A\in \mathcal{B}(\mathbb{R}^d)$
as
When considering the
$\varepsilon$
-perimeter, the region affected by adversarial perturbations must be within distance
$\varepsilon$
of the decision boundary of the classifier. As such, it will be helpful to be able to discuss sets that either include or exclude the
$\varepsilon$
-perimeter region. From mathematical morphology [Reference Serra26], for a set
$A\in \mathbb{R}^d$
and
$\varepsilon \gt 0$
, we define the
-
•
$\varepsilon$
-dilation of
$A$
as
$A^\varepsilon \,:\!=\, \{x\in \mathbb{R}^d\,: \mathrm{d}(x,A) \lt \varepsilon \}$
, -
•
$\varepsilon$
-erosion of
$A$
as
$A^{-\varepsilon }\,:\!=\, \{x\in \mathbb{R}^d\,:\, \mathrm{d}(x,A^{\mathsf{c}}) \ge \varepsilon \}$
.
Using this notation, one can equivalently express the
$\varepsilon$
-perimeter as
Inspired by the notation in geometric measure theory, we also define the relative
$\varepsilon$
-perimeter for a classifier
$A\in \mathcal{B}(\mathbb{R}^d)$
with respect to a set
$E\in \mathcal{B}(\mathbb{R}^d)$
by
Remark 1.2 (Previous work for the adversarial training problem (3)). The worst-case adversarial training model was initially proposed for general loss functions by [Reference Madry, Makelov, Schmidt, Tsipras and Vladu19]. When the loss function is specified to be the 0–1 loss function, previous work has established the existence and considered the equivalence of minimizers to (3) for the open and closed ball models [Reference Awasthi, Frank and Mohri2, Reference Bungert, Trillos and Murray5, Reference Trillos, Jacobs and Kim14, Reference Pydi and Jog22]. Although the open and closed ball models are similar, there are some subtle differences that must be considered. While measurability of
$\sup _{\tilde {x} \in B_{\mathrm{d}}(x,\varepsilon )} {\unicode{x1D7D9}}_A(\tilde {x})$
for a Borel set
$A$
in the open ball model is trivial, the same cannot be said for the closed ball model; to address these measurability concerns in the closed ball model, one must employ the universal
$\sigma$
-algebra instead of the Borel
$\sigma$
-algebra. We emphasize that we choose to study the open ball model as this simplifies the analysis and measurability concerns associated with the closed ball model, and the open ball model was used for prior convergence results [Reference Bungert and Stinson7].
Some papers consider a surrogate adversarial risk which is more computationally tractable [Reference Awasthi, Frank, Mao, Mohri and Zhong1, Reference Bao, Scott and Sugiyama3, Reference Frank and Niles-Weed13, Reference Meunier, Ettedgui, Pinot, Chevaleyre and Atif20]; others explore necessary conditions and geometric properties of minimizers [Reference Bungert, Laux and Stinson6, Reference Bungert and Stinson7, Reference Trillos and Murray16]. Of particular note to the present work is the study of the limit of minimizers of
$J_\varepsilon$
. Theorem 2.5 states [Reference Bungert and Stinson7].
Theorem (Conditional convergence of adversarial training). Under the conditions of Theorems 2.1 and 2.3 from [Reference Bungert and Stinson7] and assuming the source condition, any sequence of solutions to
possesses a subsequence converging to a minimizer of
The convergence is proven in the
$L^1(\Omega )$
topology for some open, bounded Lipschitz domain
$\Omega \subset \mathbb{R}^d$
. Here,
$\mathrm{Per}(\cdot ;\,\rho )$
is a weighted version of the classical perimeter. The source condition mentioned provides minor regularity assumptions on the Bayes classifier. Note that in the referenced theorem, there are additional assumptions on the underlying data distribution
$\rho$
. In our work, we strengthen this convergence result by proving Hausdorff convergence of minimizers of (3) to the Bayes classifier with similar assumptions on
$\rho$
.
The second optimization problem, which serves as an important model case, interpolates between the accuracy on clean data of the Bayes classifier and the robustness of the adversarial training problem minimizers. The probabilistic adversarial training problem for
$p\in [0,1)$
and probability measures
$\mathfrak p_x\in \mathcal P(\mathbb{R}^d)$
for each
$x\in \mathbb{R}^d$
is
with the probabilistic perimeter defined by
and the set functions
$\Lambda _p^i$
for
$i = 0,1$
defined by
Here,
$\mathbb{P}(x'\in A\,:\, x'\sim \mathfrak p_x)$
is the probability that a point
$x'$
sampled from the probability distribution
$\mathfrak p_x$
belongs to the set
$A$
. We notice that (6) takes the same form as (4) where we replace the metric boundary fattening by a probabilistic fattening. We define the probabilistic adversarial classification risk for a classifier
$A\in \mathcal{B}(\mathbb{R}^d)$
as
The relative probabilistic perimeter for a classifier
$A\in \mathcal{B}(\mathbb{R}^d)$
with respect to a set
$E\in \mathcal{B}(\mathbb{R}^d)$
is given by
To make the connection with the
$\varepsilon$
-perimeter more concrete, we will restrict our attention to certain families of probability measures that scale appropriately with
$\varepsilon$
for the remainder of this work.
Assumption 1.3. Let
$\xi \,:\, \mathbb{R}^d \to [0,\infty )$
such that
$\xi \in L^1(\mathbb{R}^d)$
,
$\int _{\mathbb{R}^d}\xi (z) \, dz = 1$
,
$\xi (z) = 0$
if
$|z| \gt 1$
, and
$\xi (z) \gt c$
for some constant
$c\gt 0$
and for
$|z| \leq 1$
. For
$x,x' \in \mathbb{R}^d$
, we assume that
We will now write ProbPer
$_{\varepsilon ,p}$
and refer to it as the probabilistic
$\varepsilon$
-perimeter to emphasize the dependence on the adversarial budget. Unlike with the Per
$_\varepsilon$
, we do not normalize ProbPer
$_{\varepsilon ,p}$
with respect to
$\varepsilon$
. We also write
$J_{\varepsilon ,p}$
instead of
$J_p$
and
$\Lambda _{\varepsilon ,p}^i$
instead of
$\Lambda _{p}^i$
for
$i=0,1$
. Under Assumption 1.3,
$\Lambda _{\varepsilon ,p}^0(A)$
and
$\Lambda _{\varepsilon ,p}^1(A)$
are subsets of the
$\varepsilon$
-perimeter regions
$A^\varepsilon \setminus A$
and
$A\setminus A^{-\varepsilon }$
, respectively. Specifically, this means that
$J_{\varepsilon ,p}(A)\le J_{\varepsilon }(A)$
for all
$A\in \mathcal{B}(\mathbb{R}^d)$
when the underlying data distribution
$\mu$
is the same. We note that probabilistic
$\varepsilon$
-perimeter that most closely coincides with the
$\varepsilon$
-perimeter when
$p = 0$
and
$\mathfrak p_{x,\varepsilon } = \text{Unif}(B_{\mathrm{d}}(x,\varepsilon ))$
for each
$x\in \mathbb{R}^d$
.
Remark 1.4 (Previous work for the probabilistic adversarial training problem (5)). This form of the problem was proposed by [Reference Bungert, Trillos, Jacobs, McKenzie, Nikolić and Wang4] as a revision of probabilistically robust learning [Reference Robey, Chamon, Pappas, Hassani, Chaudhuri, Jegelka, Song, Szepesvari, Niu and Sabato25]. Although ProbPer
$_p$
is not a perimeter in the sense that it has not been shown to be submodular and it does not admit a coarea formula, we follow the convention from [Reference Bungert, Trillos, Jacobs, McKenzie, Nikolić and Wang4] and refer to ProbPer
$_p$
as the probabilistic perimeter. Importantly, existence of minimizers has not been proved for either the original or modified probabilistic adversarial training problem. There have also been no results pertaining to the convergence of minimizers, provided they exist, to the Bayes classifier for either version.
However, [4] proposes and proves the existence of minimizers for a related probabilistically robust
$\Psi$
risk
for suitable functions
$\Psi \,:\,[0,1]\to [0,1]$
where the
$\Psi$
-perimeter takes the form
However, the convergence results proved in this paper do not currently extend to the
$\Psi$
-perimeter case. The details will be further discussed in Remark 4.17.
If we juxtapose the variational problem for the adversarial training problem (3) and the probabilistic adversarial training problem (5), both risks are of the form
where the data-adapted perimeters can be expressed as
We seek to develop a unifying framework for various adversarial models, including, but not limited to, (3) and (5). These types of attacks are designed to flexibly capture a range of adversarial behaviours, not just the idealized ones given in the original adversarial training problem. Under the proper assumptions, which will be discussed in Sections 2 and 4, we can extend the convergence result to a broad class of adversarial attacks. We begin by giving some concrete definitions.
Definition 1.5. For a classifier
$A\in \mathcal{B}(\mathbb{R}^d)$
, we define the Lebesgue measurable function
$\phi \,:\,\mathbb{R}^d \to \{0,1\}$
by
\begin{equation*}\phi (x;\,A) \,:\!=\, \begin{cases} 1, & \text{if the adversary can perturb a data point } x \text{ from } A \text{ to } A^{\mathsf{c}} \text{ or vice versa},\\[3pt] 0, & \text{otherwise}. \end{cases}\end{equation*}
We refer to
$\phi$
as the deterministic attack function with respect to the classifier
$A$
.
Deterministic refers to the fact that the classification risk is completely determined at any point
$x\in \mathbb{R}^d$
by the choice of classifier and the associated attack function. We emphasize that this attack function does not consider the true label
$y$
associated with
$x$
.
In order to generalize the classification risk, it will be essential to isolate the sets where classification loss occurs. We can define the following set operators based on the values of
$\phi$
.
Definition 1.6. Let
$A\in \mathcal{B}(\mathbb{R}^d)$
. For a deterministic attack function
$\phi$
, we define the set operators
$\Lambda _{\phi }^i\,:\,\mathcal{B}(\mathbb{R}^d)\to \mathcal{B}(\mathbb{R}^d)$
and
$ \tilde \Lambda ^i_\phi \,:\,\mathcal{B}(\mathbb{R}^d)\to \mathcal{B}(\mathbb{R}^d)$
for
$i = 0,1$
by
We refer to these four sets collectively as
$\Lambda$
-sets. For convenience, we also define
$\Lambda _\phi (A) = \Lambda _\phi ^0(A)\cup \Lambda _\phi ^1(A)$
and
$\tilde {\Lambda }_\phi (A) = \tilde \Lambda _\phi ^0(A)\cup \tilde \Lambda _\phi ^1(A)$
. Note the
$0$
and
$1$
superscripts indicate the label assigned by the classifier
$A$
and not the value of the deterministic attack function (i.e.
$0$
corresponds to points in
$A^{\mathsf{c}}$
and
$1$
corresponds to points in
$A$
).
The set
$\Lambda _\phi (A)$
contains points that meet the attack criteria for the deterministic attack function
$\phi$
, whereas the set
$\tilde {\Lambda }_\phi (A)$
contains points that do not meet the attack criteria. The
$\Lambda$
-sets are mutually disjoint with
$A = \Lambda ^1_\phi (A) \cup \tilde \Lambda ^1_\phi (A)$
and
$A^{\mathsf{c}} = \Lambda ^0_{\phi }(A) \cup \tilde \Lambda ^0_\phi (A)$
.
We can express the classification risk for a set
$A\in \mathcal{B}(\mathbb{R}^d)$
by the loss on the attacked sets, given by
$\Lambda _\phi (A)$
, and by the loss inherent to the choice of classifier. More formally, we define the generalized classification risk as follows.
Definition 1.7. The generalized classification risk for a deterministic attack function
$\phi$
and classifier
$A\in \mathcal{B}(\mathbb{R}^d)$
is given by
As in [Reference Zhang, Yu, Jiao, Xing, Ghaoui and Jordan28], we seek to separate the total classification risk
$J_\phi$
into the standard Bayes risk (natural error) and the risk attributed to the adversary’s attack.
Definition 1.8. The adversarial deficit for a classifier
$A\in \mathcal{B}(\mathbb{R}^d)$
and a deterministic attack function
$\phi$
is defined to be
where
$\mathbb{E}_{(x,y)\sim \mu }[|{\unicode{x1D7D9}}_A(x) - y|]$
is the standard Bayes risk.
As one can express the standard Bayes risk as
we can derive a more useful equation for the adversarial deficit that mirrors the formulas for the data-adapted perimeters (4) and (6), namely,
Unlike the data-adapted perimeters we described above, at this stage
$\Lambda _\phi (A)$
is not necessarily in some neighbourhood of the decision boundary. We define the relative adversarial deficit for a classifier
$A\in \mathcal{B}(\mathbb{R}^d)$
with respect to a set
$E\in \mathcal{B}(\mathbb{R}^d)$
to be
With the appropriate definitions in place, we now present the generalized adversarial training problem for the deterministic attack function
$\phi$
.
Definition 1.9. For a deterministic attack function
$\phi$
, the generalized adversarial training problem is given by
In the previous equation, the adversarial deficit,
$D_\phi$
, takes the place of the data-adapted perimeter terms from (3) and (5).
Remark 1.10. By construction, the adversarial training problem (3) and the probabilistic adversarial training problem (5) are two examples that fall under this generalized attack function framework. For (3), the
$\varepsilon$
-deterministic attack function with respect to a classifier
$A\in \mathcal{B}(\mathbb{R}^d)$
for
$\varepsilon \gt 0$
is
\begin{equation*}\phi _\varepsilon (x;\,A)\,:\!=\, \begin{cases} 1, & \text{if } \mathrm{d}(x,\partial A) \lt \varepsilon ,\\[5pt] 0, & \text{otherwise}. \end{cases}\end{equation*}
For
$\phi _\varepsilon$
, we will let
$\Lambda _\varepsilon ^0(A) \,:\!=\, A^\varepsilon \setminus A$
,
$\Lambda _\varepsilon ^1(A) \,:\!=\, A\setminus A^{-\varepsilon }$
,
$\tilde \Lambda ^0_\varepsilon (A) \,:\!=\, A^{\mathsf{c}} \setminus A^{\varepsilon }$
, and
$\tilde \Lambda ^1_\varepsilon (A) \,:\!=\, A^{-\varepsilon }$
denote the
$\Lambda$
-sets for convenience.
On the other hand for (5), the
$(\varepsilon ,p)$
-deterministic attack function with respect to a classifier
$A\in \mathcal{B}(\mathbb{R}^d)$
for
$\varepsilon \gt 0$
and
$p\in [0,1)$
is
\begin{equation*} \phi _{\varepsilon , p}(x;\,A) = \begin{cases} 1, & \text{if }\mathbb{P}({\unicode{x1D7D9}}_A(x') \neq {\unicode{x1D7D9}}_A(x))\,:\, x'\sim \mathfrak p_{x,\varepsilon }) \gt p,\\[5pt] 0, & \text{otherwise}. \end{cases}\end{equation*}
1.2. Informal main results and discussion
We will focus the main results and discussion on the generalized adversarial training problem (8) and comment on the application to the adversarial training problem (3) and the probabilistic adversarial training problem (5) when appropriate. By Remark 1.10, all statements pertaining to (8) automatically apply to (3) and (5). However, because (3) is sensitive to measure zero changes, results for (3) are stronger than what can be stated in the generalized or probabilistic cases. On the other hand, the results for (5) are identical to those for (8) up to notation.
The first crucial result for (8) provides an estimate on the relative adversarial deficit.
Proposition ((Informal) Energy Exchange Inequality for (8)). Under mild assumptions on
$\phi$
(see Assumption 2.1), for a classifier
$A\in \mathcal{B}(\mathbb{R}^d)$
and a set
$E\in \mathcal{B}(\mathbb{R}^d)$
such that
$w_0\rho _0 - w_1\rho _1 \gt \delta \gt 0$
on
$E$
, if
$J_\phi (A\setminus E) - J_\phi (A) \ge 0$
, then
where
$\widehat U_{1} \subset \tilde \Lambda _\phi ^1(A) \cap \tilde \Lambda _\phi ^0(E)$
and
$\widehat U_{11} \subset {\Lambda _\phi ^0}(A) \cap {\Lambda _\phi ^1}(E)$
.

Figure 1. This diagram illustrates the sets present in the energy exchange inequality for the adversarial training problem (3) when
$E = B_{\mathrm{d}}(R)$
. The sets comprising
$\varepsilon \mathrm{Per}_{\varepsilon }(A;\,B_{\mathrm{d}}(R))$
are shaded blue and purple, whereas the sets comprising
$\varepsilon \mathrm{Per}_{\varepsilon }(B_{\mathrm{d}}(R)^{\mathsf{c}};\,A)$
are shaded pink and purple.
The energy exchange inequality asserts that if it favourable according to the densities to be labelled 0 on
$E$
but adversarial training labels it 1, then the ‘perimeter’ (more generally, the adversarial deficit) of the original set
$A$
must be quantifiably better in the sense of (1.2). In spirit, the energy exchange inequality is connected to relative isoperimetric comparisons as it seeks to relate the relative adversarial deficits (or for (3) the relative
$\varepsilon$
-perimeters) of two sets to the volume of their intersection. However, the energy exchange inequality has additional error terms that must be accounted for. In the case of the stronger
$\varepsilon$
-perimeter,
$\widehat U_{1} = \emptyset$
so the energy exchange inequality simplifies and can be expressed as follows.
Proposition ((Informal) Energy Exchange Inequality for (3)). For a classifier
$A\in \mathcal{B}(\mathbb{R}^d)$
and a set
$E\in \mathcal{B}(\mathbb{R}^d)$
such that
$w_0\rho _0 - w_1\rho _1 \gt \delta \gt 0$
on
$E$
, if
$J_\varepsilon (A\setminus E) - J_\varepsilon (A) \ge 0$
, then
where
$\widehat U_{11} \subset (A^\varepsilon \setminus A) \cap (E\setminus E^{-\varepsilon })$
(see Figure 1).
As for the relative probabilistic perimeter
$\mathrm{ProbPer}_{\varepsilon ,p}$
, the energy exchange inequality is the same as that for (8) up to notation.
Proposition ((Informal) Energy Exchange Inequality for (5)). For a classifier
$A\in \mathcal{B}(\mathbb{R}^d)$
and a set
$E\in \mathcal{B}(\mathbb{R}^d)$
such that
$w_0\rho _0 - w_1\rho _1 \gt \delta \gt 0$
on
$E$
, if
$J_{\varepsilon ,p}(A\setminus E) - J_{\varepsilon ,p}(A) \ge 0$
, then
where
$\widehat U_{1} \subset (A\setminus A^{-\varepsilon })\cap (E^\varepsilon \setminus E)$
and
$\widehat U_{11} \subset (A^\varepsilon \setminus A) \cap (E\setminus E^{-\varepsilon })$
.
The energy exchange inequality allows us to argue that classifiers which are minimizers of the generalized adversarial training problem (8), if they exist, can be made disjoint from sets where it is energetically preferable to be labelled 0 when the adversarial budget
$\varepsilon$
is small enough. As we will see, in the generalized setting we can only guarantee the uniqueness of minimizers of (5) and (8) up to sets of measure zero; however, we can show that the intersection of such sets with an energetic preference for the label zero with minimizers must have
$\mathcal L^d$
measure zero. For the adversarial training problem (3), we can improve the result to show that any minimizer must be disjoint from these sets when
$\varepsilon$
is small enough. This result builds towards proving uniform convergence of minimizers of (8) to the Bayes classifier, which is the next main result. In order to prove the convergence rate, we must include a non-degeneracy assumption to ensure
$\mathrm{d}_H(A_0^{\max },A_0^{\min }) = 0$
and the Bayes classifier is unique in the sense of Remark 1.1.
Theorem (Informal). With mild assumptions on
$\phi$
, let
$K$
be compact and let
$\{A_{\varepsilon ,\phi }\}_{\varepsilon \gt 0}$
be any sequence of minimizers to the generalized adversarial training problem (8). Assuming that
$w_0 \rho _0-w_1\rho _1$
is non-degenerate, then
as
$\varepsilon \to 0^+$
, where
$N_1,N_2$
are sets of
$\mathcal L^d$
measure zero,
$\mathrm{d}_H$
is the Hausdorff distance and
$A_0$
is the Bayes classifier.
However, the theorem actually proved is more general and does not require a unique Bayes classifier. Under these relaxed assumptions, we prove a corralling result for the sequence
$\{A_{\varepsilon ,\phi }\}_{\varepsilon \gt 0}$
with respect to the Hausdorff distance from the maximal Bayes classifier,
$A_0^{\max }$
, and the minimal Bayes classifier,
$A_0^{\min }$
. In essence, the corralling result states that the boundary of
$\lim _{\varepsilon \to 0^+} A_{\varepsilon ,\phi }\cup N_1\setminus N_2$
must lie between the boundaries of
$A_0^{\max }$
and
$A^{\min }_0$
. When we specify this result to the adversarial training problem (3), we no longer have to remove a
$\mathcal L^d$
measure zero set and instead prove the following.
Theorem (Informal). Let
$K$
be compact and let
$\{A_{\varepsilon }\}_{\varepsilon \gt 0}$
be any sequence of minimizers to the adversarial training problem (3). Assuming that
$w_0 \rho _0-w_1\rho _1$
is non-degenerate, then
as
$\varepsilon \to 0^+$
, where
$\mathrm{d}_H$
is the Hausdorff distance and
$A_0$
is the Bayes classifier.
For the probabilistic adversarial training problem, the uniform convergence result states,
Theorem (Informal). Let
$K$
be compact and let
$\{A_{\varepsilon ,p}\}_{\varepsilon \gt 0}$
be any sequence of minimizers to the probabilistic adversarial training problem (5) for some fixed
$p\in [0,1)$
. Assuming that
$w_0 \rho _0-w_1\rho _1$
is non-degenerate, then
as
$\varepsilon \to 0^+$
, where
$N_1,N_2$
are sets of
$\mathcal L^d$
measure zero,
$\mathrm{d}_H$
is the Hausdorff distance and
$A_0$
is the Bayes classifier.
As with (8), if we relax the assumption that the Bayes classifier is unique, we can instead prove an analogous corralling result with respect to
$A_0^{\max }$
and
$A_0^{\min }$
for (3) and (5).
With the non-degeneracy condition in place, we can also consider the rate of convergence and show that it is at most
$O(\varepsilon ^{\frac {1}{d+2}})$
for all three adversarial training problems. However, we do not expect this result to be optimal and would expect that the convergence rate to be
$O(\varepsilon )$
, which we discuss further in Remark 3.13.
2. Energy exchange inequality
In this section, we will prove a quantitative result for the adversarial deficit, which can then be applied to the
$\varepsilon$
-perimeter and the probabilistic
$\varepsilon$
-perimeter. In order to do so, we will require the deterministic attack function
$\phi$
and the corresponding
$\Lambda$
-sets to have the following structural properties.
Assumption 2.1. Recall Definition 1.6. Let
$A,E \in \mathcal B(\mathbb{R}^d)$
. We will make the following two assumptions to ensure consistency with respect to complements and set difference:
-
1. Complement Property (CP):
$\phi (x;\,A) = \phi (x;\,A^{\mathsf{c}})$
, or in terms of
$\Lambda$
-sets,
$\Lambda _\phi ^0(A) = \Lambda _\phi ^1(A^{\mathsf{c}})$
and
$\tilde \Lambda _\phi ^0(A) = \tilde \Lambda _\phi ^1(A^{\mathsf{c}})$
. -
2.
$\Lambda$
-Monotonicity (
$\Lambda$
M):-
(i) If
$x\in \tilde \Lambda _\phi ^0(A)$
, then
$x\in \tilde \Lambda _\phi ^0(A\setminus E)$
. -
(ii) If
$x\in \tilde \Lambda _\phi ^1(E)$
, then
$x\in \tilde \Lambda _\phi ^0(A\setminus E)$
. -
(iii) If
$x\in \Lambda _\phi ^0(E)\cap A$
, then
$x\in \Lambda _\phi ^1(A\setminus E)$
. -
(iv) If
$x\in \Lambda _\phi ^1(A)\cap E^{\mathsf{c}}$
, then
$x\in \Lambda _\phi ^1(A\setminus E)$
.
-
In the following series of remarks, we seek to better understand these two properties generally and as they apply to the adversarial and probabilistic adversarial settings.
Remark 2.2 (On Monotonicity). We note that the deterministic attack functions
$\phi$
that satisfy Assumption 2.1 are not monotonic with respect to set inclusion unless
$\phi$
is the trivial attack function (i.e.
$\phi \equiv 0$
or
$\phi \equiv 1$
). To illustrate this, suppose
$\phi$
is monotonic. By the monotonicity of
$\phi$
with respect to set inclusion coupled with the complement property,
This implies
$\phi (x;\,A) \equiv \phi (x;\,A\setminus E)$
and, if we let
$E = A$
, that
$\phi (x;\,A)\equiv \phi (x;\,\emptyset )$
. Hence, the attack is independent of
$A$
, which can only be satisfied by a trivial attack function.
Although
$\phi$
itself is not monotonic, if you have a function
$\psi$
which is monotonic in terms of set inclusion, then setting
$\phi$
via its level set yields an attack function which satisfies
$\Lambda$
-monotonicity. In particular, both the distance function and the probability function are monotonic.
Remark 2.3. We will verify that the adversarial training problem (3) and the probabilistic adversarial training problem (5) satisfy Assumption 2.1. Recall from Remark 1.10, the attack for (3) is denoted
$\phi _\varepsilon$
and the attack for (5) is denoted
$\phi _{\varepsilon ,p}$
for some
$\varepsilon \gt 0$
and
$p\in [0,1)$
.
We will first show that
$\phi _\varepsilon$
satisfies Assumption 2.1. For the complement property, recognize that since
$\partial A = \partial (A^{\mathsf{c}})$
,
$\Lambda _\varepsilon ^0(A) = \Lambda _\varepsilon ^1(A^{\mathsf{c}})$
and
$\tilde \Lambda ^0_\varepsilon (A) = \tilde \Lambda ^1_\varepsilon (A^{\mathsf{c}})$
by definition. As for
$\Lambda$
-monotonicity, we can verify these four statements directly.
-
(i) If
$x\in \tilde \Lambda ^0_\varepsilon (A)$
, then
$d(x,A\setminus E)\ge d(x,A)\ge \varepsilon$
so
$x\in \tilde \Lambda ^0_\varepsilon (A\setminus E)$
. -
(ii) If
$x\in \tilde \Lambda ^1_\varepsilon (E)$
, then
$d(x,A\setminus E)\ge d(x,E^{\mathsf{c}})\ge \varepsilon$
so
$x\in \tilde \Lambda ^0_\varepsilon (A\setminus E)$
. -
(iii) If
$x\in \Lambda ^0_\varepsilon (E) \cap A$
, then
$d(x,(A\setminus E)^{\mathsf{c}})\le d(x,E)\lt \varepsilon$
so
$x\in \Lambda ^1_\varepsilon (A\setminus E)$
. -
(iv) If
$x\in \Lambda ^1_\varepsilon (A)\cap E^{\mathsf{c}}$
, then
$d(x,(A\setminus E)^{\mathsf{c}})\le d(x,A^{\mathsf{c}}) \lt \varepsilon$
so
$x\in \Lambda ^1_\varepsilon (A\setminus E)$
.
Now, we consider
$\phi _{\varepsilon ,p}$
. By definition,
Similarly, one can show
$\tilde \Lambda ^0_{\varepsilon ,p}(A) = \tilde \Lambda ^1_{\varepsilon ,p}(A^{\mathsf{c}})$
. Hence, the complement property holds for
$\phi _{\varepsilon ,p}$
. Now we consider
$\Lambda$
-monotonicity. To simplify notation, we let
$\mathbb{P}(x;\,A) \,:\!=\, \mathbb{P}(x'\in A \,:\, x'\sim \mathfrak p_{x,\varepsilon })$
. Examining each of the
$\Lambda$
-monotonicity properties, we find the monotonicity with respect to set inclusion of the probability function
-
(i) If
$x\in \tilde \Lambda ^0_{\varepsilon ,p}(A)$
, then
$\mathbb{P}(x;\, A\setminus E) \le \mathbb{P}(x;\,A)\le p$
so
$x\in \tilde \Lambda ^0_{\varepsilon ,p}(A\setminus E)$
. -
(ii) If
$x\in \tilde \Lambda ^1_{\varepsilon ,p}(E)$
, then
$\mathbb{P}(x;\,A\setminus E) \le \mathbb{P}(x;\, E^{\mathsf{c}}) \le p$
so
$x\in \tilde \Lambda ^0_{\varepsilon ,p}(A\setminus E)$
. -
(iii) If
$x\in \Lambda _{\varepsilon ,p}^0(E)\cap A$
, then
$\mathbb{P}(x;\,(A\setminus E)^{\mathsf{c}}) \ge \mathbb{P}(x;\, E) \gt p$
so
$x\in \Lambda ^1_{\varepsilon ,p}(A\setminus E)$
. -
(iv) If
$x\in \Lambda _{\varepsilon ,p}^1(A)\cap E^{\mathsf{c}}$
, then
$\mathbb{P}(x;\, (A\setminus E)^{\mathsf{c}}) \ge \mathbb{P}(x;\,A^{\mathsf{c}}) \gt p$
so
$x\in \Lambda ^1_{\varepsilon ,p}(A\setminus E)$
.
Thus,
$\phi _{\varepsilon ,p}$
satisfies
$\Lambda$
-monotonicity and Assumption 2.1.
Remark 2.4 (
$\Lambda$
-set Decompositions). Under Assumption 2.1, we may decompose
$\mathbb{R}^d$
in terms of the
$\Lambda$
-sets for
$A,E\in \mathcal{B}(\mathbb{R}^d)$
according to
$\Lambda$
-monotonicity. In doing so, we define the sets
$U_1,\dots , U_{13}$
, which partition
$\mathbb{R}^d$
(see Table 1 and Figure 2).
For the sets
$U_i$
where no conclusion can be made about
$\phi (x;\,A\setminus E)$
, we will further decompose them into two subsets based on the
$\phi$
values, i.e.
for
$i = 1,3,6,9,10,$
and
$11$
.
The auxiliary symbols are meant to help the reader group the terms. Notice that the
$\widetilde U_i$
sets contain points that cannot be perturbed by the adversary into the other class for the classifier
$A\setminus E$
in accordance with all
$\tilde \Lambda$
sets also containing points that are unable to be attacked by the adversary. On the other hand, the
$\widehat U_i$
sets contain only points that can be perturbed into the opposite class.
With this decomposition, we can express the
$\Lambda$
-sets for
$A\setminus E$
using the
$U$
sets as follows:
\begin{align*} \Lambda _{\phi }^0(A\setminus E) &= \widehat U_3\cup \widehat U_6\cup \widehat U_9\cup \widehat U_{10} \cup \widehat U_{11}, \\[5pt] \Lambda _{\phi }^1(A\setminus E) &= \widehat U_{1}\cup U_2\cup U_7\cup U_8,\\[5pt] \tilde \Lambda ^0_\phi (A\setminus E) &= \widetilde U_{3}\cup U_4\cup U_5\cup \widetilde U_{6}\cup \widetilde U_{9}\cup \widetilde U_{10}\cup \widetilde U_{11}\cup U_{12}\cup U_{13}, \\[5pt] \tilde \Lambda ^1_\phi (A\setminus E) &= \widetilde U_{1}. \end{align*}
Depending on extra structure imposed by the choice of
$\phi$
, sometimes we can conclude certain sets are empty. For example, when
$\phi = \phi _\varepsilon$
(see Remark 1.10), we have
$\widehat U_{1} = \emptyset , \widetilde U_{3} = \emptyset ,$
and
$\widetilde U_{10} =\emptyset$
. In the case where such sets are unambiguous in terms of the values of
$\phi (x;\,A\setminus E)$
, we drop the hat or tilde notation. However,
$U_6,U_9$
and
$U_{11}$
still require a finer decomposition. Note that generally
$\widetilde U_6, \widetilde U_9 =\emptyset$
, but when boundaries of
$A$
and
$E$
intersect at more than discrete points, then these sets can be non-empty. When
$\widetilde U_6, \widetilde U_9 = \emptyset$
(such as in Figure 2), we also drop the tilde notation and let
$U_6 = \widehat U_6$
and
$U_9 = \widehat U_9$
. The claims made here are verified in Appendix A.1.
Table 1. This table defines the 13
$U_i$
sets and exhibits all possible conclusions about the
$\Lambda$
-sets for
$A\setminus E$
based on the
$\Lambda$
-sets for
$A$
and
$E$
from
$\Lambda$
-monotonicity. This set decomposition, along with the further refinement in (9), will be key in proving the energy exchange inequality


Figure 2. This diagram depicts the
$U_i$
regions for the attack function
$\phi _\varepsilon$
associated with adversarial training problem (3). The
$\varepsilon$
-perimeter regions of
$A$
are shaded blue and purple, whereas
$\varepsilon$
-perimeter regions of
$A\setminus B_{\mathrm{d}}(R)$
are shaded pink and purple. Note that some sets, such as
$\widehat U_{1}$
, are null sets for the
$\varepsilon$
-perimeter, and so do not appear in this figure.
Having stated our assumptions on
$\phi$
, we now turn to proving the first main result. In the following proposition, we examine the difference in energy between classifiers
$A$
and
$A\setminus E$
for
$A,E\in \mathcal{B}(\mathbb{R}^d)$
when
$E$
belongs to a region where the label
$0$
is energetically preferable according to the Bayes risk. We refer to the resulting inequality as the energy exchange inequality because it quantifies the effect of removing the set
$E$
from a classifier
$A$
by examining the difference in risks.
Proposition 2.5 (Energy Exchange Inequality). Let
$\phi$
be a deterministic attack function that satisfies Assumption 2.1, let
$A,E\in \mathcal{B}(\mathbb{R}^d)$
, and assume that
$w_0\rho _0-w_1\rho _1 \gt \delta \gt 0$
on
$E$
. If
$J_\phi (A\setminus E) - J_\phi (A) \ge 0$
, then
where
$U_{1}^1$
and
$\widehat U_{11}$
are defined in Table 1, namely
$\widehat U_{1} = \{x\in \tilde \Lambda ^1_\phi (A) \cap \tilde \Lambda ^0_\phi (E)\,:\, \phi (x;\,A\setminus E) = 1\}$
and
$\widehat U_{11}= \{x\in {\Lambda _\phi ^0}(A)\cap {\Lambda _\phi ^1}(E)\,:\, \phi (x;\,A\setminus E) = 1\}$
.
Proof. By (7), we have
\begin{align*} J_\phi (A) &= w_0\rho _0\left(A\cup \Lambda _\phi ^0(A)\right) + w_1\rho _1\left(A^{\mathsf{c}}\cup \Lambda _\phi ^1(A)\right),\\[5pt] J_\phi (A\setminus E) &= w_0\rho _0\left((A\setminus E)\cup \Lambda _\phi ^0(A\setminus E)\right) + w_1\rho _1\left((A\setminus E)^{\mathsf{c}}\cup \Lambda _\phi ^1(A\setminus E)\right). \end{align*}
Based on Remark 2.4 with further details shown in Appendix A.2, we can express
$A\cap E$
and the sets comprising
$J_\phi (A\setminus E)$
as
\begin{align} A\cap E &= U_3\cup U_4\cup U_5 \cup U_6, \\[5pt] \nonumber \Lambda _{\phi }^0(A\setminus E) &= \widehat U_3\cup \widehat U_6\cup \widehat U_9\cup \widehat U_{10} \cup \widehat U_{11}, \\[5pt] \nonumber \Lambda _{\phi }^1(A\setminus E) &= \widehat U_{1}\cup U_2\cup U_7\cup U_8,\\[5pt] \nonumber A\setminus E &= U_1\cup U_2\cup U_7\cup U_8, \\[5pt] \nonumber (A\setminus E)^{\mathsf{c}} &= U_3\cup U_4\cup U_5\cup U_6\cup U_9\cup U_{10}\cup U_{11}\cup U_{12}\cup U_{13}. \end{align}
We can write the adversarial deficit terms as
Then we estimate,

In the last line, the inequality results from neglecting all remaining terms with a negative sign. As
$J_\phi (A\setminus E) - J_\phi (A) \ge 0$
and
$w_0\rho _0 - w_1\rho _1 \gt \delta \gt 0$
on
$E$
, we estimate
\begin{align*} D_{\phi }(A;\,E) &\le D_{\phi }(E^{\mathsf{c}};\,A) - (w_0\rho _0 - w_1\rho _1){(A\cap E)} + w_0\rho _0(\widehat U_{11}) + w_1\rho _1(\widehat U_{1})\\[5pt] &\lt D_{\phi }(E^{\mathsf{c}};\,A) - \delta \mathcal L^d(A\cap E) + w_0\rho _0(\widehat U_{11}) + w_1\rho _1(\widehat U_{1}). \end{align*}
Observe that if
$A\in \mathcal{B}(\mathbb{R}^d)$
is a minimizer of
$J_\phi$
for some deterministic attack function
$\phi$
, then
$J_\phi (A\setminus E) - J_\phi (A) \ge 0$
for any
$E\in \mathcal{B}(\mathbb{R}^d)$
and Proposition 2.5 applies. This will be the setting for our results, although we state the result in its most general form here.
In later energy arguments, it will be helpful to express the difference in classification risks exactly instead of combining terms to form
$D_\phi (A;\,E), D_\phi (E^{\mathsf{c}};\,A)$
, and
$\mathcal L^d(A\cap E)$
. In Corollary 2.6, we consider the same computation for
$J_\phi (A\setminus E)-J_\phi (A)$
but now aim to simplify the difference as much as possible.
Corollary 2.6. Let
$A\in \mathcal{B}(\mathbb{R}^d)$
be a classifier for the generalized adversarial training problem and let
$E \in \mathcal{B}(\mathbb{R}^d)$
. Then, using the same notation as in Proposition 2.5 and under the same assumptions,
\begin{align*} J_\phi (A\setminus E) - J_\phi (A) &= w_1\rho _1(\widehat U_{1}\cup U_2\cup \widehat U_3) - (w_0\rho _0-w_1\rho _1)(\widetilde U_{3}\cup U_4) \\[5pt] &- w_0\rho _0(U_5 \cup \widetilde U_{6} \cup \widetilde U_{9}\cup \widetilde U_{10} \cup \widetilde U_{11}\cup U_{12}). \end{align*}
Proof. Let all sets
$U_i, \widehat U_i, \widetilde U_i$
be as defined in Table 1 and (9). We compute the exact difference in energies as follows:

In the following pair of corollaries, we will apply Proposition 2.5 to the adversarial training problem (3) and the probabilistic adversarial training problem (5).
Corollary 2.7. Let
$\varepsilon \gt 0$
and
$\phi = \phi _\varepsilon$
. Let
$A,E\in \mathcal{B}(\mathbb{R}^d)$
such that
$w_0\rho _0-w_1\rho _1 \gt \delta \gt 0$
on
$E$
and
$J_{\varepsilon }(A\setminus E) - J_{\varepsilon }(A) \ge 0$
. Then
where
$\widehat U_{11}= \{x\in A^{\mathsf{c}}\cap E\,:\, \mathrm{d}(x, A\setminus E)\lt \varepsilon \}$
.
Proof. To prove the corollary, we only need to check that
$\phi _\varepsilon$
satisfies Assumption 2.1 (which is done in Remark 2.3) and to verify that
$\widehat U_{1}$
is empty. To that end, if
$x \in \widehat U_{1}$
, then
which in turn implies that
$d(x,A^{\mathsf{c}} \cup E) \gt \varepsilon$
. Hence for such
$x$
,
$\phi _\varepsilon (x;\,A\setminus E) = 0$
and accordingly
$\widehat U_{1} = \emptyset$
.
Corollary 2.8. Let
$\varepsilon \gt 0$
,
$p\in [0,1)$
,
$\{\mathfrak p_{x,\varepsilon }\}_{x\in \mathbb{R}^d}$
be a family of probability measures, and
$\phi = \phi _{\varepsilon ,p}$
. Let
$A,E\in \mathcal{B}(\mathbb{R}^d)$
such that
$w_0\rho _0-w_1\rho _1 \gt \delta \gt 0$
on
$E$
and
$J_{\varepsilon ,p}(A\setminus E) - J_{\varepsilon ,p}(A) \ge 0$
. Then,
where
$\widehat U_{11}= \{x\in A^{\mathsf{c}}\cap E\,:\, \mathbb{P}(x'\in A\setminus E\,:\,x'\sim \mathfrak p_{x,\varepsilon }) \gt p\}$
and
$\widehat U_{1} = \{x\in \tilde \Lambda ^1_{\varepsilon ,p}(A) \cap \tilde \Lambda ^0_{\varepsilon ,p}(E)\,:\, \mathbb{P}(x'\in (A\setminus E)^{\mathsf{c}}\,:\,x'\sim \mathfrak p_{x,\varepsilon })\gt p\}$
.
3. Uniform convergence for the adversarial training problem
Before tackling convergence for the generalized adversarial problem (8), we first consider the convergence for the adversarial training problem (3) to understand the results in a more concrete setting. The results for (3) are also stronger than those for (8) and allow for more straightforward proofs that provide the basis for our approach in the subsequent section. We will return to (8) in Section 4 equipped with better intuition and understanding.
In this section, we establish uniform convergence in the Hausdorff metric of minimizers of the adversarial training problem (3) to Bayes classifiers on compact sets as the parameter
$\varepsilon \to 0^+$
. As previously stated in Remark 1.2, current convergence results are in the (weaker)
$L^1$
topology. We begin by stating a modest assumption we make about the underlying metric space.
Assumption 3.1. For the remainder of the paper, we assume that the metric
$\mathrm{d}$
is induced by a norm. Then,
$\mathcal L^d(B_{\mathrm{d}}(r)) \,:\!=\, \omega _{\mathrm{d}} r^d$
for the constant
$\omega _{\mathrm{d}} = \mathcal L^d(B_{\mathrm{d}}(1))$
. Naturally,
$\omega _{\mathrm{d}}$
will also depend on the dimension
$d$
, but we suppress this in the notation. Additionally, we will identify the conditional measures in (4) with their densities, meaning that we can express
$d\rho _i = \rho _i(x) \, dx$
.
For these norm balls, it will be useful to estimate their
$\varepsilon$
-perimeter. When
$\varepsilon \le R$
and
$\rho _0,\rho _1$
are bounded from above, this amounts to estimating the volume between two norm balls that are distance
$2\varepsilon$
apart.
Lemma 3.2. Let
$0 \lt \varepsilon \le R$
for some fixed
$R\gt 0$
. Suppose
$\rho _0,\rho _1 \le M$
on
$\mathbb{R}^d$
. Then, there exists a constant
$\alpha \gt 0$
independent of
$R, \varepsilon$
, and
$x$
such that
Proof. Recall that (4) for
$A = B_{\mathrm{d}}(x,R)$
gives
As
$\rho _0,\rho _1$
are bounded from above by
$M$
,
By the scaling properties of the norm ball,
$\mathcal L^d(B_{\mathrm{d}}(x,r)) = \omega _{\mathrm{d}}(r)^d$
for all
$r\ge 0$
. By convexity, we estimate
As
$\varepsilon \le R$
, we conclude
Throughout the paper, we will require an upper bound on the
$\varepsilon$
-perimeter of the complement of
$B_{\mathrm{d}}(x,R)$
. By the complement property from Assumption 2.1 (verified to hold for the
$\varepsilon$
-perimeter in Remark 2.3), the bound given by Lemma 3.2 still holds for
$\varepsilon \mathrm{Per}_{\varepsilon }(B_{\mathrm{d}}(x,R)^{\mathsf{c}})$
since the same upper bound is true for
$\rho _0$
and
$\rho _1$
, namely,
With our normed setting clear, we begin the process of proving uniform Hausdorff convergence for minimizers of the adversarial training problem (3). The first step involves proving a technical lemma about the interaction between minimizers and sets
$B_{\mathrm{d}}(x,R) \subset \{ w_0\rho _0 - w_1 \rho _1 \gt \delta \gt 0\}$
. Importantly, this means
$B_{\mathrm{d}}(x,R) \cap A_0=\emptyset$
for a Bayes classifier
$A_0$
, which can help us relate minimizers of the adversarial training problem to Bayes classifiers. By applying a slicing argument, we will show that minimizers are disjoint from
$B_{\mathrm{d}}(x,R/2^{d+1})$
.
Lemma 3.3. Let
$A\in \mathcal{B}(\mathbb{R}^d)$
be a minimizer of the adversarial training problem (3) for
$\varepsilon \gt 0$
. Suppose there exists
$x\in \mathbb{R}^d$
and
$R\gt 0$
such that
$w_0\rho _0 - w_1\rho _1 \gt \delta \gt 0$
on
$B_{\mathrm{d}}(x,2R)$
with
$\rho _0,\rho _1 \le M$
on
$\mathbb{R}^d$
. Then, there exists a
$C \gt 0$
independent of
$R,\delta ,\varepsilon$
, and
$x$
such that if
$\varepsilon \le \min \left \{R/2^{d+2}, CR\delta ^{d+1}\right \}$
, then
$A \cap B_{\mathrm{d}}(x,R/2^{d+1}) = \emptyset$
.
Proof. Fix
$\varepsilon \gt 0$
. Choose a coordinate system such that
$x =0$
and write
$B_{\mathrm{d}}(0,R) = B_{\mathrm{d}}(R)$
. For the sake of contradiction, suppose there exists
$z\in A\cap B_{\mathrm{d}}(R/2^{d+1})$
. Then,
Corollary 2.7 shows for
$r \le R$
,
with
$\widehat U_{11} \subset \Lambda _\varepsilon (B_{\mathrm{d}}(r)) \cap A^\varepsilon$
and
$w_0\rho _0(\widehat U_{11}) \le \varepsilon \mathrm{Per}_\varepsilon ( B_{\mathrm{d}}(r)^{\mathsf{c}};\,A^\varepsilon )$
.
In particular, using the fact that
$w_0\rho _0 \gt \delta$
in
$B_{\mathrm{d}}(R)$
, we obtain
\begin{align*} \mathcal L^d&((A^\varepsilon \setminus A) \cap B_{\mathrm{d}}(R)) \le \frac {w_0}{\delta }\rho _0((A^\varepsilon \setminus A) \cap B_{\mathrm{d}}(R))\\[5pt] &\le \frac {\varepsilon }{\delta }\mathrm{Per}_\varepsilon (A;\,B_{\mathrm{d}}(R))\le \frac {\varepsilon }{\delta } \mathrm{Per}_\varepsilon (B_{\mathrm{d}}(R)^{\mathsf{c}};\,A)-\mathcal L^d(A\cap B_{\mathrm{d}}(R))+ \frac {w_0}{\delta }\rho _0(\widehat U_{11}). \end{align*}
Rearranging and applying the bound
$w_0\rho _0(\widehat U_{11}) \le \varepsilon \mathrm{Per}_\varepsilon (B_{\mathrm{d}}(R)^{\mathsf{c}};\,A^\varepsilon )$
, we estimate
\begin{align} \mathcal L^d(A^\varepsilon \cap B_{\mathrm{d}}(R))&\le \frac {\varepsilon }{\delta }\mathrm{Per}_\varepsilon (B_{\mathrm{d}}(R)^{\mathsf{c}};\,A)+ \frac {w_0}{\delta }\rho _0(\widehat U_{11})\nonumber \\[5pt] &\le \frac {2\varepsilon }{\delta }\mathrm{Per}_\varepsilon (B_{\mathrm{d}}(R)^{\mathsf{c}};\,A^\varepsilon ) \nonumber \\[5pt] &\le 2\alpha \frac {R^{d-1}}{\delta }\varepsilon \end{align}
with the last inequality due to (13). Note
$\mathcal L^d(A^\varepsilon\cap B_{\mathrm{d}}(r))\le 2\alpha \frac {r^{d-1}}{\delta }\varepsilon$
for
$0\lt r\le R$
.
Using that
$\rho_{0},\rho_{1}$
are bounded from above by
$M$
, we estimate
\begin{align*} \sum _{k=0}^{\lfloor \frac {R}{4\varepsilon }\rfloor -1} \mathcal L^d(A^\varepsilon \cap B_{\mathrm{d}}(R/2 + 2k\varepsilon )) &\le \sum _{k=0}^{\lfloor \frac {R}{4\varepsilon }\rfloor -1} \frac {2\varepsilon }{\delta }\mathrm{Per}_\varepsilon (B_{\mathrm{d}}(R/2 + 2k\varepsilon )^{\mathsf{c}};\, A^\varepsilon )\\[5pt] &\le \frac {2M}{\delta } \sum _{k=0}^{\lfloor \frac {R}{4\varepsilon }\rfloor -1} \mathcal L^d(A^\varepsilon \cap (B_{\mathrm{d}}(R/2+(2k+1)\varepsilon )\setminus B_{\mathrm{d}}(R/2+(2k-1)\varepsilon ))\\[5pt] &\le \frac {2M}{\delta } \mathcal L^d(A^\varepsilon \cap B_{\mathrm{d}}(R)) \le 4\alpha M\frac {R^{d-1}}{\delta ^2}\varepsilon \end{align*}
thanks to (15).
In particular,
If
$\varepsilon \le R/8$
so that
$\lfloor \frac {R}{4\varepsilon }\rfloor \ge \frac {R}{4\varepsilon }-1\ge \frac {R}{8\varepsilon }$
, then by letting
$s_1 =R/2+2k\varepsilon$
achieve
$\min _k \mathcal L^d(A^\varepsilon \cap B_{\mathrm{d}}(R/2+2k\varepsilon ))$
, we then obtain
Then, repeating the same construction at the scale
$R/2^i$
,
$i \ge 2$
, we find
as long as
$\varepsilon \le \frac {R}{2^{i+2}}$
(that is,
$i \le \log _2 \left (\frac {R}{4\varepsilon }\right )$
).
For
$i = d$
, it follows
\begin{align*} \mathcal L^d(A^\varepsilon \cap B_{\mathrm{d}}(s_d)) &\le 2^{\sum _{i=2}^d i}\left (\frac {8M\varepsilon }{R\delta }\right )^{d-1}\mathcal L^d(A^\varepsilon \cap B_{\mathrm{d}}(s_1))\\[5pt] &\le 2^{\frac {d(d+1)}{2}+3d-4}\left (\frac {M\varepsilon }{R\delta }\right )^{d-1}32 \alpha M\frac {R^{d-2}}{\delta ^2}\varepsilon ^2. \end{align*}
Hence,
Letting
$C_{d+1}\,:\!=\, 2^{\frac {d(d+1)}{2}+3d+1}\alpha M^d$
, we conclude if
$\varepsilon \lt \min \{R/2^{d+2}, \omega _dC^{-1}_{d+1}R\delta ^{d+1}\},$
then
which implies that
$A\cap B_{\mathrm{d}}(R/2^{d+1}) = \emptyset$
by (14).
Remark 3.4. In Lemma 3.3, we can slightly relax the assumption that
$A$
is a minimizer as follows: Recall that we assume
$w_0\rho _0 - w_1\rho _1 \gt \delta \gt 0$
on
$B_{\mathrm{d}}(x,2R)$
. If we have that
$J_\varepsilon (A\setminus B_{\mathrm{d}}(x,r)) - J_\varepsilon (A) \ge 0$
for all
$r$
such that
$R/2^{d+2} \le r \le R$
, then the energy exchange inequality (12) still holds and the same proof for Lemma 3.3 shows that
$A\cap B_{\mathrm{d}}(x,R/2^{d+1}) = \emptyset$
.
We now aim to directly relate minimizers of the adversarial training problem (3) to Bayes classifiers. Recall that the maximal and minimal Bayes classifiers (2) are given by
We will not be assuming that
$A_0^{\max }$
and
$A_0^{\min }$
coincide up to a set of
$\rho$
measure zero unless explicitly stated.
We will now show that on a compact set, we can ‘corral’ the minimizer of the adversarial training problem (3) by any distance
$\eta \gt 0$
, in the sense that it must lie between the
$\eta$
-dilation of
$A_0^{\max }$
and the
$\eta$
-erosion of
$A_0^{\min }$
when
$\varepsilon$
is small enough.
Lemma 3.5. Let
$A_0^{\max }$
be the maximal Bayes classifier. Suppose that
$\rho _0,\rho _1$
are continuous and bounded from above on
$\mathbb{R}^d$
, and let
$\eta \gt 0$
. Then for any compact set
$K\subset \mathbb{R}^d$
, there exists an
$\varepsilon _0\gt 0$
such that for all
$0\lt \varepsilon \lt \varepsilon _0$
,
where
$A_\varepsilon \subset \mathbb{R}^d$
is an arbitrary minimizer of the adversarial training problem.
Proof. For convenience, we abuse notation and let
$A_0 = A_0^{\max }$
. Assume that
$\left (A_0^\eta \right )^{\mathsf{c}} \cap K \neq \emptyset$
as otherwise the result is trivial. The conditions are also trivially satisfied if
$w_0\rho _0-w_1\rho _1$
never changes sign. This is because, for all
$\varepsilon \gt 0$
, either
$A_0 = A_\varepsilon = \emptyset$
if
$w_0\rho _0 - w_1\rho _1 \gt 0$
on
$\mathbb{R}^d$
, or
$A_0 = A_\varepsilon = \mathbb{R}^d$
otherwise.
Fix
$\eta \gt 0$
. Let
$R = \frac {\eta }{3}$
. Observe that
$\overline {A_0^R}\cap \overline {K^{2R}}$
is compact and
$A_0\subset \overline {A_0^R}$
. Then, by the continuity of
$w_0\rho _0-w_1\rho _1$
on
$\overline {A_0^R}\cap \overline {K^{2R}}$
, there exists a
$\delta \gt 0$
such that
where
$E_{\delta } = \{x\in \mathbb{R}^d\,:\, w_0\rho _0(x) - w_1\rho _1(x) \le \delta \}$
. This implies
$\left [E_{\delta }^{\mathsf{c}} \cap \overline {K^{2R}}\right ] \supset \left [\left (\overline {A_0^R}\right )^{\mathsf{c}} \cap \overline {K^{2R}}\right ]$
, so
$w_0\rho _0 - w_1\rho _1 \gt \delta \gt 0$
on
$\left (\overline {A_0^R}\right )^{\mathsf{c}} \cap \overline {K^{2R}}$
. In particular, as
$(A_0^\eta )^{\mathsf{c}} \cap K \subset \left [\left (\overline {A_0^R}\right )^{\mathsf{c}} \cap \overline {K^{2R}}\right ]$
, the difference in densities
$w_0\rho _0 - w_1\rho _1 \gt \delta$
on
$(A_0^\eta )^{\mathsf{c}} \cap K$
.
Take
$x\in (A_0^{\eta })^{\mathsf{c}} \cap K$
. Observe that
$B_{\mathrm{d}}(x,2R)$
satisfies the conditions of Lemma 3.3 for
$\delta$
as determined previously. Take
$\varepsilon _0 = \min \left \{R/2^{d+2}, CR\delta ^{d+1}\right \}$
for
$C$
is independent of
$R,\delta , \varepsilon ,$
and
$x$
. Let
$\varepsilon \le \varepsilon _0$
and let
$A_\varepsilon$
be a minimizer of the adversarial training problem (3). Then,
$A_\varepsilon \cap B(x,R/2^{d+1}) = \emptyset$
for all
$x\in (A_0^{\eta })^{\mathsf{c}} \cap K$
, which implies that
$A_\varepsilon \cap (A_0^{\eta })^{\mathsf{c}} \cap K = \emptyset$
. Thus, we conclude
Remark 3.6. The only place where we use the compactness assumption in Lemma 3.5 is to determine
$\delta$
from
$\eta$
by the continuity of
$w_0\rho _0-w_1\rho _1$
a compact set.
The proof established that minimizers
$A_\varepsilon$
of the adversarial training problem (3) can be corralled by the maximal Bayes classifier. We can also corral
$A_\varepsilon$
by the minimal Bayes classifier as follows: Consider interchanging the densities so data points
$x$
are distributed according to
$\widetilde \rho _0 = \rho _1$
and
$\widetilde \rho _1 = \rho _0$
. We can apply Lemma 3.5 to the minimizer
$\widetilde A_\varepsilon = (A_\varepsilon )^{\mathsf{c}}$
of the interchanged problem. We can conclude that for all compact sets
$K\subset \mathbb{R}^d$
and
$\eta \gt 0$
, there exists an
$\varepsilon _0 \gt 0$
, such that
for all
$\varepsilon \le \varepsilon _0$
. This means that we have a two-sided, or corralling’, bound on our minimizer for
$\varepsilon$
small enough, namely
The corralling argument will allow us to examine the Hausdorff distance between Bayes classifiers and minimizers of the adversarial training problem (3) as the adversarial budget decreases to zero. To begin, we recall the definition of the Hausdorff distance.
Definition 3.7. The Hausdorff distance between two sets
$A,E\subset \mathbb{R}^d$
is given by
for a metric
$\mathrm{d}$
on
$\mathbb{R}^d$
. Furthermore,
$\mathrm{d}_H$
is a pseudometric on
$\mathcal{B}(\mathbb{R}^d)$
.
Remark 3.8. If
$\mathrm{d}_H(A_0^{\max }, A_0^{\min }) = 0$
, then for any
$\eta \gt 0$
and compact set
$K\subset \mathbb{R}^d$
, there exists an
$\varepsilon _0 \gt 0$
such that
for all
$\varepsilon \le \varepsilon _0$
and for
$A_0$
the unique Bayes classifier.
We now have the tools to show the uniform convergence of minimizers
$A_\varepsilon$
of the adversarial training problem (3) to the Bayes classifier
$A_0$
. To begin, we prove the more general version of the result when the Bayes classifier is not unique up to a set of
$\rho$
measure zero. In this case, we can only show that
$\lim _{\varepsilon \to 0^+} A_\varepsilon$
must be corralled by the maximal and minimal Bayes classifiers.
Theorem 3.9. Suppose
$\rho _0, \rho _1$
are continuous and bounded from above on
$\mathbb{R}^d$
. Let
$\{A_\varepsilon \}_{\varepsilon \gt 0}$
be a sequence of minimizers of the adversarial training problem (3) for
$\varepsilon \to 0^+$
. Then, for any compact set
$K\subset \mathbb{R}^d$
,
Proof. Let
$K$
be a compact set. Observe that
$A_0^{\max }\subset A_\varepsilon \cup A_0^{\max }$
, so
For the sake of contradiction, suppose this quantity does not go to zero as
$\varepsilon \to 0^+$
. Then, there exists an
$\eta \gt 0$
such that for all
$\varepsilon _0\gt 0$
, there exists an
$0\lt \varepsilon \leq \varepsilon _0$
such that
However, this contradicts Lemma 3.5. Thus, we conclude
As
$A_{\varepsilon }\cap A_0^{\min } \subset A_0^{\min },$
an analogous argument proves that
Corollary 3.10. Suppose that
$\mathrm{d}_H(A_0^{\max },A_0^{\min }) = 0$
. Then under the same assumptions as Theorem 3.9,
for
$A_0$
the unique Bayes classifier.
Proof. This follows from Theorem3.9 as the result of Lemma 3.5 simplifies when
$\mathrm{d}_H(A_0^{\max }, A_0^{\min }) =0$
as described in Remark 3.8.
In the case where
$\mathrm{d}_H(A_0^{\max },A_0^{\min }) = 0$
, it is natural to consider rates of convergence. In order to obtain such rates, we introduce the following assumption:
Assumption 3.11. The level set
$\{w_0\rho _0 = w_1\rho _1\}$
is non-degenerate, meaning that
$w_0\rho _1-w_1\rho _1 \in C^1(\mathbb{R}^d)$
and
$|w_0\nabla \rho _0 - w_1\nabla \rho _1| \gt \alpha \gt 0$
on
$\{w_0\rho _0 = w_1\rho _1\}$
for some constant
$\alpha$
. In this case, Bayes classifiers are unique up to a set of
$\mathcal L^d$
measure zero and
$\mathrm{d}_H(A_0^{\max },A_0^{\min }) = 0$
.
Now, we establish the convergence rate for minimizers of the adversarial training problem (3) to Bayes classifiers under this non-degeneracy assumption.
Corollary 3.12. Suppose Assumption 3.11 holds and that
$\rho _0,\rho _1$
are continuous and bounded from above on
$\mathbb{R}^d$
. For any compact set
$K\subset \mathbb{R}^d$
, there exists a constant
$C\gt 0$
such that
where
$A_0$
is the Bayes classifier.
Proof. Consider a sequence
$\{\eta _i\}_{i\in \mathbb{N}}$
where
$\eta _i \gt 0$
and
$\eta _i \to 0^+$
. Define
$\varepsilon _i = \min \{C\eta _i, C\eta _i \delta _i^{d+1}\}$
based on the requirements on
$\varepsilon$
from Lemma 3.3 with
$R = \eta _i$
and the continuity bound
$\delta =\delta _i$
from Lemma 3.5. In this proof,
$C$
is a constant always independent of
$\eta _i, \varepsilon _i,$
and
$\delta _i$
that we will allow to vary throughout this proof.
As
$w_0\rho _0-w_1\rho _1 \in C^1(\mathbb{R}^d)$
and its gradient is bounded away from 0, the boundary
$\partial A_0 = \{w_0\rho _0 = w_1\rho _1\}$
is a
$C^1$
surface by the implicit function theorem, and hence the Hausdorff distance between the minimal and maximal sets is zero. Furthermore for
$\eta _i \ll 1$
,
$\delta _i$
is the same order as
$\eta _i$
, which implies
$\varepsilon _i= C\eta _i^{d+2}$
.
For each
$\varepsilon _i$
, let
$A_{\varepsilon _i}$
be the associated minimizer of the adversarial training problem (3). By Theorem3.9 along with Remark 3.10, for any compact set
$K\subset \mathbb{R}^d$
we have that
Thus, we conclude that
Remark 3.13. Although we have shown the convergence rate to be at most
$O(\varepsilon ^{\frac {1}{d+2}})$
, we expect that the convergence rate is actually
$O(\varepsilon )$
(see the formal asymptotics near
$\varepsilon = 0$
derived by [Reference Trillos and Murray16]). The reason we get the convergence rate
$\varepsilon ^{\frac {1}{d+2}}$
is from the
$\delta ^{d+1}$
that appears in our bounds for
$\varepsilon$
. In Lemma 3.3, this term comes from the iterative argument that often employs crude volume bounds. More precise estimates would be required to improve the convergence rate.
4. Uniform convergence for other deterministic attacks
Now, we will turn our focus to the generalized adversarial training problem (8). At the end, we will present the results for the probabilistic adversarial training problem (5) as an example of our results for (8). Unlike the case of the adversarial training problem (3), existence of minimizers to (8) is an open question, and in this case, our convergence result can be understood in the spirit of ‘a priori’ estimates in partial differential equations. First, we will make it precise which deterministic attack functions we consider.
Definition 4.1. A deterministic attack function
$\phi$
is metric if an adversary’s attack on
$x$
only depends upon points within distance
$\varepsilon$
of
$x$
for some adversarial budget
$\varepsilon \gt 0$
. More precisely for two classifiers
$A,\widetilde A\in \mathcal{B}(\mathbb{R}^d)$
,
To avoid a trivial situation where
$x$
is always attacked independent of the choice of
$A$
, we assume the adversary has no power, meaning
$\phi (x;\,A) \equiv 0$
, if
$A = \emptyset$
or
$A =\mathbb{R}^d$
when
$\phi$
is a metric attack function.
In the following pair of lemmas, we will show two important properties of metric attack functions. The first will allow us to relate
$D_\phi$
with
$\varepsilon \mathrm{Per}_\varepsilon$
and provides an upper bound on
$D_\phi$
by
$\varepsilon \mathrm{Per}_\varepsilon$
. This will allow us to employ many of the estimates of
$\varepsilon \mathrm{Per}_\varepsilon$
from Lemma 3.3 in Lemma 4.6.
Lemma 4.2. Let
$\phi$
be a metric deterministic attack function. For any set
$A, E\in \mathcal{B}(\mathbb{R}^d)$
, we have that
Proof. It will be sufficient to show that
$\Lambda _\phi ^i(A)\subset \Lambda _\varepsilon ^i(A)$
. Take
$x\in \Lambda _\phi ^0(A)$
. If we consider
$\widetilde A = \emptyset$
, the metric property states
For
$x\in A^{\mathsf{c}},$
$A\cap B_{\mathrm{d}}(x,\varepsilon ) \neq \emptyset$
implies
$x\in A^\varepsilon \setminus A = \Lambda _\varepsilon ^0(A)$
. Thus, we conclude
$\Lambda _\phi ^0(A)\subset \Lambda _\varepsilon ^0(A)$
. A similar argument with
$\widetilde A = \mathbb{R}^d$
shows that
$\Lambda _\phi ^1(A)\subset \Lambda _\varepsilon ^1(A)$
.
We now prove a second property of metric attack functions, which isolates where the values of
$\phi (x;\,A)$
and
$\phi (x;\,A\setminus E)$
may differ.
Lemma 4.3. Let
$\phi$
be a metric deterministic attack function. For sets
$A,E\in \mathcal{B}(\mathbb{R}^d)$
, if
$x\in (E^\varepsilon )^{\mathsf{c}}$
, then
$\phi (x;\,A) =\phi (x;\,A\setminus E)$
.
Proof. Suppose
$x \in (E^\varepsilon )^{\mathsf{c}}$
. Then,
$B(x,\varepsilon ) \subset E^{\mathsf{c}}$
and so
$A \cap B(x,\varepsilon ) = (A \setminus E) \cap B(x,\varepsilon )$
. Hence, the metric property then implies that
$\phi (x;\,A) = \phi (x;\,A\setminus E)$
.
We require one additional assumption on a metric attack function
$\phi$
in order to prove the generalized version of Lemma 3.3. Namely, if the size of the intersection of
$B_{\mathrm{d}}(x,\varepsilon )$
with the opposite class of
$x$
satisfies a lower bound, then
$x\in \Lambda _\phi (A)$
.
Assumption 4.4. Let
$\phi$
be a metric deterministic attack function with budget
$\varepsilon \gt 0$
. For a classifier
$A\in \mathcal{B}(\mathbb{R}^d)$
, we assume:
for some constant
$0\lt \beta \lt \omega _{\mathrm{d}}$
independent of
$x, \varepsilon ,$
and
$A$
.
As a consequence of this assumption, we have if
$x\in \tilde \Lambda _\phi ^0(A)$
, then
$\mathcal L^d(A\cap B_{\mathrm{d}}(x,\varepsilon )) \le \beta \varepsilon ^d$
. Likewise, if
$x\in \tilde \Lambda _\phi ^1(A)$
, then
$\mathcal L^d(A^{\mathsf{c}} \cap B_{\mathrm{d}}(x,\varepsilon )) \le \beta \varepsilon ^d$
. Furthermore, if Assumption 2.1 also holds for
$\phi$
, then only one of the two lower bounds needs to be assumed, as the other follows by the complement property.
Remark 4.5. This assumption states that a point
$x\in \mathbb{R}^d$
is attacked if the portion of its
$\varepsilon$
-neighbours with the opposite label is on the order of
$\varepsilon ^d$
. In this way, the deterministic attack function depends on the adversarial budget
$\varepsilon$
and the metric.
Observe that the adversarial training problem (3) satisfies Assumption 4.4. In fact, it satisfies the statements
\begin{align*} x\in \Lambda _\varepsilon ^0(A) &\iff x\in A^{\mathsf{c}} \text{ and } A\cap B_{\mathrm{d}}(x,\varepsilon ) \neq \emptyset , \\[5pt] x\in \Lambda _\varepsilon ^1(A) &\iff x\in A \text{ and } A^{\mathsf{c}}\cap B_{\mathrm{d}}(x,\varepsilon ) \neq \emptyset . \end{align*}
In Proposition 4.15, we will verify that the probabilistic adversarial training problem (5) also satisfies Assumption 4.4.
In order to show uniform convergence for the generalized adversarial training problem (8), we first prove the analogue of Lemma 3.3 by a similar slicing argument. We leverage the relationship between the adversarial deficit and the
$\varepsilon$
-perimeter established in Lemma 4.2. However, there are a few key differences in both the results and the proof. Whereas in Lemma 3.3, we show that minimizers of the adversarial training problem (3) are disjoint from certain norm balls that are misclassified, we show that the intersection of minimizers of (8) with a misclassified norm ball must have
$\mathcal L^d$
measure zero. In this sense, we establish a necessary condition for minimizers of (8). As for the proof of the statement, the final step differs significantly between Lemmas 3.3 and 4.6. In final step of Lemma 3.3, we are able to use the fact that a single point causes misclassification on the order of
$\varepsilon ^d$
. For the general case, the lower bound on the
$\mathcal L^d$
measure condition for misclassification from Assumption 4.4 requires a more delicate energy argument that examines the exact difference in energies.
Lemma 4.6. Let
$\phi$
be a metric deterministic attack function for
$\varepsilon \gt 0$
, satisfying Assumptions 2.1 and 4.4. Suppose
$\rho _0,\rho _1$
are continuous and bounded from above on
$\mathbb{R}^d$
. Furthermore, suppose
$A\in \mathcal{B}(\mathbb{R}^d)$
is a minimizer of the generalized training problem (8) and there exists
$x\in \mathbb{R}^d$
and
$R\gt 0$
such that
$w_0\rho _0-w_1\rho _1 \gt \delta \gt 0$
on
$B_{\mathrm{d}}(x,2R)$
. Then, there exists a constant
$C \gt 0$
independent of
$R, \delta ,\varepsilon$
, and
$x$
such that if
$\varepsilon \le \min \left \{R/2^{d+2}, CR\delta ^{d+1}\right \}$
, then
$\mathcal L^d(A \cap B(x,R/2^{d+2})) = 0$
.
Proof. Choose a coordinate system such that
$x = 0$
and write
$B_{\mathrm{d}}(0,R) = B_{\mathrm{d}}(R)$
with
$x$
as in the statement above.
We will first find an initial estimate for
$\mathcal L^d(A\cap B_{\mathrm{d}}(R))$
. As
$A$
is a minimizer and
$w_0\rho _0-w_1\rho _1\gt \delta \gt 0$
on
$B_{\mathrm{d}}(R)$
, we can apply Proposition 2.5 to find that
where
\begin{align*} \widehat U_{1} &= \{x\in \tilde \Lambda ^1_\phi (A)\cap \tilde \Lambda ^0_\phi (B_{\mathrm{d}}(R))\,:\, \phi (x;\,A\setminus B_{\mathrm{d}}(R)) = 1\},\\[5pt] \widehat U_{11} &= \{x\in \Lambda ^0_{\phi }(A)\cap \Lambda ^1_{\phi }(B_{\mathrm{d}}(R)) \,:\, \phi (x;\,A\setminus B_{\mathrm{d}}(R)) = 1\}. \end{align*}
By (11), we have
$w_0\rho _0(\widehat U_{11})\le D_\phi (A;\,B_{\mathrm{d}}(R))$
. Combining the upper bound on
$w_0\rho _0(\widehat U_{11})$
with (16) and simplifying, we find
Recall that by definition,
$\widehat U_1 \subset A\cap B_{\mathrm{d}}(R)^{\mathsf{c}}$
. Additionally by Lemma 4.3,
$\widehat U_1 \subset B_{\mathrm{d}}(R+\varepsilon )$
as
$\phi (x;\,A\setminus B_{\mathrm{d}}(R)) = 1$
, while
$\phi (x;\,A) = 0$
. Thus,
$\widehat U_1 \subset A\cap (B_{\mathrm{d}}(R+\varepsilon )\setminus B_{\mathrm{d}}(R))$
. In particular,
Additionally, by Lemma 4.2, we have
$D_\phi (B_{\mathrm{d}}(R)^{\mathsf{c}};\,A) \le \varepsilon \mathrm{Per}_{\varepsilon }(B_{\mathrm{d}}(R)^{\mathsf{c}};\,A)$
. Applying (15) from Lemma 3.2,
for
$\alpha$
independent of
$R,\delta ,\varepsilon$
, and
$x$
as in Lemma 3.2.
Next, we want to find a radius
$s_1\in (R/2,R)$
that will give an order
$\varepsilon ^2$
estimate for
$\mathcal L^d(A\cap B_{\mathrm{d}}(s_1))$
. For
$r \le R$
, one has
We can argue by a discrete slicing argument like in Lemma 3.3 to show that there exists an
$s_1\in (R/2,R)$
such that
Iterating the argument as in Lemma 3.3 yields an order
$\varepsilon ^{i+1}$
estimate of
$\mathcal L^d(A\cap B_{\mathrm{d}}(s_i))$
for
$s_i\in (R/2^i,R/2^{i-1})$
and
$2\le i \le \log _2(\frac {R}{4\varepsilon })$
(i.e.
$\varepsilon \le \frac {R}{2^{i+2}}$
). After
$d$
iterations, we find
where
$C_{d+1}\,:\!=\, 2^{\frac {d(d+1)}{2}+3d+1}\alpha M^d$
.
Finally, we must show that
$\mathcal L^d(A\cap B(R/2^{d+2})) = 0$
. Let
$z\in A\cap B_{\mathrm{d}}(\frac {R}{2^{d+1}}+\varepsilon )$
. We must consider a region slightly outside of
$B_{\mathrm{d}}(R/2^{d+1})$
as the following argument needs to apply all points in the
$\varepsilon$
-dilation of
$B_{\mathrm{d}}(R/2^{d+1})$
. We want to show that
$z\in \Lambda _\phi ^1(A)$
. To do so, by Assumption 5, it will suffice to show that if
$z \in A$
, then
$\mathcal L^d(A^{\mathsf{c}} \cap B(z,\varepsilon )) \gt \beta \varepsilon ^d$
.
Recall the estimate from the previous steps,
$\mathcal L^d(A\cap B_{\mathrm{d}}(R/2^d)) \le \left (\frac {C_{d+1}}{R\delta ^{d+1}}\right )\varepsilon ^{d+1}$
. As long as
$ \left (\frac {C_{d+1}}{R\delta ^{d+1}}\right )\varepsilon \lt (\omega _{\mathrm{d}} - \beta )$
, or in other words
$\varepsilon \lt (\omega _{\mathrm{d}} - \beta ) \left (\frac {R\delta ^{d+1}}{C_{d+1}}\right ) \,:\!=\, C$
, then we have
where
$\beta$
is the constant from Assumption 4.4. Then, as
$B_{\mathrm{d}}(z,\varepsilon ) \subset B_{\mathrm{d}}(R/2^d)$
, we estimate
Hence,
$z\in \Lambda _\phi ^1(A)$
. In particular, this means that
$\tilde \Lambda _\phi ^1(A)\cap B_{\mathrm{d}}(\frac {R}{2^{d+1}}+\varepsilon ) = \emptyset .$
We will now examine the difference in energies after removing
$B_{\mathrm{d}}(R/2^{d+1})$
in order to show that we must actually remove
$B_{\mathrm{d}}(R/2^{d+2})$
in order to achieve
$\mathcal L^d(A \cap B_{\mathrm{d}}(R/2^{d+2})) = 0$
. By Corollary 2.6, the difference in energy after removing the set
$E=B_{\mathrm{d}}(R/2^{d+1})$
from
$A$
is
\begin{align*} J_{\phi }(A\setminus B_{\mathrm{d}}(R/2^{d+1})) - J_{\phi }(A) &= w_1\rho _1(\widehat U_{1}\cup U_2\cup \widehat U_3) - (w_0\rho _0-w_1\rho _1)(\widetilde U_{3}\cup U_4) \\[5pt] & - w_0\rho _0(U_5 \cup \widetilde U_{6} \cup \widetilde U_{9}\cup \widetilde U_{10} \cup \widetilde U_{11}\cup U_{12}), \end{align*}
where all sets are as defined in Table 1 and (9). By construction,
However, we have just shown that
$\tilde \Lambda ^1_\phi (A)\cap B_{\mathrm{d}}(\frac {R}{2^{d+1}}+\varepsilon ) = \emptyset$
. Thus, we conclude that
$\widehat U_{1} = U_2 = \widehat U_3 = \emptyset$
.
As
$w_0\rho _0 -w_1\rho _1\gt \delta \gt 0$
on
$B_{\mathrm{d}}(R/2^{d+1})$
, the difference in energies becomes
\begin{align*} J_{\phi }(A\setminus B_{\mathrm{d}}(R/2^{d+1})) - J_{\phi }(A) &= - (w_0\rho _0-w_1\rho _1)(\widetilde U_{3}\cup U_4) - w_0\rho _0(U_5 \cup \widetilde U_{6} \cup \widetilde U_{9}\cup \widetilde U_{10} \cup \widetilde U_{11}\cup U_{12})\\[5pt] &\le -\delta \mathcal L^d(\widetilde U_{3}\cup U_4) - \delta \mathcal L^d(U_5 \cup \widetilde U_{6} \cup \widetilde U_{9}\cup \widetilde U_{10} \cup \widetilde U_{11}\cup U_{12})\\[5pt] &\le 0. \end{align*}
By our assumption,
$A$
is a minimizer, so
$J_{\phi }(A\setminus B_{\mathrm{d}}(R/2^{d+1})) - J_{\phi }(A) = 0$
. This means all remaining sets must have measure zero, i.e.
Recall from (10) that
$A\cap B_{\mathrm{d}}(R/2^{d+1}) = U_3 \cup U_4 \cup U_5 \cup U_6$
. We have already shown that
$U_3 = \widetilde U_{3}\cup \widehat U_3$
,
$U_4$
and
$U_5$
all have measure zero. However, we notice that
$\widehat U_6 \subset B(\frac {R}{2^{d+1}} + \varepsilon ) \setminus B(\frac {R}{2^{d+1}}-\varepsilon )$
, and so we can conclude that
$\mathcal L^d(A \cap B(\frac {R}{2^{d+1}} - \varepsilon )) = 0$
.
Then combining with the facts about
$U_1, U_2,$
and
$ U_3$
, we then get that for any
$s \lt \frac {R}{2^{d+1}} - \varepsilon$
we have that
$A\setminus B_{\mathrm{d}}(s)$
is a minimizer of (8) and that
$A \cap B_{\mathrm{d}}(s)$
has measure zero.
Remark 4.7. As stated at the end of the proof,
$A\setminus B_{\mathrm{d}}(x,R/2^{d+2})$
is also a minimizer of (8). In addition to providing a necessary condition for minimizers, Lemma 4.6 also gives a construction for a minimizer that is disjoint from
$B_{\mathrm{d}}(x,R/2^{d+2})$
.
Considering the assumptions, we cannot relax the assumption that
$A$
is a minimizer to
$J_{\phi }(A\setminus B_{\mathrm{d}}(x,r)) - J_{\phi }(A) \ge 0$
as we could for Lemma 3.3 (see Remark 3.4). Although the energy exchange inequality will still hold, we require that
$A$
is a minimizer of (8) to show
$\mathcal L^d(A\cap B_{\mathrm{d}}(R/2^{d+2}))=0$
.
Assuming a minimizer to (8) exists, Lemma 4.6 allows us to show that minimizers are (a.e.) disjoint from certain sets where it is energetically advantageous to be assigned label 0 by the classifier. In Lemma 4.8, we will use this result to show that for a prescribed distance
$\eta \gt 0$
, there exists a minimizer of (8) that can be corralled to be within distance
$\eta$
of any Bayes classifier for all
$\varepsilon$
smaller than some threshold. This is the generalized version of Lemma 3.5. As we cannot expect minimizers of (8) to be sensitive to modification by a
$\mathcal L^d$
measure zero set, we do not expect arbitrary minimizers to have this property. However, from an arbitrary minimizer, Lemma 4.8 provides a method to construct a
$\mathcal{L}^d$
-a.e. equivalent minimizer that does satisfy this distance condition.
Lemma 4.8. Let
$A_0^{\max }$
be the maximal Bayes classifier, i.e.
$A_0^{\max } = \{x\in \mathbb{R}^d\,:\, w_0\rho _0(x) \leq w_1\rho _1(x) \}$
. Suppose
$\rho _0,\rho _1\gt 0$
and continuous and bounded from above on
$\mathbb{R}^d$
. Let
$K$
be a compact set and fix
$\eta \gt 0$
. Then, there exists an
$\varepsilon _0 \gt 0$
such that for any
$0\lt \varepsilon \le \varepsilon _0$
and deterministic attack function
$\phi$
satisfying Assumptions 2.1 and 4.4 for adversarial budget
$\varepsilon$
such that for any minimizer
$A_{\varepsilon ,\phi }$
of the generalized adversarial training problem (8) there exist a
$\mathcal L^d$
measure zero set
$N^{\max }\in \mathcal{B}(\mathbb{R}^d)$
such that
Furthermore,
$A_{\varepsilon ,\phi }\setminus N^{\max }$
is also a minimizer of (8).
Proof. We will follow the proof of Lemma 3.5. We again abuse notation and let
$A_0 = A_0^{\max }$
. Assume that
$(A_0^\eta )^{\mathsf{c}} \cap K \neq \emptyset$
as otherwise the result is trivial. The conditions are also trivially satisfied if
$w_0\rho _0 - w_1\rho _1$
never changes sign.
Let
$R = \frac {\eta }{3}$
. By the same argument as in Lemma 3.5, the continuity of
$w_0\rho _0 - w_1\rho _1$
on the compact set
$\overline {A_0^R}\cap \overline {K^{2R}}$
allows us to conclude that there exists a
$\delta \gt 0$
such that
$w_0\rho _0 - w_1\rho _1 \gt \delta$
on
$\left (\overline {A_0^R}\right )^{\mathsf{c}} \cap \overline {K^{2R}}$
and
$(A_0^\eta )^{\mathsf{c}} \cap K$
.
As
$(A_0^\eta )^{\mathsf{c}} \cap K$
is compact, there exists a finite covering of
$(A_0^\eta )^{\mathsf{c}} \cap K$
by
$\{B_{\mathrm{d}}(x_i,R/2^{d+2})\}_{1\le i\le n}$
for some
$n\in \mathbb{N}$
such that
where
$E_{\delta } = \{x \in \mathbb{R}^d \,:\, w_0\rho _0(x) - w_1\rho _1(x) \le \delta \}$
. Hence, each
$B_{\mathrm{d}}(x_i,2R)$
satisfies the conditions of Lemma 4.6 for
$\delta$
from the continuity bound. As the constant
$C$
from Lemma 4.6 is independent of
$x$
, we can let
$\varepsilon _0 = \min \left \{R/2^{d+2}, CR\delta ^{d+1}\right \}$
.
Suppose
$A_{\varepsilon ,\phi }$
is a minimizer of the generalized adversarial training problem (8) for some
$0\lt \varepsilon \le \varepsilon _0$
and let
$N^{\max } = \bigcup _{i=1}^n [A_{\varepsilon ,\phi }\cap B_{\mathrm{d}}(x_i, R/2^{d+2})]$
. Then,
\begin{equation*} \mathcal L^d\left ( \bigcup _{i=1}^n \left [A_{\varepsilon ,\phi } \cap B_{\mathrm{d}}(x_i, R/2^{d+2})\right ]\right ) \le \sum _{i=1}^n \mathcal L^d(A_{\varepsilon ,\phi }\cap B_{\mathrm{d}}(x_i, R/2^{d+2})) =0,\end{equation*}
so
$N^{\max }$
is a
$\mathcal L^d$
measure zero set. By Remark 4.7, an iterative application of Lemma 4.6, removing one norm ball at a time, ensures that
$A_{\varepsilon ,\phi }\setminus N^{\max }$
is a minimizer of (8). Furthermore,
$[A_{\varepsilon ,\phi }\setminus N^{\max }] \cap [(A_0^\eta )^{\mathsf{c}} \cap K] = \emptyset$
by construction which implies
Remark 4.9. In Lemma 4.6, we require compactness both for the continuity argument and for the finite covering argument to ensure that we are removing a set of
$\mathcal L^d$
measure zero. Compare this with Lemma 3.5 and Remark 3.6.
We can analogously show that (up to a set of
$\mathcal L^d$
measure zero) we can corral
$A_{\varepsilon ,\phi }$
by an
$\eta$
-erosion of the minimal Bayes classifier
$A_0^{\min }$
by considering the flipped density problem. We can apply the result from Lemma 4.8 to conclude that on a compact set
$K\subset \mathbb{R}^d$
for a fixed
$\eta \gt 0$
, there exists a
$\varepsilon _0 \gt 0$
such that for
$0\lt \varepsilon \le \varepsilon _0$
and a deterministic attack function
$\phi$
with adversarial budget
$\varepsilon$
satisfying the appropriate assumptions, then for any minimizer
$A_{\varepsilon ,\phi }$
of (8) there exist a
$\mathcal L^d$
measure zero set
$N^{\min }$
such that
Observe that by construction
$N^{\max }\subset A_0^{\mathsf{c}}$
and
$N^{\min } \subset A_0$
so the two sets are disjoint. Like in the previous case, this establishes a two-sided, ‘corralling’ bound on any minimizer for
$\varepsilon$
small enough, namely,
Remark 4.10. If the Bayes classifier is unique in the sense of Remark 1.1, then for any
$\eta \gt 0$
and compact set
$K\subset \mathbb{R}^d$
, there exists an
$\varepsilon _0 \gt 0$
such that for all
$0\lt \varepsilon \le \varepsilon _0$
and
$\phi$
satisfying Assumptions 2-1 and 4.4 for adversarial budget
$\varepsilon$
,
provided that
$A_{\varepsilon ,\phi }$
exists.
Following the sequence of proofs in Section 3, we will now use the corralling result from Lemma 4.8 to examine the distance between minimizers of the generalized adversarial training problem (8) and Bayes classifiers. The next theorem is the generalization of Theorem3.9 and establishes uniform convergence in the Hausdorff distance. As previously stated, there is currently no proof of existence for minimizers of (8), so this result should be seen as a type of a priori uniform convergence estimate.
Theorem 4.11. Let
$\phi$
be a deterministic attack function satisfying Assumptions 2.1 and 4.4. Suppose
$\rho _0, \rho _1$
are continuous and bounded from above on
$\mathbb{R}^d$
. Additionally, suppose
$\{A_{\varepsilon _i,\phi }\}_{i\in \mathbb{N}}$
is a sequence of minimizers of the generalized adversarial training problem (8) with
$\varepsilon _i \to 0^+$
as
$i\to \infty$
. For any compact set
$K\subset \mathbb{R}^d$
, there exist sequences
$\{N^{\min }_i\}_{i\in \mathbb{N}}$
and
$\{N^{\max }_i\}_{i\in \mathbb{N}}$
of
$\mathcal L^d$
measure zero sets such that
and
Proof. The proof is identical to that of Theorem3.9, where
$N^{\max }_i$
and
$N_i^{\min }$
are as defined in the proof of Lemma 4.8.
Corollary 4.12. If the Bayes classifier
$A_0$
is unique in the sense of Remark 1.1, then under the same assumptions as Theorem 3.9,
Proof. This follows directly from Theorem4.11.
Recall that Assumption 3.11 is a non-degeneracy assumption on the Bayes classifier
$A_0$
that ensures that
$\mathrm{d}_H(A_0^{\max },A_0^{\min }) = 0$
and that
$A_0$
is unique up to a set of
$\mathcal L^d$
measure zero. If we assume that the Bayes classifier is non-degenerate, then it becomes natural to examine the rates of convergence.
Corollary 4.13. Let
$\phi$
be a deterministic attack function satisfying Assumptions 2.1 and 4.4. Suppose Assumption 3.11 holds and that for every
$\varepsilon \gt 0$
, there exists a minimizer
$A_{\varepsilon ,\phi }$
to the generalized adversarial training problem (8). Additionally, suppose
$\rho _0,\rho _1$
are continuous and bounded from above on
$\mathbb{R}^d$
. For any compact set
$K\subset \mathbb{R}^d$
, there exist sequences
$\{N^{\min }_i\}_{i\in \mathbb{N}}$
and
$\{N^{\max }_i\}_{i\in \mathbb{N}}$
of
$\mathcal L^d$
measure zero sets and a constant
$C\gt 0$
such that
where
$A_0$
is the Bayes classifier.
Proof. This proof is identical to that of Corollary 3.12.
Remark 4.14. As in Lemma 3.12, we expect that the convergence rate for minimizers of the generalized adversarial training problem (8) should be improved to
$O(\varepsilon )$
, but this would require more refined estimates than those available in Lemma 4.6.
4.1. Application to the probabilistic adversarial training problem
We now turn our attention to the probabilistic adversarial training problem (5), which we will view as an instance of the generalized adversarial training problem (8). In order to apply the results for (8) to (5), we must verify that Assumptions 2.1 and 4.4 hold. In Remark 2.3, we established that (5) satisfies Assumption 2.1; thus, it only remains to show in the following proposition that (5) satisfies Assumption 4.4.
Proposition 4.15. Let
$\varepsilon \gt 0$
,
$p \in [0,1)$
and
$\{\mathfrak p_{x,\varepsilon }\}_{x\in \mathbb{R}^d}$
be a family of probability measures satisfying Assumption 1.3. The deterministic attack function
$\phi _{\varepsilon ,p}$
associated with the probabilistic adversarial training problem (5) satisfies Assumption 4.4.
Proof. Suppose
$x\in A^{\mathsf{c}}$
such that
$\mathcal L^d(A\cap B_{\mathrm{d}}(x,\varepsilon )) \gt \beta \varepsilon ^d$
for some
$\beta \gt 0$
to be determined. It will be sufficient to show that
$\mathbb{P}(x'\in A\,:\,x'\sim \mathfrak p_{x,\varepsilon }) \gt p$
. Recall that we can express
\begin{align*} \mathbb{P}(x'\in A \,:\, x' \sim \mathfrak p_{x,\varepsilon }) &= \int _{\mathbb{R}^d} \varepsilon ^{-d} {\unicode{x1D7D9}}_{A}(x') \xi \left (\frac {x'-x}{\varepsilon }\right ) \, dx' \\[5pt] &= \int _{A \cap B_{\mathrm{d}}(x,\varepsilon )} \varepsilon ^{-d} \xi \left (\frac {x'-x}{\varepsilon }\right ) \, dx'\\[5pt] &\gt c\varepsilon ^{-d}\mathcal L^d(A\cap B_{\mathrm{d}}(x,\varepsilon )) \end{align*}
where
$c\gt 0$
is the lower bound on
$\xi$
from Assumption 1.3. If
$\beta = \frac {p}{c}$
, then
$\mathbb{P}(x'\in A \,:\, x' \sim \mathfrak p_{x,\varepsilon }) \gt p$
as desired. As the probabilistic adversarial training problem (5) satisfies the complement property, this is sufficient to conclude that Assumption 4.4 holds for
$\beta = \frac {p}{c}$
.
Since the probabilistic adversarial training problem (5) satisfies the requisite assumptions, we can state the following convergence result.
Theorem 4.16. Suppose
$\rho _0, \rho _1$
are continuous and bounded from above on
$\mathbb{R}^d$
and fix
$p \in [0,1)$
. Additionally, suppose
$\{A_{\varepsilon _i,p}\}_{i\in \mathbb{N}}$
is a sequence of minimizers of the probabilistic adversarial training problem (5) with
$\varepsilon _i \to 0^+$
as
$i\to \infty$
. For any compact set
$K\subset \mathbb{R}^d$
, there exist sequences
$\{N^{\min }_i\}_{i\in \mathbb{N}}$
and
$\{N^{\max }_i\}_{i\in \mathbb{N}}$
of measure zero sets such that
and
When Assumption 3.11 holds, Theorem4.16 asserts that
where
$A_0$
is the unique Bayes classifier. Applying Corollary 4.13 in this case, we find that the minimizers for the probabilistic training problem (5) converge to the Bayes classifier at the rate
$O(\varepsilon ^{\frac {1}{d+2}})$
.
We conclude the discussion of the probabilistic adversarial training problem (5) by commenting on why this result fails to extend to the
$\Psi$
-perimeter problem mentioned in Remark 1.4.
Remark 4.17. Recall that [4] considers the
$\Psi$
adversarial training problem
where the
$\Psi$
-perimeter is given by
where
$\Psi \,:\,[0,1]\to [0,1]$
is concave and non-decreasing. As opposed to the probabilistic adversarial training problem (5), the existence of minimizers to (17) has been established.
However, notice that the
$\Psi$
-perimeter cannot be expressed as
$w_0\rho _0(\Lambda _\Psi ^0(A)) + w_1\rho _1(\Lambda _\Psi ^1(A))$
if
$\Psi$
is concave and non-decreasing as indicator functions are not concave. The
$\Psi$
-perimeter is an example of a data-adapted perimeter from the literature that cannot be represented via the deterministic attack framework. At present, whether the energy exchange inequality holds for the
$\Psi$
-perimeter remains an open question and proving this inequality for the
$\Psi$
-perimeter would be a promising first step towards showing uniform convergence of minimizers of (17).
5. Conclusion
In this paper, we developed a unifying framework for the adversarial and probabilistic adversarial training problems to define more generalized adversarial attacks. Under natural set-algebraic assumptions, we derived the energy exchange inequality to quantify the effect of removing a set where a given label was energetically preferable from a minimizer. Utilizing the energy exchange inequality to show that there exist minimizers disjoint from sets where the label 0 is strongly preferred energetically, we then proved uniform convergence in the Hausdorff distance for various adversarial attacks. This significantly strengthens the type of convergence established via
$\Gamma$
-convergence techniques [Reference Bungert and Stinson7], as well as generalizing it to a broader class of adversarial attacks. Finally, we derived the rate of convergence based on our proof techniques.
There are various future directions of research suggested by our results in this paper. First, the uniform convergence results increase the information that we have about minimizers and sequences of approximate minimizers. That information may be useful in establishing regularity results about minimizers, for example, in the case of the adversarial training problem (3), or may provide helpful information for proving existence in the generalized case. A different avenue of research to pursue would be to sharpen the convergence rates found in this paper by improving estimates from Lemmas 3.3 and 4.6 to determine whether the formally derived rate of
$O(\varepsilon )$
can be achieved. Finally, one could consider how to expand the theoretical deterministic attack function framework to encapsulate other types of adversarial training problems, such as
$\Psi$
adversarial training problem (17).
Acknowledgements
The authors would also like to thank the reviewers for their insightful comments, in particular, one reviewer who offered a simplified proof of Lemma 3.3.
Funding statement
The authors gratefully acknowledge the support of the NSF DMS 2307971 and the Simons Foundation TSM.
Competing interest
The authors declare none.
Appendix A
A.1. The
$U$
sets for
$\phi _\varepsilon$
In Remark 2.4, we claim that further conclusions about the
$U$
sets may be drawn when
$\phi = \phi _\varepsilon$
. We will now verify these claims. We consider only the cases where whether the entire set
$U_i$
is attacked cannot be unambiguously determined by
$\Lambda$
-monotonicity (see Table 1). In all of the following cases, we assume that the interaction of
$A,E$
is nontrivial in the sense that
$A\cap E$
and
$A^{\mathsf{c}} \cup E$
are both nonempty. Otherwise, the following sets will either be empty themselves or we trivially find
$\mathrm{d}(x,\emptyset )=\infty$
.

Figure A1. A degenerate example where
$U_6$
and
$U_9$
are neither solely attacked nor unattacked sets. The example arises because the boundaries of
$A$
and
$B_{\mathrm{d}}(R)$
coincide. The pink and purple sets represent the
$\varepsilon$
-perimeter regions of
$A$
, whereas the blue and purple regions represent the
$\varepsilon$
-perimeter regions for
$A\setminus \overline {B_{\mathrm{d}}(R)}$
.
Proposition A.1. Let
$\phi = \phi _\varepsilon$
and
$A,E\in \mathcal B(\mathbb{R}^d)$
. Then,
$\widehat U_1 = \emptyset$
.
Proof. Suppose
$x\in U_1\subset A\cap E^{\mathsf{c}}$
. By construction, we have
$\mathrm{d}(x,A^{\mathsf{c}}) \ge \varepsilon$
and
$\mathrm{d}(x,E)\ge \varepsilon$
. This implies that
$\mathrm{d}(x,A^{\mathsf{c}} \cup E) = \mathrm{d}(x,(A\setminus E)^{\mathsf{c}})\ge \varepsilon$
as well. Thus,
$\widehat U_1 = \emptyset$
.
Proposition A.2. Let
$\phi = \phi _\varepsilon$
and
$A,E\in \mathcal B(\mathbb{R}^d)$
. Then,
$\widetilde U_3 = \emptyset$
.
Proof. Suppose
$x\in U_3\subset A\cap E$
. By construction, we have
$\mathrm{d}(x,A^{\mathsf{c}}) \ge \varepsilon$
and
$\mathrm{d}(x,E^{\mathsf{c}})\lt \varepsilon$
. As
$\mathrm{d}(x,A^{\mathsf{c}}) \ge \varepsilon$
,
$B(x,\varepsilon )\subset A$
. Furthermore, as
$\mathrm{d}(x,E^{\mathsf{c}}) \lt \varepsilon$
,
$B(x,\varepsilon ) \cap E^{\mathsf{c}} \neq \emptyset$
. Thus, there exists some
$y \in B(x,\varepsilon )\cap E^{\mathsf{c}}\subset A\cap E^{\mathsf{c}}$
. Hence,
$\mathrm{d}(x,A\setminus E) \lt \varepsilon$
, so
$\widetilde U_3 = \emptyset$
.
Proposition A.3. Let
$\phi = \phi _\varepsilon$
and
$A,E\in \mathcal B(\mathbb{R}^d)$
. Then,
$\widetilde U_{10} = \emptyset$
.
Proof. Suppose
$x\in U_{10}$
. By construction, we have
$\mathrm{d}(x,A)\lt \varepsilon$
and
$\mathrm{d}(x,E)\ge \varepsilon$
. As
$\mathrm{d}(x,E)\ge \varepsilon$
,
$B(x,\varepsilon ) \subset E^{\mathsf{c}}$
. Furthermore, as
$\mathrm{d}(x,A) \lt \varepsilon$
,
$B(x,\varepsilon ) \cap A \neq \emptyset$
. Thus, there exists some
$y \in B(x,\varepsilon )\cap A \subset A\cap E^{\mathsf{c}}$
. Hence,
$\mathrm{d}(x,A\setminus E) \lt \varepsilon$
, so
$\widetilde U_{10} = \emptyset$
.
As for
$U_6, U_9$
and
$U_{11}$
, we can make no determinations about whether all points in these sets must be attacked or not. Figure 2 shows an example where
$U_{11}$
must be split into attacked and unattacked subsets. In special cases where the boundaries of the sets
$A$
and
$E$
coincide,
$U_6$
and
$U_9$
may also need to be split into attacked and unattacked subsets (see Figure A1).
A.2.
$\Lambda$-set decompositions
For completeness, we give further details about the decompositions by
$U$
sets in Proposition 2.5, namely
$\Lambda _\phi ^0(A\setminus E), \Lambda _\phi ^1(A\setminus E), A\cap E, A\setminus E, (A\setminus E)^{\mathsf{c}}, D_\phi (A;\,E),$
and
$D_\phi (E^{\mathsf{c}};\,A)$
. Table 1 is reproduced for ease of reference.
-
$\bullet$
$\Lambda _{\phi }^0(A\setminus E)$
is comprised of all
$U$
sets such that
$U_i\not \in A\setminus E$
and the points can be attacked by the adversary for the classifier
$A\setminus E$
. The
$U_i\not \in A\setminus E$
are all
$U$
sets such that the
$\Lambda$
-set for
$A$
has the superscript 0 or the
$\Lambda$
-set for
$E$
has the superscript 1, that is,
$U_3, U_4, U_5, U_6, U_9, U_{10}, U_{11},U_{12}$
and
$U_{13}$
. However,
$U_4, U_5, U_{12},$
and
$U_{13}$
are all unattacked by Table 1. Thus,
$\Lambda _{\phi }^0(A\setminus E)$
contains the attacked subsets of
$U_3, U_6, U_9, U_{10},$
and
$U_{11}$
. Hence,
\begin{equation*}\Lambda _{\phi }^0(A\setminus E) = \widehat U_3\cup \widehat U_6\cup \widehat U_9\cup \widehat U_{10} \cup \widehat U_{11}.\end{equation*}
-
$\bullet$
$\Lambda _{\phi }^1(A\setminus E)$
is comprised of all
$U$
sets such that
$U_i\in A\setminus E$
and the points can be attacked by the adversary for the classifier
$A\setminus E$
. The
$U_i\not \in A\setminus E$
are all
$U$
sets such that the
$\Lambda$
-set for
$A$
has the superscript 1 and the
$\Lambda$
-set for
$E$
has the superscript 0, that is,
$U_1, U_2, U_7,$
and
$U_8$
. By Table 1, the sets
$U_2, U_7$
and
$U_8$
are belong entirely to
$\Lambda _{\phi }^1(A\setminus E)$
, so
$\Lambda _{\phi }^1(A\setminus E)$
contains those sets and the attacked subset of
$U_1$
. Hence,
\begin{equation*}\Lambda _{\phi }^1(A\setminus E) = \widehat U_{1}\cup U_2\cup U_7\cup U_8.\end{equation*}
-
$\bullet$
$A\cap E$
is comprised of all
$U$
sets such that the
$\Lambda$
-sets for
$A$
and
$E$
both have the superscript
$1$
. Hence,
\begin{equation*}A\cap E = U_3\cup U_4\cup U_5 \cup U_6.\end{equation*}
-
$\bullet$
$A\setminus E$
is comprised of all
$U$
sets such that the
$\Lambda$
-set for
$A$
has the superscript 1 and the
$\Lambda$
-set for
$E$
has the superscript
$0$
. Hence,
\begin{equation*}A\setminus E = U_1\cup U_2\cup U_7\cup U_8.\end{equation*}
-
$\bullet$
$(A\setminus E)^{\mathsf{c}}$
is comprised of all
$U$
sets not in
$A\setminus E$
, or alternatively, all
$U$
sets such that either the
$\Lambda$
-set for
$A$
has the superscript 0 or the
$\Lambda$
-set for
$E$
has the superscript
$1$
. Hence,
\begin{equation*}(A\setminus E)^{\mathsf{c}} = U_3\cup U_4\cup U_5\cup U_6\cup U_9\cup U_{10}\cup U_{11}\cup U_{12}\cup U_{13}.\end{equation*}
-
$\bullet$
Recall
$D_\phi (A;\,E) = w_0\rho _0(\Lambda _\phi ^0(A)\cap E) + w_1\rho _1(\Lambda _\phi ^1(A)\cap E)$
. The set
$E$
can be expressed in terms of
$\Lambda$
-sets by
$E =\Lambda _\phi ^1(E) \cup \tilde \Lambda _\phi ^1(E).$
By Table 1,
\begin{equation*} D_\phi (A;\,E)= w_0\rho _0(U_{11}\cup U_{12}) + w_1\rho _1(U_5\cup U_6).\end{equation*}
-
$\bullet$
Recall
$D_\phi (E^{\mathsf{c}};\,A) = w_0\rho _0(\Lambda _\phi ^1(E)\cap A) + w_1\rho _1(\Lambda _\phi ^0(E)\cap A)$
by the Complement Property of
$\Lambda$
-sets. The set
$A$
can be expressed in terms of
$\Lambda$
-sets by
$A = \Lambda _\phi ^1(A)\cup \tilde \Lambda _\phi ^1(A)$
. By Table 1,
\begin{equation*} D_\phi (E^{\mathsf{c}};\,A)= w_0\rho _0(U_{3}\cup U_{6}) + w_1\rho _1(U_2\cup U_7).\end{equation*}






























