To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter, we introduce attacks/threats against machine learning. A primary aim of an attack is to cause the neural network to make errors. An attack may target the training dataset (its integrity or privacy), the training process (deep learning), or the parameters of the DNN once trained. Alternatively, an attack may target vulnerabilities by discovering test samples that produce erroneous output. The attacks include: (i) TTEs, which make subtle changes to a test pattern, causing the classifier’s decision to change; (ii) data poisoning attacks, which corrupt the training set to degrade accuracy of the trained model; (iii) backdoor attacks, a special case of data poisoning where a subtle (backdoor) pattern is embedded into some training samples, with their supervising label altered, so the classifier learns to misclassify to a target class when the backdoor pattern is present; (iv) reverse-engineering attacks, which query a classifier to learn its decision-making rule; and (v) membership inference attacks, which seek information about the training set from queries to the classifier. Defenses aim to detect attacks and/or to proactively improve robustness of machine learning. An overview is given of the three main types of attacks (TTEs, data poisoning, and backdoors) investigated in subsequent chapters.
In this chapter, we focus on before/during training backdoor defense, where the defender is also the training authority, with control of the training process and responsibility for providing an accurate, backdoor-free DNN classifier. Deployment of a backdoor defense during training is supported by the fact that the training authority is usually more resourceful in both computation and storage than a downstream user of the trained classifier. Moreover, before/during training detection could be easier than post-training detection because the defender has access to the (possibly poisoned) training set and, thus, to samples that contain the backdoor pattern. However, before/during training detection is still highly challenging because it is unknown whether there is poisoning and, if so, which subset of samples (among many possible subsets) is poisoned. A detailed review of backdoor attacks (Trojans) is given, and optimization-based reverse-engineering defense for training set cleansing deployed before/during classifier training is described. The defense is designed to detect backdoor attacks on samples with a human-imperceptible backdoor pattern, as widely considered in existing attacks and defenses. Detection of training set poisoning is achieved by reverse engineering (estimating) the pattern of a putative backdoor attack, considering each class as the possible target class of an attack.
According to the Schrödinger equation, a particle with wave character and mass in the presence of a potential may be described as a state that is a function of space and time. Space and time are assumed to be smooth and continuous. The potential can localize the particle to one region of space forming a bound state.
Previous chapters exclusively considered attacks against classifiers. In this chapter, we devise a backdoor attack and defense for deep regression or prediction models. Such models may be used to, for example, predict housing prices in an area given measured features, to estimate a city’s power consumption on a given day, or to price financial derivatives (where they replace complex equation solvers and vastly improve the speed of inference). The developed attack is made most effective by surrounding poisoned samples (with their mis-supervised target values) by clean samples, in order to localize the attack and thus make it evasive to detection. The developed defense involves the use of a kind of query-by-synthesis active learning which trades off depth (local error maximizers) and breadth of search. Both the developed attack and defense are evaluated for an application domain that involves the pricing of a simple (single barrier) financial option.
Scattering experiments are one of our most important tools for extracting information about the structure and interactions of microscopic systems. In these experiments, we prepare a beam of particles of a given type and we direct it towards a target. The interaction of the particles in the beam with those of the target may lead to various phenomena: changes in the direction and the energy of incoming particles, absorption of incoming particles, the appearance of new species of particles, and so on. The target is surrounded by particle detectors that identify the particles that exit the interaction region and measure their momenta.
In previous chapters, we saw that quantum theory is unique among physical theories, in that its predictions refer exclusively to measurement outcomes rather than to the properties of physical objects. It is therefore no surprise that the study of quantum measurements has developed into a research field on its own. The earlier studies of quantum measurements focused on conceptual and foundational issues, but in recent years quantum measurement theory has become a crucial tool for quantum technologies.
Quantum mechanics is a very successful description of atomic scale systems. The mathematical formalism relies on the algebra of noncommuting linear Hermitian operators. Postulates provide a logical framework with which to make contact with the results of experimental measurements.
In this chapter we describe unsupervised post-training defenses that do not make explicit assumptions regarding the backdoor pattern or how it was incorporated into clean samples. These backdoor defenses aim to be “universal.” They do not produce an estimate of the backdoor pattern (which may be valuable information as the basis for detecting backdoor triggers at test time, the subject of Chapter 10). We start by describing a universal backdoor detector that does not require any clean labeled data. This approach optimizes over the input image to the DNN, seeking the input that yields the maximum margin (for each putative target class of an attack). The premise here, under a winner-take-all decision rule, is that backdoors produce much larger classifier margins than those of un-attacked examples. Then a universal backdoor mitigation strategy is described that does leverage a small clean dataset. This optimizes a threshold (tamping down unusually large ReLU activations) for each neuron in the network. In each backdoor attack scenario described, different detection and mitigation strategies are compared, where some mitigation strategies are also known as “unlearning” defenses. Some universal backdoor defenses modify or augment the DNN itself, while others do not.
In this chapter we focus on post-training defense against backdoor data poisoning (Trojans). The defender has access to the trained DNN but not to the training set. The following are examples. (i) Proprietary: a customized DNN model purchased by government or a company without data rights and without training set access. (ii) Legacy: the data is long forgotten or not maintained. (iii) Cell phone apps: the user has no access to the training set for the app classifier. It is also assumed that a clean labeled dataset (no backdoor poisoning) is available with a small number of examples from each of the classes from the domain. This clean labeled dataset is insufficient for retraining and its small size makes its availability a reasonable assumption. Reverse-engineering defenses (REDs) are described including one that estimates putative backdoor patterns for each candidate (source class, target class) backdoor pair and then assesses an order statistic p-value on the sizes of these perturbations. This is successful at detecting subtle backdoor patterns, including sparse patterns involving few pixels, and global patterns where many pixels are modified subtly. A computationally efficient variant is presented. The method addresses additive backdoor embeddings and other embedding functions.