To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
So far we have explored classifiers with decision boundaries that are linear, or, in the case of the multiclass logistic regression, a combination of linear segments. In this chapter, we will expand what we have learned so far to classifiers that are capable of learning nonlinear decision boundaries. The classifiers that we will discuss here are called feed-forward neural networks, and are a generalization of both logistic regression and the perceptron. Despite the more complicated structures presented, we show that the key building blocks remain the same: the network is trained by minimizing a cost function. This minimization is implemented with backpropagation, which adapts the gradient descent algorithm introduced in the previous chapter to multilayer neural networks.
This chapter covers the perceptron, the simplest neural network architecture. In general, neural networks are machine learning architectures loosely inspired by the structure of biological brains. The perceptron is the simplest example of such architectures: it contains a single artificial neuron. The perceptron will form the building block for the more complicated architectures discussed later in the book. However, rather than starting directly with the discussion of this algorithm, we will start with something simpler: a children’s book and some fundamental observations about machine learning. From these, we will formalize our first machine learning algorithm, the perceptron.
In this chapter, we provide an implementation of the multilayer neural network described in Chapter 5, along with several of the best practices discussed in Chapter 6. Still keeping things fairly simple, our network will consist of two fully connected layers: a hidden layer and an output layer. Between these layers, we will include dropout and a nonlinearity. Further, we make use of two PyTorch classes: a Dataset and a DataLoader. The advantage of using these classes is that they make several things easy, including data shuffling and batching. Last, since the classifier’s architecture has become more complex, for optimization we transition from stochastic gradient descent to the Adam optimizer to take advantage of its additional features such as momentum and L2 regularization.
This chapter motivates the need for a book that covers both theoretical and practical aspects of deep learning for natural language processing. We summarize the content of the book, as well as aspects that are not within scope, and current limitations of deep learning in general.
In Chapters 10 and 12, we focused on two common usages of recurrent neural networks and transformer networks: acceptors and transducers. In this chapter, we discuss a third architecture for both recurrent neural networks and transformer networks: encoder-decoder methods. We introduce three encoder-decoder architectures, which enable important NLP applications such as machine translation. In particular, we discuss the sequence-to-sequence method of Sutskever et al. (2014), which couples an encoder long short-term memory with a decoder long short-term memory. We follow this method with the approach of Bahdanau et al. (2015), which extends the previous decoder with an attention component, which produces a different encoding of the source text for each decoded word. Last, we introduce the complete encoder-decoder transformer network, which relies on three attention mechanisms: one within the encoder (which we discussed in Chapter 12), a similar one that operates over decoded words, and, importantly, an attention component that connects the input words with the decoded ones.
As mentioned in Chapter 8, the distributional similarity algorithms discussed there conflate all senses of a word into a single numerical representation (or embedding). For example, the word bank receives a single representation, regardless of its financial (e.g., as in the bank gives out loans) or geological (e.g., bank of the river) sense. This chapter introduces a solution for this limitation in the form of a new neural architecture called transformer networks, which learns contextualized embeddings of words, which, as the name indicates, change depending on the context in which the words appear. That is, the word bank receives a different numerical representation for each of its instances in the two texts above because the contexts in which they occur are different. We also discuss several architectural choices that enabled the tremendous success of transformer networks: self attention, multiple heads, stacking of multiple layers, and subword tokenization, as well as how transformers can be pretrained on large amounts of data through through masked language modeling and next-sentence prediction.
Up to this point, we have only discussed neural approaches for text classification (e.g., review and news classification) that handle the text as a bag of words. That is, we aggregate the words either by representing them as explicit features in a feature vector or by averaging their numerical representations (i.e., embeddings). Although this strategy completely ignores the order in which words occur in a sentence, it has been repeatedly shown to be a good solution for many practical natural language processing applications that are driven by text classification. Nevertheless, for many natural language processing tasks such as part-of-speech tagging, we need to capture the word-order information more explicitly. Sequence models capture exactly this scenario, where classification decisions must be made using not only the current information but also the context in which it appears. In particular, we discuss several types of recurrent neural networks, including stacked (or deep) recurrent neural networks, bidirectional recurrent neural networks, and long short-term memory networks. Last, we introduced conditional random fields, which extend recurrent neural networks with an extra layer that explicitly models transition probabilities between two cells.
Deep Learning is becoming increasingly important in a technology-dominated world. However, the building of computational models that accurately represent linguistic structures is complex, as it involves an in-depth knowledge of neural networks, and the understanding of advanced mathematical concepts such as calculus and statistics. This book makes these complexities accessible to those from a humanities and social sciences background, by providing a clear introduction to deep learning for natural language processing. It covers both theoretical and practical aspects, and assumes minimal knowledge of machine learning, explaining the theory behind natural language in an easy-to-read way. It includes pseudo code for the simpler algorithms discussed, and actual Python code for the more complicated architectures, using modern deep learning libraries such as PyTorch and Hugging Face. Providing the necessary theoretical foundation and practical tools, this book will enable readers to immediately begin building real-world, practical natural language processing systems.
Machine translation technology is having increasing applications in health and medical settings that involve communications and interactions between people from diverse language, cultural background. Machine translation tools offer low-cost, and accessible solutions to help close the gap in cross-lingual health communications. The risks of machine translation need to be effectively controlled and properly managed to boost the confidence in this developing technology among health professionals. This study integrates the methodological benefits of machine learning in machine translation quality evaluation, and more importantly, the prediction of clinically relevant machine translation errors based on the study of linguistic features of the English source texts.
Access to health-related information is vital in our society. Official health websites provide essential and beneficial information to the general population. In particular, they can represent a crucial public service when this information is fundamental to fight against a new health threat –such as the COVID-19 pandemic. Yet, for these websites to achieve their ultimate informative goal, they need to ensure that their content is accessible to all users, especially to people with disabilities. Many of these websites –especially those from institutions operating in multilingual countries – offer their content in several languages, which, by definition, is an accessibility best practice. However, the level of accessibility achieved might not always be the same in all the language versions available. In fact, previous studies focusing on other types of multilingual websites have shown that localized versions are usually less accessible than the original ones. In this chapter, we present a research study that involved the examination of seventy-four official multilingual health sites to understand the current situation in terms of accessibility compliance. In particular, the home pages in two languages – English, original version, and Spanish, localized version – were checked against two specific success criteria (SC) from the Web Content Accessibility Guidelines (WCAG) current standard, using both automatic and manual evaluation methods. We observed that although overall accessibility scores were similar, the localized pages obtained worse results in the two SC analyzed more in depth – that is, language and title of the page. We contend that this finding could be explained by a lack of accessibility awareness or knowledge of those participating in the localization process. We thus advocate the existence of web professionals with an interdisciplinary background that could create multilingual accessible sites, providing an inclusive web experience for all.
In emergency care settings, there is a crucial need for automated translation tools. We focus here on the BabelDr system, a speech-enabled fixed-phrase translator used to improve communication in emergency settings between doctors and allophone patients. The aim of the chapter is two-fold. First, we will assess if a bidirectional version of the phraselator allowing patients to answer doctors’ questions by selecting pictures from open-source databases will improve user satisfaction. Second, we wish to evaluate pictograph usability in this context. Our hypotheses are that images will in fact help to improve patient satisfaction and that multiple factors influence pictograph usability. Factors of interest include not only the comprehensibility of the pictographs per se, but also how the images are presented to the user with respect to their number and ordering. We showed that most respondents prefer to use the interface with pictographs and that multiple factors influence participants’ ability to find a pictograph based on a written form, but that the comprehensibility of the individual pictographs is probably the most important.