To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
With a machine learning approach and less focus on linguistic details, this gentle introduction to natural language processing develops fundamental mathematical and deep learning models for NLP under a unified framework. NLP problems are systematically organised by their machine learning nature, including classification, sequence labelling, and sequence-to-sequence problems. Topics covered include statistical machine learning and deep learning models, text classification and structured prediction models, generative and discriminative models, supervised and unsupervised learning with latent variables, neural networks, and transition-based methods. Rich connections are drawn between concepts throughout the book, equipping students with the tools needed to establish a deep understanding of NLP solutions, adapt existing models, and confidently develop innovative models of their own. Featuring a host of examples, intuition, and end of chapter exercises, plus sample code available as an online resource, this textbook is an invaluable tool for the upper undergraduate and graduate student.
Starting from this chapter, we discuss the main research topics of sentiment analysis and their state-of-the-art algorithms. Document sentiment classification (or document-level sentiment analysis) is perhaps the most extensively studied topic in the field of sentiment analysis so far, especially in its early days (see the surveys by Pang and Lee, 2008a; Liu, 2012). It aims to classify an opinion document (e.g., a product review) as expressing a positive or a negative opinion (or sentiment), which are called sentiment orientations or polarities. This task is referred to as document-level analysis because it considers each document as a whole and does not study entities or aspects inside the document or determine sentiments expressed about them. Arguably, this task is the one that popularized sentiment analysis research. Its limitations also motivated the fine-grained task of aspect-based sentiment analysis (Hu and Liu, 2004) (Chapters 5 and 6), which is widely used in practice today.
Opinion documents come in many different forms. So far, we have implicitly assumed that individual documents are independent of each other or have no relationships. In this chapter, we move on to two forms of social media contexts that involve extensive interactions of their participants and are also full of expressions of sentiments and opinions: debates/discussions and comments. However, the key characteristic of the documents in such media forms is that they are not independent of each other, in contrast to stand-alone documents such as reviews and blog posts. The interactive exchanges and discussions among participants make these media forms much richer targets for analysis. Such interactions can be seen as relationships or links both among participants and among posts. Thus, we can not only perform sentiment analysis, as discussed in previous chapters, but also carry out other types of analyses that are characteristic of interactions – for example, grouping people into camps, discovering contentious issues of debates, mining agreement and disagreement expressions, discovering pairwise arguing nature, and so on. Because debates are exchanges of arguments and reasoning among participants who may be engaged in some kind of deliberation to achieve a common goal, it is interesting to study whether each participant in online debate forums gives reasoned arguments with justifiable claims via constructive debates, or whether a participant just exhibits dogmatism and egotistical clashes of ideologies. These tasks are important for many fields of social science, such as political science and communications. Central to these tasks are the sentiments of agreement and disagreement, which are instrumental to these analyses. These additional types of analyses are the focus of this chapter.
In this chapter, we discuss the quality of reviews. This topic is related to opinion spam detection yet also different, because low-quality reviews may not be spam or fake reviews, and fake reviews may not be perceived as low-quality reviews by readers. Indeed, as we discussed in Chapter 12, it is very difficult to spot fake reviews simply by reading them. For this reason, fake reviews may also be seen as helpful or high-quality reviews if the imposters write their reviews early and craft them well.
As discussed in Chapter 2, in most sentiment analysis applications, one needs to study opinions from many people; due to the subjective nature of opinions, looking at only the opinion from a single person is usually not sufficient. To understand a large number of opinions, some form of summary is necessary. Definition 2.10 in Chapter 2 defined a structured opinion summary called aspect-based summary, also known as feature-based summary in the reports by Hu and Liu (2004) and Liu et al. (2005). Much of the opinion summarization research is based on this definition. This form of summary is also widely used in industry. For example, both Microsoft Bing and Google Product Search use aspect-based summary in their opinion analysis systems.
Apart from directly expressing positive or negative opinions about an entity and/or its aspects, one can also express opinions by comparing similar entities. Such opinions are called comparative opinions (Jindal and Liu, 2006a, 2006b). Comparative opinions have different semantic meanings from regular opinions as well as different syntactic forms. For example, a typical regular opinion sentence is “The voice quality of this phone is amazing,” and a typical comparative opinion sentence is “The voice quality of Moto X is better than that of iPhone 5.” This comparative sentence does not say that any phone’s voice quality is good or bad, but simply states a relative ordering in terms of voice quality of the two smartphones. Like regular sentences, comparative sentences can be opinionated or not-opinionated. The preceding comparative sentence is clearly opinionated because it explicitly expresses a comparative sentiment, while the sentence “Samsung Galaxy 4 is larger than iPhone 5” expresses no sentiment, at least not explicitly.
Before performing an action, we almost always have the intent to perform the action first. In many cases, we also talk about or write about our intents. Although the concept of intent has been investigated in philosophy and psychology, researchers in these fields are usually not concerned with the language used to state intent or how to infer intent from written language computationally, which is our objective in this chapter. Studying intent computationally is just beginning, and our understanding of the problem remains limited.
This is the second chapter about aspect-based sentiment analysis. In Section 2.1.1, we defined each opinion as a quintuple (e, a, s, h, t), where e is an entity and a is one of its aspects, s is the sentiment about the aspect a, h is the opinion holder, and t is the time when the opinion is expressed. Chapter 5 focused on aspect-based sentiment classification, which determines s. This chapter addresses extraction of entities and aspects about which sentiments or opinions have been expressed.
By now, it should be quite clear that words and phrases that convey positive or negative sentiment are instrumental for sentiment analysis. This chapter discusses how to compile such word lists. In the research literature, sentiment words are also called opinion words, polar words, or opinion-bearing words. Positive sentiment words such as beautiful, wonderful, and amazing are used to express some desired states or qualities, while negative sentiment words such as bad, awful, and poor are used to express some undesired states or qualities. In addition to individual words, there are sentiment phrases and idioms – for example, cost an arm and a leg. Collectively, they are called the sentiment lexicon (or opinion lexicon). From now on, when we say sentiment words, we mean both individual words and phrases.
Following the natural progression of chapters, this chapter should focus on expression-level (word or phrase) sentiment classification, as the last two chapters were about document-level and sentence-level classifications. However, we leave that topic to Chapter 7. In this and the next chapter, we focus on aspect-based sentiment analysis (or opinion mining) to deal with the full sentiment analysis problem as defined in Section 2.1 – that is, classifying sentiments and extracting sentiment or opinion targets (entities and aspects).