To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Students will develop a practical understanding of data science with this hands-on textbook for introductory courses. This new edition is fully revised and updated, with numerous exercises and examples in the popular data science tool Python, a new chapter on using Python for statistical analysis, and a new chapter that demonstrates how to use Python within a range of cloud platforms. The many practice examples, drawn from real-life applications, range from small to big data and come to life in a new end-to-end project in Chapter 11. New 'Data Science in Practice' boxes highlight how concepts introduced work within an industry context and many chapters include new sections on AI and Generative AI. A suite of online material for instructors provides a strong supplement to the book, including lecture slides, solutions, additional assessment material and curriculum suggestions. Datasets and code are available for students online. This entry-level textbook is ideal for readers from a range of disciplines wishing to build a practical, working knowledge of data science.
Students will develop a practical understanding of data science with this hands-on textbook for introductory courses. This new edition is fully revised and updated, with numerous exercises and examples in the popular data science tool R, a new chapter on using R for statistical analysis, and a new chapter that demonstrates how to use R within a range of cloud platforms. The many practice examples, drawn from real-life applications, range from small to big data and come to life in a new end-to-end project in Chapter 11. New 'Data Science in Practice' boxes highlight how concepts introduced work within an industry context and many chapters include new sections on AI and Generative AI. A suite of online material for instructors provides a strong supplement to the book, including lecture slides, solutions, additional assessment material and curriculum suggestions. Datasets and code are available for students online. This entry-level textbook is ideal for readers from a range of disciplines wishing to build a practical, working knowledge of data science.
Quantifying differences between flow fields is a key challenge in fluid mechanics, particularly when evaluating the effectiveness of flow control or other problem parameters. Traditional vector metrics, such as the Euclidean distance, provide straightforward pointwise comparisons but can fail to distinguish distributional changes in flow fields. To address this limitation, we employ optimal transport (OT) theory, which is a mathematical framework built on probability and measure theory. By aligning Euclidean distances between flow fields in a latent space learned by an autoencoder with the corresponding OT geodesics, we seek to learn low-dimensional representations of flow fields that are interpretable from the perspective of unbalanced OT. As a demonstration, we utilise this OT-based analysis on separated flows past a NACA 0012 airfoil with periodic heat flux actuation near the leading edge. The cases considered are at a chord-based Reynolds number of 23 000 and a free-stream Mach number of 0.3 for two angles of attack (AoA) of $6^\circ$ and $9^\circ$. For each angle of attack, we identify a two-dimensional embedding that succinctly captures the different effective regimes of flow responses and control performance, characterised by the degree of suppression of the separation bubble and secondary effects from laminarisation and trailing-edge separation. The interpretation of the latent representation was found to be consistent across the two AoA, suggesting that the OT-based latent encoding was capable of extracting physical relationships that are common across the different suites of cases. This study demonstrates the potential utility of optimal transport in the analysis and interpretation of complex flow fields.
This introductory article to Democratic Theory's special issue on the marginalized democracies of the world begins by presenting the lexical method for understanding democracy. It is argued that the lexical method is better than the normative and analytical methods at finding democracies in the world. The argument then turns to demonstrating, mainly through computational research conducted within the Google Books catalog, that an empirically demonstrable imbalance exists between the democracies mentioned in the literature. The remainder of the argument is given to explaining the value of working to correct this imbalance, which comes in at least three guises: (1) studying marginalized democracies can increase our options for alternative democratic actions and democratic innovations; (2) it leads to a conservation and public outreach project, which is epitomized in an “encyclopedia of the democracies”; and (3) it advocates for a decolonization of democracies’ definitions and practices and decentering academic democratic theory.
Cross-border philanthropy occurs across multiple dimensions simultaneously. Seemingly domestic actors become players in international spheres, shattering the idea of a domestic/international dichotomy with clear lines delineating these spaces. This line blurring obscures monetary flows and highlights questions regarding nonprofit accountability in a transnational context. We present a study tracking money from US INGOs to Israeli NGOs, demonstrating the advantages and challenges to a big data approach and highlighting the importance of local partners.
Use of big data in the nonprofit sector is on the rise as a part of a trend toward “data-driven” management. While big data has its critics, few have addressed fundamental ontological and epistemological issues big data presents for the nonprofit sector. In this article, we address some of these issues including most prominently the notion that big data are value neutral and divorced from context. Drawing on data feminism, an intersectional feminist framework focusing on critically interrogating our experience with data and data-driven technologies, we examine the power differentials inherent in the construction of big data and challenge the claims, priorities, and inequities it produces specifically for nonprofit work. We conclude the article with a call for nonprofit scholars and practitioners to employ a data feminist framework to harness the power of big (and small) data for justice, equity, and co-liberation through nonprofit work.
In 2010 Milja Kurki explained that although scholars recognize that democracy is described in a variety of ways, they do not typically engage with its many and diverse descriptions. My aim in this agenda-setting research note is to tackle this quandary by first providing a minimum empirical account of democracy’s descriptions (i.e., a catalogue of 2,234 adjectives that have been used to describe democracy) and secondly by suggesting what democracy studies may gain by compiling this information. I argue that the catalogue of descriptors be applied in four ways: (1) drilling down into the meaning of each description, (2) making taxonomies, (3) rethinking the phenomenology of democracy, and (4) visualizing democracy’s big data. Each of the four applications and their significance is explained in turn. This research note ends by looking back on the catalogue and its four applications.
Over the last decade, the field of political science has been exposed to two concomitant developments: a surge of Big Data (BD) and a growing demand for transparency. To date, however, we do not know the extent to which these two developments are compatible with one another. The purpose of this article is to assess, empirically, the extent to which BD political science (broadly defined) adheres to established norms of transparency in the discipline. To address this question, we develop an original dataset of 1555 articles drawn from the Web of Science database covering the period 2008–2019. In doing so, we also provide an assessment of the current level of transparency in empirical political science and quantitative political science in general. We find that articles using Big Data are significantly less likely than other, more traditional works of political science, to share replication files. Our study also illustrates some of the promises and challenges associated with extracting data from Web of Science and similar databases.
We extracted around two million vowel tokens from a sample of sixty-four speakers (b. 1886–1965; 35M/29F; 16 African Americans/48 non-African Americans) across eight states in the American South in an NSF-funded project. We have validated automatic measurements with manual inspection of alignment samples and find that 87 percent of alignments are successful and another 6 percent are partially successful. This large body of tokens (big data) complements existing sociophonetic research by providing a more thorough, detailed picture of the phonetics of American English. We find that (1) there is a much wider range of realization for vowels than is typically represented, and (2) there is no central tendency for any vowel. Using spatial methods drawn from technical geography, we find that all distributions of tokens in vowel space are nonlinear. This suggests that traditional reliance on finding average acoustic properties of a vowel may be unrepresentative of what most speakers actually do. (3) Distributional patterns for vowels are fractal. When we break up the overall dataset into subgroups (e.g., male/female), the same nonlinear distributional pattern appears but with varying locations of highest density of tokens. These findings complement existing sociophonetic research and demonstrate methods by which variation can both be represented and analyzed.
In Chilling Effects, Jonathon W. Penney explores the increasing weaponization of surveillance, censorship, and new technology to repress and control us. With corporations, governments, and extremist actors using big data, cyber-mobs, AI, and other threats to limit our rights and freedoms, concerns about chilling effects – or how these activities deter us from exercising our rights – have become urgent. Penney draws on law, privacy, and social science to present a new conformity theory that highlights the dangers of chilling effects and their potential to erode democracy and enable a more illiberal future. He critiques conventional theories and provides a framework for predicting, explaining, and evaluating chilling effects in a range of contexts. Urgent and timely, Chilling Effects sheds light on the repressive and conforming effects of technology, state, and corporate power, and offers a roadmap of how to respond to their weaponization today and in the future.
This chapter describes the excitement surrounding scientific progress as a driver of medical progress in the Cold War and subsequent theoretical and practical challenges. Medicine, for skeptical theories, was a powerful example that there is no such thing as knowledge that continually approaches the truth, that even the body is historical, and that knowledge is always a tool of the powerful. From the medical side, some respondents were adamant that scientific knowledge about the body is “real” and that medicine is uniquely immune to uncertainties inherent in relativistic accounts of knowledge. The chapter concludes by analyzing two recent examples, evidence-based medicine and health artificial intelligence, which have been praised as objective examples of a particular kind of medical knowledge progress. Throughout, I show the implications for medical progress of larger debates about the progress of knowledge, as well as how an excessive focus on biomedical knowledge gains neglects other, important dimensions of progress.
The digital transformation of Chinese companies offers a new frontier for organizational research. Widespread use of workplace platforms creates rich archives of unobtrusive data, providing continuous, real-time insights into organizational life that traditional surveys cannot capture. The central challenge for scholars is turning this data abundance into meaningful theory. This special issue highlights three studies that meet this challenge by using innovative methods to convert granular data into valuable knowledge. The papers employ digital-context experiments, real-time behavioral tracking, and machine-learning-assisted theory building to study phenomena from interpersonal dynamics to crisis productivity. Looking ahead, we explore the potential of unstructured multimodal data and new AI tools to make complex analysis more accessible. We conclude with a research agenda calling for methodological rigor, interdisciplinary collaboration, and a firm balance between technological innovation and theoretical depth.
This chapter assesses the potential of technological tools to ensure voluntary compliance without coercion and improve the predictability of trustworthiness, focusing on the ethical challenges such differentiation might create.
Cutting-edge computational tools like artificial intelligence, data scraping, and online experiments are leading to new discoveries about the human mind. However, these new methods can be intimidating. This textbook demonstrates how Big Data is transforming the field of psychology, in an approachable and engaging way that is geared toward undergraduate students without any computational training. Each chapter covers a hot topic, such as social networks, smart devices, mobile apps, and computational linguistics. Students are introduced to the types of Big Data one can collect, the methods for analyzing such data, and the psychological theories we can address. Each chapter also includes discussion of real-world applications and ethical issues. Supplementary resources include an instructor manual with assignment questions and sample answers, figures and tables, and varied resources for students such as interactive class exercises, experiment demos, articles, and tools.
This chapter first describes how we measure data, and how its creation has skyrocketed in recent years. We then define Big Data and psychology for the purposes of the book, and motivate why their intersection is important to study. The chapter ends with a guide to how to use the book, and brief summaries of the upcoming chapters.
This textbook reflects the changing landscape of water management by combining the fields of satellite remote sensing and water management. Divided into three major sections, it begins by discussing the information that satellite remote sensing can provide about water, and then moves on to examine how it can address real-world management challenges, focusing on precipitation, surface water, irrigation management, reservoir monitoring, and water temperature tracking. The final part analyses governance and social issues that have recently been given more attention as the world reckons with social justice and equity aspects of engineering solutions. This book uses case studies from around the globe to demonstrate how satellite remote sensing can improve traditional water practices and includes end-of-chapter exercises to facilitate student learning. It is intended for advanced undergraduate and graduate students in water resource management, and as reference textbook for researchers and professionals.
Critics from across the political spectrum attack social media platforms for invading personal privacy. Social media firms famously suck in huge amounts of information about individuals who use their services (and sometimes others as well), and then monetize this data, primarily by selling targeted advertising. Many privacy advocates object to the very collection and use of this personal data by platforms, even if not shared with third parties. In addition, there is the ongoing (and reasonable) concern that the very existence of Big Data creates a risk of leaks. Further, aside from the problem of Big Data, the very existence of social media enables private individuals to invade the privacy of others by widely disseminating personal information. That social media firms’ business practices compromise privacy cannot be seriously doubted. But it is also true that Big Data lies at the heart of social media firms’ business models, permitting them to provide users with free services in exchange for data which they can monetize via targeted advertising. So unless regulators want to take free services away, they must tread cautiously in regulating privacy.
The area where social media has undoubtedly been most actively regulated is in their data and privacy practices. While no serious critic has proposed a flat ban on data collection and use (since that would destroy the algorithms that drive social media), a number of important jurisdictions including the European Union and California have imposed important restrictions on how websites (including social media) collect, process, and disclose data. Some privacy regulations are clearly justified, but insofar as data privacy laws become so strict as to threaten advertising-driven business models, the result will be that social media (and search and many other basic internet features) will stop being free, to the detriment of most users. In addition, privacy laws (and related rules such as the “right to be forgotten”) by definition restrict the flow of information, and so burden free expression. Sometimes that burden is justified, but especially when applied to information about public figures, suppressing unfavorable information undermines democracy. The chapter concludes by arguing that one area where stricter regulation is needed is protecting children’s data.
Physiologic data streaming and aggregation platforms such as Sickbay® and Etiometry are becoming increasingly used in the paediatric acute care setting. As these platforms gain popularity in clinical settings, there has been a parallel growth in scholarly interest. The primary aim of this study is to characterise research productivity utilising high-fidelity physiologic streaming data with Sickbay® or Etiometry in the acute care paediatric setting.
Methods:
A systematic review of the literature was conducted to identify paediatric publications using data from Sickbay® or Etiometry. The resulting publications were reviewed to characterise them and identify trends in these publications.
Results:
A total of 41 papers have been published over 9 years using either platform. This involved 179 authors across 21 institutions. Most studies utilised Sickbay®, involved cardiac patients, were single-centre, and did not utilise machine learning or artificial intelligence methods. The number of publications has been significantly increasing over the past 9 years, and the average number of citations for each publication was 7.9.
Conclusion:
A total of 41 papers have been published over 9 years using Sickbay® or Etiometry data in the paediatric setting. Although the majority of these are single-centre and pertain to cardiac patients, growth in publication volume suggests growing utilisation of high-fidelity physiologic data beyond clinical applications. Multicentre efforts may help increase the number of centres that can do such work and help drive improvements in clinical care.
In the previous chapters, we built the basic foundation of satellite remote sensing. In this chapter we will explore a relatively recent innovation in information technology called cloud computing that has dramatically improved data accessibility and the practicality of applying large satellite remote sensing datasets for water management. Future chapters on specific targets and water management themes will have hands-on examples and assignments based on actual satellite data. Most of these chapters will assume prior knowledge of cloud computing for understanding and completing assignments. Since cloud computing is gradually proliferating in all walks of water management practice, the aim of this chapter is to introduce readers to cloud computing concepts and specific tools currently available for dealing with the very large satellite data sets on water.