To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Statistical modelling and machine learning offer a vast toolbox of inference methods with which to model the world, discover patterns and reach beyond the data to make predictions when the truth is not certain. This concise book provides a clear introduction to those tools and to the core ideas – probabilistic model, likelihood, prior, posterior, overfitting, underfitting, cross-validation – that unify them. A mixture of toy and real examples illustrates diverse applications ranging from biomedical data to treasure hunts, while the accompanying datasets and computational notebooks in R and Python encourage hands-on learning. Instructors can benefit from online lecture slides and exercise solutions. Requiring only first-year university-level knowledge of calculus, probability and linear algebra, the book equips students in statistics, data science and machine learning, as well as those in quantitative applied and social science programmes, with the tools and conceptual foundations to explore more advanced techniques.
Understanding change over time is a critical component of social science. However, data measured over time – time series – requires their own set of statistical and inferential tools. In this book, Suzanna Linn, Matthew Lebo, and Clayton Webb explain the most commonly used time series models and demonstrate their applications using examples. The guide outlines the steps taken to identify a series, make determinations about exogeneity/endogeneity, and make appropriate modelling decisions and inferences. Detailing challenges and explanations of key techniques not covered in most time series textbooks, the authors show how navigating between data and models, deliberately and transparently, allows researchers to clearly explain their statistical analyses to a broad audience.
Bridging theory and practice in network data analysis, this guide offers an intuitive approach to understanding and analyzing complex networks. It covers foundational concepts, practical tools, and real-world applications using Python frameworks including NumPy, SciPy, scikit-learn, graspologic, and NetworkX. Readers will learn to apply network machine learning techniques to real-world problems, transform complex network structures into meaningful representations, leverage Python libraries for efficient network analysis, and interpret network data and results. The book explores methods for extracting valuable insights across various domains such as social networks, ecological systems, and brain connectivity. Hands-on tutorials and concrete examples develop intuition through visualization and mathematical reasoning. The book will equip data scientists, students, and researchers in applications using network data with the skills to confidently tackle network machine learning projects, providing a robust toolkit for data science applications involving network-structured data.
This self-contained guide introduces two pillars of data science, probability theory, and statistics, side by side, in order to illuminate the connections between statistical techniques and the probabilistic concepts they are based on. The topics covered in the book include random variables, nonparametric and parametric models, correlation, estimation of population parameters, hypothesis testing, principal component analysis, and both linear and nonlinear methods for regression and classification. Examples throughout the book draw from real-world datasets to demonstrate concepts in practice and confront readers with fundamental challenges in data science, such as overfitting, the curse of dimensionality, and causal inference. Code in Python reproducing these examples is available on the book's website, along with videos, slides, and solutions to exercises. This accessible book is ideal for undergraduate and graduate students, data science practitioners, and others interested in the theoretical concepts underlying data science methods.
Bringing together years of research into one useful resource, this text empowers the reader to creatively construct their own dependence models. Intended for senior undergraduate and postgraduate students, it takes a step-by-step look at the construction of specific dependence models, including exchangeable, Markov, moving average and, in general, spatio-temporal models. All constructions maintain a desired property of pre-specifying the marginal distribution and keeping it invariant. They do not separate the dependence from the marginals and the mechanisms followed to induce dependence are so general that they can be applied to a very large class of parametric distributions. All the constructions are based on appropriate definitions of three building blocks: prior distribution, likelihood function and posterior distribution, in a Bayesian analysis context. All results are illustrated with examples and graphical representations. Applications with data and code are interspersed throughout the book, covering fields including insurance and epidemiology.
This enthusiastic introduction to the fundamentals of information theory builds from classical Shannon theory through to modern applications in statistical learning, equipping students with a uniquely well-rounded and rigorous foundation for further study. Introduces core topics such as data compression, channel coding, and rate-distortion theory using a unique finite block-length approach. With over 210 end-of-part exercises and numerous examples, students are introduced to contemporary applications in statistics, machine learning and modern communication theory. This textbook presents information-theoretic methods with applications in statistical learning and computer science, such as f-divergences, PAC Bayes and variational principle, Kolmogorov's metric entropy, strong data processing inequalities, and entropic upper bounds for statistical estimation. Accompanied by a solutions manual for instructors, and additional standalone chapters on more specialized topics in information theory, this is the ideal introductory textbook for senior undergraduate and graduate students in electrical engineering, statistics, and computer science.
Brownian motion is an important topic in various applied fields where the analysis of random events is necessary. Introducing Brownian motion from a statistical viewpoint, this detailed text examines the distribution of quadratic plus linear or bilinear functionals of Brownian motion and demonstrates the utility of this approach for time series analysis. It also offers the first comprehensive guide on deriving the Fredholm determinant and the resolvent associated with such statistics. Presuming only a familiarity with standard statistical theory and the basics of stochastic processes, this book brings together a set of important statistical tools in one accessible resource for researchers and graduate students. Readers also benefit from online appendices, which provide probability density graphs and solutions to the chapter problems.
Introduction to Probability and Statistics for Data Science provides a solid course in the fundamental concepts, methods and theory of statistics for students in statistics, data science, biostatistics, engineering, and physical science programs. It teaches students to understand, use, and build on modern statistical techniques for complex problems. The authors develop the methods from both an intuitive and mathematical angle, illustrating with simple examples how and why the methods work. More complicated examples, many of which incorporate data and code in R, show how the method is used in practice. Through this guidance, students get the big picture about how statistics works and can be applied. This text covers more modern topics such as regression trees, large scale hypothesis testing, bootstrapping, MCMC, time series, and fewer theoretical topics like the Cramer-Rao lower bound and the Rao-Blackwell theorem. It features more than 250 high-quality figures, 180 of which involve actual data. Data and R are code available on our website so that students can reproduce the examples and do hands-on exercises.
Experiments have gained prominence in sociology in recent years. Increased interest in testing causal theories through experimental designs has ignited a debate about which experimental designs can facilitate scientific progress in sociology. This book discusses the implications of research interests for the design of experiments, identifies points of commonality and disagreement among the different perspectives within sociology, and elaborates on the rationales of each. It helps experimental sociologists find appropriate designs for answering specific research questions while alerting them to the challenges. Offering more than just a guide, this book explores both the historical roots of experimental sociology and the cutting-edge techniques of rigorous sociology. It concludes with a tantalizing peek into the future and provides a roadmap to the exciting prospects and uncharted territories of experimental sociology.
A vast literature exists on theories of public opinion - how to measure, analyze, predict, and influence it; however, there is no synthesis of best practices for interpreting public opinion: existing knowledge is disparate and spread across many disciplines. Polls, Pollsters, and Public Opinion presents a systematic analytical approach for understanding, predicting, and engaging public opinion. It tells the story through the eyes of the pollster and draws an analytical road map for examining public opinion, both conceptually and practically. Providing a theoretical and conceptual foundation, as well as debunking popular myths, this book delves into the science of polling, offering tools analysts can use to assess the quality of polls. It also introduces methods that can be used to predict elections and other socio-political outcomes while understanding the nuances of messaging, engaging, and moving public opinion.
This book provides statistics instructors and students with complete classroom material for a one- or two-semester course on applied regression and causal inference. It is built around 52 stories, 52 class-participation activities, 52 hands-on computer demonstrations, and 52 discussion problems that allow instructors and students to explore in a fun way the real-world complexity of the subject. The book fosters an engaging 'flipped classroom' environment with a focus on visualization and understanding. The book provides instructors with frameworks for self-study or for structuring the course, along with tips for maintaining student engagement at all levels, and practice exam questions to help guide learning. Designed to accompany the authors' previous textbook Regression and Other Stories, its modular nature and wealth of material allow this book to be adapted to different courses and texts or be used by learners as a hands-on workbook.
Survey research is in a state of crisis. People have become less willing to respond to polls and recent misses in critical elections have undermined the field's credibility. Pollsters have developed many tools for dealing with the new environment, an increasing number of which rely on risky opt-in samples. Virtually all of these tools require that respondents in each demographic category are a representative sample of all people in each demographic category, something that is unlikely to be reliably true. Polling at a Crossroads moves beyond such strong limitations, providing tools that work even when survey respondents are unrepresentative in complex ways. This book provides case studies that show how to avoid underestimating Trump support and how conventional polls exaggerate partisan differences. This book also helps us think in clear and sometimes counterintuitive ways and points toward simple, low-cost changes that can better address contemporary polling challenges.
An emerging field in statistics, distributional regression facilitates the modelling of the complete conditional distribution, rather than just the mean. This book introduces generalized additive models for location, scale and shape (GAMLSS) – one of the most important classes of distributional regression. Taking a broad perspective, the authors consider penalized likelihood inference, Bayesian inference, and boosting as potential ways of estimating models and illustrate their usage in complex applications. Written by the international team who developed GAMLSS, the text's focus on practical questions and problems sets it apart. Case studies demonstrate how researchers in statistics and other data-rich disciplines can use the model in their work, exploring examples ranging from fetal ultrasounds to social media performance metrics. The R code and data sets for the case studies are available on the book's companion website, allowing for replication and further study.
The radical interdependence between humans who live together makes virtually all human behavior conditional. The behavior of individuals is conditional upon the expectations of those around them, and those expectations are conditional upon the rules (institutions) and norms (culture) constructed to monitor, reward, and punish different behaviors. As a result, nearly all hypotheses about humans are conditional – conditional upon the resources they possess, the institutions they inhabit, or the cultural practices that tell them how to behave. Interaction Models provides a stand-alone, accessible overview of how interaction models, which are frequently used across the social and natural sciences, capture the intuition behind conditional claims and context dependence. It also addresses the simple specification and interpretation errors that are, unfortunately, commonplace. By providing a comprehensive and unified introduction to the use and critical evaluation of interaction models, this book shows how they can be used to test theoretically-derived claims of conditionality.
Learn by doing with this user-friendly introduction to time series data analysis in R. This book explores the intricacies of managing and cleaning time series data of different sizes, scales and granularity, data preparation for analysis and visualization, and different approaches to classical and machine learning time series modeling and forecasting. A range of pedagogical features support students, including end-of-chapter exercises, problems, quizzes and case studies. The case studies are designed to stretch the learner, introducing larger data sets, enhanced data management skills, and R packages and functions appropriate for real-world data analysis. On top of providing commented R programs and data sets, the book's companion website offers extra case studies, lecture slides, videos and exercise solutions. Accessible to those with a basic background in statistics and probability, this is an ideal hands-on text for undergraduate and graduate students, as well as researchers in data-rich disciplines
The 'data revolution' offers many new opportunities for research in the social sciences. Increasingly, social and political interactions can be recorded digitally, leading to vast amounts of new data available for research. This poses new challenges for organizing and processing research data. This comprehensive introduction covers the entire range of data management techniques, from flat files to database management systems. It demonstrates how established techniques and technologies from computer science can be applied in social science projects, drawing on a wide range of different applied examples. This book covers simple tools such as spreadsheets and file-based data storage and processing, as well as more powerful data management software like relational databases. It goes on to address advanced topics such as spatial data, text as data, and network data. This book is one of the first to discuss questions of practical data management specifically for social science projects. This title is also available as Open Access on Cambridge Core.
While the Poisson distribution is a classical statistical model for count data, the distributional model hinges on the constraining property that its mean equal its variance. This text instead introduces the Conway-Maxwell-Poisson distribution and motivates its use in developing flexible statistical methods based on its distributional form. This two-parameter model not only contains the Poisson distribution as a special case but, in its ability to account for data over- or under-dispersion, encompasses both the geometric and Bernoulli distributions. The resulting statistical methods serve in a multitude of ways, from an exploratory data analysis tool, to a flexible modeling impetus for varied statistical methods involving count data. The first comprehensive reference on the subject, this text contains numerous illustrative examples demonstrating R code and output. It is essential reading for academics in statistics and data science, as well as quantitative researchers and data analysts in economics, biostatistics and other applied disciplines.
During the past half-century, exponential families have attained a position at the center of parametric statistical inference. Theoretical advances have been matched, and more than matched, in the world of applications, where logistic regression by itself has become the go-to methodology in medical statistics, computer-based prediction algorithms, and the social sciences. This book is based on a one-semester graduate course for first year Ph.D. and advanced master's students. After presenting the basic structure of univariate and multivariate exponential families, their application to generalized linear models including logistic and Poisson regression is described in detail, emphasizing geometrical ideas, computational practice, and the analogy with ordinary linear regression. Connections are made with a variety of current statistical methodologies: missing data, survival analysis and proportional hazards, false discovery rates, bootstrapping, and empirical Bayes analysis. The book connects exponential family theory with its applications in a way that doesn't require advanced mathematical preparation.
This compact course is written for the mathematically literate reader who wants to learn to analyze data in a principled fashion. The language of mathematics enables clear exposition that can go quite deep, quite quickly, and naturally supports an axiomatic and inductive approach to data analysis. Starting with a good grounding in probability, the reader moves to statistical inference via topics of great practical importance – simulation and sampling, as well as experimental design and data collection – that are typically displaced from introductory accounts. The core of the book then covers both standard methods and such advanced topics as multiple testing, meta-analysis, and causal inference.