Search

6 - Discrete and Continuous Variables
Carlos Fernandez-Granda, New York University
Book:

Probability and Statistics for Data Science

Published online:

19 June 2025

Print publication:

03 July 2025, pp 202-240
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter discusses how to build probabilistic models that include both discrete and continuous variables. Mathematically, this is achieved by defining them as random variables within the same probability space. In practice, the variables are manipulated using their marginal and conditional distributions. We define the conditional pmf of a discrete random variable given a continuous variable, and the conditional probability density of a continuous random variable given a discrete variable. We use these objects to build mixture models and apply them to model height in a population. Next, we describe Gaussian discriminant analysis, a classification method based on mixture models with Gaussian conditional distributions, and apply it to diagnose Alzheimer's disease. Then, we explain how to perform clustering using Gaussian mixture models and leverage the approach to cluster NBA players. Finally, we introduce the framework of Bayesian statistics which enables us to explicitly encode our uncertainty about model parameters, and use it to analyze poll data from the 2020 United States presidential election.

1 - Background
Paul Fearnhead, Lancaster University, Christopher Nemeth, Newcastle University, Chris J. Oates, University of Newcastle upon Tyne, Chris Sherlock, Lancaster University
Book:

Scalable Monte Carlo for Bayesian Learning

Published online:

16 May 2025

Print publication:

05 June 2025, pp 1-38
- Chapter
- - You have access
- PDF
- Export citation
Summary

This chapter provides a comprehensive overview of the foundational concepts essential for scalable Bayesian learning and Monte Carlo methods. It introduces Monte Carlo integration and its relevance to Bayesian statistics, focusing on techniques such as importance sampling and control variates. The chapter outlines key applications, including logistic regression, Bayesian matrix factorization, and Bayesian neural networks, which serve as illustrative examples throughout the book. It also offers a primer on Markov chains and stochastic differential equations, which are critical for understanding the advanced methods discussed in later chapters. Additionally, the chapter introduces kernel methods in preparation for their application in scalable Markov Chain Monte Carlo (MCMC) diagnostics.

Decline of the plastochrone interval in a Zostera marina meadow: 18 years of data
Elena Solana-Arellano, Olga Flores-Uzeta, Carlos Eduardo Cabrera-Ramos, J. Adán Avilés-Chávez
Journal:

Journal of the Marine Biological Association of the United Kingdom / Volume 105 / 2025

Published online by Cambridge University Press:

30 May 2025, e54
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Climate change profoundly affects plant phenology. An important parameter in research on plant dynamics is the plastochrone interval (PI), which is define as the time interval between the formation of successive leaves. The PI has been used to evaluate seagrass demography and as a direct measure of shoot growth and age. Variations in PI determine the growth rates, maintenance, and success of seagrass beds. Global warming could affect the PI dynamics of Zostera marina and, consequently, alter the dynamics of seagrass beds. Using Bayesian linear regression with a time series composed of 316 biweekly sampling dates from 1998 to 2018, we evaluated PI dynamics in the Punta Banda Estuary in Baja California, Mexico. We found that the tendency of the series was linear with parameter values of β0 = 1.65 (SD ±0.19) and β1 = −0.012 (SD ±0. 001). The Bayesian analysis of variance showed strong evidence of differences in the PI among years, given probabilities from 3.2 to 1.88 × 106 times higher of differences than no differences. The largest differences were detected between cold and hot years. The climatology of the time series PI values showed changes in seasonality over time. Summer and autumn were found to be the most perturbed seasons. Finally, by linking the PI estimates with the sea surface temperature anomalies for the complete series, a good inverse correspondence was observed between hot years and high PI, as well as cold years and low PI values, suggesting that climate change has affected PI among years and seasons.

Mediation Analysis in Bayesian Extended Redundancy Analysis with Mixed Outcome Variables
Ji Yeh Choi, Minjung Kyung, Ju-Hyun Park
Journal:

Psychometrika / Volume 90 / Issue 1 / March 2025

Published online by Cambridge University Press:

03 January 2025, pp. 251-279
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Extended redundancy analysis (ERA) is a statistical approach to component-based multivariate regression modeling that explores interrelationships among multiple sets of while incorporating regression with a data-reduction technique. The extant models that utilize ERA have assumed the outcome variables with the same data type. Also, ERA models focused on estimating direct pathways only without explicitly addressing mediation effects. In this paper, ERA is extended to handle multiple mediators and mixed types of outcome variables by adopting a Bayesian framework, taking into account correlation structure among all of the outcome variables. The proposed method develops an algorithm that derives the joint posterior distribution of parameters using a Markov chain Monte Carlo algorithm. Simulations and an empirical dataset are provided to illustrate the usefulness of the proposed method.

The Crosswise Model for Surveys on Sensitive Topics: A General Framework for Item Selection and Statistical Analysis
Marco Gregori, Martijn G. De Jong, Rik Pieters
Journal:

Psychometrika / Volume 89 / Issue 3 / September 2024

Published online by Cambridge University Press:

01 January 2025, pp. 1007-1033
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
When surveys contain direct questions about sensitive topics, participants may not provide their true answers. Indirect question techniques incentivize truthful answers by concealing participants’ responses in various ways. The Crosswise Model aims to do this by pairing a sensitive target item with a non-sensitive baseline item, and only asking participants to indicate whether their responses to the two items are the same or different. Selection of the baseline item is crucial to guarantee participants’ perceived and actual privacy and to enable reliable estimates of the sensitive trait. This research makes the following contributions. First, it describes an integrated methodology to select the baseline item, based on conceptual and statistical considerations. The resulting methodology distinguishes four statistical models. Second, it proposes novel Bayesian estimation methods to implement these models. Third, it shows that the new models introduced here improve efficiency over common applications of the Crosswise Model and may relax the required statistical assumptions. These three contributions facilitate applying the methodology in a variety of settings. An empirical application on attitudes toward LGBT issues shows the potential of the Crosswise Model. An interactive app, Python and MATLAB codes support broader adoption of the model.

Estimating Correlations with Missing Data, a Bayesian Approach
Alan L. Gross, Rocio Torres-Quevedo
Journal:

Psychometrika / Volume 60 / Issue 3 / September 1995

Published online by Cambridge University Press:

01 January 2025, pp. 341-354
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
The posterior distribution of the bivariate correlation is analytically derived given a data set where x is completely observed but y is missing at random for a portion of the sample. Interval estimates of the correlation are then constructed from the posterior distribution in terms of highest density regions (HDRs). Various choices for the form of the prior distribution are explored. For each of these priors, the resulting Bayesian HDRs are compared with each other and with intervals derived from maximum likelihood theory.

A Hierarchical Model for Accuracy and Choice on Standardized Tests
Steven Andrew Culpepper, James Joseph Balamuta
Journal:

Psychometrika / Volume 82 / Issue 3 / September 2017

Published online by Cambridge University Press:

01 January 2025, pp. 820-845
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper assesses the psychometric value of allowing test-takers choice in standardized testing. New theoretical results examine the conditions where allowing choice improves score precision. A hierarchical framework is presented for jointly modeling the accuracy of cognitive responses and item choices. The statistical methodology is disseminated in the ‘cIRT’ R package. An ‘answer two, choose one’ (A2C1) test administration design is introduced to avoid challenges associated with nonignorable missing data. Experimental results suggest that the A2C1 design and payout structure encouraged subjects to choose items consistent with their cognitive trait levels. Substantively, the experimental data suggest that item choices yielded comparable information and discrimination ability as cognitive items. Given there are no clear guidelines for writing more or less discriminating items, one practical implication is that choice can serve as a mechanism to improve score precision.

A Restricted Four-Parameter IRT Model: The Dyad Four-Parameter Normal Ogive (Dyad-4PNO) Model
Justin L. Kern, Steven Andrew Culpepper
Journal:

Psychometrika / Volume 85 / Issue 3 / September 2020

Published online by Cambridge University Press:

01 January 2025, pp. 575-599
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Recently, there has been a renewed interest in the four-parameter item response theory model as a way to capture guessing and slipping behaviors in responses. Research has shown, however, that the nested three-parameter model suffers from issues of unidentifiability (San Martín et al. in Psychometrika 80:450–467, 2015), which places concern on the identifiability of the four-parameter model. Borrowing from recent advances in the identification of cognitive diagnostic models, in particular, the DINA model (Gu and Xu in Stat Sin https://doi.org/10.5705/ss.202018.0420, 2019), a new model is proposed with restrictions inspired by this new literature to help with the identification issue. Specifically, we show conditions under which the four-parameter model is strictly and generically identified. These conditions inform the presentation of a new exploratory model, which we call the dyad four-parameter normal ogive (Dyad-4PNO) model. This model is developed by placing a hierarchical structure on the DINA model and imposing equality constraints on a priori unknown dyads of items. We present a Bayesian formulation of this model, and show that model parameters can be accurately recovered. Finally, we apply the model to a real dataset.

On the Use of Heterogeneous Thresholds Ordinal Regression Models to Account for Individual Differences in Response Style
Timothy R. Johnson
Journal:

Psychometrika / Volume 68 / Issue 4 / December 2003

Published online by Cambridge University Press:

01 January 2025, pp. 563-583
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper proposes a general approach to accounting for individual differences in the extreme response style in statistical models for ordered response categories. This approach uses a hierarchical ordinal regression modeling framework with heterogeneous thresholds structures to account for individual differences in the response style. Markov chain Monte Carlo algorithms for Bayesian inference for models with heterogeneous thresholds structures are discussed in detail. A simulation and two examples based on ordinal probit models are given to illustrate the proposed methodology. The simulation and examples also demonstrate that failing to account for individual differences in the extreme response style can have adverse consequences for statistical inferences.

Bayesian Estimation of the DINA Q matrix
Yinghan Chen, Steven Andrew Culpepper, Yuguo Chen, Jeffrey Douglas
Journal:

Psychometrika / Volume 83 / Issue 1 / March 2018

Published online by Cambridge University Press:

01 January 2025, pp. 89-108
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Cognitive diagnosis models are partially ordered latent class models and are used to classify students into skill mastery profiles. The deterministic inputs, noisy “and” gate model (DINA) is a popular psychometric model for cognitive diagnosis. Application of the DINA model requires content expert knowledge of a Q matrix, which maps the attributes or skills needed to master a collection of items. Misspecification of Q has been shown to yield biased diagnostic classifications. We propose a Bayesian framework for estimating the DINA Q matrix. The developed algorithm builds upon prior research (Chen, Liu, Xu, & Ying, in J Am Stat Assoc 110(510):850–866, 2015) and ensures the estimated Q matrix is identified. Monte Carlo evidence is presented to support the accuracy of parameter recovery. The developed methodology is applied to Tatsuoka’s fraction-subtraction dataset.

Bayesian Item Selection Criteria for Adaptive Testing
Wim J. van der Linden
Journal:

Psychometrika / Volume 63 / Issue 2 / June 1998

Published online by Cambridge University Press:

01 January 2025, pp. 201-216
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Owen (1975) proposed an approximate empirical Bayes procedure for item selection in computerized adaptive testing (CAT). The procedure replaces the true posterior by a normal approximation with closed-form expressions for its first two moments. This approximation was necessary to minimize the computational complexity involved in a fully Bayesian approach but is no longer necessary given the computational power currently available for adaptive testing. This paper suggests several item selection criteria for adaptive testing which are all based on the use of the true posterior. Some of the statistical properties of the ability estimator produced by these criteria are discussed and empirically characterized.

Hierarchical Bayesian Modeling for Test Theory Without an Answer Key
Zita Oravecz, Royce Anders, William H. Batchelder
Journal:

Psychometrika / Volume 80 / Issue 2 / June 2015

Published online by Cambridge University Press:

01 January 2025, pp. 341-364
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Cultural Consensus Theory (CCT) models have been applied extensively across research domains in the social and behavioral sciences in order to explore shared knowledge and beliefs. CCT models operate on response data, in which the answer key is latent. The current paper develops methods to enhance the application of these models by developing the appropriate specifications for hierarchical Bayesian inference. A primary contribution is the methodology for integrating the use of covariates into CCT models. More specifically, both person- and item-related parameters are introduced as random effects that can respectively account for patterns of inter-individual and inter-item variability.

Machine Learning for Archaeological Applications in R

Denisse L. Argote, Pedro A. López-García, Manuel A. Torres-García, Michael C. Thrun
Published online:

10 December 2024

Print publication:

16 January 2025
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This Element highlights the employment within archaeology of classification methods developed in the field of chemometrics, artificial intelligence, and Bayesian statistics. These run in both high- and low-dimensional environments and often have better results than traditional methods. Instead of a theoretical approach, it provides examples of how to apply these methods to real data using lithic and ceramic archaeological materials as case studies. A detailed explanation of how to process data in R (The R Project for Statistical Computing), as well as the respective code, are also provided in this Element.

Data-driven optimization of a gas turbine combustor: A Bayesian approach addressing NOx emissions, lean extinction limits, and thermoacoustic stability
Johann Moritz Reumschüssel, Jakob G. R. von Saldern, Bernhard Ćosić, Christian Oliver Paschereit
Journal:

Data-Centric Engineering / Volume 5 / 2024

Published online by Cambridge University Press:

18 November 2024, e32
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The design of gas turbine combustors for optimal operation at different power ratings is a multifaceted engineering task, as it requires the consideration of several objectives that must be evaluated under different test conditions. We address this challenge by presenting a data-driven approach that uses multiple probabilistic surrogate models derived from Gaussian process regression to automatically select optimal combustor designs from a large parameter space, requiring only a few experimental data points. We present two strategies for surrogate model training that differ in terms of required experimental and computational efforts. Depending on the measurement time and cost for a target, one of the strategies may be preferred. We apply the methodology to train three surrogate models under operating conditions where the corresponding design objectives are critical: reduction of NOx emissions, prevention of lean flame extinction, and mitigation of thermoacoustic oscillations. Once trained, the models can be flexibly used for different forms of a posteriori design optimization, as we demonstrate in this study.

Bayesian Social Science Statistics

From the Very Beginning
Jeff Gill, Le Bao
Published online:

24 October 2024

Print publication:

24 October 2024
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In this Element, the authors introduce Bayesian probability and inference for social science students and practitioners starting from the absolute beginning and walk readers steadily through the Element. No previous knowledge is required other than that in a basic statistics course. At the end of the process, readers will understand the core tenets of Bayesian theory and practice in a way that enables them to specify, implement, and understand models using practical social science data. Chapters will cover theoretical principles and real-world applications that provide motivation and intuition. Because Bayesian methods are intricately tied to software, code in both R and Python is provided throughout.

Knowledge Discovery from Archaeological Materials

Pedro A. López García, Denisse L. Argote, Manuel A. Torres-García, Michael C. Thrun
Published online:

20 September 2024

Print publication:

17 October 2024
- Element
- - Get access
    
    Buy the print Element
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This Element highlights the employment within archaeology of classification methods developed in the field of chemometrics, artificial intelligence, and Bayesian statistics. These operate in both high- and low-dimensional environments and often have better results than traditional methods. The basic principles and main methods are introduced with recommendations for when to use them.

Calculation of astrophysical reaction rate and uncertainty for T(d,n)4He using Bayesian statistical approach
Seyyed Soheil Esmaeili, Abbas Ghasemizad, Omid Naserghodsi
Journal:

Publications of the Astronomical Society of Australia / Volume 41 / 2024

Published online by Cambridge University Press:

13 March 2024, e019
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
One of the best methods to investigate and calculate a desired quantity using available limited data is the Bayesian statistical method, which has been recently entered the field of nuclear astrophysics and can be used to evaluate the astrophysical S-factors, the cross sections and, as a result, the nuclear reaction rates of Big Bang Nucleosynthesis. This study tries to calculate the astrophysical S-factor and the rate of reaction T(d,n)4He as an important astrophysical reaction with the help of this method in energies lower that electron repulsive barrier, and for this purpose, it uses the R-Software, which leads to improved results in comparison with the non-Bayesian methods for the mentioned reaction rate.

BAYESIAN SPATIOTEMPORAL MODELS FOR INFECTIOUS DISEASES IN RELATION TO HEALTH INEQUALITY
Part of
- Parametric inference
- Applications
DINAH JANE LOPE
Journal:

Bulletin of the Australian Mathematical Society / Volume 108 / Issue 3 / December 2023

Published online by Cambridge University Press:

31 August 2023, pp. 516-517

Print publication:

December 2023
- Article
- - You have access
- PDF
- HTML
- Export citation

On a wider class of prior distributions for graphical models
Part of
- Graph theory
Abhinav Natarajan, Willem van den Boom, Kristoforus Bryant Odang, Maria de Iorio
Journal:

Journal of Applied Probability / Volume 61 / Issue 1 / March 2024

Published online by Cambridge University Press:

08 June 2023, pp. 230-243

Print publication:

March 2024
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Gaussian graphical models are useful tools for conditional independence structure inference of multivariate random variables. Unfortunately, Bayesian inference of latent graph structures is challenging due to exponential growth of $\mathcal{G}_n$, the set of all graphs in n vertices. One approach that has been proposed to tackle this problem is to limit search to subsets of $\mathcal{G}_n$. In this paper we study subsets that are vector subspaces with the cycle space $\mathcal{C}_n$ as the main example. We propose a novel prior on $\mathcal{C}_n$ based on linear combinations of cycle basis elements and present its theoretical properties. Using this prior, we implement a Markov chain Monte Carlo algorithm, and show that (i) posterior edge inclusion estimates computed with our technique are comparable to estimates from the standard technique despite searching a smaller graph space, and (ii) the vector space perspective enables straightforward implementation of MCMC algorithms.

Exact convergence analysis for metropolis–hastings independence samplers in Wasserstein distances
Part of
- Markov processes
Austin Brown, Galin L. Jones
Journal:

Journal of Applied Probability / Volume 61 / Issue 1 / March 2024

Published online by Cambridge University Press:

05 June 2023, pp. 33-54

Print publication:

March 2024
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Under mild assumptions, we show that the exact convergence rate in total variation is also exact in weaker Wasserstein distances for the Metropolis–Hastings independence sampler. We develop a new upper and lower bound on the worst-case Wasserstein distance when initialized from points. For an arbitrary point initialization, we show that the convergence rate is the same and matches the convergence rate in total variation. We derive exact convergence expressions for more general Wasserstein distances when initialization is at a specific point. Using optimization, we construct a novel centered independent proposal to develop exact convergence rates in Bayesian quantile regression and many generalized linear model settings. We show that the exact convergence rate can be upper bounded in Bayesian binary response regression (e.g. logistic and probit) when the sample size and dimension grow together.

Search Results

Refine search

Refine search

Actions for selected content:

64 results

6 - Discrete and Continuous Variables

Summary

1 - Background

Summary

Decline of the plastochrone interval in a Zostera marina meadow: 18 years of data

Mediation Analysis in Bayesian Extended Redundancy Analysis with Mixed Outcome Variables

The Crosswise Model for Surveys on Sensitive Topics: A General Framework for Item Selection and Statistical Analysis

Estimating Correlations with Missing Data, a Bayesian Approach

A Hierarchical Model for Accuracy and Choice on Standardized Tests

A Restricted Four-Parameter IRT Model: The Dyad Four-Parameter Normal Ogive (Dyad-4PNO) Model

On the Use of Heterogeneous Thresholds Ordinal Regression Models to Account for Individual Differences in Response Style

Bayesian Estimation of the DINA Q matrix

Bayesian Item Selection Criteria for Adaptive Testing

Hierarchical Bayesian Modeling for Test Theory Without an Answer Key

Machine Learning for Archaeological Applications in R

Data-driven optimization of a gas turbine combustor: A Bayesian approach addressing NOx emissions, lean extinction limits, and thermoacoustic stability

Bayesian Social Science Statistics

Knowledge Discovery from Archaeological Materials

Calculation of astrophysical reaction rate and uncertainty for T(d,n)4He using Bayesian statistical approach

BAYESIAN SPATIOTEMPORAL MODELS FOR INFECTIOUS DISEASES IN RELATION TO HEALTH INEQUALITY

On a wider class of prior distributions for graphical models

Exact convergence analysis for metropolis–hastings independence samplers in Wasserstein distances

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

64 results

Summary

Summary

Machine Learning for Archaeological Applications in R

Bayesian Social Science Statistics

Knowledge Discovery from Archaeological Materials