To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The glomerular filtration rate (GFR), estimated from serum creatinine (SCr), is widely used in clinical practice for kidney function assessment, but SCr-based equations are limited by non-GFR determinants and may introduce inaccuracies across racial groups. Few studies have evaluated whether advanced modeling techniques enhance their performance.
Methods:
Using multivariable fractional polynomials (MFP), generalized additive models (GAM), random forests (RF), and gradient boosted machines (GBM), we developed four SCr-based GFR-estimating equations in a pooled data set from four cohorts (n = 4665). Their performance was compared to that of the refitted linear regression-based 2021 CKD-EPI SCr equation using bias (median difference between measured GFR [mGFR] and estimated GFR [eGFR]), precision, and accuracy metrics (e.g., P10 and P30, percentage of eGFR within 10% and 30% of mGFR, respectively) in a pooled validation data set from three additional cohorts (n = 2215).
Results:
In the validation data set, the greatest bias and lowest accuracy, were observed in Black individuals for all equations across subgroups defined by race, sex, age, and eGFR. The MFP and GAM equations performed similarly to the refitted CKD-EPI SCr equation, with slight improvements in P10 and P30 in subgroups including Black individuals and females. The GBM and RF equations demonstrated smaller biases, but lower accuracy compared to other equations. Generally, differences among equations were modest overall and across subgroups.
Conclusions:
Our findings suggest that advanced methods provide limited improvement in SCr-based GFR estimation. Future research should focus on integrating novel biomarkers for GFR estimation and improving the feasibility of GFR measurement.
There is an odd contradiction about much of the empirical (experimental) literature: The data is analysed using statistical tools which presuppose that there is some noise or randomness in the data, but the source and possible nature of the noise are rarely explicitly discussed. This paper argues that the noise should be brought out into the open, and its nature and implications openly discussed. Whether the statistical analysis involves testing or estimation, the analysis inevitably is built upon some assumed stochastic structure to the noise. Different assumptions justify different analyses, which means that the appropriate type of analysis depends crucially on the stochastic nature of the noise. This paper explores such issues and argues that ignoring the noise can be dangerous.
This chapter reviews alternative methods proposed in the literature for estimating discrete-time stochastic volatility models and illustrates the details of their application. The methods reviewed are classified as either frequentist or Bayesian. The methods in the frequentist class include generalized method of moments, quasi-maximum likelihood, empirical characteristic function, efficient method of moments, and simulated maximum likelihood based on Laplace-based importance sampler. The Bayesian methods include single-move Markov chain Monte Carlo, multimove Markov chain Monte Carlo, and sequential Monte Carlo.
We propose Rényi information generating function (RIGF) and discuss its properties. A connection between the RIGF and the diversity index is proposed for discrete-type random variables. The relation between the RIGF and Shannon entropy of order q > 0 is established and several bounds are obtained. The RIGF of escort distribution is derived. Furthermore, we introduce the Rényi divergence information generating function (RDIGF) and discuss its effect under monotone transformations. We present nonparametric and parametric estimators of the RIGF. A simulation study is carried out and a real data relating to the failure times of electronic components is analyzed. A comparison study between the nonparametric and parametric estimators is made in terms of the standard deviation, absolute bias, and mean square error. We have observed superior performance for the newly proposed estimators. Some applications of the proposed RIGF and RDIGF are provided. For three coherent systems, we calculate the values of the RIGF and other well-established uncertainty measures, and similar behavior of the RIGF is observed. Further, a study regarding the usefulness of the RDIGF and RIGF as model selection criteria is conducted. Finally, three chaotic maps are considered and then used to establish a validation of the proposed information generating function.
Garbarino et al. (J Econ Sci Assoc. https://doi.org/10.1007/s40881-018-0055-4, 2018) describe a new method to calculate the probability distribution of the proportion of lies told in “coin flip” style experiments. I show that their estimates and confidence intervals are flawed. I demonstrate two better ways to estimate the probability distribution of what we really care about—the proportion of liars—and I provide R software to do this.
Fully Bayesian estimation of item response theory models with logistic link functions suffers from low computational efficiency due to posterior density functions that do not have known forms. To improve algorithmic computational efficiency, this paper proposes a Bayesian estimation method by adopting a new data-augmentation strategy in uni- and multidimensional IRT models. The strategy is based on the Pólya–Gamma family of distributions which provides a closed-form posterior distribution for logistic-based models. In this paper, an overview of Pólya–Gamma distributions is described within a logistic regression framework. In addition, we provide details about deriving conditional distributions of IRT, incorporating Pólya–Gamma distributions into the conditional distributions for Bayesian samplers’ construction, and random drawing from the samplers such that a faster convergence can be achieved. Simulation studies and applications to real datasets were conducted to demonstrate the efficiency and utility of the proposed method.
Quantitative psychology is concerned with the development and application of mathematical models in the behavioral sciences. Over time, models have become more complex, a consequence of the increasing complexity of research designs and experimental data, which is also a consequence of the utility of mathematical models in the science. As models have become more elaborate, the problems of estimating them have become increasingly challenging. This paper gives an introduction to a computing tool called automatic differentiation that is useful in calculating derivatives needed to estimate a model. As its name implies, automatic differentiation works in a routine way to produce derivatives accurately and quickly. Because so many features of model development require derivatives, the method has considerable potential in psychometric work. This paper reviews several examples to demonstrate how the methodology can be applied.
Methodological development of the model-implied instrumental variable (MIIV) estimation framework has proved fruitful over the last three decades. Major milestones include Bollen’s (Psychometrika 61(1):109–121, 1996) original development of the MIIV estimator and its robustness properties for continuous endogenous variable SEMs, the extension of the MIIV estimator to ordered categorical endogenous variables (Bollen and Maydeu-Olivares in Psychometrika 72(3):309, 2007), and the introduction of a generalized method of moments estimator (Bollen et al., in Psychometrika 79(1):20–50, 2014). This paper furthers these developments by making several unique contributions not present in the prior literature: (1) we use matrix calculus to derive the analytic derivatives of the PIV estimator, (2) we extend the PIV estimator to apply to any mixture of binary, ordinal, and continuous variables, (3) we generalize the PIV model to include intercepts and means, (4) we devise a method to input known threshold values for ordinal observed variables, and (5) we enable a general parameterization that permits the estimation of means, variances, and covariances of the underlying variables to use as input into a SEM analysis with PIV. An empirical example illustrates a mixture of continuous variables and ordinal variables with fixed thresholds. We also include a simulation study to compare the performance of this novel estimator to WLSMV.
Nonlinear random coefficient models (NRCMs) for continuous longitudinal data are often used for examining individual behaviors that display nonlinear patterns of development (or growth) over time in measured variables. As an extension of this model, this study considers the finite mixture of NRCMs that combine features of NRCMs with the idea of finite mixture (or latent class) models. The efficacy of this model is that it allows the integration of intrinsically nonlinear functions where the data come from a mixture of two or more unobserved subpopulations, thus allowing the simultaneous investigation of intra-individual (within-person) variability, inter-individual (between-person) variability, and subpopulation heterogeneity. Effectiveness of this model to work under real data analytic conditions was examined by executing a Monte Carlo simulation study. The simulation study was carried out using an R routine specifically developed for the purpose of this study. The R routine used maximum likelihood with the expectation–maximization algorithm. The design of the study mimicked the output obtained from running a two-class mixture model on task completion data.
This research concerns a mediation model, where the mediator model is linear and the outcome model is also linear but with a treatment–mediator interaction term and a residual correlated with the residual of the mediator model. Assuming the treatment is randomly assigned, parameters in this mediation model are shown to be partially identifiable. Under the normality assumption on the residual of the mediator and the residual of the outcome, explicit full-information maximum likelihood estimates of model parameters are introduced given the correlation between the residual for the mediator and the residual for the outcome. A consistent variance matrix of these estimates is derived. Currently, the coefficients of this mediation model are estimated using the iterative feasible generalized least squares (IFGLS) method that is originally developed for seemingly unrelated regressions (SURs). We argue that this mediation model is not a system of SURs. While the IFGLS estimates are consistent, their variance matrix is not. Theoretical comparisons of the FIMLE variance matrix and the IFGLS variance matrix are conducted. Our results are demonstrated by simulation studies and an empirical study. The FIMLE method has been implemented in a freely available R package iMediate.
An observer is to make inference statements about a quantity p, called a propensity and bounded between 0 and 1, based on the observation that p does or does not exceed a constant c. The propensity p may have an interpretation as a proportion, as a long-run relative frequency, or as a personal probability held by some subject. Applications in medicine, engineering, political science, and, most especially, human decision making are indicated. Bayes solutions for the observer are obtained based on prior distributions in the mixture of beta distribution family; these are then specialized to power-function prior distributions. Inference about log p and log odds is considered. Multiple-action problems are considered in which the focus of inference shifts to the process generating the propensities p, both in the case of a process parameter π known to the subject and unknown. Empirical Bayes techniques are developed for observer inference about c when π is known to the subject. A Bayes rule, a minimax rule and a beta-minimax rule are constructed for the subject when he is uncertain about π.
This chapter introduces communication and information theoretical aspects of molecular communication, relating molecular communication to existing techniques and results in communication systems. Communication models are discussed, as well as detection and estimation problems. The information theory of molecular communication is introduced, and calculation of the Shannon capacity is discussed.
Research on advice taking has demonstrated a phenomenon of egocentric discounting: people weight their own estimates more than advice from others. However, this research is mostly conducted in highly controlled lab settings with low or no stakes. We used unique data from a game show on Norwegian television to investigate advice taking in a high stakes and highly public setting. Parallel to the standard procedure in judge–advisor systems studies, contestants give numerical estimates for several tasks and solicit advice (another estimate) from three different sources during the game. The average weight of advice was 0.58, indicating that contestants weighted advice more than their own estimates. Of potential predictors of weight of advice, we did not detect associations with the use of intuition (e.g., gut feeling, guessing) and advice source (family, celebrities, average of viewers from hometown), but own estimation success (the proportion of previous rounds won) was associated with less weight of advice. Solicitation of advice was associated with higher stakes. Together with the relatively high weight on advice, this suggests that participants considered the advice valuable. On average, estimates did not improve much after advice taking, and the potential for improvement by averaging estimates and advice was negligible. We discuss different factors that could contribute to these findings, including stakes, solicited versus unsolicited advice, task difficulty, and high public scrutiny. The results suggest that highly controlled lab studies may not give an accurate representation of advice taking in high stakes and highly public settings.
This chapter elaborates on the calibration and validation procedures for the model. First, we describe our calibration strategy in which a customised optimisation algorithm makes use of a multi-objective function, preventing the loss of indicator-specific error information. Second, we externally validate our model by replicating two well-known statistical patterns: (1) the skewed distribution of budgetary changes and (2) the negative relationship between development and corruption. Third, we internally validate the model by showing that public servants who receive more positive spillovers tend to be less efficient. Fourth, we analyse the statistical behaviour of the model through different tests: validity of synthetic counterfactuals, parameter recovery, overfitting, and time equivalence. Finally, we make a brief reference to the literature on estimating SDG networks.
The polymer model provides a relatively simple and robust basis for estimating the standard Gibbs free energies of formation (ΔGfo) and standard enthalpies of formation (ΔHfo) of clay minerals and other aluminosilicates with an accuracy that is comparable to or better than can be obtained using alternative techniques. The model developed in the present study for zeolites entailed the selection of internally consistent standard thermodynamic properties for model components, calibration of adjustable model parameters using a linear regression technique constrained by ΔGfo and ΔHfo values retrieved from calorimetric, solubility, and phase-equilibrium experiments, and assessments of model accuracy based on comparisons of predicted values with experimental counterparts not included in the calibration dataset. The ΔGfo and ΔHfo predictions were found to average within ±0.2% and ±0.3%, respectively, of experimental values at 298.15 K and 1 bar. The latter result is comparable to the good accuracy that has been obtained by others using a more rigorous electronegativity-based model for ΔHfo that accounts explicitly for differences in zeolite structure based on differences in framework density and unit-cell volume. This observation is consistent with recent calorimetric studies indicating that enthalpies of transition from quartz to various pure-silica zeolite frameworks (zeosils) are small and only weakly dependent on framework type, and suggests that the effects on ΔHfo of differences in framework topology can be ignored for estimation purposes without incurring a significant loss of accuracy. The relative simplicity of the polymer model, together with its applicability to both zeolites and clay minerals, is based on a common set of experimentally determined and internally consistent thermodynamic properties for model components. These attributes are particularly well suited for studies of the effects of water-rock-barrier interactions on the long-term safety of geologic repositories for high-level nuclear waste (HLW).
For this book, we assume you’ve had an introductory statistics or experimental design class already! This chapter is a mini refresher of some critical concepts we’ll be using and lets you check you understand them correctly. The topics include understanding predictor and response variables, the common probability distributions that biologists encounter in their data, the common techniques, particularly ordinary least squares (OLS) and maximum likelihood (ML), for fitting models to data and estimating effects, including their uncertainty. You should be familiar with confidence intervals and understand what hypothesis tests and P-values do and don’t mean. You should recognize that we use data to decide, but these decisions can be wrong, so you need to understand the risk of missing important effects and the risk of falsely claiming an effect. Decisions about what constitutes an “important” effect are central.
In this chapter we introduce and apply hidden Markov models to model and analyze dynamical data. Hidden Markov models are one of simplest of dynamical models valid for systems evolving in a discrete state-space at discrete time points. We first describe the evaluation of the likelihood relevant to hidden Markov models and introduce the concept of filtering. We then describe how to obtain maximum likelihood estimators using expectation maximization. We then broaden our discussion to the Bayesian paradigm and introduce the Bayesian hidden Markov model. In this context, we describe the forward filtering backward sampling algorithm and Monte Carlo methods for sampling from hidden Markov model posteriors. As hidden Markov models are flexible modeling tools, we present a number of variants including the sticky hidden Markov model, the factorial hidden Markov model, and the infinite hidden Markov model. Finally, we conclude with a case study in fluorescence spectroscopy where we show how the basic filtering theory presented earlier may be extended to evaluate the likelihood of a second-order hidden Markov model.
A relatively novel approach of autonomous navigation employing platform dynamics as the primary process model raises new implementational challenges. These are related to: (i) potential numerical instabilities during longer flights; (ii) the quality of model self-calibration and its applicability to different flights; (iii) the establishment of a global estimation methodology when handling different initialisation flight phases; and (iv) the possibility of reducing computational load through model simplification. We propose a unified strategy for handling different flight phases with a combination of factorisation and a partial Schmidt–Kalman approach. We then investigate the stability of the in-air initialisation and the suitability of reusing pre-calibrated model parameters with their correlations. Without GNSS updates, we suggest setting a subset of the state vector as ‘considered’ states within the filter to remove their estimation from the remaining observations. We support all propositions with new empirical evidence: first in model-parameter self-calibration via optimal smoothing and second through applying our methods on three test flights with dissimilar durations and geometries. Our experiments demonstrate a significant improvement in autonomous navigation quality for twelve different scenarios.
This chapter discusses the key elements involved when building a study. Planning empirical studies presupposes a decision about whether the major goal of the study is confirmatory (i.e., tests of hypotheses) or exploratory in nature (i.e., development of hypotheses or estimation of effects). Focusing on confirmatory studies, we discuss problems involved in obtaining an appropriate sample, controlling internal and external validity when designing the study, and selecting statistical hypotheses that mirror the substantive hypotheses of interest. Building a study additionally involves decisions about the to-be-employed statistical test strategy, the sample size required by this strategy to render the study informative, and the most efficient way to achieve this so that study costs are minimized without compromising the validity of inferences. Finally, we point to the many advantages of study preregistration before data collection begins.
Cognitive diagnosis models originated in the field of educational measurement as a psychometric tool to provide finer-grained information more suitable for formative assessment. Typically,but not necessarily, these models classify examinees as masters or nonmasters on a set of binary attributes. This chapter aims to provide a general overview of the original models and the extensions, and methodological developments, that have been made in the last decade. The main topics covered in this chapter include model estimation, Q-matrix specification, model fit evaluation, and procedures for gathering validity and reliability evidences. The chapter ends with a discussion of future trends in the field.