We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Confidence intervals are ubiquitous in the presentation of social science models, data, and effects. When several intervals are plotted together, one natural inclination is to ask whether the estimates represented by those intervals are significantly different from each other. Unfortunately, there is no general rule or procedure that would allow us to answer this question from the confidence intervals alone. It is well known that using the overlaps in 95% confidence intervals to perform significance tests at the 0.05 level does not work. Recent scholarship has developed and refined a set of tools for inferential confidence intervals that permit inference on confidence intervals with the appropriate type I error rate in many different bivariate contexts. These are all based on the same underlying idea of identifying the multiple of the standard error (i.e., a new confidence level) such that the overlap in confidence intervals matches the desired type I error rate. These procedures remain stymied by multiple simultaneous comparisons. We propose an entirely new procedure for developing inferential confidence intervals that decouples the testing and visualization that can overcome many of these problems in any visual testing scenario. We provide software in R and Stata to accomplish this goal.
Large sample theory states the asymptotic normality of the maximum likelihood estimator of the person parameter in the two parameter logistic (2PL) model. In short tests, however, the assumption of normality can be grossly wrong. As a consequence, intended coverage rates may be exceeded and confidence intervals are revealed to be overly conservative. Methods belonging to the higher-order-theory, more specifically saddlepoint approximations, are a convenient way to deal with small-sample problems. Confidence bounds obtained by these means hold the approximate confidence level for a broad range of the person parameter. Moreover, an approximation to the exact distribution permits to compute median unbiased estimates (MUE) that are as likely to overestimate as to underestimate the true person parameter. Additionally, in small samples, these MUE are less mean-biased than the often-used maximum likelihood estimator.
In the context of conditional maximum likelihood (CML) estimation, confidence intervals can be interpreted in three different ways, depending on the sampling distribution under which these confidence intervals contain the true parameter value with a certain probability. These sampling distributions are (a) the distribution of the data given the incidental parameters, (b) the marginal distribution of the data (i.e., with the incidental parameters integrated out), and (c) the conditional distribution of the data given the sufficient statistics for the incidental parameters. Results on the asymptotic distribution of CML estimates under sampling scheme (c) can be used to construct asymptotic confidence intervals using only the CML estimates. This is not possible for the results on the asymptotic distribution under sampling schemes (a) and (b). However, it is shown that the conditional asymptotic confidence intervals are also valid under the other two sampling schemes.
The common way to calculate confidence intervals for item response theory models is to assume that the standardized maximum likelihood estimator for the person parameter θ is normally distributed. However, this approximation is often inadequate for short and medium test lengths. As a result, the coverage probabilities fall below the given level of significance in many cases; and, therefore, the corresponding intervals are no longer confidence intervals in terms of the actual definition. In the present work, confidence intervals are defined more precisely by utilizing the relationship between confidence intervals and hypothesis testing. Two approaches to confidence interval construction are explored that are optimal with respect to criteria of smallness and consistency with the standard approach.
Yuan and Chan (Psychometrika, 76, 670–690, 2011) recently showed how to compute the covariance matrix of standardized regression coefficients from covariances. In this paper, we describe a method for computing this covariance matrix from correlations. Next, we describe an asymptotic distribution-free (ADF; Browne in British Journal of Mathematical and Statistical Psychology, 37, 62–83, 1984) method for computing the covariance matrix of standardized regression coefficients. We show that the ADF method works well with nonnormal data in moderate-to-large samples using both simulated and real-data examples. R code (R Development Core Team, 2012) is available from the authors or through the Psychometrika online repository for supplementary materials.
The use of U-statistics based on rank correlation coefficients in estimating the strength of concordance among a group of rankers is examined for cases where the null hypothesis of random rankings is not tenable. The studentized U-statistics is asymptotically distribution-free, and the Student-t approximation is used for small and moderate sized samples. An approximate confidence interval is constructed for the strength of concordance. Monte Carlo results indicate that the Student-t approximation can be improved by estimating the degrees of freedom.
When the raters participating in a reliability study are a random sample from a larger population of raters, inferences about the intraclass correlation coefficient must be based on the three mean squares from the analysis of variance table summarizing the results: between subjects, between raters, and error. An approximate confidence interval for the parameter is presented as a function of these three mean squares.
Structural equation models (SEM) are widely used for modeling complex multivariate relationships among measured and latent variables. Although several analytical approaches to interval estimation in SEM have been developed, there lacks a comprehensive review of these methods. We review the popular Wald-type and lesser known likelihood-based methods in linear SEM, emphasizing profile likelihood-based confidence intervals (CIs). Existing algorithms for computing profile likelihood-based CIs are described, including two newer algorithms which are extended to construct profile likelihood-based confidence regions (CRs). Finally, we illustrate the use of these CIs and CRs with two empirical examples, and provide practical recommendations on when to use Wald-type CIs and CRs versus profile likelihood-based CIs and CRs. OpenMx example code is provided in an Online Appendix for constructing profile likelihood-based CIs and CRs for SEM.
Chapter 7 introduces statistical power and effect size in hypothesis testing. Guidelines for interpretation of effect size, along with other sources of increasing statistical power, are provided. Point estimation and interval estimation and their relationship to population parameter estimates and the hypothesis-testing process are considered. Statistical significance is highly sensitive to large sample sizes. This means that researchers, in addition to selecting desired statistical significance p-values, need to know the magnitude of the treatment effect or the effect size of the behavior under consideration. Effect size determines sample size, and sample size is intimately related to statistical power or the likelihood of rejecting a false null hypothesis.
If the results of a study reveal an interesting association between an exposure and a health outcome, there is a natural tendency to assume that it is real. (Note: we are considering whether two things are associated. This does not imply that one causes the other to occur.) However, before we can even contemplate this possibility we have to try to rule out other possible explanations for the results. There are three main ‘alternative explanations’ that we have to consider whenever we analyse epidemiological data or read the reports of others, whatever the study design; namely, could the results be due to chance, bias or error, or confounding? We discuss the first of these, chance, in this chapter and cover bias and confounding in Chapters 7 and 8, respectively.
Depression is highly prevalent in haemodialysis patients, and diet might play an important role. Therefore, we conducted this cross-sectional study to determine the association between dietary fatty acids (FA) consumption and the prevalence of depression in maintenance haemodialysis (MHD) patients. Dietary intake was assessed using a validated FFQ between December 2021 and January 2022. The daily intake of dietary FA was categorised into three groups, and the lowest tertile was used as the reference category. Depression was assessed using the Patient Health Questionnaire-9. Logistic regression and restricted cubic spline (RCS) models were applied to assess the relationship between dietary FA intake and the prevalence of depression. As a result, after adjustment for potential confounders, a higher intake of total FA [odds ratio (OR)T3 vs. T1 = 1·59, 95 % confidence interval (CI) = 1·04, 2·46] and saturated fatty acids (SFA) (ORT3 vs. T1 = 1·83, 95 % CI = 1·19, 2·84) was associated with a higher prevalence of depressive symptoms. Significant positive linear trends were also observed (P < 0·05) except for SFA intake. Similarly, the prevalence of depression in MHD patients increased by 20% (OR = 1.20, 95% CI = 1.01–1.43) for each standard deviation increment in SFA intake. RCS analysis indicated an inverse U-shaped correlation between SFA and depression (Pnonlinear > 0·05). Additionally, the sensitivity analysis produced similar results. Furthermore, no statistically significant association was observed in the subgroup analysis with significant interaction. In conclusion, higher total dietary FA and SFA were positively associated with depressive symptoms among MHD patients. These findings inform future research exploring potential mechanism underlying the association between dietary FA and depressive symptoms in MHD patients.
This rather long chapter constitutes part of the hike in our walk/hike/stroll set-up. We introduce the reader to the basics of stochastics (representing both probability and statistics) necessary for the more technical discussions on risk later. The path followed starts from probability space (a theoretical concept we quickly leave aside); we then move to the notion of a random variable and,, its distribution function, including the most important discrete as well as continuous examples. Historical examples as well as pedagogical ones are always included in order to support the understanding of the new concepts introduced. These examples often show that there is more to randomness than meets the eye. For the applications discussed later, we will measure statistical uncertainty through the concept of confidence intervals. These can be based either on some asymptotic theory involving the famous bell curve, the normal distribution, or on some form of resampling known under the name of bootstrapping. Further, we add some tools that are very important for measuring and communicating risk; these include the concepts of return periods and quantile functions.
Chapter 10 covers INFERENCES INVOLVING THE MEAN OF A SINGLE POPULATION WHEN σ IS KNOWN and includes the following specific topics, among others: Estimating the Population Mean, μ, Interval Estimation, Confidence Intervals, Hypothesis Testing and Interval Estimation, Effect Size,Type II Error, and Power.
Chapter 15 covers CORRELATION AND SIMPLE REGRESSION AS INFERENTIAL TECHNIQUES and includes the following specific topics, among others:Bivariate Normal Distribution, Statistical Significance Test of Correlation, Confidence Intervals, Statistical Significance of b-Weight, Fit of the Overall Regression Equation, R and R-squared, Adjusted R-squared, Regression Diagnostics, Residual Plots, Influential Observations, Discrepancy, Leverage, Influence, and Power Analyses.
Chapter 10 covers inferences involving the mean of a single population when σ is known and includes the following specific topics, among others: estimating the population mean, interval estimation, confidence intervals, hypothesis testing and interval estimation, effect size, type II error, and power.
Chapter 15 covers correlation and simple regression as inferential techniques and includes the following specific topics, among others: bivariate normal distribution, statistical significance test of correlation, confidence intervals, statistical significance of the b weight, fit of the overall regression equation, R and R-squared, adjusted R-squared, regression diagnostics, residual plots, influential observations, discrepancy, leverage, influence, and power analysis.
Chapter 16 covers an introduction to multiple regression and includes the following specific topics, among others: confidence intervals, statistical significance of the b weight, fit of the overall regression Eeuation, R and R-squared, adjusted R-squared, semipartial correlation, partial slope, confounding, and statistical control.
Three experiments (N = 550) examined the effect of an interval construction elicitation method used in several expert elicitation studies on judgment accuracy. Participants made judgments about topics that were either searchable or unsearchable online using one of two order variations of the interval construction procedure. One group of participants provided their best judgment (one step) prior to constructing an interval (i.e., lower bound, upper bound, and a confidence rating that the correct value fell in the range provided), whereas another group of participants provided their best judgment last, after the three-step confidence interval was constructed. The overall effect of this elicitation method was not significant in 8 out of 9 univariate tests. Moreover, the calibration of confidence intervals was not affected by elicitation order. The findings warrant skepticism regarding the benefit of prior confidence interval construction for improving judgment accuracy.
We show how to elicit the beliefs of an expert in the form of a “most likely interval”, a set of future outcomes that are deemed more likely than any other outcome. Our method, called the Most Likely Interval elicitation rule (MLI), asks the expert for an interval and pays according to how well the answer compares to the actual outcome. We show that the MLI performs well in economic experiments, and satisfies a number of desirable theoretical properties such as robustness to the risk preferences of the expert.