We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Factor analysis for nonnormally distributed variables is discussed in this paper. The main difference between our approach and more traditional approaches is that not only second order cross-products (like covariances) are utilized, but also higher order cross-products. It turns out that under some conditions the parameters (factor loadings) can be uniquely determined. Two estimation procedures will be discussed. One method gives Best Generalized Least Squares (BGLS) estimates, but is computationally very heavy, in particular for large data sets. The other method is a least squares method which is computationally less heavy. In one example the two methods will be compared by using the bootstrap method. In another example real life data are analyzed.
A cubic spline method for smoothing equipercentile equating relationships under the common item nonequivalent populations design is described. Statistical techniques based on bootstrap estimation are presented that are designed to aid in choosing an equating method/degree of smoothing. These include: (a) asymptotic significance tests that compare no equating and linear equating to equipercentile equating; (b) a scheme for estimating total equating error and for dividing total estimated error into systematic and random components. The smoothing technique and statistical procedures are explored and illustrated using data from forms of a professional certification test.
Methods for comparing means are known to be highly nonrobust in terms of Type II errors. The problem is that slight shifts from normal distributions toward heavy-tailed distributions inflate the standard error of the sample mean. In contrast, the standard error of various robust measures of location, such as the one-step M-estimator, are relatively unaffected by heavy tails. Wilcox recently examined a method of comparing the one-step M-estimators of location corresponding to two independent groups which provided good control over the probability of a Type I error even for unequal sample sizes, unequal variances, and different shaped distributions. There is a fairly obvious extension of this procedure to pairwise comparisons of more than two independent groups, but simulations reported here indicate that it is unsatisfactory. A slight modification of the procedure is found to give much better results, although some caution must be taken when there are unequal sample sizes and light-tailed distributions. An omnibus test is examined as well.
The conventional setup for multi-group structural equation modeling requires a stringent condition of cross-group equality of intercepts before mean comparison with latent variables can be conducted. This article proposes a new setup that allows mean comparison without the need to estimate any mean structural model. By projecting the observed sample means onto the space of the common scores and the space orthogonal to that of the common scores, the new setup allows identifying and estimating the means of the common and specific factors, although, without replicate measures, variances of specific factors cannot be distinguished from those of measurement errors. Under the new setup, testing cross-group mean differences of the common scores is done independently from that of the specific factors. Such independent testing eliminates the requirement for cross-group equality of intercepts by the conventional setup in order to test cross-group equality of means of latent variables using chi-square-difference statistics. The most appealing piece of the new setup is a validity index for mean differences, defined as the percentage of the sum of the squared observed mean differences that is due to that of the mean differences of the common scores. By analyzing real data with two groups, the new setup is shown to offer more information than what is obtained under the conventional setup.
The paper takes up the problem of performing all pairwise comparisons among J independent groups based on 20% trimmed means. Currently, a method that stands out is the percentile-t bootstrap method where the bootstrap is used to estimate the quantiles of a Studentized maximum modulus distribution when all pairs of population trimmed means are equal. However, a concern is that in simulations, the actual probability of one or more Type I errors can drop well below the nominal level when sample sizes are small. A practical issue is whether a method can be found that corrects this problem while maintaining the positive features of the percentile-t bootstrap. Three new methods are considered here, one of which achieves the desired goal. Another method, which takes advantage of theoretical results by Singh (1998), performs almost as well but is not recommended when the smallest sample size drops below 15. In some situations, however, it gives substantially shorter confidence intervals.
A test is proposed for the equality of the variances of k ≥ 2 correlated variables. Pitman's test for k = 2 reduces the null hypothesis to zero correlation between their sum and their difference. Its extension, eliminating nuisance parameters by a bootstrap procedure, is valid for any correlation structure between the k normally distributed variables. A Monte Carlo study for several combinations of sample sizes and number of variables is presented, comparing the level and power of the new method with previously published tests. Some nonnormal data are included, for which the empirical level tends to be slightly higher than the nominal one. The results show that our method is close in power to the asymptotic tests which are extremely sensitive to nonnormality, yet it is robust and much more powerful than other robust tests.
Recently several new attempts have been made to find a robust method for comparing the variances of J dependent random variables. However, empirical studies have shown that all of these procedures can give unsatisfactory results. This paper examines several new procedures that are derived heuristically. One of these procedures was found to perform better than all of the robust procedures studied here, and so it is recommended for general use.
Efron's Monte Carlo bootstrap algorithm is shown to cause degeneracies in Pearson's r for sufficiently small samples. Two ways of preventing this problem when programming the bootstrap of r are considered.
The two-sample problem for Cronbach’s coefficient \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\alpha _C$$\end{document}, as an estimate of test or composite score reliability, has attracted little attention compared to the extensive treatment of the one-sample case. It is necessary to compare the reliability of a test for different subgroups, for different tests or the short and long forms of a test. In this paper, we study statistical procedures of comparing two coefficients \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\alpha _{C,1}$$\end{document} and \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\alpha _{C,2}$$\end{document}. The null hypothesis of interest is \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$H_0 : \alpha _{C,1} = \alpha _{C,2}$$\end{document}, which we test against one-or two-sided alternatives. For this purpose, resampling-based permutation and bootstrap tests are proposed for two-group multivariate non-normal models under the general asymptotically distribution-free (ADF) setting. These statistical tests ensure a better control of the type-I error, in finite or very small sample sizes, when the state-of-affairs ADF large-sample test may fail to properly attain the nominal significance level. By proper choice of a studentized test statistic, the resampling tests are modified in order to be valid asymptotically even in non-exchangeable data frameworks. Moreover, extensions of this approach to other designs and reliability measures are discussed as well. Finally, the usefulness of the proposed resampling-based testing strategies is demonstrated in an extensive simulation study and illustrated by real data applications.
Reporting effect size index estimates with their confidence intervals (CIs) can be an excellent way to simultaneously communicate the strength and precision of the observed evidence. We recently proposed a robust effect size index (RESI) that is advantageous over common indices because it’s widely applicable to different types of data. Here, we use statistical theory and simulations to develop and evaluate RESI estimators and confidence/credible intervals that rely on different covariance estimators. Our results show (1) counter to intuition, the randomness of covariates reduces coverage for Chi-squared and F CIs; (2) when the variance of the estimators is estimated, the non-central Chi-squared and F CIs using the parametric and robust RESI estimators fail to cover the true effect size at the nominal level. Using the robust estimator along with the proposed nonparametric bootstrap or Bayesian (credible) intervals provides valid inference for the RESI, even when model assumptions may be violated. This work forms a unified effect size reporting procedure, such that effect sizes with confidence/credible intervals can be easily reported in an analysis of variance (ANOVA) table format.
In the distance approach to nonlinear multivariate data analysis the focus is on the optimal representation of the relationships between the objects in the analysis. In this paper two methods are presented for including weights in distance-based nonlinear multivariate data analysis. In the first method, weights are assigned to the objects while the second method is concerned with differential weighting of groups of variables. When each analysis variable defines a group the latter method becomes a variable weighting method. For objects the weights are assumed to be given; for groups of variables they may be given, or estimated. These weighting schemes can also be combined and have several important applications. For example, they make it possible to perform efficient analyses of large data sets, to use the distance-based variety of nonlinear multivariate data analysis as an addition to loglinear analysis of multiway contingency tables, and to do stability studies of the solutions by applying the bootstrap on the objects or the variables in the analysis. These and other applications are discussed, and an efficient algorithm is proposed to minimize the corresponding loss function.
The paper suggests new methods for comparing the medians corresponding to independent treatment groups. The procedures are based on the Harrell-Davis estimator in conjunction with a slight modification and extension of the bootstrap calibration technique suggested by Loh. Alternatives to the Harrell-Davis estimator are briefly discussed. For the special case of two treatment groups, the proposed procedure always had more power than the Fligner-Rust solution, as well as the procedure examined by Wilcox and Charlin. Included is an illustration, using real data, that comparing medians, rather than means, can yield a substantially different conclusion as to whether two distributions differ in terms of some measure of central location.
A Monte Carlo experiment is conducted to investigate the performance of the bootstrap methods in normal theory maximum likelihood factor analysis both when the distributional assumption is satisfied and unsatisfied. The parameters and their functions of interest include unrotated loadings, analytically rotated loadings, and unique variances. The results reveal that (a) bootstrap bias estimation performs sometimes poorly for factor loadings and nonstandardized unique variances; (b) bootstrap variance estimation performs well even when the distributional assumption is violated; (c) bootstrap confidence intervals based on the Studentized statistics are recommended; (d) if structural hypothesis about the population covariance matrix is taken into account then the bootstrap distribution of the normal theory likelihood ratio test statistic is close to the corresponding sampling distribution with slightly heavier right tail.
Experience with real data indicates that psychometric measures often have heavy-tailed distributions. This is known to be a serious problem when comparing the means of two independent groups because heavy-tailed distributions can have a serious effect on power. Another problem that is common in some areas is outliers. This paper suggests an approach to these problems based on the one-step M-estimator of location. Simulations indicate that the new procedure provides very good control over the probability of a Type I error even when distributions are skewed, have different shapes, and the variances are unequal. Moreover, the new procedure has considerably more power than Welch's method when distributions have heavy tails, and it compares well to Yuen's method for comparing trimmed means. Wilcox's median procedure has about the same power as the proposed procedure, but Wilcox's method is based on a statistic that has a finite sample breakdown point of only 1/n, where n is the sample size. Comments on other methods for comparing groups are also included.
In this paper we propose a simple estimator for unbalanced repeated measures design models where each unit is observed at least once in each cell of the experimental design. The estimator does not require a model of the error covariance structure. Thus, circularity of the error covariance matrix and estimation of correlation parameters and variances are not necessary. Together with a weak assumption about the reason for the varying number of observations, the proposed estimator and its variance estimator are unbiased. As an alternative to confidence intervals based on the normality assumption, a bias-corrected and accelerated bootstrap technique is considered. We also propose the naive percentile bootstrap for Wald-type tests where the standard Wald test may break down when the number of observations is small relative to the number of parameters to be estimated. In a simulation study we illustrate the properties of the estimator and the bootstrap techniques to calculate confidence intervals and conduct hypothesis tests in small and large samples under normality and non-normality of the errors. The results imply that the simple estimator is only slightly less efficient than an estimator that correctly assumes a block structure of the error correlation matrix, a special case of which is an equi-correlation matrix. Application of the estimator and the bootstrap technique is illustrated using data from a task switch experiment based on an experimental within design with 32 cells and 33 participants.
Despite wide applications of both mediation models and missing data techniques, formal discussion of mediation analysis with missing data is still rare. We introduce and compare four approaches to dealing with missing data in mediation analysis including listwise deletion, pairwise deletion, multiple imputation (MI), and a two-stage maximum likelihood (TS-ML) method. An R package bmem is developed to implement the four methods for mediation analysis with missing data in the structural equation modeling framework, and two real examples are used to illustrate the application of the four methods. The four methods are evaluated and compared under MCAR, MAR, and MNAR missing data mechanisms through simulation studies. Both MI and TS-ML perform well for MCAR and MAR data regardless of the inclusion of auxiliary variables and for AV-MNAR data with auxiliary variables. Although listwise deletion and pairwise deletion have low power and large parameter estimation bias in many studied conditions, they may provide useful information for exploring missing mechanisms.
In most item response theory applications, model parameters need to be first calibrated from sample data. Latent variable (LV) scores calculated using estimated parameters are thus subject to sampling error inherited from the calibration stage. In this article, we propose a resampling-based method, namely bootstrap calibration (BC), to reduce the impact of the carryover sampling error on the interval estimates of LV scores. BC modifies the quantile of the plug-in posterior, i.e., the posterior distribution of the LV evaluated at the estimated model parameters, to better match the corresponding quantile of the true posterior, i.e., the posterior distribution evaluated at the true model parameters, over repeated sampling of calibration data. Furthermore, to achieve better coverage of the fixed true LV score, we explore the use of BC in conjunction with Jeffreys’ prior. We investigate the finite-sample performance of BC via Monte Carlo simulations and apply it to two empirical data examples.
It is widely believed that a joint factor analysis of item responses and response time (RT) may yield more precise ability scores that are conventionally predicted from responses only. For this purpose, a simple-structure factor model is often preferred as it only requires specifying an additional measurement model for item-level RT while leaving the original item response theory (IRT) model for responses intact. The added speed factor indicated by item-level RT correlates with the ability factor in the IRT model, allowing RT data to carry additional information about respondents’ ability. However, parametric simple-structure factor models are often restrictive and fit poorly to empirical data, which prompts under-confidence in the suitablity of a simple factor structure. In the present paper, we analyze the 2015 Programme for International Student Assessment mathematics data using a semiparametric simple-structure model. We conclude that a simple factor structure attains a decent fit after further parametric assumptions in the measurement model are sufficiently relaxed. Furthermore, our semiparametric model implies that the association between latent ability and speed/slowness is strong in the population, but the form of association is nonlinear. It follows that scoring based on the fitted model can substantially improve the precision of ability scores.
Chapter 8 presents random forests for regression, which – at least in some situations – may outperform the least-squares-based regression methods. The chapter discusses bagging in the context of regression applications of random forests, the algorithm for splitting nodes in regression trees, and the variable importance metrics applicable to regression.
Researchers theorize that many real-world networks exhibit community structure where within-community edges are more likely than between-community edges. While numerous methods exist to cluster nodes into different communities, less work has addressed this question: given some network, does it exhibit statistically meaningful community structure? We answer this question in a principled manner by framing it as a statistical hypothesis test in terms of a general and model-agnostic community structure parameter. Leveraging this parameter, we propose a simple and interpretable test statistic used to formulate two separate hypothesis testing frameworks. The first is an asymptotic test against a baseline value of the parameter while the second tests against a baseline model using bootstrap-based thresholds. We prove theoretical properties of these tests and demonstrate how the proposed method yields rich insights into real-world datasets.