To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter serves as a guide to common advanced statistical methods: multiple regression, two-way and three-way analysis of variance, logistic regression, multiple logistic regression, Spearman’s rho correlation, Wilcoxon rank-sum test, and the Kruskal-Wallis test. Each of these is explanations is accompanied by a software guide to show how to conduct these procedures and interpret the results. There is also a brief description of common multivariate procedures.
Chapter 5 teaches how data analysts can change the scale of a distribution by performing a linear transformation, which is the process of adding, subtracting, multiplying, or dividing the data by a constant. Adding and subtracting a constant will change the mean of a variable, but not its standard deviation or variance. Multiplying and dividing by a constant will change the mean, the standard deviation, and the variance of a dataset. A table shows students shows how linear transformations change the values of models of central tendency and variability. One special linear transformation is the z-score. All z-score values have a mean of 0 and a standard deviation of 1. Putting datasets on a common scale permits comparisons across different units. But linear transformations, like the z-score transformation, force the data to have the desired mean and standard deviation. Yet, they do not change the shape of the distribution – only its scale. Indeed, all scales are arbitrary, and scientists can use linear transformations to give their data any mean and standard deviation they choose.
A one-sample t-test is an NHST procedure that is appropriate when a z-test cannot be performed because the population standard deviation is unknown. The one-sample t-test follows all of the eight steps of the z-test, but requires modifications to accommodate the unknown sample standard deviation. First, the formulas that used σy now use the estimated population standard deviation based on sample data instead. Second, degrees of freedom must be calculated. Finally, t-tests use a new probability distribution called a t-distribution.
This chapter also explains more about p-values. First, when p is lower than α, the null hypothesis is always rejected. Second, when p is higher than α, the null hypothesis is always retained. Therefore, we can determine whether p is smaller or larger than α by determining whether the null hypothesis was retained or rejected for α. This chapter also discusses confidence intervals (CIs), which are a range of plausible values for a population parameter. CIs can vary in width, which the researcher chooses. The 95% CI width is most common in social science research. Finally, one-sample t-tests can be used to test the hypothesis that the sample mean is equal to any value (not the population mean).
A correlation coefficient can be used to make predictions of dependent variable values using a procedure called linear regression. There are two equations that can be used to perform regression: the standardized regression equation and the unstandardized regression equation. Both regression equations produce a straight line that represents the predicted value on the dependent variable for a sample member with a given X variable score.
One statistical phenomenon to be aware of when making predictions is regression towards the mean, which occurs when a predicted dependent variable value that is closer to the mean of the dependent variable than the person’s score on the independent variable was to the mean of the independent variable. This means that outliers and rare events can be difficult or impossible to predict via the regression equations.
There are important assumptions of Pearson’s r and regression: (1) a linear relationship between variables, (2) homogeneity of residuals, (3) an absence of a restriction of range, (4) a lack of outliers/extreme values that distort the relationship between variables, (5) subgroups within the sample are equivalent, and (6) interval- or ratio-level data for both variables. Violating any of these assumptions can distort the correlation coefficient.
All null hypothesis statistical significance test (NHST) procedures follow eight steps: (1) forming groups in the data, (2) define the null hypothesis (H0), (3) set alpha (α), (4) choose a one-tailed or a two-tailed test, (5), calculate the observed value, (6) find the critical value, (7), compare the observed value and the critical value, and (8) calculate an effect size. For a z-test the effect size is Cohen’s d. This can be interpreted as the number of standard deviations between the two means.
A z-test is the simplest NHST and tests the H0 that a sample’s dependent variable mean and a population’s dependent variable mean are equal. If H0 is retained, then the difference between means is no greater than what would be expected from sampling error. If the H0 is rejected, it is not a good statistical model for the data.
When conducting NHSTs, it is possible to make the wrong decision about the H0. A Type I error occurs when a person rejects a H0 that is actually true. A Type II error occurs when a person retains a H0 that is actually false. It is impossible to know whether a correct decision has been made or not.
All the NHSTs in previous chapters compare two dependent variable means. When there are three or more group means, it is possible to use unpaired two-sample t-tests for each pair of group means, but there are two problems with this strategy. First, as the number of groups increases, the number of t-tests required increases faster. Second, the risk of Type I error increases with each additional t-test.
The analysis of variance (ANOVA) fixes both problems. Its null hypothesis is that all group means are equal. ANOVA follows the same eight steps as other NHST procedures. ANOVA produces an effect size, η2. The η2 effect size can be interpreted in two ways. First, η2 quantifies the percentage of dependent variable variance that is shared with the independent variable’s variance. Second, η2 measures how much better the group mean functions as a predicted score when compared to the grand mean.
ANOVA only says whether a difference exists – not which means differ from other means. To determine this, a post hoc test is frequently performed. The most common procedure is Tukey’s test. This helps researchers identify the location of the difference(s).