To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In the previous chapter we presented a statistical model for answering questions about a single population mean μ, when σ was known. Because σ usually is not known in practice, we present in this chapter a statistical model that is appropriate for answering questions about a single population mean when σ is not known. We also present two other statistical models that are appropriate for answering questions about the equality of two population means.
In the situations of statistical inference that we have discussed regarding the mean and the variance, the dependent variable Y was tacitly assumed to be at the interval level of measurement. In addition, the statistical tests (e.g., t and F) required the assumptions of normality and, sometimes, homogeneity of variance of the parent population distributions.
In the previous chapter, we remarked that even when a strong linear correlation exists between two variables, X and Y, it may not be possible to talk about either one of them as the cause of the other. Even so, there is nothing to keep us from using one of the variables to predict the other. In making predictions from one variable to another, the variable being predicted is called the dependent variable and the variable predicting the dependent variable is called the independent variable. The dependent variable may also be referred to as the criterion or outcome while the independent variable may also be referred to as the predictor or regressor.
In Chapter 13 we introduced a method for analyzing mean differences on a single dependent variable between two or more independent groups while controlling the risk of making a Type I error at some specified α level. We explained the connection between the method and its name, analysis of variance, and showed how the method was a generalization of the independent group ttest.
Often, variables in their original form do not lend themselves well to comparison with other variables or to certain types of analysis. In addition, often we may obtain greater insight by expressing a variable in a different form. For these and other reasons, in this chapter we discuss four different ways to re-express or transform variables: applying linear transformations, applying nonlinear transformations, recoding, and combining. We also describe how to use syntax files to manage your data analysis.
According to Equation 17.2, and as discussed in Chapter 16, holding relative humidity constant at a fixed value, each one-degree increase in temperature is associated with an estimated 0.88 more ice cream bars sold on average. This is true regardless of the value at which relative humidity is held constant. Likewise, holding temperature constant at a fixed value, each one-percentage increase in relative humidity is associated with an estimated 0.40 more ice cream bars sold on average regardless of the value at which temperature is held constant.
In Chapter 15, we were concerned with predicting the number of ice cream bars we could expect to sell at the beach from the daily highest temperature. Our goal was to predict that number as accurately as possible so that we would neither take too much ice cream to the beach or too little. Because of the linear shape depicted in the scatterplot and the strong relationship between number of ice cream sales and daily highest temperature (r = 0.887), we were able to construct a reasonably accurate simple linear regression equation for prediction with R2 = 0.78. But, we can try to do even better.
As noted in Chapter 1, the function of descriptive statistics is to describe data. A first step in this process is to explore how the collection of values for each variable is distributed across the array of values possible. Because our concern is with each variable taken separately, we say that we are exploring univariate distributions. Tools for examining such distributions, including tabular and graphical representations, are presented in this chapter along with Stata commands for implementing these tools.
Welcome to the study of statistics! It has been our experience that many students face the prospect of taking a course in statistics with a great deal of anxiety, apprehension, and even dread. They fear not having the extensive mathematical background that they assume is required, and they fear that the contents of such a course will be irrelevant to their work in their fields of concentration.
A theoretical probability model is a mathematical representation of a class of experiments having certain specified characteristics in common, and from which we may derive the probabilities of outcomes of any experiment in the class. The extent to which the probabilities derived from the theoretical model are correct for a particular experiment depends on the extent to which the particular experiment possesses the characteristics required by the model.
As noted at the beginning of Chapter 7, a distinction exists between descriptive and inferential statistics. Whereas descriptive statistics are used to describe the data at hand, inferential statistics are used to draw inferences from results based on the data at hand to a larger population from which the data at hand have been selected. The data at hand form what is called a sample.
In Chapter 1, we made a distinction between descriptive and inferential statistics. We said that when the purpose of the research is to describe the data that have been (or will be) collected, we are in the realm of descriptive statistics. In descriptive statistics, because the data are collected on all the individuals about whom a conclusion is to be drawn, conclusions can be drawn with 100 percent certainty. In inferential statistics, on the other hand, the purpose of the research is not to describe the set of data that have been collected, but to generalize or make inferences based on them to a larger group called the population.