To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
One way of generating hypotheses is to collect data and look for patterns. Often, however, it is difficult to see any underlying trend or feature from a set of data which is just a list of numbers. Graphs and descriptive statistics are very useful for summarising and displaying data in ways that may reveal patterns. This chapter describes the different types of data you are likely to encounter and discusses ways of displaying them.
Variables, experimental units and types of data
The particular attributes you measure when you collect data are called variables (e.g. body temperature, the numbers of a particular species of beetle per broad bean pod, the amount of fungal damage per leaf or the numbers of brown and albino mice). These data are collected from each experimental or sampling unit, which may be an individual (e.g. a human or a whale) or a defined item (e.g. a square metre of the seabed, a leaf or a lake). If you only measure one variable per experimental unit, the data set is univariate. Data for two variables per unit are bivariate. Data for three or more variables measured on the same experimental unit are multivariate.
This chapter describes some non-parametric tests for ratio, interval or ordinal scale univariate data. These tests do not use the predictable normal distribution of sample means, which is the basis of most parametric tests, to assess whether samples are from the same population. Because of this, non-parametric tests are generally not as powerful as their parametric equivalents but if the data are grossly non-normal and cannot be satisfactorily improved by transformation it is necessary to use one.
Non-parametric tests are often called ‘distribution free tests’, but most nevertheless assume that the samples being analysed are from populations with the same distribution. They should not be used where there are gross differences in distribution (including the variance) among samples and the general rule that the ratio of the largest to smallest sample variance should not exceed 4:1 discussed in Chapter 14 also applies. Many non-parametric tests for ratio, interval or ordinal data calculate a statistic from a comparison of two or more samples and work in the following way.
A single-factor ANOVA (Chapter 11) is used to analyse univariate data from samples exposed to different levels or aspects of only one factor. For example, it could be used to compare the oxygen consumption of a species of intertidal crab (the response variable) at two or more temperatures (the factor), the growth of brain tumours (the response variable) exposed to a range of drugs (the factor), or the insecticide resistance of a moth (the response variable) from several different locations (the factor).
Often, however, life scientists obtain univariate data for a response variable in relation to more than one factor. Examples of two-factor experiments are the oxygen consumption of an intertidal crab at several combinations of temperature and humidity, the growth of brain tumours exposed to a range of drugs and different levels of radiation therapy, or the insecticide resistance of an agricultural pest from different locations and different host plants.
This chapter gives a summary of the essential concepts of probability needed for the statistical tests covered in this book. More advanced material that will be useful if you go on to higher-level statistics courses is in Section 7.6.
Probability
The probability of any event can only vary between zero (0) and one (1) (which correspond to 0% and 100%). If an event is certain to occur, it has a probability of 1. If an event is certain not to occur, it has a probability of 0.
This chapter is an introduction to four slightly more complex analyses of variance often used by life scientists: two-factor ANOVA without replication, ANOVA for randomised block designs, repeated-measures ANOVA and nested ANOVA. An understanding of these is not essential if you are reading this book as an introduction to biostatistics. If, however, you need to use models that are more complex, the explanations given here are straightforward extensions of the pictorial descriptions in Chapters 11 and 13 and will help with many of the models used to analyse designs that are more complex.
Two-factor ANOVA without replication
This is a special case of the two-factor ANOVA described in Chapter 13. Sometimes an orthogonal experiment with two independent factors has to be done without replication because there is a shortage of experimental subjects or the treatments are very expensive to administer. The simplest case of ANOVA without replication is a two-factor design. You cannot do a single-factor ANOVA without replication.
Sometimes when testing whether two or more samples have come from the same population, you may suspect that the value of the response variable is also affected by a second factor. For example, you might need to compare the concentration of lead in the tissues of rats from a contaminated mine site and several uncontaminated sites. You could do this by trapping 20 rats at the mine and 20 from each of five randomly chosen uncontaminated sites. These data for the response variable ‘lead concentration’ and the factor ‘site’ could be analysed by a single-factor ANOVA (Chapter 11), but there is a potential complication. Heavy metals such as lead often accumulate in the bodies of animals, so the concentration in an individual’s tissues will be determined by (a) the amount of lead it has been exposed to at each site and (b) its age. Therefore, if you have a wide age range of rats in each sample, the variance in lead concentration within each is also likely to be large. This will give a relatively large standard error of the mean and a large error term in ANOVA (Chapter 11).
An example for a comparison between a mine site and a relatively uncontaminated control site is given in Figures 18.1(a) and (b). The lead concentration at a particular age is always higher in a rat from the mine site compared to one from the control, so there appears to be a difference between sites, but a t test or single-factor ANOVA will not show a significant difference because the data are confounded by the wide age range of rats trapped at each. The second confounding factor is called the covariate.
To generate hypotheses you often sample different groups or places (which is sometimes called a mensurative experiment because you usually measure something, such as height or weight, on each sampling unit) and explore these data for patterns or associations. To test hypotheses, you may do mensurative experiments, or manipulative experiments where you change a condition and observe the effect of that change on each experimental unit (like the experiment with millipedes and light described in Chapter 2). Often you may do several experiments of both types to test a particular hypothesis. The quality of your sampling and the design of your experiment can have an effect upon the outcome and therefore determine whether your hypothesis is rejected or not, so it is absolutely necessary to have an appropriate experimental design.
First, you should attempt to make your measurements as accurate and precise as possible so they are the best estimates of actual values. Accuracy is the closeness of a measured value to the true value. Precision is the ‘spread’ or variability of repeated measures of the same value. For example, a thermometer that consistently gives a reading corresponding to a true temperature (e.g. 20°C) is both accurate and precise. Another that gives a reading consistently higher (e.g. +10°C) than a true temperature is not accurate, but it is very precise. In contrast, a thermometer that gives a reading that fluctuates around a true temperature is not precise and will usually be inaccurate except when it occasionally happens to correspond to the true temperature.
By now you are likely to have a very clear idea about how science is done. Science is the process of rational enquiry, which seeks explanations for natural phenomena. Scientific method was discussed in a very prescriptive way in Chapter 2 as the proposal of an hypothesis from which predictions are made and tested by doing experiments. Depending on the results, which may have to be analysed statistically, the decision is made to either retain or reject the hypothesis. This process of knowledge by disproof advances our understanding of the natural world and seems extremely impartial and hard to fault.
Unfortunately, this is not necessarily the case, because science is done by human beings who sometimes do not behave responsibly or ethically. For example, some scientists fail to give credit to those who have helped propose a new hypothesis. Others make up, change, ignore or delete results so their hypothesis is not rejected, omit details to prevent the detection of poor experimental design or deal unfairly with the work of others. Most scientists are not taught about responsible behaviour and are supposed to learn a code of conduct by example, but this does not seem to be a good strategy considering the number of cases of scientific irresponsibility that have recently been exposed. This chapter is about the importance of behaving responsibly and ethically when doing science.
This chapter explains how some parametric tests for comparing the means of one and two samples actually work. The first test is for comparing a single sample mean to a known population mean. The second is for comparing a single sample mean to an hypothesised value. These are followed by a test for comparing two related samples and a test for two independent samples.
The 95% confidence interval and 95% confidence limits
In Chapter 8 it was described how 95% of the means of samples of a particular size, n, taken from a population with a known mean, μ, and standard deviation, σ, would be expected to occur within the range of μ ± 1.96 × SEM. This range is called the 95% confidence interval, and the actual numbers that show the limits of that range (μ ± 1.96 × SEM) are called the 95% confidence limits.
Every time you make a decision to retain or reject an hypothesis on the basis of the probability of a particular result, there is a risk that this decision is wrong. There are two sorts of mistakes you can make and these are called Type 1 error and Type 2 error.
Type 1 error
A Type 1 error or false positive occurs when you decide the null hypothesis is false when in reality it is not. Imagine you have taken a sample from a population with known statistics of μ and σ and exposed the sample to a particular experimental treatment. Because the population statistics are known, you could test whether the sample mean was significantly different to the population mean by doing a Z test (Section 9.3).
Before starting on experimental design and statistics, it is important to be familiar with how science is done. This is a summary of a very conventional view of scientific method.
Basic scientific method
These are the essential features of the ‘hypothetico-deductive’ view of scientific method (see Popper, 1968).
If you mention ‘statistics’ or ‘biostatistics’ to life scientists, they often look nervous. Many fear or dislike mathematics, but an understanding of statistics and experimental design is essential for graduates, postgraduates and researchers in the biological, biochemical, health and human movement sciences.
Since this understanding is so important, life science students are usually made to take some compulsory undergraduate statistics courses. Nevertheless, I found that a lot of graduates (and postgraduates) were unsure about designing experiments and had difficulty knowing which statistical test to use (and which ones not to!) when analysing their results. Some even told me they had found statistics courses ‘boring, irrelevant and hard to understand’.
Parametric analysis of variance assumes the data are from normally distributed populations with the same variance and there is independence, both within and among treatments. If these assumptions are not met, an ANOVA may give you an unrealistic F statistic and therefore an unrealistic probability that several sample means are from the same population. Therefore, it is important to know how robust ANOVA is to violations of these assumptions and what to do if they are not met as in some cases it may be possible to transform the data to make variances more homogeneous or give distributions that are better approximations to the normal curve.
This chapter discusses the assumptions of ANOVA, followed by three frequently used transformations. Finally, there are descriptions of two tests for the homogeneity of variances.
Often life scientists collect samples of multivariate data – where more than two variables have been measured on each sample – because univariate or bivariate data are unlikely to give enough detail to realistically describe the object, location or ecosystem being investigated. For example, when comparing three polluted and three unpolluted lakes, it is best to collect data for as many species of plants and animals as possible, because the pollutant may affect some species but not others. Data for only a few species are unlikely to realistically estimate the effect upon an aquatic community of several hundred species.
If all six lakes were relatively similar in terms of the species present and their abundance, it suggests the pollutant has had little or no effect. In contrast, if the three polluted lakes were relatively similar to each other but very different to the three unpolluted ones, it suggests the pollutant has had an effect. It would be very useful to have a way of assessing similarity (or its converse dissimilarity) among samples of multivariate data.
Why do life scientists need to know about experimental design and statistics?
If you work on living things, it is usually impossible to get data from every individual of the group or species in question. Imagine trying to measure the length of every anchovy in the Pacific Ocean, the haemoglobin count of every adult in the USA, the diameter of every pine tree in a plantation of 200 000 or the individual protein content of 10 000 prawns in a large aquaculture pond.
The total number of individuals of a particular species present in a defined area is often called the population. But because a researcher usually cannot measure every individual in the population (unless they are studying the few remaining members of an endangered species), they have to work with a very carefully selected subset containing several individuals (often called sampling units or experimental units) that they hope is a representative sample from which they can infer the characteristics of the population. You can also think of a population as the total number of artificial sampling units possible (e.g. the total number of 1m2 plots that would cover a whole coral reef) and your sample being the subset (e.g. 20 plots) you have to work upon.