Cambridge University Press
052183550X - Statistics Explained - An Introductory Guide for Life Scientists - by Steve McKillup
Frontmatter/Prelims
Statistics Explained is a reader-friendly introduction to experimental design and statistics for undergraduate students in the life sciences, particularly those who do not have a strong mathematical background. Hypothesis testing and experimental design are discussed first. Statistical tests are then explained using pictorial examples and a minimum of formulae. This class-tested approach, along with a well-structured set of diagnostic tables, will give students the confidence to choose an appropriate test with which to analyse their own data sets. Presented in a lively and straightforward manner Statistics Explained will give readers the depth and background necessary to proceed to more advanced texts and applications. It will therefore be essential reading for all bioscience undergraduates, and will serve as a useful refresher course for more advanced students.
Steve McKillup is an Associate Professor of Biology in the School of Biological and Environmental Sciences at Central Queensland University, Rockhampton.
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
CAMBRIDGE UNIVERSITY PRESS
The Edinburgh Building, Cambridge CB2 2RU, UK
www.cambridge.org
Information on this title: www.cambridge.org/9780521835503
© S. McKillup 2005
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without
the written permission of Cambridge University Press.
First published 2005
Printed in the United Kingdom at the University Press, Cambridge
A catalogue record for this publication is available from the British Library
ISBN-13 978-0-521-83550-X hardback
ISBN-10 0-521-83550-X hardback
ISBN-13 978-0-521-54316-9 paperback
ISBN-10 0-521-54316-9 paperback
Cambridge University Press has no responsibility for
the persistence or accuracy of URLs for external or
third-party internet websites referred to in this publication,
and does not guarantee that any content on such
websites is, or will remain, accurate or appropriate.
Preface | page xi | |
1 | Introduction | 1 |
1.1 | Why do life scientists need to know about experimental design and statistics? | 1 |
1.2 | What is this book designed to do? | 5 |
2 | ‘Doing science’ – hypotheses, experiments, and disproof | 7 |
2.1 | Introduction | 7 |
2.2 | Basic scientific method | 7 |
2.3 | Making a decision about an hypothesis | 10 |
2.4 | Why can’t an hypothesis or theory ever be proven? | 11 |
2.5 | ‘Negative’ outcomes | 11 |
2.6 | Null and alternate hypotheses | 12 |
2.7 | Conclusion | 13 |
3 | Collecting and displaying data | 14 |
3.1 | Introduction | 14 |
3.2 | Variables, experimental units, and types of data | 14 |
3.3 | Displaying data | 16 |
3.4 | Displaying ordinal or nominal scale data | 20 |
3.5 | Bivariate data | 23 |
3.6 | Multivariate data | 25 |
3.7 | Summary and conclusion | 26 |
4 | Introductory concepts of experimental design | 27 |
4.1 | Introduction | 27 |
4.2 | Sampling – mensurative experiments | 28 |
4.3 | Manipulative experiments | 32 |
4.4 | Sometimes you can only do an unreplicated experiment | 39 |
4.5 | Realism | 40 |
4.6 | A bit of common sense | 41 |
4.7 | Designing a ‘good’ experiment | 41 |
4.8 | Conclusion | 42 |
5 | Probability helps you make a decision about your results | 44 |
5.1 | Introduction | 44 |
5.2 | Statistical tests and significance levels | 45 |
5.3 | What has this got to do with making a decision or statistical testing? | 49 |
5.4 | Making the wrong decision | 49 |
5.5 | Other probability levels | 50 |
5.6 | How are probability values reported? | 51 |
5.7 | All statistical tests do the same basic thing | 52 |
5.8 | A very simple example – the chi-square test for goodness of fit | 52 |
5.9 | What if you get a statistic with a probability of exactly 0.05? | 55 |
5.10 | Statistical significance and biological significance | 55 |
5.11 | Summary and conclusion | 55 |
6 | Working from samples – data, populations, and statistics | 57 |
6.1 | Using a sample to infer the characteristics of a population | 57 |
6.2 | Statistical tests | 57 |
6.3 | The normal distribution | 57 |
6.4 | Samples and populations | 63 |
6.5 | Your sample mean may not be an accurate estimate of the population mean | 65 |
6.6 | What do you do when you only have data from one sample? | 67 |
6.7 | Why are the statistics that describe the normal distribution so important? | 71 |
6.8 | Distributions that are not normal | 72 |
6.9 | Other distributions | 73 |
6.10 | Other statistics that describe a distribution | 74 |
6.11 | Conclusion | 75 |
7 | Normal distributions – tests for comparing the means of one and two samples | 77 |
7.1 | Introduction | 77 |
7.2 | The 95% confidence interval and 95% confidence limits | 77 |
7.3 | Using the Z statistic to compare a sample mean and population mean when population statistics are known | 78 |
7.4 | Comparing a sample mean with an expected value | 81 |
7.5 | Comparing the means of two related samples | 88 |
7.6 | Comparing the means of two independent samples | 90 |
7.7 | Are your data appropriate for a t test? | 92 |
7.8 | Distinguishing between data that should be analysed by a paired sample test or a test for two independent samples | 94 |
7.9 | Conclusion | 95 |
8 | Type 1 and Type 2 errors, power, and sample size | 96 |
8.1 | Introduction | 96 |
8.2 | Type 1 error | 96 |
8.3 | Type 2 error | 97 |
8.4 | The power of a test | 100 |
8.5 | What sample size do you need to ensure the risk of Type 2 error is not too high? | 102 |
8.6 | Type 1 error, Type 2 error, and the concept of biological risk | 104 |
8.7 | Conclusion | 104 |
9 | Single factor analysis of variance | 105 |
9.1 | Introduction | 105 |
9.2 | Single factor analysis of variance | 106 |
9.3 | An arithmetic/pictorial example | 112 |
9.4 | Unequal sample sizes (unbalanced designs) | 117 |
9.5 | An ANOVA does not tell you which particular treatments appear to be from different populations | 117 |
9.6 | Fixed or random effects | 118 |
10 | Multiple comparisons after ANOVA | 119 |
10.1 | Introduction | 119 |
10.2 | Multiple comparison tests after a Model I ANOVA | 119 |
10.3 | An a-posteriori Tukey comparison following a significant result for a single factor Model I ANOVA | 122 |
10.4 | Other a-posteriori multiple comparison tests | 123 |
10.5 | Planned comparisons | 124 |
11 | Two factor analysis of variance | 127 |
11.1 | Introduction | 127 |
11.2 | What does a two factor ANOVA do? | 129 |
11.3 | How does a two factor ANOVA analyse these data? | 131 |
11.4 | How does a two factor ANOVA separate out the effects of each factor and interaction? | 136 |
11.5 | An example of a two factor analysis of variance | 139 |
11.6 | Some essential cautions and important complications | 140 |
11.7 | Unbalanced designs | 149 |
11.8 | More complex designs | 149 |
12 | Important assumptions of analysis of variance: transformations and a test for equality of variances | 151 |
12.1 | Introduction | 151 |
12.2 | Homogeneity of variances | 151 |
12.3 | Normally distributed data | 152 |
12.4 | Independence | 155 |
12.5 | Transformations | 156 |
12.6 | Are transformations legitimate? | 158 |
12.7 | Tests for heteroscedasticity | 159 |
13 | Two factor analysis of variance without replication, and nested analysis of variance | 162 |
13.1 | Introduction | 162 |
13.2 | Two factor ANOVA without replication | 162 |
13.3 | A-posteriori comparison of means after a two factor ANOVA without replication | 166 |
13.4 | Randomised blocks | 167 |
13.5 | Nested ANOVA as a special case of a one factor ANOVA | 168 |
13.6 | A pictorial explanation of a nested ANOVA | 170 |
13.7 | A final comment on ANOVA – this book is only an introduction | 175 |
14 | Relationships between variables: linear correlation and linear regression | 176 |
14.1 | Introduction | 176 |
14.2 | Correlation contrasted with regression | 177 |
14.3 | Linear correlation | 177 |
14.4 | Calculation of the Pearson r statistic | 178 |
14.5 | Is the value of r statistically significant? | 184 |
14.6 | Assumptions of linear correlation | 184 |
14.7 | Summary and conclusion | 184 |
15 | Simple linear regression | 186 |
15.1 | Introduction | 186 |
15.2 | Linear regression | 186 |
15.3 | Calculation of the slope of the regression line | 188 |
15.4 | Calculation of the intercept with the Y axis | 192 |
15.5 | Testing the significance of the slope and the intercept of the regression line | 193 |
15.6 | An example – mites that live in the your hair follicles | 199 |
15.7 | Predicting a value of Y from a value of X | 201 |
15.8 | Predicting a value of X from a value of Y | 201 |
15.9 | The danger of extrapolating beyond the range of data available | 202 |
15.10 | Assumptions of linear regression analysis | 202 |
15.11 | Further topics in regression | 204 |
16 | Non-parametric statistics | 205 |
16.1 | Introduction | 205 |
16.2 | The danger of assuming normality when a population is grossly non-normal | 205 |
16.3 | The value of making a preliminary inspection of the data | 207 |
17 | Non-parametric tests for nominal scale data | 208 |
17.1 | Introduction | 208 |
17.2 | Comparing observed and expected frequencies – the chi-square test for goodness of fit | 209 |
17.3 | Comparing proportions among two or more independent samples | 212 |
17.4 | Bias when there is one degree of freedom | 215 |
17.5 | Three-dimensional contingency tables | 219 |
17.6 | Inappropriate use of tests for goodness of fit and heterogeneity | 220 |
17.7 | Recommended tests for categorical data | 221 |
17.8 | Comparing proportions among two or more related samples of nominal scale data | 222 |
18 | Non-parametric tests for ratio, interval, or ordinal scale data | 224 |
18.1 | Introduction | 224 |
18.2 | A non-parametric comparison between one sample and an expected distribution | 225 |
18.3 | Non-parametric comparisons between two independent samples | 227 |
18.4 | Non-parametric comparisons among more than two independent samples | 232 |
18.5 | Non-parametric comparisons of two related samples | 236 |
18.6 | Non-parametric comparisons among three or more related samples | 238 |
18.7 | Analysing ratio, interval, or ordinal data that show gross differences in variance among treatments and cannot be satisfactorily transformed | 241 |
18.8 | Non-parametric correlation analysis | 243 |
18.9 | Other non-parametric tests | 245 |
19 | Choosing a test | 246 |
19.1 | Introduction | 246 |
20 | Doing science responsibly and ethically | 255 |
20.1 | Introduction | 255 |
20.2 | Dealing fairly with other people’s work | 255 |
20.3 | Doing the experiment | 257 |
20.4 | Evaluating and reporting results | 258 |
20.5 | Quality control in science | 260 |
References | 261 | |
Index | 263 |
© Cambridge University Press