Hostname: page-component-7857688df4-8d8b9 Total loading time: 0 Render date: 2025-11-19T14:44:35.682Z Has data issue: false hasContentIssue false

The Misuse of Normality Tests as Gatekeepers for Research in Prehospital and Disaster Medicine

Published online by Cambridge University Press:  05 November 2025

Jeffrey Michael Franc*
Affiliation:
Associate Professor, Department of Emergency Medicine, University of Alberta Visiting Professor in Disaster Medicine, Università del Piemonte Orientale Adjunct Faculty, Harvard/BIDMC Disaster Medicine Fellowship Editor-in-Chief, Prehospital and Disaster Medicine
*
Correspondence: Jeffrey Michael Franc Department of Emergency Medicine 736c University Terrace, 8203-112 Street NW Edmonton, AB, Canada, T6G 2T4 E-mail: jeffrey.franc@ualberta.ca
Rights & Permissions [Opens in a new window]

Abstract

This editorial discusses the common practice of using normality tests as a preliminary step for choosing between parametric and non-parametric methods. The editorial argues that such pre-testing is theoretically unfounded and practically harmful, as parametric tests are robust to moderate deviations from normality, while reliance on normality tests can distort error rates and mislead researchers.

Information

Type
Editorial
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of World Association for Disaster and Emergency Medicine

Introduction

In the realm of prehospital and disaster medicine research, the choice between parametric and non-parametric statistical methods can be a confusing decision for some authors. This issue appears to be particularly common in disaster medicine research, where innovative and exploratory studies often involve small sample sizes, and the choice between parametric and non-parametric methods arises more frequently.

It is common practice for researchers to employ tests of normality on their data as a preliminary step, or gatekeeper, in determining the appropriateness of parametric techniques. Frequently used statistical tests to assess normality include the Kolmogorov-Smirnov test and the Shapiro-Wilk test. Reference Ghasemi and Zahediasl1 Often, authors will use one of these tests on their data, and depending on the results of the test, make the decision to analyze the data using non-parametric methods (if the tests refute normality) or using parametric methods (if the tests fail to refute normality).

Common statistical textbooks are not always the most practical guides for navigating this issue, as their explanations are often grounded in abstract theory that can be difficult for many applied researchers to translate into practice. This editorial seeks to describe why such a practice is both theoretically flawed and practically unsound and to demonstrate why the reliance on normality tests as gatekeepers should be discouraged.

Focus on the Sampling Distribution

At the core of parametric testing is the assumption that the sampling distribution of the test statistic, rather than the raw sample data, is normally distributed. Reference Tsagris and Pandis2 Normality has to be established for the populations under consideration (or at least its sampling distribution), not for the sample of data being analyzed. Reference Rochon, Gondan and Kieser3 This theoretical foundation clarifies that testing the normality of observed data is not a direct verification of the parametric assumptions. The emphasis lies on the distribution of the mean over repeated sampling, a concept that renders preliminary normality tests on raw data conceptually misplaced.

Normality Tests Do Not Confirm Normality

It is a theoretical fallacy to interpret a non-significant normality test as evidence that the data are normally distributed. Such tests can only suggest a lack of evidence to reject the null hypothesis of normality, not confirm it. A non-significant P value in a normality test means “we don’t have enough evidence to reject normality,” not “the data are normal.” Suppose a researcher has n = 12 measurements of a biomarker used to detect susceptibility to excess bleeding in blunt force trauma. Imagine that the true underlying population is a heavily right-skewed exponential distribution. The Shapiro–Wilk test is applied:

  • Test result: W = 0.95; P = .18.

  • Interpretation: P >.05 → “fail to reject” normality.

But here’s the catch: with such a small sample, the test doesn’t have the statistical power to pick up the strong skewness of the underlying distribution. If we simulated thousands of values from the same process, the skew would be obvious. The “non-significant” result is simply telling us this small dataset doesn’t provide enough evidence to say it isn’t normal, not that it truly is normal. This theoretical limitation underscores the unreliability of using normality tests as a basis for methodological decisions.

Robustness of Parametric Tests

From a theoretical and practical perspective, parametric tests are known to be robust to moderate deviations from normality. This is particularly true with larger sample sizes: although these tests remain valid in very small samples when the outcome variable is normally distributed, their principal strength lies in the fact that in every moderately large sample, they retain validity regardless of the underlying distribution. Reference Lumley, Diehr, Emerson and Chen4 For instance, when using linear regression where the number of observations per variable is more than 10, violations of the normality assumption often do not noticeably impact results. Reference Schmidt and Finan5 This robustness implies that a strict preliminary normality test is not required for determining whether parametric methods can be applied, as these tests can tolerate non-normality to a reasonable extent. Reference Knief and Forstmeier6

Sample Size Sensitivity

In practical terms, normality tests exhibit paradoxical behavior depending on sample size. They often fail to detect real deviations in small samples and over-emphasize trivial deviations in large samples. Reference Fagerland7 In small samples, where non-parametric testing is most useful, normality tests suffer from low statistical power and frequently fail to identify substantial departures from normality. In contrast, in large samples, normality tests become excessively sensitive, almost inevitably detecting statistically significant deviations since no real-world dataset is perfectly normal. This may prompt researchers to choose a less powerful non-parametric test unnecessarily, thereby diminishing the ability to detect true effects. As a result, researchers may be misled into using non-parametric methods when parametric ones would suffice, or vice versa, based solely on sample size artifacts.

Distortion of Error Rates

Finally, the most practical concern is that conditioning the choice of analysis on a normality test can distort Type I error rates. From a theoretical standpoint, the practice of double-testing inflates Type I error rates because the preliminary normality test and the main hypothesis test are not independent—they are both conducted on the same data, and the choice of test is conditional on the outcome of the first. As a result, the nominal significance level no longer reflects the true probability of a false positive, and the actual Type I error rate for the overall procedure exceeds the pre-specified threshold. By making methodological decisions contingent on a preliminary test, researchers risk inflating their overall error rate and undermining the integrity of their statistical inference.

Rational Use of Normality Testing

Although this editorial argues strongly against the routine use of normality testing as a gatekeeper for choosing between parametric and non-parametric methods, there are cases where tests for normality are appropriate. Specifically, when the primary hypothesis of the study concerns the distribution of the data itself—for example, to determine whether a biological measurement follows a normal distribution or to compare the empirical distribution of a sample against a theoretical one—formal tests of normality are both relevant and justified. In such circumstances, normality tests serve as the object of inquiry rather than as a preliminary diagnostic tool, and their use can meaningfully advance scientific understanding.

Finally, a point about style in academic writing: most journals, including Prehospital and Disaster Medicine (PDM), require that every statistical test described in the Methods section be accompanied by corresponding results in the Results section. Therefore, if authors choose to use normality testing, they must ensure that the outcomes of these tests are explicitly and accurately presented in the Results section, rather than implied or omitted.

Conclusion

The reliance on normality tests as a preliminary step for determining whether to employ parametric or non-parametric methods is both theoretically unfounded and practically problematic. Researchers are encouraged to trust in the robustness of parametric tests and to make methodological choices based on a comprehensive analytical plan rather than on the outcome of a normality test. By doing so, the integrity of statistical inference is better preserved, and the choice of methods remains aligned with sound statistical principles.

Conflicts of interest

JMF is the CEO and founder of STAT59 and the Editor-in-Chief of Prehospital and Disaster Medicine.

Author Contribution

JMF is responsible for all content.

Use of AI Technology: ChatGPT Version 5 (OpenAI; San Francisco, California USA) was used for assistance with writing, however, the author remains responsible for all content.

References

Ghasemi, A, Zahediasl, S. Normality tests for statistical analysis: a guide for non-statisticians. Int J Endocrinol Metab. 2012;10(2):486489.CrossRefGoogle Scholar
Tsagris, M, Pandis, N. Normality test: is it really necessary? Am J Orthod Dentofacial Orthop. 2021;159(4):548549.CrossRefGoogle ScholarPubMed
Rochon, J, Gondan, M, Kieser, M. To test or not to test: preliminary assessment of normality when comparing two independent samples. BMC Medical Research Methodology. 2012;12(1):81.CrossRefGoogle ScholarPubMed
Lumley, T, Diehr, P, Emerson, S, Chen, L. The importance of the normality assumption in large public health data sets. Annu Rev Public Health. 2002;23:151169.CrossRefGoogle ScholarPubMed
Schmidt, AF, Finan, C. Linear regression and the normality assumption. J Clin Epidemiol. 2018;98:146151.CrossRefGoogle ScholarPubMed
Knief, U, Forstmeier, W. Violating the normality assumption may be the lesser of two evils. Behav Res Methods. 2021;53(6):25762590.CrossRefGoogle ScholarPubMed
Fagerland, MW. t-tests, non-parametric tests, and large studies--a paradox of statistical practice? BMC Med Res Methodol. 2012;12:78.CrossRefGoogle ScholarPubMed