Hostname: page-component-cb9f654ff-mx8w7 Total loading time: 0 Render date: 2025-08-15T20:43:48.721Z Has data issue: false hasContentIssue false

When a mean can be meaningless: evaluating mosquito infections with Plasmodium parasites

Published online by Cambridge University Press:  14 July 2025

Prince Chigozirim Ubiaru
Affiliation:
School of Biodiversity, One Health and Veterinary Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, Scotland, UK
Lisa C. Ranford-Cartwright*
Affiliation:
School of Biodiversity, One Health and Veterinary Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, Scotland, UK
*
Corresponding author: Lisa C. Ranford-Cartwright; Email: lisa.ranford-cartwright@glasgow.ac.uk

Abstract

Several malaria control measures aim to reduce infection levels in mosquitoes, and evaluation of these measures usually relies on experimental infections of mosquitoes or evaluation in field populations. Both require robust statistical tools to account for multiple variables and non-normal distributions of parasites in the vector host. We argue that a well-chosen generalized linear or mixed model is the most appropriate statistical tool for analysing and interpreting these biological data. We suggest specific methods to overcome datasets where some groups have zero/close to zero prevalence, or many zero counts of parasite numbers (as would be seen with an effective transmission blocking intervention). These methods are more broadly applicable across many parasitic infections with similar patterns of parasite numbers across hosts.

Information

Type
Review Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press.

Introduction

Malaria is one of the most severe and life-threatening tropical diseases, affecting nearly half of the world’s population; in 2023 there were 263 million cases globally, and 597,000 deaths (World Health Organisation, 2024). People are infected through the bite of an infected female Anopheles mosquito. The disease can be treated by chemotherapy, usually with an Artemisinin-based combination therapy for the most pathogenic species Plasmodium falciparum, and for relapsing species like Plasmodium vivax, also with primaquine (World Health Organisation, 2024). Public health control strategies include intermittent preventive therapy and vector control, employed to reduce the disease burden significantly in malaria-endemic areas. However, these control strategies are challenged by parasite drug resistance (reviewed in Ippolito et al., Reference Ippolito, Moser, Kabuya, Cunningham and Juliano2021) and mosquito insecticide resistance (reviewed in Suh et al., Reference Suh, Elanga-Ndille, Tchouakui, Sandeu, Tagne, Wondji and Ndo2023).

Currently, many malaria control measures that are being developed aim to reduce the proportion of mosquitoes in a population that are capable of transmitting malaria. Examples include (i) interventions that aim to increase the refractoriness of mosquitoes to malaria parasites by genetic modification, paratransgenesis or alterations in mosquito microbiota (reviewed in Kefi et al., Reference Kefi, Cardoso-Jaime, Saab and Dimopoulos2024), like the use of a gene drive system within the mosquito vector to reduce and/or prevent the parasite’s survival and transmission; (ii) transmission-blocking drugs and vaccines (reviewed in Yu et al., Reference Yu, Wang, Luo, Zheng, Wang, Yang and Wang2022); and (iii) interventions that aim to reduce the age distributions of vector mosquitoes below the extrinsic incubation period (EIP). The EIP is the time it takes for Plasmodium parasites to develop, replicate and invade a mosquito’s salivary glands after she takes an infected blood meal. For example, the EIP of P. falciparum is between 12 and 14 days (Vaughan et al., Reference Vaughan, Noden and Beier1992; Ohm et al., Reference Ohm, Baldini, Barreaux, Lefevre, Lynch, Suh, Whitehead and Thomas2018), so for P. falciparum transmission to occur in humans, mosquito vectors must live beyond the EIP (Shaw et al., Reference Shaw, Holmdahl, Itoe, Werling, Marquette, Paton, Singh, Buckee, Childs and Catteruccia2020).

To evaluate the success of these control measures, it is necessary to compare the level of mosquito infection in a population before and after the intervention, or in experimental work, between control and treated mosquitoes given infectious blood meals. These studies require proper statistical models or tools that incorporate all the different measured factors to estimate the results, which are mainly presented as infection prevalence and oocyst intensity. Plasmodium-mosquito infection data, in common with other biological data with multiple measurements or experimental variables, are inherently ‘noisy’ natural systems (reviewed in Paterson and Lello, Reference Paterson and Lello2003). These types of data are usually complex, and require well-chosen statistical tools to fit and interpret them. Analysing these data types requires knowledge of the correct statistical tool to use, which is not usually simple. This often results in choosing a more straightforward statistical method to analyse the data, which may not be wholly appropriate.

Approaches for measuring mosquito infection

Experimentally, the impact of a control measure such as a transmission-blocking drug or antibody on mosquito infection uses 1 of 3 different methods: a direct membrane feeding assay (DMFA), a direct skin feed (DSF), or a standard membrane feeding assay (SMFA). In DMFA, blood containing transmission stages (mature gametocytes) from a treated human host is fed to female Anopheles mosquitoes using a water-jacketed membrane feeder (Ponnudurai et al., Reference Ponnudurai, Van Gemert, Bensink, Lensen and Meuwissen1987), whereas in DSF, female Anopheles mosquitoes are directly placed on the skin of a gametocyte carrier to feed (Bousema et al., Reference Bousema, Dinglasan, Morlais, Gouagna, Van Warmerdam, Awono-Ambene, Bonnet, Diallo, Coulibaly, Tchuinkam, Mulder, Targett, Drakeley, Sutherland, Robert, Doumbo, Touré, Graves, Roeffen, Sauerwein, Birkett, Locke, Morin, Wu and Churcher2012). In SMFA, the gametocytes are grown in vitro and fed to Anopheles mosquitoes using water-jacketed membrane feeders. DMFA and DSF are usually used in the field to check for the infectiousness of natural gametocyte carriers treated with, for example, potential transmission-blocking drugs. In contrast, SMFAs are used mainly for two purposes: first, to check for the effect of immune factors or transmission-blocking compounds in preventing the transmission of in vitro cultured mature gametocytes; second, to study the ability of different clones or isolates of P. falciparum to transmit to different Anopheles mosquitoes.

The resulting infection for SMFA (for the effect of immune factors/compounds), DMFA and DSF is characterized as transmission-reducing activity (TRA) and transmission-blocking activity (TBA) (Medley et al., Reference Medley, Sinden, Fleck, Billingsley, Tirawanchai and Rodriguez1993; Churcher et al., Reference Churcher, Blagborough, Delves, Ramakrishnan, Kapulu, Williams, Biswas, Da, Cohuet and Sinden2012). TRA is the percentage inhibition in parasite oocyst intensity of infected mosquitoes, whereas TBA is the percentage inhibition in oocyst prevalence (Churcher et al., Reference Churcher, Blagborough, Delves, Ramakrishnan, Kapulu, Williams, Biswas, Da, Cohuet and Sinden2012; Miura et al., Reference Miura, Swihart, Deng, Zhou, Pham, Diouf, Burton, Fay and Long2016). However, there has been debate about using TRA or TBA as the best readout to report for SMFA (Wu et al., Reference Wu, Sinden, Churcher, Tsuboi and Yusibov2015; Miura et al., Reference Miura, Swihart, Deng, Zhou, Pham, Diouf, Burton, Fay and Long2016; Swihart et al., Reference Swihart, Fay and Miura2018).

The result of mosquito infections from membrane feeding assays (MFAs) for any SMFA studies are commonly presented as infection prevalence and oocyst infection intensity; the former is the percentage (%) of mosquitoes infected, usually presented as a mean percentage of replicates, and the latter is the number of oocysts present on mosquito midguts (commonly known as parasite abundance), usually presented as a mean, a geometric mean, or median oocyst number, and occasionally presented for infected mosquitoes only (more properly parasite intensity). The standard membrane-feeding assay additionally includes variables such as the different batches of human serum used for the parasite culture and blood used for setting up the gametocyte cultures. All 3 assays include variables such as the size of each mosquito, the gametocyte number in the blood, and replicate effects, which could all influence the distribution of infection (Ponnudurai et al., Reference Ponnudurai, Lensen, Vangemert, Bensink, Bolmer and Meuwissen1989; Lyimo and Koella, Reference Lyimo and Koella1992).

Common methods for analysing P. falciparum mosquito infection data

A survey of papers from PubMed, filtered from the years 2009–2025 and presenting data on mosquito P. falciparum infection, showed that, out of 153 published papers reviewed, 78 (∼51%) either used one or a combination of the following tests to analyse their oocyst infection prevalence and intensity data: Chi-squared test, Student’s t-test/analysis of variance (ANOVA), Mann–Whitney test, Wilcoxon-signed rank, Kruskal–Wallis/Dunn test and Kolmogorov–Smirnov test (Table 1).

Table 1. Summary of statistical tests used for analysing P. falciparum mosquito infection between 2009 and 2025 in 153 published papers

Publications obtained from PubMed using the search terms ‘membrane feeding’ and ‘falciparum.’ TBA, transmission blocking activity; TRA, transmission-reducing activity.

Although improved mathematical methods/models for analysing mosquito infection data from an MFA have been published (Churcher et al., Reference Churcher, Blagborough, Delves, Ramakrishnan, Kapulu, Williams, Biswas, Da, Cohuet and Sinden2012; Miura et al., Reference Miura, Deng, Tullo, Diouf, Moretz, Locke, Morin, Fay and Long2013; Swihart et al., Reference Swihart, Fay and Miura2018), many researchers analysed mosquito oocyst infection prevalence and intensity with statistical tests that used mean and median estimates only to analyse mosquito infection data. The use of tests based solely on mean or median values are often inadequate to reveal differences in mosquito infection levels, which could lead to poor interpretation of the overall result. The advantages and limitations of these statistical tests are summarized in Table 2.

Table 2. Advantages and limitations of common metrics used for analysing P. falciparum mosquito infection prevalence and oocyst intensity

Challenges with these common methods

Infection prevalence

Simple mean prevalence estimates are affected by a small number of very high or very low (zero) values in treatment groups, and errors are higher if replicates show high variability (usually due to uncontrolled experimental variables). Prevalence values are commonly close to zero, especially in field-collected mosquitoes. In addition, the noted variation in prevalence in replicates due to uncontrolled (but measurable) experimental variables, such as gametocyte density or mosquito size, is not incorporated into simple mean prevalence estimates.

Infection intensity

Typically, within an experiment, there are usually a few mosquitoes with very high oocyst numbers and some with zero or very low numbers, even within relatively homogeneous lab-reared mosquitoes fed on the same infectious blood meal. Oocyst numbers within mosquitoes typically exhibit overdispersion (Sinden et al., Reference Sinden, Dawes, Alavi, Waldock, Finney, Mendoza, Butcher, Andrews, Hill, Gilbert and Basáñez2007; Vaughan, Reference Vaughan2007; Churcher et al., Reference Churcher, Blagborough, Delves, Ramakrishnan, Kapulu, Williams, Biswas, Da, Cohuet and Sinden2012) and are not normally distributed, i.e., not a symmetrically distributed dataset with a bell shape, and many values do not cluster in the central portion when plotted on the graph (reviewed in Bhandari, Reference Bhandari2023). Using mean values to estimate oocyst numbers will be highly influenced by extreme values and skewed distributions, and will usually overestimate the true value. Median values are often a better summary statistic, but for oocyst distributions, the median value will be zero if the prevalence is less than 50% (which it frequently is). Oocyst intensity data often more closely approximate a negative binomial (NB), often with zero-inflation (Miura et al., Reference Miura, Swihart, Deng, Zhou, Pham, Diouf, Fay and Long2019; Akorli et al., Reference Akorli, Ubiaru, Pradhan, Akorli and Ranford-Cartwright2022). This type of data cannot usually be transformed by standard methods e.g. logs, making a mean oocyst number an inappropriate measure.

How then should mosquito infection data with these types of distribution and with different experimental variables or treatments be analysed?

Use of generalized linear models

A generalized linear model (GLM) is a statistical method useful for analysing non-normal datasets. It models or analyses datasets with only fixed effects, which are explanatory variables (predictors or inputs/independent variables), such as different drug treatments or number of gametocytes in bloodmeal, with a constant effect across different individuals or groups. A GLM mainly consists of a linear predictor, a link function and an error distribution. A linear predictor is the part of the model where all the explanatory variables are combined to give or predict the result. A link function connects a linear predictor to what the model wants to predict, for example a log-link function is common for count data. The error distribution, also known as family, defines the distribution on which the model is based, for example Poisson or NB (McCullagh and Nelder, Reference McCullagh and Nelder1989). The model uses a linear formula to combine explanatory variables and predict outcomes using link functions and exponential family distributions, such as normal, Poisson or binomial distributions (Bolker et al., Reference Bolker, Brooks, Clark, Geange, Poulsen, Stevens and White2009). The type of family distribution to be chosen when using GLM depends on the nature of the dataset (e.g., binary, continuous or count data). This makes GLM flexible and valuable for different types of datasets.

Advantages of GLMs

A GLM analysis eliminates the challenge of data transformation with non-normal data and accommodates different types of datasets, as it allows response variables to have different distributions. A GLM approach is very flexible in analysing the relationship between variables using a link function that handles non-linear relationships, and predicts results that are simple to interpret.

A further extension of the GLM is the Generalised Linear Mixed model (GLMM), which can model datasets with random effects. Random effects are unmeasurable variables that differ among repeated measurements, such as differences between biological replicates due to unknown factors. GLMMs are better statistical tools than GLMs for analysing non-normal data involving random effects, or when random effects are the focus of the analyses. Defining model predictors in GLMM either as fixed or random effects is essential in modelling; the important criteria to consider before using GLMM and fitting random variables have been discussed previously (Harrison et al., Reference Harrison, Donaldson, Correa-Cano, Evans, Fisher, Goodwin, Robinson, Hodgson and Inger2018). The conditions of when to include a random effect in a model, and the challenges of using GLMM for non-experts, have also been discussed extensively (Bolker et al., Reference Bolker, Brooks, Clark, Geange, Poulsen, Stevens and White2009). One condition to fulfil before using random effects in a model is that at least 5 ‘levels or group’ (e.g., number of biological replicates or number of mosquito cages) must be achieved in the dataset to be able to estimate variance (Gelman and Hill, Reference Gelman and Hill2006; Kéry and Royle, Reference Kéry and Royle2015; Harrison et al., Reference Harrison, Donaldson, Correa-Cano, Evans, Fisher, Goodwin, Robinson, Hodgson and Inger2018; Gomes, Reference Gomes2022). As a result of this pitfall, non-experts could use GLMM inappropriately (Bolker et al., Reference Bolker, Brooks, Clark, Geange, Poulsen, Stevens and White2009), for example when there are fewer than five replicates of each experimental treatment.

A GLMM can be the best model for analysing mosquito infection data involving different parasite drug treatments, many biological replicates, multiple feeders/mosquito cages and mosquito populations, to account for random effects/or variability (Churcher et al., Reference Churcher, Blagborough, Delves, Ramakrishnan, Kapulu, Williams, Biswas, Da, Cohuet and Sinden2012). However, for most SMFA experiments, the number of biological replicates rarely exceeds 5, which could rule out using GLMM to analyse such a dataset where biological replicate is a potential random effect.

The following examples illustrate the use of a GLM approach to investigate the impact of transmission blocking drugs on the prevalence and infection of P. falciparum in mosquitoes in an SMFA.

Example 1: GLM analysis of infection prevalence

Data: The data used for illustration (Table 3) are adapted from an examination of the transmission-blocking properties of antimalarial drugs. Transmission stages (gametocytes) of P. falciparum were grown in vitro, exposed to different drugs during development, and then used in experimental infections of Anopheles mosquitoes in an SMFA. The prevalence and intensity of the resulting oocyst infection was determined by mosquito dissection 10 days after the infectious blood meal, and recording the presence and number of oocysts.

Table 3. Raw data of prevalence and gametocyte density for three biological replicates

Gametocytes were treated with five different drugs (A–E), and the control group was untreated, and were then used to infect Anopheles mosquitoes by SMFA. Three biological replicates were performed. Values are prevalence of infection in mosquitoes (n = 30 mosquitoes examined for infection for each drug). Gametocyte (GC) density was evaluated by examination of at least 10 000 red blood cells (rbc) in a thin Giemsa-stained smear.

Analysis: To illustrate the statistical methods recommended here, the prevalence of infection was compared by the commonly used one-tailed Student’s t-test, and by standard logistic regression GLM, and logistic regression GLM with Firth’s bias reduction (using the R package logistf (Heinze et al., Reference Heinze, Ploner, Jiricka and Steiner2023)) (Table 4). Both GLM models incorporated the variables of gametocyte density in the blood meal and replicate as fixed variables. The significance of the experimental variables was tested using a backward elimination method (Burnham and Anderson, Reference Burnham and Anderson1998). This begins with a maximal model containing all experimental variables, and is then simplified by stepwise elimination of non-significant factors until the final model, containing only significant effects, is reached.

Table 4. Predicted prevalences obtained from the two GLM models (with 95% confidence intervals), with the mean prevalence (and standard error) for comparison

P values represent the significance of the difference in prevalence observed between the control and each drug-treated group, as produced by the different analysis methods (Student’s t-test, GLM, GLM/logistf). P values <0.05 are highlighted in bold.

Comparisons of each drug compared to the untreated control are also shown in Table 4. Predictions of prevalence were obtained for the 2 GLM methods (see supplementary material for R codes and packages used) for comparison with the mean prevalence over all 3 biological replicates (Table 4 and Figure 1).

Figure 1. Graphical representation of the predicted prevalence with 95% confidence intervals obtained by the two GLM methods: (A) standard logistic regression by GLM (B) logistic regression with Firth’s bias reduction (using the r package logistf).

Results: The simple comparison of mean prevalence using Student’s t-test identified significant differences of infection prevalence in mosquitoes between the untreated control and gametocytes treated with drug D or drug E, but with no significant effect of drugs A, B or C (Table 4). The GLM approaches allowed the inclusion of additional variables such as gametocyte density, which can be seen to vary between treatment groups and between replicates (Table 3), as well as differences in overall prevalence in the controls between replicates, which includes non-measured sources of variation. The standard logistic regression by GLM revealed a significantly lower infection prevalence in gametocytes treated with drug C compared to controls (Table 4), but this standard GLM approach was unsatisfactory in the group treated with drug E, where transmission was completely blocked. The use of logistic regression with Firth’s bias reduction, using the R package logistf, overcame this issue, and allowed comparisons of prevalence in each group from the model.

The use of GLM/logistf thus improved the estimate of the impact of the drug on transmission compared to the Student’s t-test analysis, revealing a significant difference between the untreated control and three of the five drugs tested, whereas the t-test analysis found only 2 drugs to have significant transmission-blocking effects.

Example 2: GLM estimation of the oocyst intensity data

Data: The experiment described in Example 1 included counts of oocyst numbers for individual mosquitoes, determined by dissection 10 days after the infectious blood meal. The group treated with drug E was removed from the analysis because there was zero prevalence, and therefore no oocyst numbers for which a distribution can be modelled. A summary of the data is shown in Table 5 and Figure 2.

Figure 2. Raw data of oocyst numbers for three biological replicates. Each individual dot represents a single mosquito.

Table 5. Mean and median oocyst numbers for three biological replicates of the infections described in Example 1.

Gametocytes were treated with four different drugs (A–D), and the control group was untreated. Values are oocyst intensity in mosquitoes (n = 30 mosquitoes examined for oocyst intensity for each drug).

Analysis: The analysis was performed using the methods commonly used in the published literature: comparisons of median oocyst numbers using non-parametric tests (Kruskal–Wallis), and of mean oocyst intensity by Student’s t-test/ANOVA (despite its likely unsuitability for the data distributions) (Table 6). The results were then compared with those obtained by the GLM approach (Table 7). For the GLM models, different drugs (A–D), gametocyte density and biological replicate were used as fixed variables, and the distributions tested were Poisson, quasi-Poisson, NB, zero-inflated negative binomial (ZINB) and hurdle NB. The best-fit GLM for the data was assessed using the lowest Akaike Information Criterion and the best prediction of the zero counts in the data set (see supplementary material for the R codes used). Predictions of oocyst intensity for each drug were obtained for the GLM method (Table 7 and Figure 3). Comparisons of each drug treatment compared to the untreated control are shown in Table 7.

Figure 3. Graphical representation of the predicted intensity with 95% confidence intervals obtained by the GLM approach (here, a zero-inflated negative binomial model).

Table 6. Significance tests for analysis of oocyst numbers (infection intensity) by two methods commonly used in the published literature (Kruskal–Wallis test and ANOVA)

The values are P values for Kruskal–Wallis and ANOVA tests, the latter analysed on pooled data and then individually by replicate. P values <0.05 are highlighted in bold.

Table 7. Predicted infection intensity obtained from the best-fit GLM model (with 95% confidence intervals), and significance of the difference in intensity observed between the control and each drug-treated group

P values <0.05 are highlighted in bold.

Results: A non-parametric test (Kruskal–Wallis test) identifies significant differences in oocyst numbers between the untreated control and infections in which gametocytes were treated with drugs B, C or D, but this approach does not permit the inclusion of additional variables, including replicate. Although the data violate the assumptions for analysis by parametric tests such as ANOVA because of the non-normal distribution and high numbers of zero values in some groups, the results are shown for completeness (Table 6). The GLM approach (Table 7) allows modelling to fit several different distributions and the selection of the most appropriate for the given data set, as well as allowing inclusion of additional variables. For these data, the best model (selected on the basis on the log likelihood and the prediction of zero events; see supplementary material) was the ZINB model including fixed effects of treatment and replicate (gametocyte density was not a significant explanatory factor, P = 0.064). The application of GLM improved the estimate of the drug effect compared to the non-parametric method (Kruskal–Wallis), revealing a significant difference between the untreated control and all of the drugs tested.

Insights from the analyses

The GLM approach illustrated here examines infection prevalence and oocyst intensity in mosquitoes with real data taken from an examination of the transmission-blocking properties of different drugs. As the number of biological replicates was 3, replicate was not included as a random effect but as a fixed effect in the GLM. To account for additional experimental variables and determine their effect on the prevalence of mosquito infection and oocyst intensity, the infection prevalence was analysed by fitting it to a binomial distribution (probability of infection) using logistic regression, while oocyst intensity raw data were fitted by ZINB and the results were compared with those obtained by the commonly seen analysis methods.

For prevalence of infection, the simplest Student’s t-test result did not identify a significant impact of drug C on transmission, which was however detected using the GLM/logistf approach (Table 4). For impact on oocyst numbers (Table 6), the simplest analysis using a Kruskal–Wallis analysis identified a significant difference in oocyst intensity following treatment with drugs A and B, but this effect was not seen in all three replicates, leading to the conclusion that drug A did not affect oocyst numbers. The GLM approach allowed the inclusion of replicate, and revealed a significant decrease in oocyst numbers between the untreated control and all the drugs tested (A, B, C and D). The use of a GLM approach thus improved the estimate of the impact of the intervention, and identified all four drugs tested as effective in lowering transmission oocyst intensity compared to the simple Kruskal–Wallis analysis.

Statistical interactions between different explanatory variables and multiple comparisons, which were not performed for this example dataset for simplicity, could also provide more insight. We focused on the main effects to give an example of the advantages of GLM over common, simple tests.

Recommendations

A well-chosen GLM (or where appropriate, GLMM) approach can handle the problematic issues identified with non-normal distributions and overdispersion; additional variables such as gametocyte density and biological replicates can be included, and their impact on mosquito infections can be assessed.

Infection prevalence can be analysed using a logistic regression GLM, incorporating additional measurable variables (such as gametocyte density or mosquito size) where they have a statistically significant effect. However, mosquito infection prevalence can be close to zero, or zero in some groups, especially when examining effective transmission-blocking interventions or infections from MFA. This results in a quasi-complete separation of data if standard logistic regression (basic GLM in R) is used to fit the data, and the software may report an error or warning, or the standard errors for the model will be extremely large (as seen in Figure 1A for drug E). Quasi-complete separation of data occurs when a predictor gives a perfect prediction of a response variable to a certain degree or for most values of the predictors, but not all values, for example, when treatment with a drug gives zero or close to zero infection (Albert and Anderson, Reference Albert and Anderson1984; Advanced Research Computing, 2021). This problem can be addressed by analysing the data using Firth’s bias reduction method (Firth, Reference Firth1993) for the GLM, which can be accomplished with the R package ‘logistf’ (Heinze et al., Reference Heinze, Ploner, Jiricka and Steiner2023). Logistf fits a logistic regression model with Firth’s bias reduction method, a penalized maximum likelihood method which reduces the bias and improves convergence in the presence of data with an excess or high number of zeros (Heinze et al., Reference Heinze, Ploner, Jiricka and Steiner2023). This provides an ideal solution to the problem of separation in logistic regression (Heinze and Schemper, Reference Heinze and Schemper2002).

For analysis of infection intensity, GLMs can fit a number of different distributions that more closely match the data. The user defines the distribution to be used, and the fit of the data to the chosen distribution can be assessed and the best distribution that reflects the data selected.

Previous studies have suggested that a NB regression model is usually better at explaining the distribution of Plasmodium oocysts in the mosquito than a normal or Poisson distribution (Medley et al., Reference Medley, Sinden, Fleck, Billingsley, Tirawanchai and Rodriguez1993; Billingsley et al., Reference Billingsley, Medley, Charlwood and Sinden1994). A NB model is a statistical model of success/failure outcomes that fits over-dispersed count data when the variance exceeds the mean (Cummings and Hardin, Reference Cummings and Hardin2019). A ZINB, which accounts for excess zeros in a dataset in two ways, i.e. ‘true zeros’ and ‘excess zeros’ (Cummings and Hardin, Reference Cummings and Hardin2019), can often give an even better estimation than the NB model, as has been shown for both P. berghei and P. falciparum experimental infections (Churcher et al., Reference Churcher, Blagborough, Delves, Ramakrishnan, Kapulu, Williams, Biswas, Da, Cohuet and Sinden2012; Swihart et al., Reference Swihart, Fay and Miura2018).

Therefore, a better alternative to using mean or median estimates of oocyst intensity is the use of GLM or GLMM to fit the oocyst numbers (data) to alternative distributions to which they are better matched, e.g. NB or ZINB, which allows better estimates of infection intensity, as well as comparison between groups. A GLM approach also allows external variables (such as gametocyte density) to be included in the oocyst number estimates. We have applied this approach recently to model P. falciparum prevalence and oocyst intensity in mosquitoes under different gametocyte growth conditions (Pradhan et al., Reference Pradhan, Ubiaru and Ranford-Cartwright2024). Other authors have used a similar approach: a ZINB model that includes random effects to analyse mosquito oocyst intensity (Churcher et al., Reference Churcher, Blagborough, Delves, Ramakrishnan, Kapulu, Williams, Biswas, Da, Cohuet and Sinden2012; Miura et al., Reference Miura, Deng, Tullo, Diouf, Moretz, Locke, Morin, Fay and Long2013; McGuire et al., Reference McGuire, Miura, Wiethoff and Williamson2017; Swihart et al., Reference Swihart, Fay and Miura2018; Chan et al., Reference Chan, Wetzel, Reiling, Miura, Drew, Gilson, Anderson, Richards, Long, Suckow, Jenzelewski, Tsuboi, Boyle, Piontek and Beeson2019; Lee et al., Reference Lee, Wu, Hickey, Miura, Whitaker, Joshi, Volkin, Richter King and Plieskatt2019; Balam et al., Reference Balam, Miura, Ayadi, Konaté, Incandela, Agnolon, Guindo, Diakité, Olugbile, Nebie, Herrera, Long, Kajava, Diakité, Corradin, Herrera and Herrera2025; Naghizadeh et al., Reference Naghizadeh, Miura, Addo Ofori, Long, Sagara, Tiono, Plieskatt and Theisen2025).

Conclusions

The advantages of using GLM analyses, with appropriate distributions, to analyse the typically non-normal data obtained from human malaria parasite infections in Anopheles mosquitoes are presented here. In the examples given of analysis of both prevalence and intensity of infection, the GLM method identified more significant transmission blocking effects among the drugs tested than simpler analyses that are frequently used in publications.

By providing evidence of the added benefits of these methods, as well as the R codes used, we hope to persuade researchers to apply these analytical techniques. We recommend that researchers use logistic regression GLM, (or GLMM if appropriate) as an alternative to the use of mean/median estimates for the estimation and interpretation of mosquito infection prevalence with P. falciparum, and other parasitic infections with zero or close to zero infection prevalence. For infection intensity, the fitting of GLM (or GLMM) with different distributions, and then selecting the best model for the data, is a little more complex, but gives a more robust and informative analysis, avoiding the pitfalls and disadvantages of parametric (ANOVA) and simple non-parametric (Kruskal–Wallis) methods.

While the focus was on the use of these statistical approaches for the evaluation of mosquito infection data, such non-normal distributions are common in many parasitic diseases, where a minority of individuals harbour a majority of parasites, and the majority harbour low or zero numbers. These analytical methods can equally be utilized by researchers working on other parasitic control interventions, to make best use of their data.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S0031182025100541.

Acknowledgements

We wish to thank Prof. Dan Haydon and Dr. Paul Johnson for suggesting the logistf package to address the challenges of modelling mosquito infection data with zero prevalence, and TETFUND, Nigeria, for sponsoring the PhD research of PCU.

Author contributions

PCU and LRC conceived and designed the review, performed the statistical analyses and wrote the article.

Financial support

This work was supported by a PhD scholarship awarded to PCU by TETFUND, Nigeria.

Competing interests

The authors declare there are no conflicts of interest.

Ethical standards

Not applicable.

References

Akorli, EA, Ubiaru, PC, Pradhan, S, Akorli, J and Ranford-Cartwright, L (2022) Bio-products from Serratia marcescens isolated from Ghanaian Anopheles gambiae reduce Plasmodium falciparum burden in vector mosquitoes. Frontiers in Tropical Diseases 3, 979615.Google Scholar
Albert, A and Anderson, JA (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71, 110.Google Scholar
Balam, S, Miura, K, Ayadi, I, Konaté, D, Incandela, NC, Agnolon, V, Guindo, MA, Diakité, S, Olugbile, S, Nebie, I, Herrera, SM, Long, C, Kajava, AV, Diakité, M, Corradin, G, Herrera, S and Herrera, MA (2025) Cross-reactivity of rPvs48/45, a recombinant Plasmodium vivax protein, with plasma from Plasmodium falciparum endemic areas of Africa. PLoS One 20, e0302605.Google Scholar
Bhandari, P (2023) Normal Distribution: Examples, Formulas, & Uses. Scribbr, Retrieved September 10 , 2024, from https://www.scribbr.com/statistics/normal-distribution/Google Scholar
Billingsley, PF, Medley, GF, Charlwood, D and Sinden, RE (1994) Relationship Between Prevalence and Intensity of Plasmodium falciparum Infection in Natural Populations of Anopheles Mosquitoes. The American Journal of Tropical Medicine and Hygiene 51, 260270.Google Scholar
Bolker, BM, Brooks, ME, Clark, CJ, Geange, SW, Poulsen, JR, Stevens, MH and White, JS (2009) Generalized linear mixed models: A practical guide for ecology and evolution. Trends in Ecology and Evolution 24, 127135.Google Scholar
Bousema, T, Dinglasan, RR, Morlais, I, Gouagna, LC, Van Warmerdam, T, Awono-Ambene, PH, Bonnet, S, Diallo, M, Coulibaly, M, Tchuinkam, T, Mulder, B, Targett, G, Drakeley, C, Sutherland, C, Robert, V, Doumbo, O, Touré, Y, Graves, PM, Roeffen, W, Sauerwein, R, Birkett, A, Locke, E, Morin, M, Wu, Y and Churcher, TS (2012) Mosquito feeding assays to determine the infectiousness of naturally infected Plasmodium falciparum gametocyte carriers. PLoS One 7, e42821.Google Scholar
Burnham, KP and Anderson, DR (1998) Model Selection and Inference: A Practical Information-Theoretical Approach. New York: Springer-Verlag.Google Scholar
Chan, JA, Wetzel, D, Reiling, L, Miura, K, Drew, DR, Gilson, PR, Anderson, DA, Richards, JS, Long, CA, Suckow, M, Jenzelewski, V, Tsuboi, T, Boyle, MJ, Piontek, M and Beeson, JG (2019) Malaria vaccine candidates displayed on novel virus-like particles are immunogenic and induce transmission-blocking activity. PLoS One 14, e0221733.Google Scholar
Churcher, TS, Blagborough, AM, Delves, M, Ramakrishnan, C, Kapulu, MC, Williams, AR, Biswas, S, Da, DF, Cohuet, A and Sinden, RE (2012) Measuring the blockade of malaria transmission–an analysis of the Standard Membrane Feeding Assay. International Journal for Parasitology 42, 10371044.Google Scholar
Cummings, TH and Hardin, JW (2019) Modeling count data with marginalized zero-inflated distributions. The Stata Journal 19, 499509.Google Scholar
Firth, D (1993) Bias reduction of maximum likelihood estimates. Biometrika 80, 2738.Google Scholar
Gelman, A and Hill, J (2006) Data Analysis Using Regression and Multilevel/hierarchical Models. Cambridge: Cambridge University Press.Google Scholar
Gomes, DGE (2022) Should I use fixed effects or random effects when I have fewer than five levels of a grouping factor in a mixed-effects model? PeerJ 10, e12794.Google Scholar
Harrison, XA, Donaldson, L, Correa-Cano, ME, Evans, J, Fisher, DN, Goodwin, CED, Robinson, BS, Hodgson, DJ and Inger, R (2018) A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ 6, e4794.Google Scholar
Heinze, G, Ploner, M, Jiricka, L and Steiner, G (2023) logistf: Firth’s Bias-Reduced Logistic Regression https://CRAN.R-project.org/package=logistf. R package version 1.26.0.Google Scholar
Heinze, G and Schemper, M (2002) A solution to the problem of separation in logistic regression. Statistics in Medicine 21, 24092419.Google Scholar
Ippolito, MM, Moser, KA, Kabuya, JB, Cunningham, C and Juliano, JJ (2021) Antimalarial Drug Resistance and Implications for the WHO Global Technical Strategy. Current Epidemiology Reports 8, 4662.Google Scholar
Kefi, M, Cardoso-Jaime, V, Saab, SA and Dimopoulos, G (2024) Curing mosquitoes with genetic approaches for malaria control. Trends in Parasitology 40, 487499.Google Scholar
Kéry, M and Royle, JA (2015) Applied Hierarchical Modeling in Ecology: analysis of Distribution, Abundance and Species Richness in R and BUGS: prelude and Static Models. Waltham: Academic Press.Google Scholar
Lee, SM, Wu, Y, Hickey, JM, Miura, K, Whitaker, N, Joshi, SB, Volkin, DB, Richter King, C and Plieskatt, J (2019) The Pfs230 N-terminal fragment, Pfs230D1+: Expression and characterization of a potential malaria transmission-blocking vaccine candidate. Malaria Journal 18, 356.Google Scholar
Lyimo, EO and Koella, JC (1992) Relationship between body size of adult Anopheles gambiae s.l. and infection with the malaria parasite Plasmodium falciparum. Parasitology 104(Pt 2), 233237.Google Scholar
McCullagh, P and Nelder, JA (1989) Generalized Linear Models. London; New York: Chapman and Hall.Google Scholar
McGuire, KA, Miura, K, Wiethoff, CM and Williamson, KC (2017) New adenovirus-based vaccine vectors targeting Pfs25 elicit antibodies that inhibit Plasmodium falciparum transmission. Malaria Journal 16, 254.Google Scholar
Medley, GF, Sinden, RE, Fleck, S, Billingsley, PF, Tirawanchai, N and Rodriguez, MH (1993) Heterogeneity in patterns of malarial oocyst infections in the mosquito vector. Parasitology 106(Pt 5), 441449.Google Scholar
Miura, K, Deng, B, Tullo, G, Diouf, A, Moretz, SE, Locke, E, Morin, M, Fay, MP and Long, CA (2013) Qualification of standard membrane-feeding assay with Plasmodium falciparum malaria and potential improvements for future assays. PLoS One 8, e57909.Google Scholar
Miura, K, Swihart, BJ, Deng, B, Zhou, L, Pham, TP, Diouf, A, Burton, T, Fay, MP and Long, CA (2016) Transmission-blocking activity is determined by transmission-reducing activity and number of control oocysts in Plasmodium falciparum standard membrane-feeding assay. Vaccine 34, 41454151.Google Scholar
Miura, K, Swihart, BJ, Deng, B, Zhou, L, Pham, TP, Diouf, A, Fay, MP and Long, CA (2019) Strong concordance between percent inhibition in oocyst and sporozoite intensities in a Plasmodium falciparum standard membrane-feeding assay. Parasites and Vectors 12, 206.Google Scholar
Naghizadeh, M, Miura, K, Addo Ofori, E, Long, C, Sagara, I, Tiono, AB, Plieskatt, J and Theisen, M (2025) Magnitude and durability of ProC6C-AlOH/Matrix-M(tm) vaccine-induced malaria transmission-blocking antibodies in Burkinabe adults from a Phase 1 randomized trial. Human Vaccines and Immunotherapeutics 21, 2488075.Google Scholar
Ohm, JR, Baldini, F, Barreaux, P, Lefevre, T, Lynch, PA, Suh, E, Whitehead, SA and Thomas, MB (2018) Rethinking the extrinsic incubation period of malaria parasites. Parasites and Vectors 11, 178.Google Scholar
Paterson, S and Lello, J (2003) Mixed models: Getting the best use of parasitological data. Trends in Parasitology 19, 370375.Google Scholar
Ponnudurai, T, Lensen, AHW, Vangemert, GJA, Bensink, MPE, Bolmer, M and Meuwissen, JHET (1989) Infectivity of Cultured Plasmodium-Falciparum Gametocytes to Mosquitos. Parasitology 98, 165173.Google Scholar
Ponnudurai, T, Van Gemert, GJ, Bensink, T, Lensen, AHW and Meuwissen, JHET (1987) Transmission blockade of Plasmodium falciparum: Its variability with gametocyte numbers and concentration of antibody. Transactions of the Royal Society of Tropical Medicine and Hygiene 81, 491493.Google Scholar
Pradhan, S, Ubiaru, PC and Ranford-Cartwright, L (2024) Simple supplementation of serum-free medium produces gametocytes of Plasmodium falciparum that transmit to mosquitoes. Malaria Journal 23, 275.Google Scholar
Shaw, WR, Holmdahl, IE, Itoe, MA, Werling, K, Marquette, M, Paton, DG, Singh, N, Buckee, CO, Childs, LM and Catteruccia, F (2020) Multiple blood feeding in mosquitoes shortens the Plasmodium falciparum incubation period and increases malaria transmission potential. PLoS Pathogens 16, e1009131.Google Scholar
Sinden, RE, Dawes, EJ, Alavi, Y, Waldock, J, Finney, O, Mendoza, J, Butcher, GA, Andrews, L, Hill, AV, Gilbert, SC and Basáñez, MG (2007) Progression of Plasmodium berghei through Anopheles stephensi is density-dependent. PLoS Pathogens 3, e195.Google Scholar
Suh, PF, Elanga-Ndille, E, Tchouakui, M, Sandeu, MM, Tagne, D, Wondji, C and Ndo, C (2023) Impact of insecticide resistance on malaria vector competence: A literature review. Malaria Journal 22, 19.Google Scholar
Swihart, BJ, Fay, MP and Miura, K (2018) Statistical methods for standard membrane-feeding assays to measure transmission blocking or reducing activity in malaria. Journal of the American Statistical Association 113, 534545.Google Scholar
Vaughan, JA (2007) Population dynamics of Plasmodium sporogony. Trends in Parasitology 23, 6370.Google Scholar
Vaughan, JA, Noden, BH and Beier, JC (1992) Population dynamics of Plasmodium falciparum sporogony in laboratory-infected Anopheles gambiae. The Journal of Parasitology 78, 716724.Google Scholar
World Health Organisation (2024). World Malaria Report 2024. Available: https://www.who.int/teams/global-malaria-programme/reports/world-malaria-report-2024 (accessed 8 April 2025).Google Scholar
Wu, Y, Sinden, RE, Churcher, TS, Tsuboi, T and Yusibov, V (2015) Development of malaria transmission-blocking vaccines: From concept to product. Advances in Parasitology 89, 109152.Google Scholar
Yu, S, Wang, J, Luo, X, Zheng, H, Wang, L, Yang, X and Wang, Y (2022) Transmission-blocking strategies against malaria parasites during their mosquito stages. Frontiers in Cellular & Infection Microbiology 12, 820650.Google Scholar
Figure 0

Table 1. Summary of statistical tests used for analysing P. falciparum mosquito infection between 2009 and 2025 in 153 published papers

Figure 1

Table 2. Advantages and limitations of common metrics used for analysing P. falciparum mosquito infection prevalence and oocyst intensity

Figure 2

Table 3. Raw data of prevalence and gametocyte density for three biological replicates

Figure 3

Table 4. Predicted prevalences obtained from the two GLM models (with 95% confidence intervals), with the mean prevalence (and standard error) for comparison

Figure 4

Figure 1. Graphical representation of the predicted prevalence with 95% confidence intervals obtained by the two GLM methods: (A) standard logistic regression by GLM (B) logistic regression with Firth’s bias reduction (using the r package logistf).

Figure 5

Figure 2. Raw data of oocyst numbers for three biological replicates. Each individual dot represents a single mosquito.

Figure 6

Table 5. Mean and median oocyst numbers for three biological replicates of the infections described in Example 1.

Figure 7

Figure 3. Graphical representation of the predicted intensity with 95% confidence intervals obtained by the GLM approach (here, a zero-inflated negative binomial model).

Figure 8

Table 6. Significance tests for analysis of oocyst numbers (infection intensity) by two methods commonly used in the published literature (Kruskal–Wallis test and ANOVA)

Figure 9

Table 7. Predicted infection intensity obtained from the best-fit GLM model (with 95% confidence intervals), and significance of the difference in intensity observed between the control and each drug-treated group

Supplementary material: File

Ubiaru and Ranford-Cartwright supplementary material

Ubiaru and Ranford-Cartwright supplementary material
Download Ubiaru and Ranford-Cartwright supplementary material(File)
File 25.2 KB