Introduction
Malaria is one of the most severe and life-threatening tropical diseases, affecting nearly half of the world’s population; in 2023 there were 263 million cases globally, and 597,000 deaths (World Health Organisation, 2024). People are infected through the bite of an infected female Anopheles mosquito. The disease can be treated by chemotherapy, usually with an Artemisinin-based combination therapy for the most pathogenic species Plasmodium falciparum, and for relapsing species like Plasmodium vivax, also with primaquine (World Health Organisation, 2024). Public health control strategies include intermittent preventive therapy and vector control, employed to reduce the disease burden significantly in malaria-endemic areas. However, these control strategies are challenged by parasite drug resistance (reviewed in Ippolito et al., Reference Ippolito, Moser, Kabuya, Cunningham and Juliano2021) and mosquito insecticide resistance (reviewed in Suh et al., Reference Suh, Elanga-Ndille, Tchouakui, Sandeu, Tagne, Wondji and Ndo2023).
Currently, many malaria control measures that are being developed aim to reduce the proportion of mosquitoes in a population that are capable of transmitting malaria. Examples include (i) interventions that aim to increase the refractoriness of mosquitoes to malaria parasites by genetic modification, paratransgenesis or alterations in mosquito microbiota (reviewed in Kefi et al., Reference Kefi, Cardoso-Jaime, Saab and Dimopoulos2024), like the use of a gene drive system within the mosquito vector to reduce and/or prevent the parasite’s survival and transmission; (ii) transmission-blocking drugs and vaccines (reviewed in Yu et al., Reference Yu, Wang, Luo, Zheng, Wang, Yang and Wang2022); and (iii) interventions that aim to reduce the age distributions of vector mosquitoes below the extrinsic incubation period (EIP). The EIP is the time it takes for Plasmodium parasites to develop, replicate and invade a mosquito’s salivary glands after she takes an infected blood meal. For example, the EIP of P. falciparum is between 12 and 14 days (Vaughan et al., Reference Vaughan, Noden and Beier1992; Ohm et al., Reference Ohm, Baldini, Barreaux, Lefevre, Lynch, Suh, Whitehead and Thomas2018), so for P. falciparum transmission to occur in humans, mosquito vectors must live beyond the EIP (Shaw et al., Reference Shaw, Holmdahl, Itoe, Werling, Marquette, Paton, Singh, Buckee, Childs and Catteruccia2020).
To evaluate the success of these control measures, it is necessary to compare the level of mosquito infection in a population before and after the intervention, or in experimental work, between control and treated mosquitoes given infectious blood meals. These studies require proper statistical models or tools that incorporate all the different measured factors to estimate the results, which are mainly presented as infection prevalence and oocyst intensity. Plasmodium-mosquito infection data, in common with other biological data with multiple measurements or experimental variables, are inherently ‘noisy’ natural systems (reviewed in Paterson and Lello, Reference Paterson and Lello2003). These types of data are usually complex, and require well-chosen statistical tools to fit and interpret them. Analysing these data types requires knowledge of the correct statistical tool to use, which is not usually simple. This often results in choosing a more straightforward statistical method to analyse the data, which may not be wholly appropriate.
Approaches for measuring mosquito infection
Experimentally, the impact of a control measure such as a transmission-blocking drug or antibody on mosquito infection uses 1 of 3 different methods: a direct membrane feeding assay (DMFA), a direct skin feed (DSF), or a standard membrane feeding assay (SMFA). In DMFA, blood containing transmission stages (mature gametocytes) from a treated human host is fed to female Anopheles mosquitoes using a water-jacketed membrane feeder (Ponnudurai et al., Reference Ponnudurai, Van Gemert, Bensink, Lensen and Meuwissen1987), whereas in DSF, female Anopheles mosquitoes are directly placed on the skin of a gametocyte carrier to feed (Bousema et al., Reference Bousema, Dinglasan, Morlais, Gouagna, Van Warmerdam, Awono-Ambene, Bonnet, Diallo, Coulibaly, Tchuinkam, Mulder, Targett, Drakeley, Sutherland, Robert, Doumbo, Touré, Graves, Roeffen, Sauerwein, Birkett, Locke, Morin, Wu and Churcher2012). In SMFA, the gametocytes are grown in vitro and fed to Anopheles mosquitoes using water-jacketed membrane feeders. DMFA and DSF are usually used in the field to check for the infectiousness of natural gametocyte carriers treated with, for example, potential transmission-blocking drugs. In contrast, SMFAs are used mainly for two purposes: first, to check for the effect of immune factors or transmission-blocking compounds in preventing the transmission of in vitro cultured mature gametocytes; second, to study the ability of different clones or isolates of P. falciparum to transmit to different Anopheles mosquitoes.
The resulting infection for SMFA (for the effect of immune factors/compounds), DMFA and DSF is characterized as transmission-reducing activity (TRA) and transmission-blocking activity (TBA) (Medley et al., Reference Medley, Sinden, Fleck, Billingsley, Tirawanchai and Rodriguez1993; Churcher et al., Reference Churcher, Blagborough, Delves, Ramakrishnan, Kapulu, Williams, Biswas, Da, Cohuet and Sinden2012). TRA is the percentage inhibition in parasite oocyst intensity of infected mosquitoes, whereas TBA is the percentage inhibition in oocyst prevalence (Churcher et al., Reference Churcher, Blagborough, Delves, Ramakrishnan, Kapulu, Williams, Biswas, Da, Cohuet and Sinden2012; Miura et al., Reference Miura, Swihart, Deng, Zhou, Pham, Diouf, Burton, Fay and Long2016). However, there has been debate about using TRA or TBA as the best readout to report for SMFA (Wu et al., Reference Wu, Sinden, Churcher, Tsuboi and Yusibov2015; Miura et al., Reference Miura, Swihart, Deng, Zhou, Pham, Diouf, Burton, Fay and Long2016; Swihart et al., Reference Swihart, Fay and Miura2018).
The result of mosquito infections from membrane feeding assays (MFAs) for any SMFA studies are commonly presented as infection prevalence and oocyst infection intensity; the former is the percentage (%) of mosquitoes infected, usually presented as a mean percentage of replicates, and the latter is the number of oocysts present on mosquito midguts (commonly known as parasite abundance), usually presented as a mean, a geometric mean, or median oocyst number, and occasionally presented for infected mosquitoes only (more properly parasite intensity). The standard membrane-feeding assay additionally includes variables such as the different batches of human serum used for the parasite culture and blood used for setting up the gametocyte cultures. All 3 assays include variables such as the size of each mosquito, the gametocyte number in the blood, and replicate effects, which could all influence the distribution of infection (Ponnudurai et al., Reference Ponnudurai, Lensen, Vangemert, Bensink, Bolmer and Meuwissen1989; Lyimo and Koella, Reference Lyimo and Koella1992).
Common methods for analysing P. falciparum mosquito infection data
A survey of papers from PubMed, filtered from the years 2009–2025 and presenting data on mosquito P. falciparum infection, showed that, out of 153 published papers reviewed, 78 (∼51%) either used one or a combination of the following tests to analyse their oocyst infection prevalence and intensity data: Chi-squared test, Student’s t-test/analysis of variance (ANOVA), Mann–Whitney test, Wilcoxon-signed rank, Kruskal–Wallis/Dunn test and Kolmogorov–Smirnov test (Table 1).
Table 1. Summary of statistical tests used for analysing P. falciparum mosquito infection between 2009 and 2025 in 153 published papers

Publications obtained from PubMed using the search terms ‘membrane feeding’ and ‘falciparum.’ TBA, transmission blocking activity; TRA, transmission-reducing activity.
Although improved mathematical methods/models for analysing mosquito infection data from an MFA have been published (Churcher et al., Reference Churcher, Blagborough, Delves, Ramakrishnan, Kapulu, Williams, Biswas, Da, Cohuet and Sinden2012; Miura et al., Reference Miura, Deng, Tullo, Diouf, Moretz, Locke, Morin, Fay and Long2013; Swihart et al., Reference Swihart, Fay and Miura2018), many researchers analysed mosquito oocyst infection prevalence and intensity with statistical tests that used mean and median estimates only to analyse mosquito infection data. The use of tests based solely on mean or median values are often inadequate to reveal differences in mosquito infection levels, which could lead to poor interpretation of the overall result. The advantages and limitations of these statistical tests are summarized in Table 2.
Table 2. Advantages and limitations of common metrics used for analysing P. falciparum mosquito infection prevalence and oocyst intensity

Challenges with these common methods
Infection prevalence
Simple mean prevalence estimates are affected by a small number of very high or very low (zero) values in treatment groups, and errors are higher if replicates show high variability (usually due to uncontrolled experimental variables). Prevalence values are commonly close to zero, especially in field-collected mosquitoes. In addition, the noted variation in prevalence in replicates due to uncontrolled (but measurable) experimental variables, such as gametocyte density or mosquito size, is not incorporated into simple mean prevalence estimates.
Infection intensity
Typically, within an experiment, there are usually a few mosquitoes with very high oocyst numbers and some with zero or very low numbers, even within relatively homogeneous lab-reared mosquitoes fed on the same infectious blood meal. Oocyst numbers within mosquitoes typically exhibit overdispersion (Sinden et al., Reference Sinden, Dawes, Alavi, Waldock, Finney, Mendoza, Butcher, Andrews, Hill, Gilbert and Basáñez2007; Vaughan, Reference Vaughan2007; Churcher et al., Reference Churcher, Blagborough, Delves, Ramakrishnan, Kapulu, Williams, Biswas, Da, Cohuet and Sinden2012) and are not normally distributed, i.e., not a symmetrically distributed dataset with a bell shape, and many values do not cluster in the central portion when plotted on the graph (reviewed in Bhandari, Reference Bhandari2023). Using mean values to estimate oocyst numbers will be highly influenced by extreme values and skewed distributions, and will usually overestimate the true value. Median values are often a better summary statistic, but for oocyst distributions, the median value will be zero if the prevalence is less than 50% (which it frequently is). Oocyst intensity data often more closely approximate a negative binomial (NB), often with zero-inflation (Miura et al., Reference Miura, Swihart, Deng, Zhou, Pham, Diouf, Fay and Long2019; Akorli et al., Reference Akorli, Ubiaru, Pradhan, Akorli and Ranford-Cartwright2022). This type of data cannot usually be transformed by standard methods e.g. logs, making a mean oocyst number an inappropriate measure.
How then should mosquito infection data with these types of distribution and with different experimental variables or treatments be analysed?
Use of generalized linear models
A generalized linear model (GLM) is a statistical method useful for analysing non-normal datasets. It models or analyses datasets with only fixed effects, which are explanatory variables (predictors or inputs/independent variables), such as different drug treatments or number of gametocytes in bloodmeal, with a constant effect across different individuals or groups. A GLM mainly consists of a linear predictor, a link function and an error distribution. A linear predictor is the part of the model where all the explanatory variables are combined to give or predict the result. A link function connects a linear predictor to what the model wants to predict, for example a log-link function is common for count data. The error distribution, also known as family, defines the distribution on which the model is based, for example Poisson or NB (McCullagh and Nelder, Reference McCullagh and Nelder1989). The model uses a linear formula to combine explanatory variables and predict outcomes using link functions and exponential family distributions, such as normal, Poisson or binomial distributions (Bolker et al., Reference Bolker, Brooks, Clark, Geange, Poulsen, Stevens and White2009). The type of family distribution to be chosen when using GLM depends on the nature of the dataset (e.g., binary, continuous or count data). This makes GLM flexible and valuable for different types of datasets.
Advantages of GLMs
A GLM analysis eliminates the challenge of data transformation with non-normal data and accommodates different types of datasets, as it allows response variables to have different distributions. A GLM approach is very flexible in analysing the relationship between variables using a link function that handles non-linear relationships, and predicts results that are simple to interpret.
A further extension of the GLM is the Generalised Linear Mixed model (GLMM), which can model datasets with random effects. Random effects are unmeasurable variables that differ among repeated measurements, such as differences between biological replicates due to unknown factors. GLMMs are better statistical tools than GLMs for analysing non-normal data involving random effects, or when random effects are the focus of the analyses. Defining model predictors in GLMM either as fixed or random effects is essential in modelling; the important criteria to consider before using GLMM and fitting random variables have been discussed previously (Harrison et al., Reference Harrison, Donaldson, Correa-Cano, Evans, Fisher, Goodwin, Robinson, Hodgson and Inger2018). The conditions of when to include a random effect in a model, and the challenges of using GLMM for non-experts, have also been discussed extensively (Bolker et al., Reference Bolker, Brooks, Clark, Geange, Poulsen, Stevens and White2009). One condition to fulfil before using random effects in a model is that at least 5 ‘levels or group’ (e.g., number of biological replicates or number of mosquito cages) must be achieved in the dataset to be able to estimate variance (Gelman and Hill, Reference Gelman and Hill2006; Kéry and Royle, Reference Kéry and Royle2015; Harrison et al., Reference Harrison, Donaldson, Correa-Cano, Evans, Fisher, Goodwin, Robinson, Hodgson and Inger2018; Gomes, Reference Gomes2022). As a result of this pitfall, non-experts could use GLMM inappropriately (Bolker et al., Reference Bolker, Brooks, Clark, Geange, Poulsen, Stevens and White2009), for example when there are fewer than five replicates of each experimental treatment.
A GLMM can be the best model for analysing mosquito infection data involving different parasite drug treatments, many biological replicates, multiple feeders/mosquito cages and mosquito populations, to account for random effects/or variability (Churcher et al., Reference Churcher, Blagborough, Delves, Ramakrishnan, Kapulu, Williams, Biswas, Da, Cohuet and Sinden2012). However, for most SMFA experiments, the number of biological replicates rarely exceeds 5, which could rule out using GLMM to analyse such a dataset where biological replicate is a potential random effect.
The following examples illustrate the use of a GLM approach to investigate the impact of transmission blocking drugs on the prevalence and infection of P. falciparum in mosquitoes in an SMFA.
Example 1: GLM analysis of infection prevalence
Data: The data used for illustration (Table 3) are adapted from an examination of the transmission-blocking properties of antimalarial drugs. Transmission stages (gametocytes) of P. falciparum were grown in vitro, exposed to different drugs during development, and then used in experimental infections of Anopheles mosquitoes in an SMFA. The prevalence and intensity of the resulting oocyst infection was determined by mosquito dissection 10 days after the infectious blood meal, and recording the presence and number of oocysts.
Table 3. Raw data of prevalence and gametocyte density for three biological replicates

Gametocytes were treated with five different drugs (A–E), and the control group was untreated, and were then used to infect Anopheles mosquitoes by SMFA. Three biological replicates were performed. Values are prevalence of infection in mosquitoes (n = 30 mosquitoes examined for infection for each drug). Gametocyte (GC) density was evaluated by examination of at least 10 000 red blood cells (rbc) in a thin Giemsa-stained smear.
Analysis: To illustrate the statistical methods recommended here, the prevalence of infection was compared by the commonly used one-tailed Student’s t-test, and by standard logistic regression GLM, and logistic regression GLM with Firth’s bias reduction (using the R package logistf (Heinze et al., Reference Heinze, Ploner, Jiricka and Steiner2023)) (Table 4). Both GLM models incorporated the variables of gametocyte density in the blood meal and replicate as fixed variables. The significance of the experimental variables was tested using a backward elimination method (Burnham and Anderson, Reference Burnham and Anderson1998). This begins with a maximal model containing all experimental variables, and is then simplified by stepwise elimination of non-significant factors until the final model, containing only significant effects, is reached.
Table 4. Predicted prevalences obtained from the two GLM models (with 95% confidence intervals), with the mean prevalence (and standard error) for comparison

P values represent the significance of the difference in prevalence observed between the control and each drug-treated group, as produced by the different analysis methods (Student’s t-test, GLM, GLM/logistf). P values <0.05 are highlighted in bold.
Comparisons of each drug compared to the untreated control are also shown in Table 4. Predictions of prevalence were obtained for the 2 GLM methods (see supplementary material for R codes and packages used) for comparison with the mean prevalence over all 3 biological replicates (Table 4 and Figure 1).

Figure 1. Graphical representation of the predicted prevalence with 95% confidence intervals obtained by the two GLM methods: (A) standard logistic regression by GLM (B) logistic regression with Firth’s bias reduction (using the r package logistf).
Results: The simple comparison of mean prevalence using Student’s t-test identified significant differences of infection prevalence in mosquitoes between the untreated control and gametocytes treated with drug D or drug E, but with no significant effect of drugs A, B or C (Table 4). The GLM approaches allowed the inclusion of additional variables such as gametocyte density, which can be seen to vary between treatment groups and between replicates (Table 3), as well as differences in overall prevalence in the controls between replicates, which includes non-measured sources of variation. The standard logistic regression by GLM revealed a significantly lower infection prevalence in gametocytes treated with drug C compared to controls (Table 4), but this standard GLM approach was unsatisfactory in the group treated with drug E, where transmission was completely blocked. The use of logistic regression with Firth’s bias reduction, using the R package logistf, overcame this issue, and allowed comparisons of prevalence in each group from the model.
The use of GLM/logistf thus improved the estimate of the impact of the drug on transmission compared to the Student’s t-test analysis, revealing a significant difference between the untreated control and three of the five drugs tested, whereas the t-test analysis found only 2 drugs to have significant transmission-blocking effects.
Example 2: GLM estimation of the oocyst intensity data
Data: The experiment described in Example 1 included counts of oocyst numbers for individual mosquitoes, determined by dissection 10 days after the infectious blood meal. The group treated with drug E was removed from the analysis because there was zero prevalence, and therefore no oocyst numbers for which a distribution can be modelled. A summary of the data is shown in Table 5 and Figure 2.

Figure 2. Raw data of oocyst numbers for three biological replicates. Each individual dot represents a single mosquito.
Table 5. Mean and median oocyst numbers for three biological replicates of the infections described in Example 1.

Gametocytes were treated with four different drugs (A–D), and the control group was untreated. Values are oocyst intensity in mosquitoes (n = 30 mosquitoes examined for oocyst intensity for each drug).
Analysis: The analysis was performed using the methods commonly used in the published literature: comparisons of median oocyst numbers using non-parametric tests (Kruskal–Wallis), and of mean oocyst intensity by Student’s t-test/ANOVA (despite its likely unsuitability for the data distributions) (Table 6). The results were then compared with those obtained by the GLM approach (Table 7). For the GLM models, different drugs (A–D), gametocyte density and biological replicate were used as fixed variables, and the distributions tested were Poisson, quasi-Poisson, NB, zero-inflated negative binomial (ZINB) and hurdle NB. The best-fit GLM for the data was assessed using the lowest Akaike Information Criterion and the best prediction of the zero counts in the data set (see supplementary material for the R codes used). Predictions of oocyst intensity for each drug were obtained for the GLM method (Table 7 and Figure 3). Comparisons of each drug treatment compared to the untreated control are shown in Table 7.

Figure 3. Graphical representation of the predicted intensity with 95% confidence intervals obtained by the GLM approach (here, a zero-inflated negative binomial model).
Table 6. Significance tests for analysis of oocyst numbers (infection intensity) by two methods commonly used in the published literature (Kruskal–Wallis test and ANOVA)

The values are P values for Kruskal–Wallis and ANOVA tests, the latter analysed on pooled data and then individually by replicate. P values <0.05 are highlighted in bold.
Table 7. Predicted infection intensity obtained from the best-fit GLM model (with 95% confidence intervals), and significance of the difference in intensity observed between the control and each drug-treated group

P values <0.05 are highlighted in bold.
Results: A non-parametric test (Kruskal–Wallis test) identifies significant differences in oocyst numbers between the untreated control and infections in which gametocytes were treated with drugs B, C or D, but this approach does not permit the inclusion of additional variables, including replicate. Although the data violate the assumptions for analysis by parametric tests such as ANOVA because of the non-normal distribution and high numbers of zero values in some groups, the results are shown for completeness (Table 6). The GLM approach (Table 7) allows modelling to fit several different distributions and the selection of the most appropriate for the given data set, as well as allowing inclusion of additional variables. For these data, the best model (selected on the basis on the log likelihood and the prediction of zero events; see supplementary material) was the ZINB model including fixed effects of treatment and replicate (gametocyte density was not a significant explanatory factor, P = 0.064). The application of GLM improved the estimate of the drug effect compared to the non-parametric method (Kruskal–Wallis), revealing a significant difference between the untreated control and all of the drugs tested.
Insights from the analyses
The GLM approach illustrated here examines infection prevalence and oocyst intensity in mosquitoes with real data taken from an examination of the transmission-blocking properties of different drugs. As the number of biological replicates was 3, replicate was not included as a random effect but as a fixed effect in the GLM. To account for additional experimental variables and determine their effect on the prevalence of mosquito infection and oocyst intensity, the infection prevalence was analysed by fitting it to a binomial distribution (probability of infection) using logistic regression, while oocyst intensity raw data were fitted by ZINB and the results were compared with those obtained by the commonly seen analysis methods.
For prevalence of infection, the simplest Student’s t-test result did not identify a significant impact of drug C on transmission, which was however detected using the GLM/logistf approach (Table 4). For impact on oocyst numbers (Table 6), the simplest analysis using a Kruskal–Wallis analysis identified a significant difference in oocyst intensity following treatment with drugs A and B, but this effect was not seen in all three replicates, leading to the conclusion that drug A did not affect oocyst numbers. The GLM approach allowed the inclusion of replicate, and revealed a significant decrease in oocyst numbers between the untreated control and all the drugs tested (A, B, C and D). The use of a GLM approach thus improved the estimate of the impact of the intervention, and identified all four drugs tested as effective in lowering transmission oocyst intensity compared to the simple Kruskal–Wallis analysis.
Statistical interactions between different explanatory variables and multiple comparisons, which were not performed for this example dataset for simplicity, could also provide more insight. We focused on the main effects to give an example of the advantages of GLM over common, simple tests.
Recommendations
A well-chosen GLM (or where appropriate, GLMM) approach can handle the problematic issues identified with non-normal distributions and overdispersion; additional variables such as gametocyte density and biological replicates can be included, and their impact on mosquito infections can be assessed.
Infection prevalence can be analysed using a logistic regression GLM, incorporating additional measurable variables (such as gametocyte density or mosquito size) where they have a statistically significant effect. However, mosquito infection prevalence can be close to zero, or zero in some groups, especially when examining effective transmission-blocking interventions or infections from MFA. This results in a quasi-complete separation of data if standard logistic regression (basic GLM in R) is used to fit the data, and the software may report an error or warning, or the standard errors for the model will be extremely large (as seen in Figure 1A for drug E). Quasi-complete separation of data occurs when a predictor gives a perfect prediction of a response variable to a certain degree or for most values of the predictors, but not all values, for example, when treatment with a drug gives zero or close to zero infection (Albert and Anderson, Reference Albert and Anderson1984; Advanced Research Computing, 2021). This problem can be addressed by analysing the data using Firth’s bias reduction method (Firth, Reference Firth1993) for the GLM, which can be accomplished with the R package ‘logistf’ (Heinze et al., Reference Heinze, Ploner, Jiricka and Steiner2023). Logistf fits a logistic regression model with Firth’s bias reduction method, a penalized maximum likelihood method which reduces the bias and improves convergence in the presence of data with an excess or high number of zeros (Heinze et al., Reference Heinze, Ploner, Jiricka and Steiner2023). This provides an ideal solution to the problem of separation in logistic regression (Heinze and Schemper, Reference Heinze and Schemper2002).
For analysis of infection intensity, GLMs can fit a number of different distributions that more closely match the data. The user defines the distribution to be used, and the fit of the data to the chosen distribution can be assessed and the best distribution that reflects the data selected.
Previous studies have suggested that a NB regression model is usually better at explaining the distribution of Plasmodium oocysts in the mosquito than a normal or Poisson distribution (Medley et al., Reference Medley, Sinden, Fleck, Billingsley, Tirawanchai and Rodriguez1993; Billingsley et al., Reference Billingsley, Medley, Charlwood and Sinden1994). A NB model is a statistical model of success/failure outcomes that fits over-dispersed count data when the variance exceeds the mean (Cummings and Hardin, Reference Cummings and Hardin2019). A ZINB, which accounts for excess zeros in a dataset in two ways, i.e. ‘true zeros’ and ‘excess zeros’ (Cummings and Hardin, Reference Cummings and Hardin2019), can often give an even better estimation than the NB model, as has been shown for both P. berghei and P. falciparum experimental infections (Churcher et al., Reference Churcher, Blagborough, Delves, Ramakrishnan, Kapulu, Williams, Biswas, Da, Cohuet and Sinden2012; Swihart et al., Reference Swihart, Fay and Miura2018).
Therefore, a better alternative to using mean or median estimates of oocyst intensity is the use of GLM or GLMM to fit the oocyst numbers (data) to alternative distributions to which they are better matched, e.g. NB or ZINB, which allows better estimates of infection intensity, as well as comparison between groups. A GLM approach also allows external variables (such as gametocyte density) to be included in the oocyst number estimates. We have applied this approach recently to model P. falciparum prevalence and oocyst intensity in mosquitoes under different gametocyte growth conditions (Pradhan et al., Reference Pradhan, Ubiaru and Ranford-Cartwright2024). Other authors have used a similar approach: a ZINB model that includes random effects to analyse mosquito oocyst intensity (Churcher et al., Reference Churcher, Blagborough, Delves, Ramakrishnan, Kapulu, Williams, Biswas, Da, Cohuet and Sinden2012; Miura et al., Reference Miura, Deng, Tullo, Diouf, Moretz, Locke, Morin, Fay and Long2013; McGuire et al., Reference McGuire, Miura, Wiethoff and Williamson2017; Swihart et al., Reference Swihart, Fay and Miura2018; Chan et al., Reference Chan, Wetzel, Reiling, Miura, Drew, Gilson, Anderson, Richards, Long, Suckow, Jenzelewski, Tsuboi, Boyle, Piontek and Beeson2019; Lee et al., Reference Lee, Wu, Hickey, Miura, Whitaker, Joshi, Volkin, Richter King and Plieskatt2019; Balam et al., Reference Balam, Miura, Ayadi, Konaté, Incandela, Agnolon, Guindo, Diakité, Olugbile, Nebie, Herrera, Long, Kajava, Diakité, Corradin, Herrera and Herrera2025; Naghizadeh et al., Reference Naghizadeh, Miura, Addo Ofori, Long, Sagara, Tiono, Plieskatt and Theisen2025).
Conclusions
The advantages of using GLM analyses, with appropriate distributions, to analyse the typically non-normal data obtained from human malaria parasite infections in Anopheles mosquitoes are presented here. In the examples given of analysis of both prevalence and intensity of infection, the GLM method identified more significant transmission blocking effects among the drugs tested than simpler analyses that are frequently used in publications.
By providing evidence of the added benefits of these methods, as well as the R codes used, we hope to persuade researchers to apply these analytical techniques. We recommend that researchers use logistic regression GLM, (or GLMM if appropriate) as an alternative to the use of mean/median estimates for the estimation and interpretation of mosquito infection prevalence with P. falciparum, and other parasitic infections with zero or close to zero infection prevalence. For infection intensity, the fitting of GLM (or GLMM) with different distributions, and then selecting the best model for the data, is a little more complex, but gives a more robust and informative analysis, avoiding the pitfalls and disadvantages of parametric (ANOVA) and simple non-parametric (Kruskal–Wallis) methods.
While the focus was on the use of these statistical approaches for the evaluation of mosquito infection data, such non-normal distributions are common in many parasitic diseases, where a minority of individuals harbour a majority of parasites, and the majority harbour low or zero numbers. These analytical methods can equally be utilized by researchers working on other parasitic control interventions, to make best use of their data.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0031182025100541.
Acknowledgements
We wish to thank Prof. Dan Haydon and Dr. Paul Johnson for suggesting the logistf package to address the challenges of modelling mosquito infection data with zero prevalence, and TETFUND, Nigeria, for sponsoring the PhD research of PCU.
Author contributions
PCU and LRC conceived and designed the review, performed the statistical analyses and wrote the article.
Financial support
This work was supported by a PhD scholarship awarded to PCU by TETFUND, Nigeria.
Competing interests
The authors declare there are no conflicts of interest.
Ethical standards
Not applicable.