Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-01-27T00:46:17.417Z Has data issue: false hasContentIssue false

Systematic reviews and meta-analysis in nutrition research

Published online by Cambridge University Press:  03 September 2019

George A. Kelley*
Affiliation:
Meta-Analytic Research Group, School of Public Health, Department of Biostatistics, Robert C. Byrd Health Sciences Center, West Virginia University, P.O. Box 9190, Morgantown, WV 26506-9190, USA
Kristi S. Kelley
Affiliation:
Meta-Analytic Research Group, School of Public Health, Department of Biostatistics, Robert C. Byrd Health Sciences Center, West Virginia University, P.O. Box 9190, Morgantown, WV 26506-9190, USA
*
*Corresponding author: George A. Kelley, email gkelley@hsc.wvu.edu
Rights & Permissions [Opens in a new window]

Abstract

There exists an ever-increasing number of systematic reviews, with or without meta-analysis, in the field of nutrition. Concomitant with this increase is the increased use of such to guide future research as well as both practice and policy-based decisions. Given this increased production and consumption, a need exists to educate both producers and consumers of systematic reviews, with or without meta-analysis, on how to conduct and evaluate high-quality reviews of this nature in nutrition. The purpose of this paper is to try and address this gap. In the present manuscript, the different types of systematic reviews, with or without meta-analyses, are described as well as the description of the major elements, including methodology and interpretation, with a focus on nutrition. It is hoped that this non-technical information will be helpful to producers, reviewers and consumers of systematic reviews, with or without meta-analysis, in the field of nutrition.

Type
Full Papers
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s) 2019

Systematic reviews with meta-analyses have the potential to play an important role in quantitatively synthesising evidence when numerous studies on a similar topic exist, especially when disagreement persists among those studies. The potential strengths of meta-analysis include (1) increased statistical power for primary outcomes, (2) ability to reach agreement when original studies yield conflicting findings, (3) improving effect size estimates and (4) answering questions not addressed in original trials(Reference Sacks, Berrier and Reitman1). In addition, meta-analyses provide the opportunity to generate hypotheses that can be tested in subsequent original trials. Furthermore, systematic reviews, with or without meta-analysis, often play a major role in guideline development(Reference Zhang, Akl and Schunemann2). In a recent special issue devoted entirely to P values in the American Statistician, Wasserstein et al. suggested that since one study is usually not definitive, meta-analysis is critical to determining the uncertainty in the evidence(Reference Wasserstein, Schirm and Lazar3). Recognising their potential value, the number of systematic reviews, with or without meta-analysis, has increased dramatically over approximately the last 40 years. For example, a simple PubMed search conducted by the authors on 10 May 2019, using the search phrase “systematic review” OR meta-analy* yielded four citations in 1978 v. 31 295 in 2018, the most recent complete year for which data were available. The number of systematic reviews with meta-analyses in the area of nutrition has also increased dramatically over the same time period. A simple PubMed search conducted by the authors on 10 May 2019, using the search phrase (“systematic review” OR meta-analy*) AND (food OR beverages OR diet OR nutrition) yielded one citation in 1978 v. 2743 in 2018, the most recent complete year in which data were available.

Types of systematic reviews

Table 1 lists the different types of systematic reviews with a description provided hereafter.

Table 1. Types of systematic reviews

AD, aggregate data; IPD, individual participant/patient data.

Scoping reviews

While no one universal definition exists, a scoping review may be best defined as a type of research synthesis that aims to ‘map the literature on a particular topic or research area and provide an opportunity to identify key concepts; gaps in the research; and types and sources of evidence to inform practice, policymaking, and research’(Reference Daudt, van Mossel and Scott4). Thus, scoping reviews can be beneficial from both a research and practice perspective. To illustrate its use in the field of nutrition, Amouzandeh et al. recently conducted a scoping review of the validity, reliability and conceptual alignment of food literacy measures for adults(Reference Amouzandeh, Fingland and Vidgen5). The authors concluded that most tools provided a theoretical framework, which is valid and reliable(Reference Amouzandeh, Fingland and Vidgen5). In addition, they believed that their results will assist practitioners in selecting and developing tools for the measurement of food literacy(Reference Amouzandeh, Fingland and Vidgen5). Congruent with other types of reviews, the number of scoping reviews in the field of nutrition is increasing. As an example, a PubMed search conducted on 11 May 2019, using the search phrase (“scoping review” OR “systematic scoping review” OR “scoping report” OR “scope of the evidence” OR “rapid scoping review” OR “structured literature review” OR “scoping project” OR “scoping meta review”) AND (food OR beverages OR diet OR nutrition) demonstrated that the number of citations has increased from one in 1981 to 161 in 2018, the most recent complete year for which data were available. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) provides an excellent guide, including a checklist, for conducting and reporting a scoping review(Reference Tricco, Lillie and Zarin7). Checklists such as the PRISMA series provide very helpful information to producers, reviewers and consumers (clinicians, guideline developers, etc.) for ensuring that high-quality reviews are conducted. Therefore, the authors advocate that journals require the appropriate checklist when authors submit their manuscript for publication consideration.

Systematic reviews of previous systematic reviews

Given the proliferation of systematic reviews, with or without meta-analysis, on the same topic, there is now a need to assess these previous reviews. As an example of a systematic review of previous systematic reviews (SRPSR) in nutrition, Agostoni et al. recently conducted a SRPSR on the long-term effects of dietary nutrient intake during the first 2 years of life in healthy infants from developed countries(Reference Agostoni, Guz-Mark and Marderfeld8). The overall conclusion of the authors was that a large degree of uncertainty currently exists on the health effects of differences in early nutrition among healthy full-term infants(Reference Agostoni, Guz-Mark and Marderfeld8).

There are at least two important reasons for conducting a SRPSR. First, for those desiring to conduct their own systematic review, with or without meta-analysis, such a review can help justify the conduct of a new or updated review. If an updated or new review is deemed warranted, then this information should be included in the introduction section of the new or updated review. Ideally, this should include reference to a previously published SRPSR. If after searching the literature the authors believe that no previous reviews exist, then this should be stated. The inclusion of this information may be especially important given the recent criticism regarding the publication of redundant reviews on the same topic(Reference Ioannidis9). Fig. 1 depicts a stepwise process suggested by the authors for moving from a SRPSR to one’s own review, details of which can be found elsewhere(Reference Kelley and Kelley10). Briefly, a major decision that needs to be made is whether a new systematic review, with or without meta-analysis, is needed. The Cochrane Collaboration recommends that another systematic review be based on needs and priorities, with consideration of strategic importance, practical aspects as it pertains to organising the review, and impact of another review(11). The Agency for Healthcare Research and Quality in the United States approaches this from a needs-based perspective in which the focus is on stakeholder impact as well as currency and necessity(Reference Shojania, Sampson and Ansari12). A determination is then made to create, archive or continue surveillance(Reference Shojania, Sampson and Ansari12). The Panel for Updating Guidance for Systematic Reviews (PUGS) created a consensus and checklist for when and how to perform another systematic review(Reference Garner, Hopewell and Chandler13). This process includes assessing the currency as well as previous review(s), if any exist, identifying relevant new methods, studies or other information that may justify another review, and assessing the potential impact of another review(Reference Garner, Hopewell and Chandler13). The PUGS guidelines and checklist may be the most suitable method for researchers interested in conducting another systematic review, with or without meta-analysis. Any new reviews should also address an important research question, something that should be explained in the introduction section of the manuscript.

Fig. 1. Suggested stepwise approach for deciding whether a new or updated systematic review, with or without meta-analysis, should be conducted. Adapted from Kelley & Kelley(Reference Kelley and Kelley10). SRPSR, systematic reviews of previous systematic reviews.

A second reason for conducting a SRPSR is that given the large number of reviews of this type on many of the same topics, a need exists to evaluate these in order to provide decision makers (clinicians, guideline developers, policymakers, etc.) with the information they need to make informed choices on the topic of interest. A simple PubMed search conducted by the authors on 10 May 2019, using the search criteria ‘(“systematic review of previous systematic reviews” OR “umbrella review” OR “overview of reviews” OR “review of reviews” OR “summary of systematic reviews” OR “meta-reviews”) AND (food OR beverages OR diet OR nutrition)’ yielded 173 citations associated with nutrition-related SRPSR in 2018, the most recent complete year for which data were available. As part of the conduct of a SRPSR, an evaluation regarding the quality and/or risk of bias of each included systematic review, with or without meta-analysis, should be included. Instruments for assessing such include, but are not limited to, (1) a MeaSurement Tool to Assess systematic Reviews 2(Reference Shea, Reeves and Wells14), (2) Risk of Bias in Systematic Reviews(Reference Whiting, Savovic and Higgins15) (3) Grading of Recommendations, Assessment, Development and Evaluations (GRADE)(Reference Guyatt, Oxman and Akl16) and (4) Quality Assessment of Diagnostic Accuracy Studies 2(Reference Whiting, Rutjes and Westwood17). The importance of SRPSR is supported by a recent thematic series devoted to this topic(Reference McKenzie and Brennan18Reference Lunny, Brennan and McDonald20). In addition, Ballard & Montgomery also provide methodological guidance, including a four-item checklist, for evaluating a SRPSR(Reference Ballard and Montgomery21). Finally, for the reasons previously given as well as to improve efficiencies and avoid research waste(Reference McKenzie and Brennan18), the authors believe that funding agencies should support high-quality SRPSR. Detailed information regarding SRPSR can be found elsewhere(Reference McKenzie and Brennan18Reference Pollock, Fernandes and Becker28).

Systematic review without meta-analysis

The Cochrane Collaboration defines a systematic review as a ‘review of a clearly formulated question that uses systematic and explicit methods to identify, select, and critically appraise relevant research, and to collect and analyse data from the studies that are included in the review(Reference Higgins and Green6)’. The key characteristics of a systematic review include (1) clearly stated objectives with predefined eligibility criteria for studies, (2) an explicit, reproducible methodology, (3) a systematic search that attempts to identify all studies that meet the eligibility criteria, (4) an assessment of the validity of the findings of the included studies (risk of bias, etc.) and (5) a systematic presentation and synthesis of the characteristics and findings of the included studies(Reference Higgins and Green6). A systematic review without a meta-analysis is often conducted because the authors feel that the studies are not combinable quantitatively given that they are too different and/or cannot be combined into some type of common metric. This is usually not an easy task since no one study is exactly alike, nor should they be. For example, some people may decide a priori that the studies will be too different to combine quantitatively (apples and oranges) while others may decide that the eligible studies can be combined (fruit salad). If a meta-analysis is not included, then the reason for not doing so should be stated in the research synthesis sub-section of the Methods section of the manuscript. When a meta-analysis is not included, the results are synthesised qualitatively. As an example, Calder et al. conducted a systematic review without meta-analysis with respect to increasing arachidonic acid intake and PUFA status, metabolism and health-related outcomes in humans(Reference Calder, Campoy and Eilander29). Based on twenty-two articles from fourteen randomised controlled trials, the authors concluded that insufficient evidence currently exists to support any recommendation regarding the specific health effects of arachidonic acid intake(Reference Calder, Campoy and Eilander29). The original PRISMA statement provides guidance, including a checklist, for conducting and reporting a systematic review, with or without meta-analysis(Reference Liberati, Altman and Tetzlaff30).

Systematic review with meta-analysis

A systematic review with meta-analysis is similar to a systematic review without a meta-analysis with the exception that the former includes a quantitative synthesis, that is, meta-analysis of the data. Generally, systematic reviews with a meta-analysis consist of the following types: (1) aggregate data (AD) meta-analysis, (2) individual participant/patient data (IPD) meta-analysis, (3) network meta-analysis (NMA), which can be based on either AD or IPD and (4) non-inferiority (NI) meta-analysis (AD or IPD).

Aggregate data meta-analysis. An AD meta-analysis is a quantitative approach in which summary data, for example, sample sizes, means and standard deviations are abstracted for outcomes of interest (kJ consumed, cholesterol intake, etc.) from previously published studies and then pooled for analysis. These are by far the most common types of meta-analyses conducted today and often focus on pairwise comparisons, for example, changes in an intervention v. control group. A simple PubMed search conducted by the authors on 13 May 2019, using the search string (“systematic review” OR meta-analy*) AND (food OR beverages OR diet OR nutrition) NOT (“individual participant data” OR “individual patient data” OR “IPD” OR “systematic review of previous systematic reviews” OR “umbrella review” OR “overview of reviews” OR “review of reviews” OR “summary of systematic reviews” OR “meta-reviews”) yielded a total of one citation in 1978 v. 2557 in 2018, the most recent and complete year in which data were available. As an example of an AD meta-analysis in nutrition, Zhang et al., conducted a systematic review with meta-analysis on the efficacy and safety of iron supplementation in patients with heart failure and iron deficiency(Reference Zhang, Zhang and Du31). Based on nine randomised controlled trials representing 789 patients who received iron therapy, significant improvements were observed for the 6-min walk test and peak maximum oxygen consumption as well as fewer patients being hospitalised for heart failure(Reference Zhang, Zhang and Du31). No associations were found for total re-hospitalisation or mortality(Reference Zhang, Zhang and Du31).

As previously mentioned, the original PRISMA statement provides guidance, including a checklist, for conducting and reporting a systematic review with AD meta-analysis(Reference Liberati, Altman and Tetzlaff30). In addition, recent guidance for conducting systematic reviews and meta-analyses of observational studies in aetiology is also available(Reference Dekkers, Vandenbroucke and Cevallos32) and the Cochrane Handbook provides extensive information on the conduct of systematic reviews with AD meta-analysis(Reference Higgins and Green6).

Individual participant/patient data meta-analysis. An IPD meta-analysis is a systematic review that includes a meta-analysis based on IPD and often comprises a consortium made up of a large number of investigators such as the European Consortium that recently conducted an IPD meta-analysis on vitamin D and mortality(Reference Gaksch, Jorde and Grimnes33). Since de-identified IPD is usually not available in the original studies, it needs to be requested from the author(s). Considered the ‘gold standard’ of meta-analyses, the potential advantages of an IPD meta-analysis, described in detail elsewhere(Reference Riley, Lambert and Abo-Zaid34), include, but are not limited to, ‘standardizing statistical analyses in each study; deriving desired summary results directly, independent of study reporting; checking modelling assumptions; and assessing participant-level effects, interactions and non-linear trends’(Reference Riley35). However, one of the major disadvantages of an IPD meta-analysis is the ability to retrieve original data from study authors, with ranges of 25–100 % reported across different subject areas(Reference Kelley, Kelley and Tran36Reference Polanin39). As a result, this can lead to an increased risk of bias. While at least one approach has been recommended for integrating both IPD and AD(Reference Riley, Lambert and Staessen40), one is still left with AD from those studies in which IPD cannot be retrieved. A second disadvantage of an IPD v. AD meta-analysis is the increased time and resources associated with such analysis. For example, one study estimated the costs of a previous IPD meta-analysis(Reference Steinberg, Smith and Stroup41) to be eight times greater than an AD meta-analysis(Reference Cooper and Patall42). Finally, several studies have shown a lack of statistically and practically important differences between AD and IPD meta-analyses when an indistinguishable, or nearly indistinguishable, number of studies are included(Reference Steinberg, Smith and Stroup41, Reference Olkin and Sampson43Reference Tudur Smith, Marcucci and Nolan45). Despite these disadvantages, the number of IPD meta-analyses is increasing, including the field of nutrition. A simple PubMed search conducted by the authors on 13 May 2019, using the search string (“systematic review” OR meta-analy*) AND (food OR beverages OR diet OR nutrition) AND (“individual participant data” OR “individual patient data” OR “IPD”) NOT (“systematic review of previous systematic reviews” OR “umbrella review” OR “overview of reviews” OR “review of reviews” OR “summary of systematic reviews” OR “meta-reviews”) yielded one citation in the year 2002 v. twenty-six in 2018, the most recent year in which complete data were available. As an example in the field of nutrition, Smelt et al. recently conducted an IPD meta-analysis of randomised controlled trials on the effects of vitamin B12 and folic supplementation on routine haematological parameters in adults 60 years of age and older(Reference Smelt, Gussekloo and Bermingham46). The authors concluded that there is currently a lack of evidence to support the effects of supplementation of low concentrations of vitamin B12 and folate on haematological parameters in community-dwelling adults 60 years of age and older(Reference Smelt, Gussekloo and Bermingham46). A set of PRISMA guidelines, including a checklist, for conducting and reporting an IPD meta-analysis (PRISMA-IPD) are available(Reference Stewart, Clarke and Rovers47). Additional details regarding the conduct of an IPD have been reported elsewhere(Reference Higgins and Green6, Reference Riley, Lambert and Abo-Zaid34, Reference Tierney, Vale and Riley48).

Network meta-analysis. A more recent and increasingly used approach, including the field of nutrition(Reference Schwingshackl, Buyken and Chaimani49), is the conduct of a systematic review with NMA, usually in the form of an AD NMA v. IPD NMA. NMA, also known as ‘multiple treatments meta-analysis’ or ‘mixed treatment comparisons meta-analysis’, is a type of meta-analysis that compares at least three treatments and includes both direct (comparing two treatments head to head) and indirect (comparing two treatments via a comparative control group) evidence. One of the major reasons for its increased use is the ability to include multiple treatments in the same analysis, thereby facilitating treatment recommendations. For example, Galaviz et al. recently conducted an NMA on the real-world impact of global diabetes prevention interventions on diabetes incidence, body weight and glucose(Reference Galaviz, Weber and Straus50). The overall conclusion of the authors’ NMA of sixty-three studies was that real-world lifestyle modification strategies can reduce diabetes risk(Reference Galaviz, Weber and Straus50). A simple PubMed search conducted by the authors on 14 May 2019, using the search string (“network meta-analysis” OR “multiple treatments meta-analysis” OR “mixed treatment comparisons meta-analysis”) AND (food OR beverages OR diet OR nutrition) NOT (“systematic review of previous systematic reviews” OR “umbrella review” OR “overview of reviews” OR “review of reviews” OR “summary of systematic reviews” OR “meta-reviews”) yielded one initial citation in the year 2007 v. thirty-three in 2018, the most recent year in which complete data were available. Not surprisingly, NMA is more time and resource intensive than a traditional AD meta-analysis given the large number of treatments that are usually included as well as the inclusion of both direct and indirect evidence. PRISMA guidelines, including a checklist, for conducting and reporting a NMA (PRISMA-NMA) are available(Reference Hutton, Salanti and Caldwell51). Additional details regarding this emerging and important approach have been described elsewhere(Reference Laws, Kendall and Hawkins52Reference Doi and Barendregt55).

Non-inferiority meta-analysis. The most recent, but still infrequent type of meta-analysis to emerge is a NI meta-analysis. A NI meta-analysis attempts to assess whether a new intervention is no worse than a reference intervention(Reference Brittain, Fay and Follmann56). A major challenge of a NI meta-analysis is the NI margin used(Reference Brittain, Fay and Follmann56). These types of meta-analyses could be based on either AD or IPD and could also take the form of a NMA (AD or IPD)(Reference Schmidli, Wandel and Neuenschwander57). While the authors are not aware of any NI meta-analyses in the field of nutrition, Acuna et al. recently conducted a NI meta-analysis that examined the quality of surgical outcomes using laparoscopic v. open resection for rectal cancer(Reference Acuna, Chesney and Ramjist58). Based on their analysis of fourteen randomised controlled trials, the authors concluded that laparoscopy was non-inferior to open surgery for rectal cancer(Reference Acuna, Chesney and Ramjist58, Reference Acuna, Chesney and Amarasekera59). More detailed information regarding NI meta-analyses can be found elsewhere(Reference Brittain, Fay and Follmann56, Reference Schmidli, Wandel and Neuenschwander57, Reference Liberati and D’Amico60).

Primary components of systematic reviews with meta-analysis

Given that traditional AD meta-analyses still dominate the literature, the emphasis of the rest of this manuscript will centre on this type of quantitative review but while noting that much of this information can be applied to many of the other types of systematic reviews with meta-analyses that have been previously described. For more detailed information, readers are referred to the PRISMA Guidelines, including a twenty-seven-item checklist, for the conduct and reporting of systematic reviews with AD meta-analysis(Reference Liberati, Altman and Tetzlaff30).

Overview

Similar to most research studies, a systematic review with meta-analysis manuscript (broadly) should consist of an abstract, introduction, methods, results, discussion and conclusion(s) section.

Abstract

The structure of the abstract of a systematic review with meta-analysis generally mirrors that of an original study. The PRISMA guidelines provide specific information, including a twelve-item checklist, regarding information to report in the abstract of a systematic review, with or without meta-analysis(Reference Beller, Glasziou and Altman61). However, adherence to all items in the checklist may be difficult given the word limitations on abstracts imposed by journals and conference abstracts. Thus, one may have to prioritise the most important information to be included, especially since many readers may not read beyond the abstract. For example, Saint et al. reported that almost two thirds (63 %) of internists only read the abstracts of medical journal articles(Reference Saint, Christakis and Saha62). Given the former, a clear and concise abstract would seem to be important.

Introduction

In the introduction section of the manuscript, the authors should provide a strong rationale for why the present study is needed. This should include the importance of the issue to be addressed as well as a review of prior research on the topic. Based on the authors’ experiences, producers of systematic reviews with meta-analysis usually provide an adequate description of the importance of the topic to be addressed but often lack information regarding previous original studies on the topic as well as previous systematic reviews with meta-analysis, if any, to justify their own systematic review with meta-analysis. The former is important because the conflicting findings of previous original studies are often one of the very reasons for conducting reviews of this nature. The latter is equally important because of the increasing concern about redundant systematic reviews, with or without meta-analysis, that is, value added(Reference Ioannidis9). If the authors are not aware of any previous systematic reviews with meta-analysis on the topic, then it should be stated. For example, in a systematic review with AD meta-analysis of randomised controlled trials examining the impact of modified dietary interventions on maternal glucose control and neonatal birth weight, Yamamoto et al. cited three previous systematic reviews and meta-analyses related to the topic but none specific to their proposed work regarding the impact of modified dietary interventions on detailed maternal glycaemic parameters, including changes in glucose-related variables(Reference Yamamoto, Kellett and Balsells63). As previously mentioned, one approach to help justify one’s own work, though more time-consuming and resource intensive, is to conduct and publish a systematic review of previous systematic reviews with meta-analysis on the topic and describe this in the introduction section of the manuscript(Reference Kelley and Kelley10). Finally, the end of the introduction should clearly delineate the purpose/objective(s)/research question(s) of the intended systematic review with AD meta-analysis.

Methods and results

Any systematic review, with or without meta-analysis, should include an a priori research plan and at a minimum, register the protocol in a systematic review trials registry such as PROSPERO(Reference Page, Shamseer and Tricco64). At the beginning of the methods section of the paper, the registration number should be reported. Registering a systematic review with meta-analysis is important for (1) promoting transparency, (2) helping to reduce potential bias and (3) helping to avoid unintended duplication of effort(Reference Stewart, Moher and Shekelle65). Registration is beneficial for researchers, commissioning and funding organisations, journal editors and peer reviewers(Reference Stewart, Moher and Shekelle65). Based on these benefits, the authors would advocate that journals require all manuscript submissions to include a registration number before being considered for peer review. In addition to the protocol being registered in PROSPERO, it is suggested that authors consider publishing their protocol in a peer-reviewed journal, thereby enhancing reach and possibly improving their study design. As an example, Asghari et al. recently published a protocol for a systematic review with AD meta-analysis in which they plan to examine the effects of vitamin D supplementation on serum 25-hydroxyvitamin D concentration in children and adolescents(Reference Asghari, Farhadnejad and Hosseinpanah66). The PRISMA group provides detailed guidelines, including a seventeen-item checklist, for developing and reporting the protocol for a systematic review, with or without meta-analysis (PRISMA-P)(Reference Shamseer, Moher and Clarke67). To enhance the field of research, the authors would also advocate that peer-reviewed journals consider publishing high-quality protocols, including requiring a completed PRISMA-P checklist upon submission.

Congruent with PRISMA guidelines,(Reference Liberati, Altman and Tetzlaff30) the methods section of a systematic review with AD meta-analysis should usually be partitioned into the following sections: (1) study eligibility, (2) data sources, (3) study selection, (4) data abstraction, (5) risk of bias assessment and (6) data synthesis.

Study eligibility. This section should describe the studies that should be included in a systematic review with AD meta-analysis. To aid in determining eligible studies as well as searching the literature, one may consider using the PICO or PICOS framework(Reference Liberati, Altman and Tetzlaff30). Where applicable, the PICO/PICOS structure includes participants/population (P), interventions (I), comparisons (C), outcomes (O) and study design/setting (S)(Reference Liberati, Altman and Tetzlaff30). For example, in a recent systematic review with AD meta-analysis on dietary patterns, bone mineral density and fracture risk, the PICOS framework included an open population (P), dietary patterns as the intervention (I), other dietary patterns as the comparison (C), bone mineral density, bone mineral content or fracture as the outcomes (O) and observational study designs (S)(Reference Denova-Gutierrez, Mendez-Sanchez and Munoz-Aguirre68). For observational studies dealing with aetiology, the population, exposure, control and outcomes framework has recently been suggested(Reference Dekkers, Vandenbroucke and Cevallos32). In addition, the type of study designs included should also be reported. For example, in a meta-analysis that examined the effects of Ca intake on breast cancer risk, the population consisted of females, the exposure was Ca intake (dietary and/or supplemental), the control/comparator was no dietary or supplemental Ca intake, the outcome was breast cancer risk and the study designs included were prospective cohort, case–control or case–cohort studies(Reference Hidayat, Chen and Zhang69).

In addition to providing a description of potential eligible studies, reasons for excluding studies may also be provided, though it is perfectly reasonable to assume that any study not meeting one’s eligibility criteria would be excluded. However, this does not exclude one from including a supplementary file of excluded citations, including the reasons for exclusion after each reference. A systematic review may include studies in any language, especially given the free online language translators that are currently available. However, there is no clear consensus regarding increased bias whether a systematic review is limited to English-language articles published in peer-reviewed journals(Reference Higgins and Green6). In addition, studies may be derived from both published and unpublished sources (master’s theses, dissertations, abstracts from conference proceedings, clinical trials registries, etc.). However, van Driel et al. concluded that (1) the difficulty in retrieving unpublished work could lead to selection bias, (2) many unpublished trials are eventually published, (3) the methodological quality of such studies are poorer than those that are published and (4) the effort and resources required to obtain unpublished work may not be warranted(Reference van Driel, De Sutter and De Maeseneer70).

Data sources. The data sources subsection of the methods describes the sources that are to be used to try and locate potential eligible studies. While there will always be a margin of search error, the goal is to try and obtain as many studies as possible that meets one’s eligibility criteria. To achieve this goal, a list of electronic databases that were searched should be provided (PubMed, Embase, etc.) as well as the search criteria for the databases. While there is no clear consensus, it has been suggested that at least two electronic databases be searched(Reference Higgins and Green6) because no one database indexes all journals. While a minimum of two databases is one suggestion(Reference Higgins and Green6), Bramer et al. recently suggested that at least Embase, MEDLINE, Web of Science and Google Scholar be searched to ensure adequate coverage(Reference Bramer, Rethlefsen and Kleijnen71). However, Google Scholar may not be worth the time and effort, given its lack of sensitivity and specificity(Reference Vine72). For those researchers who do not have easy access to Embase but can access Scopus, searching the latter may be acceptable since Scopus has been reported to provide 100 % coverage of both MEDLINE and Embase(Reference Burnham73). It is also relevant to point out that MEDLINE is nested within the PubMed database. If grey literature is included, sources such as ProQuest master’s theses and dissertations and the System for Information on Grey Literature in Europe databases could be searched. When searching electronic databases, the detailed search strategy for at least one of them, for example, PubMed, should be included. This may be embedded in the text or included as a supplementary file. To ensure adequate coverage, it is recommended that nutritionists search a minimum of three databases, inclusive of the following: (1) PubMed, (2) Embase or Scopus and (3) Web of Science.

In addition to searching electronic databases, other methods should be used. These include such things as cross referencing from retrieved studies, searching clinical trials databases, hand-searching selected journals and expert review. The start and end dates for all searches should be provided, including the reason(s) for the chosen start date. Finally, the name(s) of the individual(s) who conducted the searches should also be provided(Reference Liberati, Altman and Tetzlaff30).

Study selection. The study selection section describes the process that was used to select studies. To avoid study selection bias, studies should be reviewed by at least two people, independent of each other. Those individuals should then meet and review their selections for agreement. However, prior to doing so, one may provide data on the level of agreement before addressing discrepancies. One common statistic used to address this is the kappa statistic (κ)(Reference Cohen74). If agreement cannot be reached for one or more studies when the selectors meet, at least one other person should make a recommendation. For all excluded studies, the reason(s) for exclusion should be recorded. One broad way to address exclusions is to follow the PICOS structure: (1) participants/population, (2) intervention, (3) comparison, (4) outcomes, (5) study design/setting and (6) other. The names of all individuals involved in the study selection process, including their role, should also be provided.

Data abstraction. The data abstraction/extraction section describes the process used to code the eligible studies. A first step is to provide a brief description of how the codebooks were developed to abstract data, including a list and description of the information that was coded. Generally, this may include (1) study characteristics (authors, year of publication, journal, study design, etc.), (2) participant characteristics (age, gender, race/ethnicity, morbidities, etc.), (3) intervention characteristics (length of study, etc.) and (4) outcome characteristics (sample sizes, means, standard deviations, etc.). Additional information for abstracting data, including for complex meta-analyses, is provided elsewhere(Reference Pedder, Sarri and Keeney75). The same process for selecting studies should be used for abstracting data. In addition, the authors should provide information on the process used for obtaining missing data. If no attempt was made to obtain missing data, then this should be stated.

Risk of bias assessment. A systematic review, with or without meta-analysis, should usually include some type of risk of bias assessment for each included study. It is important here to distinguish between the risk of bias and study quality, something that appears to often be overlooked given the authors’ more than 25 years of experience in reviewing manuscripts and grant proposals. The Cochrane Collaboration recommends that the focus be on the risk of bias, amongst other factors, given that the ultimate goal should be the degree to which the results of the concluded studies are to be believed(Reference Higgins and Green6). It also overcomes the uncertainty in differentiating between the quality in the conduct of a study v. the conduct in the reporting of a study(Reference Higgins and Green6). While this does not negate the use of study quality scales, the potential limitations should be clearly delineated in the manuscript. However, the use of quality scales to decide what studies should be included or excluded is strongly discouraged, as previously mentioned, given the difficulty in distinguishing between the quality of the reporting of a study and the quality in the conduct of a study(Reference Higgins and Green6). There are at least eighty-six risk of bias/study quality assessment instruments(Reference Sanderson, Tatt and Higgins76). Seehra et al. reported that the Cochrane risk of bias was the most common tool used for assessing randomised controlled trials (26·1 %), while the Newcastle–Ottawa scale, a study-quality instrument, was used most commonly for assessing non-randomised studies (15·3 %), including case–control and cohort studies(Reference Seehra, Pandis and Koletsi77). However, since the time of this publication, the Cochrane Collaboration has updated their risk of bias tool for randomised controlled trials(Reference Higgins, Sterne and Savović78) and also created an instrument for assessing the risk of bias in non-randomised studies in which the health effects of two or more interventions are compared(Reference Sterne, Hernán and Reeves79). For authors, the important point here is to carefully consider the instrument(s) to be used and provide a rationale for the choice(s). For example, the authors may choose to use some type of risk of bias assessment instrument as well as some type of study quality tool. Finally, the processes for evaluating the risk of bias and/or the study quality are the same as those for selecting studies and extracting data. While not without limitations, the risk of bias and/or study quality results can help consumers of meta-analyses with decisions regarding the strengths and potential limitations of included studies.

Data synthesis (effect size calculation). The data synthesis piece of a systematic review can be either qualitative or quantitative (meta-analysis). The focus here will be on the meta-analytic approach. The initial step in conducting a meta-analysis is deciding on the method that will be used to calculate a common effect size for each outcome from each study so that the findings might be pooled into an overall result. The calculation of an effect size traditionally comprises sample sizes as well as measures of central tendency (e.g. means) and dispersion (e.g. standard deviations). If feasible, the focus should be on calculating and reporting effect sizes using the original metric, for example, kJ/d. The primary reason for this approach is based on the belief that it will be easier for consumers (nutritionists, clinicians, policymakers, etc.) to understand. However, in many situations, the calculation of something like a standardised mean difference effect size (Hedge’s g, Cohen’s d, etc.) may be necessary if the outcome of interest is assessed using different scales, for example, the effects of dietary improvement on symptoms of depression and anxiety, given that depression and anxiety outcomes were assessed using different scales(Reference Firth, Marx and Dash80). Another strength of the standardized mean difference effect size is the ability to calculate this statistic from a number of different tests (t tests, F ratios, correlations, etc.)(Reference Higgins and Green6, Reference Borenstein, Hedges and Higgins81). Alternatively, one potential weakness of the standardized mean difference effect size is the inability of consumers to understand this metric. For example, it is usually much easier for consumers to understand and interpret a decrease in resting systolic blood pressure of 8 mmHg v. a mean reduction of 0·50 standardised deviation units. Given the former, it is recommended that the original metric be used if all of the studies for the outcome of interest report the results for that outcome using the same metric or if the results can be converted into a metric that is easier for the reader to interpret, for example, converting total cholesterol (TC) from mg/dl to mmol/l by multiplying TC in mg/dl by 0·02586. If the outcome of interest is assessed using different instruments with various scales that cannot be converted into a more easily understood metric, then the standardised mean difference effect size is recommended. If the standardised mean difference effect size is used, we recommend that results based on the original scale, including variance statistics, also be reported in a table or figure.

Data synthesis (effect size pooling). After deciding on the metric used to pool results, a decision needs to be made on the type of model that will be used to pool results. However, prior to that decision, the investigators need to decide which study designs to include. For intervention studies, we recommend that only randomised controlled trials be included because they are the only way to control for confounders that are not known or measured as well as the observation that non-randomised controlled trials and single group trials tend to overestimate the effects of healthcare interventions(Reference Sacks, Chalmers and Smith82, Reference Schulz, Chalmers and Hayes83). For observational studies, we recommend that case–control, cross-sectional as well as retrospective and prospective study designs be analysed separately. These separate results can easily be displayed in a table and/or forest plot.

For pooling, there is currently no clear consensus on the one best model for combining results, necessitating a clear need for a large simulation study that tests all the different models under various conditions. With a focus on frequentist meta-analysis, historically two basic types of models are used, the traditional fixed-effect model and the random-effects model. In a traditional fixed-effect model, the assumption is that all the included studies share the same common effect size. Thus, any differences in the observed effects are considered to be the result of within-study sampling error while between-study variance is not accounted for. In contrast, random-effects models assume that the true effect size may differ both within (within-study sampling error) and between (between-study variance) studies. Thus, random-effects models attempt to account for both within- and between-study variance. Multiple random-effects models exist, all of which use different statistical approaches to estimate the between-study variance(Reference DerSimonian and Kacker84Reference Sidik and Jonkman89). Therefore, if a random-effects model is used, it is important for authors to report and cite that random-effects model since they can lead to different results(Reference Zeng and Lin90). The most commonly used, but not necessarily the best model, is the original random-effects, method-of-moments approach of Dersimonian & Laird(Reference Dersimonian and Laird85). Its common use is most likely the consequence of its longevity as well as presence in numerous statistical packages for meta-analysis. The former notwithstanding, caution may be warranted in the a priori use of the traditional fixed-effect model and various random-effects models that are currently available(Reference DerSimonian and Kacker84Reference Sidik and Jonkman89). For the traditional fixed-effect model, the issue has to do with not accounting for potential between-study variance that may exist. For random-effects models, an attempt is made to account for between-study variance that usually results in wider CI but also results in an increased mean squared error, which is a problem. In addition, the pooled mean effect for random-effects models is not always more conservative than the traditional fixed-effect model(Reference Poole and Greenland91). Alternatively, fixed-effect models with robust error estimation may currently be the best choice(Reference Doi, Barendregt and Khan92Reference Doi, Furuya-Kanamori and Thalib94). In the presence of statistical homogeneity, these models will collapse into the traditional fixed-effect model. Both the inverse heterogeneity (IVhet) and quality effects (QE) models are examples of fixed-effect models with robust error estimation(Reference Doi, Barendregt and Khan92, Reference Doi, Barendregt and Khan93). Both have been shown to be more robust than the traditional Dersimonian and Laird approach, with regard to coverage probabilities(Reference Doi, Barendregt and Khan92, Reference Doi, Barendregt and Khan93). The IVhet model uses an estimator under the fixed-effect model assumption but importantly has a quasi-likelihood-based variance structure(Reference Doi, Barendregt and Khan92), while the QE model weights studies by including a quality score for each study, derived from a pre-existing or self-developed scale(Reference Doi, Barendregt and Khan93). The relationship between the two models is that the IVhet model is the QE model with quality set to equal. Thus, no quality scores need to be imputed when using the IVhet model(Reference Doi, Barendregt and Khan93).

While acknowledging the current and ever-changing state of the evidence as well as the prioritisation of coverage probabilities over point estimates, we recommend that the IVhet and QE models be used when conducting an AD meta-analysis(Reference Doi, Barendregt and Khan92Reference Doi, Furuya-Kanamori and Thalib94). However, it’s also important to understand that no statistical model is perfect. In addition, the choice of which model to use will often depend on how a meta-analyst poses the question and what modelling assumptions they make a priori, including what the parameter of interest is. Both the IVhet and QE models are currently available in a free, easy-to-use Excel meta-analysis add-in program (Meta XL)(Reference Barendregt and Doi95). A Stata module (admetan) is also available to execute the IVhet and QE models.

Irrespective of model choice, and assuming a frequentist approach is used, pooled results should typically be reported using point estimates and 95 % CI as well as z- or t-based α values. While not germane to meta-analysis, one should consider when reporting and interpreting results the recent recommendations in an editorial by Wasserstein et al.(Reference Wasserstein, Schirm and Lazar3) as well as the rest of an entire issue of The American Statistician devoted to the use and over-reliance on ‘statistical significance’. Similiar recommendations were made in a recent commentary by Amrhein et al.(Reference Amrhein, Greenland and McShane96).

In addition to 95 % CI(Reference Amrhein, Greenland and McShane96), 95 % prediction intervals (PI) may also be reported when findings are pooled from those based on models such as random-effects(Reference Higgins, Thompson and Spiegelhalter97). The concept behind PI is that they tell one how effects are distributed around a summary effect(Reference Higgins, Thompson and Spiegelhalter97). This is in contrast to point estimates and CI, which provide an estimate of the overall effect and precision, respectively(Reference Higgins, Thompson and Spiegelhalter97). From an applied perspective, PI may make more sense because they help to determine uncertainty about whether an intervention works or not(Reference Higgins, Thompson and Spiegelhalter97). However, it has been recommended that caution be derived in drawing strong conclusions from 95 % PI because of coverage problems(Reference Partlett and Riley98). In addition, it has been suggested that because PI are calculated based on trials that are generally homogeneous, that is, patient populations and comparator treatments are interchangeable, the overall effect estimates may not be accurate if they do not meet this criterion(Reference Kriston99). As an example of PI use in nutrition, Cariolou et al. recently conducted an AD meta-analysis on the association between 25-hydroxyvitamin D deficiency and mortality in children with acute or critical conditions(Reference Cariolou, Cupp and Evangelou100). Based on a random-effects model, the pooled OR and 95 % CI of the risk of mortality in vitamin D deficient v. vitamin D non-deficient acute and critically ill children was 1·81 (95 % CI 1·24, 2·64). However, based on 95 % PI (0·71, 4·20), there was much less certainty, that is, wider intervals that also included 1, regarding this association(Reference Cariolou, Cupp and Evangelou100).

Similar to original studies, it is important to examine and report data on heterogeneity and inconsistency in meta-analysis. In meta-analysis, heterogeneity refers to any type of variability between studies and may be categorised broadly as clinical (patient characteristics, etc.), methodological (blinding, allocation concealment, etc.) and statistical (differences in outcome assessments, etc.)(Reference Higgins and Green6). The Cochran Q statistic is typically used to examine heterogeneity(Reference Cochran101), while the I 2 statistic, an extension of Q, is used to examine inconsistency(Reference Higgins, Thompson and Deeks102). The Q statistic is a measure of statistical significance and given power problems, is typically reported as significant if the alpha (α) value is < 0·10 as opposed to < 0·05(Reference Higgins, Thompson and Deeks102). I 2 is a relative measure that ranges from 0 to 100 %, with higher values representative of greater inconsistency(Reference Higgins, Thompson and Deeks102), while τ 2 is an absolute measure of between-study heterogeneity. However, like any statistic, Q, I,2 or τ 2 are not perfect with respect to explaining all the potential sources of heterogeneity(Reference Ioannidis, Patsopoulos and Evangelou103).

A standard graphical method of reporting results from each study as well as the overall pooled effect is through the use of a forest plot. An example of a forest plot using the IVhet model(Reference Doi, Barendregt and Khan92) is shown in Fig. 2(Reference Kelley, Kelley and Roberts104). While not common given the different ways in which data are reported, sample sizes as well as change outcome means and standard deviations from each intervention group may also be displayed in a forest plot. However, to reduce bias, including studies that only report data in exactly the same way is strongly discouraged if the overall treatment effect and variance from each study can be calculated from other reported statistics.

Fig. 2. Forest plot example of diet-induced changes in total cholesterol (TC) in adults based on the inverse variance heterogeneity (IVhet) model. The black squares represent mean changes in TC from each study while the left and right extremes of the squares represent the corresponding 95 % CI, that is, compatibility intervals for the mean changes. The middle of the black diamond represents the pooled mean change in TC, while the left and right extremes of the diamond represent the corresponding 95 % CI of the pooled mean change. The vertical dashed line represents the pooled mean change in TC while the solid vertical line represents zero (0) effect. As can be seen, the pooled 95 % CI did not include zero (0), suggesting compatibility regarding the association between diet and reductions in TC. The results for Cochran’s Q statistic, P value for Q and I 2 suggest a lack heterogeneity and inconsistency. The ES represents effect size changes in TC in mmol/l, while % weight represents the percentage weight attributed by each study to the overall pooled mean effect. Results were similar when the two results by Stefanick et al. were pooled into one overall ES. Data adapted from Kelley et al.(Reference Kelley, Kelley and Roberts104).

Data synthesis (small-study effects). An assessment for potential small-study effects (publication bias, etc.) is usually important in meta-analysis. Historically, this has most often been assessed qualitatively using some type of funnel plot and quantitatively using Egger’s test(Reference Egger, Davey Smith and Schneider105), though other methods exist for the assessment of both(Reference Sterne, Gavaghan and Egger106, Reference Furuya-Kanamori, Barendregt and Doi107). Briefly, a funnel plot is a scatterplot in which the precision of each included study (standard error, inverse of the standard error, etc.) is plotted on the vertical (y) axis and the effect size for each included study (mean difference, standardised mean difference, OR, etc.) is plotted on the horizontal (x) axis. In the absence of small-study effects, the values should appear as an inverted funnel, with smaller sample size studies showing greater dispersion, that is, larger standard errors, at the bottom of the plot, while studies with larger sample sizes showing less dispersion towards the top. Smaller missing studies without statistically significant effects will lead to an asymmetrical appearance of the funnel plot with a gap in the bottom corner of the plot. However, the funnel plot can be difficult to interpret(Reference Lau, Ioannidis and Terrin108). An example of a funnel plot using the same data as for the forest plot(Reference Kelley, Kelley and Roberts104) is shown in Fig. 3. Egger’s regression–intercept test is used for the Y intercept = 0 from a linear regression of a normalised effect estimate, that is, estimate divided by its standard error, against precision, that is, the reciprocal of the standard error of the estimate(Reference Egger, Davey Smith and Schneider105). Unfortunately, the power to detect asymmetry with Egger’s test is low when the number of studies is small(Reference Sterne, Sutton and Ioannidis109). Present recommendations suggest that if there are at least ten studies, a funnel plot and Egger’s test may be used to examine for the small-study effects if the outcome of interest is continuous in nature, for example, changes in TC. However, since the time of the publication of these recommendations, an alternative qualitative (Doi plot) and quantitative (Luis Furuya-Kanamori (LFK) index) approach have been suggested to be more robust with respect to ease in visualising asymmetry (Doi plot) as well as greater diagnostic accuracy in differentiating between asymmetry and no asymmetry (LFK index)(Reference Furuya-Kanamori, Barendregt and Doi107). Rather than use a scatterplot, the Doi plot uses a normal quantile plot v. effect rather than precision v. effect, providing better visualisation than a dot plot(Reference Furuya-Kanamori, Barendregt and Doi107). The LFK index, an index based on the Doi plot, assesses asymmetry quantitatively, with a value of zero (0) representing perfect symmetry, and thus, no apparent small-study effects(Reference Furuya-Kanamori, Barendregt and Doi107). It is based on the concept in which symmetry would be considered with respect to a vertical line on the horizontal (x) axis from the effect size with the lowest absolute z score on the Doi plot, dividing the plot into two regions with the same areas. The LFK index then quantifies the difference between these two regions in terms of the areas below the plot and the difference in the number of studies included in each arm of the plot(Reference Furuya-Kanamori, Barendregt and Doi107). Values ± 1, greater than ± 1 and within ± 2 and greater than ± 2 are considered to represent no, minor and major asymmetry, respectively(Reference Furuya-Kanamori, Barendregt and Doi107). An example of the Doi plot and LFK index using the same data as for our previous examples is shown in Fig. 4.

Fig. 3. Example of funnel plot based on diet-induced changes in total cholesterol (TC) following a dietary intervention. The solid vertical line represents the overall pooled mean change in TC in mmol/l after a dietary intervention. The x-axis represents changes in TC in mmol/l from each study while the y-axis represents the inverse of the standard error for changes in TC from each study. Each dot represents changes in TC plotted against its precision. In the absence of small-study effects, the plot should resemble a pyramid or inverted funnel, with scatter due to sampling variation. In the presence of potential small-study effects, the results from smaller studies with smaller/null findings will be missing in that region of the plot. While difficult to interpret, especially given the small number of effect estimates, there do not appear to be any small-study effects. Results were similar when the two results by Stefanick et al. were pooled into one overall effect size. Data adapted from Kelley et al.(Reference Kelley, Kelley and Roberts104).

Fig. 4. Example of Doi plot based on diet-induced changes in total cholesterol (TC) following a dietary intervention. The vertical line on the horizontal (x) axis represents the effect size (ES) with the lowest absolute z score, dividing the plot into two regions with the same areas. Visualisation of the plot suggests no asymmetry and thus no small-study effects such as publication bias. The obtained Luis Furuya-Kanamori index of 0·30 also suggests no asymmetry. Results were similar when the two results by Stefanick et al. were pooled into one overall ES. Data adapted from Kelley et al.(Reference Kelley, Kelley and Roberts104).

Data synthesis (influence and cumulative meta-analysis). Many meta-analyses include a small number of trials. For example, it has been reported that the typical number of studies included in a Cochrane systematic review is six(Reference Mallett and Clarke110). Given the former, it is usually relevant to conduct influence analysis with each study deleted from the model once in order to examine the effect that each study has on the overall results. Fig. 5 provides an example of influence analysis using the same data as for our other examples(Reference Kelley, Kelley and Roberts104).

Fig. 5. Influence analysis based on the inverse variance heterogeneity model with each result deleted from the overall analysis once. The black squares represent mean changes in total cholesterol (TC) with the corresponding study deleted from the model, while the left and right extremes of the squares represent the corresponding 95 % CI for the mean changes. As can be seen, changes ranged from –0·21 to –0·28 mmol/l with non-overlapping 95 % CI for all. These findings suggest that no one result had a significant impact on the overall findings. Results were similar when the two results by Stefanick et al. were pooled into one overall effect size (ES). Data adapted from Kelley et al.(Reference Kelley, Kelley and Roberts104).

In addition to influence analysis, it is often relevant to conduct cumulative meta-analysis, traditionally ranked by year of publication, to examine the accumulation of results over time(Reference Clarke, Brice and Chalmers111). The inclusion of findings from a cumulative meta-analysis can aid in making more educated choices based on past years of research as well as leading to more timely and increased use of successful interventions in practice(Reference Clarke, Brice and Chalmers111). Using this method, findings are pooled as each additional study is added to the model. An example of cumulative meta-analysis using the same data as for our previous examples is shown in Fig. 6.

Fig. 6. Cumulative meta-analysis ranked by year and based on the inverse variance heterogeneity model. The black circles represent mean changes in total cholesterol (TC) with the corresponding study, and all earlier studies pooled while the left and right extremes of the circles represent the corresponding 95 % CI for the mean pooled changes. As can be seen, non-overlapping 95 % CI have been observed since 1998. Results were similar when the two results by Stefanick et al. were pooled into one overall effect size (ES). Data adapted from Kelley et al.(Reference Kelley, Kelley and Roberts104).

Data synthesis (subgroup and/or meta-regression analysis). Given an adequate number of studies, subgroup and/or meta-regression may be conducted to explore the effect of selected covariates, for example, age, on the outcome(s) of interest, for example, changes in fat mass as a result of a weight-loss intervention. Traditionally, these are based on weights derived from fixed and random-effects models, and more recently, approaches such as the IVhet and QE models, details for all of which have been described elsewhere(Reference Higgins and Green6, Reference Borenstein, Hedges and Higgins81, Reference Doi, Barendregt and Khan92, Reference Doi, Barendregt and Khan93, Reference Xu and Doi112, Reference Lopez-Lopez, Van den Noortgate and Tanner-Smith113). While there may be a propensity for investigators to only conduct analyses when statistically significant and/or a large amount of inconsistency is found, this is generally not advised, given the current limitations of measures for heterogeneity and inconsistency(Reference Higgins, Thompson and Deeks114). With respect to the number of studies needed to conduct analyses such as meta-regression, currently no firm consensus exists regarding this. However, as a broad recommendation, and while understanding the potential arbitrariness of any definitive number given the numerous factors to consider, we support the recommendation of Fu et al., in which there should be at least six studies per covariate for a continuous variable, for example, age, and at least four studies per group for a categorical variable, for example, sex (female, male)(Reference Fu, Gartlehner and Grant115). Exclusive of dose–response analyses, the four studies per group for a categorical variable is also recommended for any subgroup analyses conducted. If multiple meta-regression analysis is conducted, one should also consider conducting and reporting results for all simple meta-regression analyses performed. This may be especially relevant, given that such analyses in meta-analysis are considered to be exploratory. As a result, such findings would need to be tested in original studies because studies are not randomly allocated to covariates in meta-analysis. Consequently, they are regarded as observational. For categorical variables such as sex, there may be a lack of studies in one or more categories to conduct any type of meta-regression or subgroup comparisons. If this is the case, there are more than two categories, and it is scientifically plausible, one may collapse one or more categories, so that at least two exist. One can then conduct their meta-regression and/or subgroup analyses. If this is not possible, one may then consider additional forms of sensitivity analyses by omitting the results from the category with the smaller number of studies to see how it effects one’s overall results. As an example, if there are results from ten studies, eight in males and two in females, one may choose to run their analyses with only the results from the males to see how it compares with the overall pooled results.

One aspect of meta-analysis in nutrition as well as other fields is that some studies conduct and report on highest v. lowest tertile comparisons. However, these are almost always difficult to interpret in terms of what nutritionists should recommend, given that there is overlap between studies with respect to what is considered high and low. Indeed, some low categories could be minimal and well below current recommended daily allowances while others could be considered close to pharmacological. Since nutritionists tend to prefer a recommended intake that can be applied to various populations and groups with confidence, it is recommended that any such comparisons be conducted using a dose–response approach. This consists of modelling the association between the exposure and outcome to estimate the increase or decrease associated with one unit, or some other appropriate unit change, in exposure(Reference Dekkers, Vandenbroucke and Cevallos32). For example, using linear dose–response meta-analysis, Morze et al. found no significant associations between a 10-g/d increase in chocolate intake and heart failure (relative risk = 0·99, 95 % CI 0·94, 1·04) as well as type 2 diabetes (relative risk = 0·94, 95 % CI 0·88, 1·01)(Reference Morze, Schwedhelm and Bencic116). However, a small inverse association was observed for CHD (relative risk = 0·96, 95 % CI 0·93, 0·99), and stroke (relative risk = 0·90, 95 % CI 0·82, 0·98)(Reference Morze, Schwedhelm and Bencic116). Greenland & Longnecker(Reference Greenland and Longnecker117), Hartemink et al.(Reference Hartemink, Boshuizen and Nagelkerke118) and Xu et al.(Reference Xu and Doi112) provide detailed information regarding dose–response methods for meta-analysis.

Data synthesis (practically relevant information). An aspect that is sometimes overlooked when conducting a meta-analysis is the need to provide practically relevant information to readers. In addition to reporting both absolute and relative results whenever possible, the use of metrics such as the number needed to treat (NNT)(Reference Higgins and Green6, Reference da Costa, Rutjes and Johnston119) and percentile improvement based on values such as Cohen’s U 3 index(Reference Cohen120), when appropriate, could be considered. For example, using the diet and TC data from our previous examples(Reference Kelley, Kelley and Roberts104), the method of Hasselblad and Hedges for estimating the NNT from continuous data(Reference Hasselblad and Hedges121), and a control group risk of 30 %, the NNT for diet-associated reductions in TC was 5, meaning that one in five (20 %) people would reduce their TC if they dieted. Using the same data, Cohen’s U 3 index for percentile improvement was 16·9, meaning an improvement from the 50th to 66·9th percentile. In addition, one should also consider both the clinical and population health importance of any findings from a meta-analysis. For example, a 2-mmHg reduction in resting systolic blood pressure as a result of lower sodium intake may not be very important at the patient level but may have significant implications at the population level, given that lower sodium intake has been associated with a 4 % reduction in CHD and a 6 % reduction in stroke(Reference Stamler, Rose and Stamler122).

Data synthesis (strength of evidence). An assessment for the strength of the evidence for the outcome(s) of interest should usually be conducted and reported. One of the most common instruments used is the GRADE instrument, details of which are provided elsewhere(Reference Guyatt, Oxman and Vist123). In brief, GRADE is a subjective tool that assesses the strength of evidence for a specific outcome across five areas: (1) risk of bias, (2) imprecision, (3) inconsistency, (4) indirectness and (5) publication bias(Reference Guyatt, Oxman and Vist123). For each of these items, the evidence can be rated down by one to two levels. There can also be an increase of one or two levels if there is a large effect and/or an increase of one level if either a dose–response relationship is observed or all plausible confounding would reduce the effect or increase the effect if no effect was identified(Reference Guyatt, Oxman and Vist123). For the GRADE instrument, risk of bias focuses on study limitations that include lack of allocation concealment and blinding, incomplete accounting of participants and outcome events, selective outcome reporting as well as any other limitations that reviewers believe may impact the outcome(Reference Guyatt, Oxman and Vist123). Imprecision is the degree of uncertainty about the findings and includes such things as a wide CI around the estimate of effect, while inconsistency signifies unexplained heterogeneity in results(Reference Guyatt, Oxman and Vist123). Indirectness is the evaluation of findings based on whether the included studies directly compare the interventions and populations in which one is interested in as well as measuring outcomes believed to be important by participants, for example, self-reported health-related quality of life as a result of weight loss in obese participants. Lastly, publication bias is the selective publication of studies in which improvements are embellished and harms are underestimated(Reference Guyatt, Oxman and Vist123). The overall certainty of the evidence is then rated by the authors as either (1) very low, (2) low, (3) moderate or (4) high(Reference Guyatt, Oxman and Vist123). As an example of the use of the GRADE instrument in nutrition, Baranski et al. rated the overall strength of evidence as moderate or high for the majority of parameters for which significant differences were detected in a systematic review with meta-analysis on differences in composition between organic and non-organic crops and crop-based foods(Reference Baranski, Srednicka-Tober and Volakakis124).

Discussion and conclusions

Where appropriate, the discussion and conclusions sections of a systematic review with meta-analysis should include (1) a summary of the overall findings, (2) a discussion of how the findings compare with previous research on the topic, (3) the potential clinical, public health and policy implications of the findings, (4) directions for future research with respect to both the reporting of future studies on the topic and additional studies that might be needed, for example, the dose–response effects of vitamin D on bone mineral density and (5) the strengths and potential limitations of one’s systematic review with meta-analysis. With respect to the latter, one of the inherent limitations of any AD systematic review with meta-analysis is the potential for ecological fallacy(Reference Rucker and Schumacher125). The PRISMA guidelines provide greater details regarding items to include in the discussion and conclusion sections of a systematic review with meta-analysis(Reference Liberati, Altman and Tetzlaff30).

With respect to interpretation on the part of the consumer, the results of a systematic review with meta-analysis should be considered, broadly, with respect to several potential factors. First and foremost, were any significant findings also found practically important? Second, were the included studies representative of the population, exposures and outcomes that one is interested in and deemed to be important? Third, do any potential benefits outweigh the risks involved? Fourth, is the evidence considered to be strong?

Finally, meta-analysis, like many fields today, is progressing at a rapid pace. As a result, it is very difficult for generic statisticians, biostatisticians and other relevant professionals to stay current unless they have a specific and current focus in this burgeoning field. Given the former, we strongly recommend that not only a content expert but also a meta-analytic expert be included in any meta-analysis that is conducted.

Conclusion

The number of systematic reviews, with or without meta-analysis, is increasing in the field of nutrition. The purpose of this article was to provide a non-technical introduction to producers, reviewers and consumers of these important reviews, with a focus on nutrition. It is the hope that this information will be helpful to producers, reviewers, and consumers in the field of nutrition.

Acknowledgements

No funding was received for this work.

G. A. K. was responsible for the conception and design, acquisition of data, analysis and interpretation of data, drafting the initial manuscript and revising it critically for important intellectual content. K. S. K. was responsible for the conception and design, acquisition of data, drafting the initial manuscript and revising all drafts critically for important intellectual content. Both authors read and approved the final manuscript.

There are no conflicts of interest.

Patient consent

Not required.

Data sharing statement

All data are available upon request from the corresponding author.

References

Sacks, HS, Berrier, J, Reitman, D, et al. (1987) Meta-analysis of randomized controlled trials. N Engl J Med 316, 450455.CrossRefGoogle Scholar
Zhang, Y, Akl, EA & Schunemann, HJ (2018) Using systematic reviews in guideline development: the GRADE approach. Res Synth Methods (epublication ahead of print version 14 July 2018).Google ScholarPubMed
Wasserstein, RL, Schirm, AL & Lazar, NA (2019) Moving to a world beyond “p < 0.05”. Am Stat 73, 119.CrossRefGoogle Scholar
Daudt, HM, van Mossel, C & Scott, SJ (2013) Enhancing the scoping study methodology: a large, inter-professional team’s experience with Arksey and O’Malley’s framework. BMC Med Res Methodol 13, 48.CrossRefGoogle ScholarPubMed
Amouzandeh, C, Fingland, D & Vidgen, HA (2019) A scoping review of the validity, reliability and conceptual alignment of food literacy measures for adults. Nutrients 11, E801.CrossRefGoogle ScholarPubMed
Higgins, JPT & Green, S (editors) (2011) Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. The Cochrane Collaboration. www.cochrane-handbook.org Google Scholar
Tricco, AC, Lillie, E, Zarin, W, et al. (2018) PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation: the PRISMA-ScR statement. Ann Intern Med 169, 467473.CrossRefGoogle Scholar
Agostoni, C, Guz-Mark, A, Marderfeld, L, et al. (2019) The long-term effects of dietary nutrient intakes during the first 2 years of life in healthy infants from developed countries: an umbrella review. Adv Nutr 10, 489501.CrossRefGoogle ScholarPubMed
Ioannidis, JPA (2016) The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses. Milbank Q 94, 485514.CrossRefGoogle ScholarPubMed
Kelley, GA & Kelley, KS (2018) Systematic reviews and cancer research: a suggested stepwise approach. BMC Cancer 18, 9.CrossRefGoogle ScholarPubMed
Cochrane (2016) Editorial and publishing policy resource. http://community.cochrane.org/editorial-and-publishing-policy-resource (accessed November 2017).Google Scholar
Shojania, KG, Sampson, M, Ansari, MT, et al. (2007) Updating Systematic Reviews: Technical Review No. 16. Rockville, MD: Agency for Healthcare Research and Quality.Google Scholar
Garner, P, Hopewell, S, Chandler, J, et al. (2016) When and how to update systematic reviews: consensus and checklist. BMJ 354, i3507.CrossRefGoogle ScholarPubMed
Shea, BJ, Reeves, BC, Wells, G, et al. (2017) AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ 358, j4008.CrossRefGoogle ScholarPubMed
Whiting, P, Savovic, J, Higgins, JP, et al. (2016) ROBIS: a new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol 69, 225234.CrossRefGoogle ScholarPubMed
Guyatt, G, Oxman, AD, Akl, EA, et al. (2011) GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J Clin Epidemiol 64, 383394.CrossRefGoogle ScholarPubMed
Whiting, PF, Rutjes, AW, Westwood, ME, et al. (2011) QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 155, 529536.CrossRefGoogle ScholarPubMed
McKenzie, JE & Brennan, SE (2017) Overviews of systematic reviews: great promise, greater challenge. Syst Rev 6, 185.CrossRefGoogle ScholarPubMed
Lunny, C, Brennan, SE, McDonald, S, et al. (2017) Toward a comprehensive evidence map of overview of systematic review methods: paper 1-purpose, eligibility, search and data extraction. Syst Rev 6, 231.CrossRefGoogle Scholar
Lunny, C, Brennan, SE, McDonald, S, et al. (2018) Toward a comprehensive evidence map of overview of systematic review methods: paper 2-risk of bias assessment; synthesis, presentation and summary of the findings; and assessment of the certainty of the evidence. Syst Rev 7, 31.CrossRefGoogle Scholar
Ballard, M & Montgomery, P (2017) Risk of bias in overviews of reviews: a scoping review of methodological guidance and four-item checklist. Res Synth Methods 8, 92108.CrossRefGoogle ScholarPubMed
Gates, A, Gates, M, Duarte, G, et al. (2018) Evaluation of the reliability, usability, and applicability of AMSTAR, AMSTAR 2, and ROBIS: protocol for a descriptive analytic study. Syst Rev 7, 85.CrossRefGoogle ScholarPubMed
Pieper, D, Waltering, A, Holstiege, J, et al. (2018) Quality ratings of reviews in overviews: a comparison of reviews with and without dual (co-)authorship. Syst Rev 7, 63.CrossRefGoogle ScholarPubMed
Hunt, H, Pollock, A, Campbell, P, et al. (2018) An introduction to overviews of reviews: planning a relevant research question and objective for an overview. Syst Rev 7, 39.CrossRefGoogle ScholarPubMed
Fusar-Poli, P & Radua, J (2018) Ten simple rules for conducting umbrella reviews. Evid Based Ment Health 21, 95100.CrossRefGoogle ScholarPubMed
Pollock, A, Campbell, P, Brunton, G, et al. (2017) Selecting and implementing overview methods: implications from five exemplar overviews. Syst Rev 6, 145.CrossRefGoogle ScholarPubMed
Pieper, D, Pollock, M, Fernandes, RM, et al. (2017) Epidemiology and reporting characteristics of overviews of reviews of healthcare interventions published 2012–2016: protocol for a systematic review. Syst Rev 6, 73.CrossRefGoogle ScholarPubMed
Pollock, M, Fernandes, RM, Becker, LA, et al. (2016) What guidance is available for researchers conducting overviews of reviews of healthcare interventions? A scoping review and qualitative metasummary. Syst Rev 5, 190.CrossRefGoogle Scholar
Calder, PC, Campoy, C, Eilander, A, et al. (2019) A systematic review of the effects of increasing arachidonic acid intake on PUFA status, metabolism and health-related outcomes in humans. Br J Nutr 121, 12011214.CrossRefGoogle ScholarPubMed
Liberati, A, Altman, DG, Tetzlaff, J, et al. (2009) The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. Ann Intern Med 151, W65W94.CrossRefGoogle ScholarPubMed
Zhang, S, Zhang, F, Du, M, et al. (2019) Efficacy and safety of iron supplementation in patients with heart failure and iron deficiency: a meta-analysis. Br J Nutr 121, 841848.CrossRefGoogle ScholarPubMed
Dekkers, OM, Vandenbroucke, JP, Cevallos, M, et al. (2019) COSMOS-E: guidance on conducting systematic reviews and meta-analyses of observational studies of etiology. PLoS Med 16, e1002742.CrossRefGoogle ScholarPubMed
Gaksch, M, Jorde, R, Grimnes, G, et al. (2017) Vitamin D and mortality: individual participant data meta-analysis of standardized 25-hydroxyvitamin D in 26916 individuals from a European consortium. PLOS ONE 12, e0170791.CrossRefGoogle ScholarPubMed
Riley, RD, Lambert, PC & Abo-Zaid, G (2010) Meta-analysis of individual participant data: rationale, conduct, and reporting. BMJ 340, c221.CrossRefGoogle ScholarPubMed
Riley, RD (2010) Commentary: like it and lump it? Meta-analysis using individual participant data. Int J Epidemiol 39, 13591361.CrossRefGoogle ScholarPubMed
Kelley, GA, Kelley, KS & Tran, ZV (2002) Retrieval of individual patient data for an exercise meta-analysis. Am J Med Sport 4, 350354.Google Scholar
Riley, RD, Simmonds, MC & Look, MP (2007) Evidence synthesis combining individual patient data and aggregate data: a systematic review identified current practice and possible methods. J Clin Epidemiol 60, 431439.CrossRefGoogle ScholarPubMed
Kelley, GA & Kelley, KS (2016) Retrieval of individual participant data for exercise meta-analyses may not be worth the time and effort. Biomed Res Int 2016, 5059041.CrossRefGoogle Scholar
Polanin, JR (2018) Efforts to retrieve individual participant data sets for use in a meta-analysis result in moderate data sharing but many data sets remain missing. J Clin Epidemiol 98, 157159.CrossRefGoogle Scholar
Riley, RD, Lambert, PC, Staessen, JA, et al. (2008) Meta-analysis of continuous outcomes combining individual patient data and aggregate data. Stat Med 27, 18701893.CrossRefGoogle ScholarPubMed
Steinberg, KK, Smith, SJ, Stroup, DF, et al. (1997) Comparison of effect size estimates from a meta-analysis of summary data from published studies and from a meta-analysis using individual patient data for ovarian cancer studies. Am J Epidemiol 145, 917925.CrossRefGoogle Scholar
Cooper, H & Patall, EA (2009) The relative benefits of meta-analysis conducted with individual participant data versus aggregated data. Psychol Methods 14, 165176.CrossRefGoogle ScholarPubMed
Olkin, I & Sampson, A (1998) Comparison of meta-analysis versus analysis of variance of individual patient data. Biometrics 54, 317322.CrossRefGoogle ScholarPubMed
Mathew, T & Nordstrom, K (1999) On the equivalence of meta-analysis using literature and using individual patient data. Biometrics 55, 12211223.CrossRefGoogle ScholarPubMed
Tudur Smith, C, Marcucci, M, Nolan, SJ, et al. (2016) Individual participant data meta-analyses compared with meta-analyses based on aggregate data. Cochrane Database Syst Rev, issue 9, MR000007.Google ScholarPubMed
Smelt, AF, Gussekloo, J, Bermingham, LW, et al. (2018) The effect of vitamin B12 and folic acid supplementation on routine haematological parameters in older people: an individual participant data meta-analysis. Eur J Clin Nutr 72, 785795.CrossRefGoogle ScholarPubMed
Stewart, LA, Clarke, M, Rovers, M, et al. (2015) Preferred reporting items for systematic review and meta-analyses of individual participant data: the PRISMA-IPD Statement. JAMA 313, 16571665.CrossRefGoogle ScholarPubMed
Tierney, JF, Vale, C, Riley, R, et al. (2015) Individual participant data (IPD) meta-analyses of randomised controlled trials: guidance on their use. PLoS Med 12, e1001855.CrossRefGoogle ScholarPubMed
Schwingshackl, L, Buyken, A & Chaimani, A (2019) Network meta-analysis reaches nutrition research. Eur J Nutr 58, 13.CrossRefGoogle ScholarPubMed
Galaviz, KI, Weber, MB, Straus, A, et al. (2018) Global diabetes prevention interventions: a systematic review and network meta-analysis of the real-world impact on incidence, weight, and glucose. Diabetes Care 41, 15261534.CrossRefGoogle ScholarPubMed
Hutton, B, Salanti, G, Caldwell, DM, et al. (2015) The PRISMA extension statement for reporting of systematic reviews incorporating network meta-analyses of health care interventions: checklist and explanations. Ann Intern Med 162, 777784.CrossRefGoogle ScholarPubMed
Laws, A, Kendall, R & Hawkins, N (2014) A comparison of national guidelines for network meta-analysis. Value Health 17, 642654.CrossRefGoogle ScholarPubMed
Rouse, B, Chaimani, A & Li, TJ (2017) Network meta-analysis: an introduction for clinicians. Intern Emerg Med 12, 103111.CrossRefGoogle ScholarPubMed
Riley, RD, Jackson, D, Salanti, G, et al. (2017) Multivariate and network meta-analysis of multiple outcomes and multiple treatments: rationale, concepts, and examples. BMJ 358, j3932.CrossRefGoogle ScholarPubMed
Doi, SAR & Barendregt, JJ (2018) A generalized pairwise modelling framework for network meta-analysis. Int J Evid Based Healthc 16, 187194.CrossRefGoogle ScholarPubMed
Brittain, EH, Fay, MP & Follmann, DA (2012) A valid formulation of the analysis of noninferiority trials under random effects meta-analysis. Biostatistics 13, 637649.CrossRefGoogle ScholarPubMed
Schmidli, H, Wandel, S & Neuenschwander, B (2013) The network meta-analytic-predictive approach to non-inferiority trials. Stat Methods Med Res 22, 219240.CrossRefGoogle Scholar
Acuna, SA, Chesney, TR, Ramjist, JK, et al. (2019) Laparoscopic versus open resection for rectal cancer: a noninferiority meta-analysis of quality of surgical resection outcomes. Ann Surg 269, 849855.CrossRefGoogle ScholarPubMed
Acuna, SA, Chesney, TR, Amarasekera, ST, et al. (2018) Defining non-inferiority margins for quality of surgical resection for rectal cancer: a Delphi consensus study. Ann Surg Oncol 25, 31713178.CrossRefGoogle ScholarPubMed
Liberati, A & D’Amico, R (2010) Commentary: the debate on non-inferiority trials: ‘when meta-analysis alone is not helpful’. Int J Epidemiol 39, 15821583.CrossRefGoogle Scholar
Beller, EM, Glasziou, PP, Altman, DG, et al. (2013) PRISMA for abstracts: reporting systematic reviews in journal and conference abstracts. PLoS Med 10, e1001419.CrossRefGoogle ScholarPubMed
Saint, S, Christakis, DA, Saha, S, et al. (2000) Journal reading habits of internists. J Gen Intern Med 15, 881884.CrossRefGoogle ScholarPubMed
Yamamoto, JM, Kellett, JE, Balsells, M, et al. (2018) Gestational diabetes mellitus and diet: a systematic review and meta-analysis of randomized controlled trials examining the impact of modified dietary interventions on maternal glucose control and neonatal birth weight. Diabetes Care 41, 13461361.CrossRefGoogle ScholarPubMed
Page, MJ, Shamseer, L & Tricco, AC (2018) Registration of systematic reviews in PROSPERO: 30,000 records and counting. Syst Rev 7, 32.CrossRefGoogle ScholarPubMed
Stewart, L, Moher, D & Shekelle, P (2012) Why prospective registration of systematic reviews makes sense. Syst Rev 1, 7.CrossRefGoogle ScholarPubMed
Asghari, G, Farhadnejad, H, Hosseinpanah, F, et al. (2018) Effect of vitamin D supplementation on serum 25-hydroxyvitamin D concentration in children and adolescents: a systematic review and meta-analysis protocol. BMJ Open 8, e021636.CrossRefGoogle ScholarPubMed
Shamseer, L, Moher, D, Clarke, M, et al. (2015) Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ 349, g7647.CrossRefGoogle Scholar
Denova-Gutierrez, E, Mendez-Sanchez, L, Munoz-Aguirre, P, et al. (2018) Dietary patterns, bone mineral density, and risk of fractures: a systematic review and meta-analysis. Nutrients 10, E1922.CrossRefGoogle ScholarPubMed
Hidayat, K, Chen, GC, Zhang, R, et al. (2016) Calcium intake and breast cancer risk: meta-analysis of prospective cohort studies. Br J Nutr 116, 158166.CrossRefGoogle ScholarPubMed
van Driel, ML, De Sutter, A, De Maeseneer, J, et al. (2009) Searching for unpublished trials in Cochrane reviews may not be worth the effort. J Clin Epidemiol 62, 838844.CrossRefGoogle Scholar
Bramer, WM, Rethlefsen, ML, Kleijnen, J, et al. (2017) Optimal database combinations for literature searches in systematic reviews: a prospective exploratory study. Syst Rev 6, 245.CrossRefGoogle ScholarPubMed
Vine, R (2006) Google Scholar. J Med Libr Assoc 94, 9799.Google Scholar
Burnham, JF (2006) Scopus database: a review. Biomed Digit Libr 3, 1.CrossRefGoogle ScholarPubMed
Cohen, J (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 70, 213220.CrossRefGoogle ScholarPubMed
Pedder, H, Sarri, G, Keeney, E, et al. (2016) Data extraction for complex meta-analysis (DECiMAL) guide. Syst Rev 5, 212.CrossRefGoogle ScholarPubMed
Sanderson, S, Tatt, ID & Higgins, JP (2007) Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography. Int J Epidemiol 36, 666676.CrossRefGoogle ScholarPubMed
Seehra, J, Pandis, N, Koletsi, D, et al. (2016) Use of quality assessment tools in systematic reviews was varied and inconsistent. J Clin Epidemiol 69, 179184.CrossRefGoogle ScholarPubMed
Higgins, JPT, Sterne, JAC, Savović, J, et al. (2016) A revised tool for assessing risk of bias in randomized trials. Cochrane Database Syst Rev 10, Suppl. 1, 2931.Google Scholar
Sterne, JA, Hernán, MA, Reeves, BC, et al. (2016) ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ 355, i4919.CrossRefGoogle ScholarPubMed
Firth, J, Marx, W, Dash, S, et al. (2019) The effects of dietary improvement on symptoms of depression and anxiety: a meta-analysis of randomized controlled trials. Psychosom Med 81, 265280.CrossRefGoogle ScholarPubMed
Borenstein, M, Hedges, L, Higgins, J, et al. (2009) Introduction to Meta-analysis. Chichester, West Sussex: John Wiley & Sons.CrossRefGoogle Scholar
Sacks, HS, Chalmers, TC, Smith, H (1982) Randomized versus historical controls for clinical trials. Am J Med 72, 233240.CrossRefGoogle ScholarPubMed
Schulz, KF, Chalmers, I, Hayes, R, et al. (1995) Empirical evidence of bias: dimensions of methodological quality associated with estimates of treatment effects in controlled trials. J Am Med Assoc 273, 408412.CrossRefGoogle ScholarPubMed
DerSimonian, R & Kacker, R (2007) Random-effects model for meta-analysis of clinical trials: an update. Contemp Clin Trials 28, 105114.CrossRefGoogle ScholarPubMed
Dersimonian, R & Laird, N (1986) Meta-analysis in clinical trials. Control Clin Trials 7, 177188.CrossRefGoogle ScholarPubMed
Dersimonian, R & Laird, N (2015) Meta-analysis in clinical trials revisited. Contemp Clin Trials 45, 139145.CrossRefGoogle ScholarPubMed
Biggerstaff, BJ & Tweedie, RL (1997) Incorporating variability in estimates of heterogeneity in the random effects model in meta-analysis. Stat Med 16, 753768.3.0.CO;2-G>CrossRefGoogle ScholarPubMed
Sidik, K & Jonkman, JN (2002) A simple confidence interval for meta-analysis. Stat Med 21, 31533159.CrossRefGoogle ScholarPubMed
Sidik, K & Jonkman, JN (2007) A comparison of heterogeneity variance estimators in combining results of studies. Stat Med 26, 19641981.CrossRefGoogle ScholarPubMed
Zeng, D & Lin, DY (2015) On random-effects meta-analysis. Biometrika 102, 281294.CrossRefGoogle ScholarPubMed
Poole, C & Greenland, S (1999) Random-effects meta-analyses are not always conservative. Am J Epidemiol 150, 469475.CrossRefGoogle Scholar
Doi, SA, Barendregt, JJ, Khan, S, et al. (2015) Advances in the meta-analysis of heterogeneous clinical trials I: the inverse variance heterogeneity model. Contemp Clin Trials 45, 130138.CrossRefGoogle ScholarPubMed
Doi, SA, Barendregt, JJ, Khan, S, et al. (2015) Advances in the meta-analysis of heterogeneous clinical trials II: the quality effects model. Contemp Clin Trials 45, 123129.CrossRefGoogle ScholarPubMed
Doi, SAR, Furuya-Kanamori, L, Thalib, L, et al. (2017) Meta-analysis in evidence-based healthcare: a paradigm shift away from random effects is overdue. Int J Evid Based Healthc 15, 152160.CrossRefGoogle ScholarPubMed
Barendregt, JJ & Doi, SA (2016) Meta XL, 5.3 ed. Queensland, Australia: EpiGear International Pty Ltd.Google Scholar
Amrhein, V, Greenland, S & McShane, B (2019) Scientists rise up against statistical significance. Nature 567, 305307.CrossRefGoogle ScholarPubMed
Higgins, JP, Thompson, SG & Spiegelhalter, DJ (2009) A re-evaluation of random-effects meta-analysis. J R Stat Soc Series A 172, 137159.CrossRefGoogle ScholarPubMed
Partlett, C & Riley, RD (2017) Random effects meta-analysis: coverage performance of 95% confidence and prediction intervals following REML estimation. Stat Med 36, 301317.CrossRefGoogle ScholarPubMed
Kriston, L (2013) Dealing with clinical heterogeneity in meta-analysis. Assumptions, methods, interpretation. Int J Methods Psychiatr Res 22, 115.CrossRefGoogle ScholarPubMed
Cariolou, M, Cupp, MA, Evangelou, E, et al. (2019) Importance of vitamin D in acute and critically ill children with subgroup analyses of sepsis and respiratory tract infections: a systematic review and meta-analysis. BMJ Open 9, e027666.CrossRefGoogle ScholarPubMed
Cochran, WG (1954) The combination of estimates from different experiments. Biometrics 10, 101129.CrossRefGoogle Scholar
Higgins, JPT, Thompson, SG, Deeks, JJ, et al. (2003) Measuring inconsistency in meta-analyses. BMJ 327, 557560.CrossRefGoogle ScholarPubMed
Ioannidis, JP, Patsopoulos, NA & Evangelou, E (2007) Uncertainty in heterogeneity estimates in meta-analyses. BMJ 335, 914916.CrossRefGoogle ScholarPubMed
Kelley, GA, Kelley, KS, Roberts, S, et al. (2012) Comparison of aerobic exercise, diet or both on lipids and lipoproteins in adults: a meta-analysis of randomized controlled trials. Clin Nutr 31, 156167.CrossRefGoogle ScholarPubMed
Egger, M, Davey Smith, G, Schneider, M, et al. (1997) Bias in meta-analysis detected by a simple graphical test. BMJ 315, 629634.CrossRefGoogle ScholarPubMed
Sterne, JAC, Gavaghan, D & Egger, M (2000) Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. J Clin Epidemiol 53, 11191129.CrossRefGoogle ScholarPubMed
Furuya-Kanamori, L, Barendregt, JJ & Doi, SAR (2018) A new improved graphical and quantitative method for detecting bias in meta-analysis. Int J Evid Based Healthc 16, 195203.CrossRefGoogle ScholarPubMed
Lau, J, Ioannidis, JP, Terrin, N, et al. (2006) The case of the misleading funnel plot. BMJ 333, 597600.CrossRefGoogle ScholarPubMed
Sterne, JA, Sutton, AJ, Ioannidis, JP, et al. (2011) Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ 343, d4002.CrossRefGoogle ScholarPubMed
Mallett, S & Clarke, M (2002) The typical Cochrane review. How many trials? How many participants? Int J Technol Assess Health Care 18, 820823.CrossRefGoogle ScholarPubMed
Clarke, M, Brice, A & Chalmers, I (2014) Accumulating research: a systematic account of how cumulative meta-analyses would have provided knowledge, improved health, reduced harm and saved resources. PLOS ONE 9, e102670.CrossRefGoogle ScholarPubMed
Xu, C & Doi, SAR (2018) The robust error meta-regression method for dose-response meta-analysis. Int J Evid Based Healthc 16, 138144.CrossRefGoogle ScholarPubMed
Lopez-Lopez, JA, Van den Noortgate, W, Tanner-Smith, EE, et al. (2017) Assessing meta-regression methods for examining moderator relationships with dependent effect sizes: a Monte Carlo simulation. Res Synth Methods 8, 435450.CrossRefGoogle ScholarPubMed
Higgins, J, Thompson, S, Deeks, J, et al. (2002) Statistical heterogeneity in systematic reviews of clinical trials: a critical appraisal of guidelines and practice. J Health Serv Res Policy 7, 5161.CrossRefGoogle ScholarPubMed
Fu, R, Gartlehner, G, Grant, M, et al. (2011) Conducting quantitative synthesis when comparing medical interventions: AHRQ and the Effective Health Care Program. J Clin Epidemiol 64, 11871197.CrossRefGoogle ScholarPubMed
Morze, J, Schwedhelm, C, Bencic, A, et al. (2019) Chocolate and risk of chronic disease: a systematic review and dose–response meta-analysis. Eur J Nutr (epublication ahead of print version 25 February 2019).Google Scholar
Greenland, S & Longnecker, MP (1992) Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. Am J Epidemiol 135, 13011309.CrossRefGoogle ScholarPubMed
Hartemink, N, Boshuizen, HC, Nagelkerke, NJ, et al. (2006) Combining risk estimates from observational studies with different exposure cutpoints: a meta-analysis on body mass index and diabetes type 2. Am J Epidemiol 163, 10421052.CrossRefGoogle ScholarPubMed
da Costa, BR, Rutjes, AW, Johnston, BC, et al. (2012) Methods to convert continuous outcomes into odds ratios of treatment response and numbers needed to treat: meta-epidemiological study. Int J Epidemiol 41, 14451459.CrossRefGoogle ScholarPubMed
Cohen, J (1988) Statistical Power Analysis for the Behavioral Sciences. New York: Academic Press.Google Scholar
Hasselblad, V & Hedges, LV (1995) Meta-analysis of screening and diagnostic tests. Psychol Bull 117, 167178.CrossRefGoogle ScholarPubMed
Stamler, J, Rose, G, Stamler, R, et al. (1989) INTERSALT study findings. Public health and medical care implications. Hypertension 14, 570577.CrossRefGoogle ScholarPubMed
Guyatt, GH, Oxman, AD, Vist, GE, et al. (2008) GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ 336, 924926.CrossRefGoogle ScholarPubMed
Baranski, M, Srednicka-Tober, D, Volakakis, N, et al. (2014) Higher antioxidant and lower cadmium concentrations and lower incidence of pesticide residues in organically grown crops: a systematic literature review and meta-analyses. Br J Nutr 112, 794811.CrossRefGoogle ScholarPubMed
Rucker, G & Schumacher, M (2008) Simpson’s paradox visualized: the example of the rosiglitazone meta-analysis. BMC Med Res Methodol 8, 34.CrossRefGoogle ScholarPubMed
Figure 0

Table 1. Types of systematic reviews

Figure 1

Fig. 1. Suggested stepwise approach for deciding whether a new or updated systematic review, with or without meta-analysis, should be conducted. Adapted from Kelley & Kelley(10). SRPSR, systematic reviews of previous systematic reviews.

Figure 2

Fig. 2. Forest plot example of diet-induced changes in total cholesterol (TC) in adults based on the inverse variance heterogeneity (IVhet) model. The black squares represent mean changes in TC from each study while the left and right extremes of the squares represent the corresponding 95 % CI, that is, compatibility intervals for the mean changes. The middle of the black diamond represents the pooled mean change in TC, while the left and right extremes of the diamond represent the corresponding 95 % CI of the pooled mean change. The vertical dashed line represents the pooled mean change in TC while the solid vertical line represents zero (0) effect. As can be seen, the pooled 95 % CI did not include zero (0), suggesting compatibility regarding the association between diet and reductions in TC. The results for Cochran’s Q statistic, P value for Q and I2 suggest a lack heterogeneity and inconsistency. The ES represents effect size changes in TC in mmol/l, while % weight represents the percentage weight attributed by each study to the overall pooled mean effect. Results were similar when the two results by Stefanick et al. were pooled into one overall ES. Data adapted from Kelley et al.(104).

Figure 3

Fig. 3. Example of funnel plot based on diet-induced changes in total cholesterol (TC) following a dietary intervention. The solid vertical line represents the overall pooled mean change in TC in mmol/l after a dietary intervention. The x-axis represents changes in TC in mmol/l from each study while the y-axis represents the inverse of the standard error for changes in TC from each study. Each dot represents changes in TC plotted against its precision. In the absence of small-study effects, the plot should resemble a pyramid or inverted funnel, with scatter due to sampling variation. In the presence of potential small-study effects, the results from smaller studies with smaller/null findings will be missing in that region of the plot. While difficult to interpret, especially given the small number of effect estimates, there do not appear to be any small-study effects. Results were similar when the two results by Stefanick et al. were pooled into one overall effect size. Data adapted from Kelley et al.(104).

Figure 4

Fig. 4. Example of Doi plot based on diet-induced changes in total cholesterol (TC) following a dietary intervention. The vertical line on the horizontal (x) axis represents the effect size (ES) with the lowest absolute z score, dividing the plot into two regions with the same areas. Visualisation of the plot suggests no asymmetry and thus no small-study effects such as publication bias. The obtained Luis Furuya-Kanamori index of 0·30 also suggests no asymmetry. Results were similar when the two results by Stefanick et al. were pooled into one overall ES. Data adapted from Kelley et al.(104).

Figure 5

Fig. 5. Influence analysis based on the inverse variance heterogeneity model with each result deleted from the overall analysis once. The black squares represent mean changes in total cholesterol (TC) with the corresponding study deleted from the model, while the left and right extremes of the squares represent the corresponding 95 % CI for the mean changes. As can be seen, changes ranged from –0·21 to –0·28 mmol/l with non-overlapping 95 % CI for all. These findings suggest that no one result had a significant impact on the overall findings. Results were similar when the two results by Stefanick et al. were pooled into one overall effect size (ES). Data adapted from Kelley et al.(104).

Figure 6

Fig. 6. Cumulative meta-analysis ranked by year and based on the inverse variance heterogeneity model. The black circles represent mean changes in total cholesterol (TC) with the corresponding study, and all earlier studies pooled while the left and right extremes of the circles represent the corresponding 95 % CI for the mean pooled changes. As can be seen, non-overlapping 95 % CI have been observed since 1998. Results were similar when the two results by Stefanick et al. were pooled into one overall effect size (ES). Data adapted from Kelley et al.(104).