Effectiveness of stop smoking interventions among adults: protocol for an overview of systematic reviews and an updated systematic review | Systematic Reviews

The evidence review will be completed by the Evidence Review and Synthesis Centre (ERSC) at the Ottawa Hospital Research Institute. A working group (WG) of Task Force members and external content experts was formed for development of the topic, refinement of the key questions and scope, and rating of outcomes. Outcomes were rated on a scale of 1 to 9 according to the Grading of Recommendations Assessment, Development and Evaluation (GRADE) methodology; those rated as critical (mean score 7 to 9) and important (mean score 4 to 6) for decision-making were selected. Patients identified through patient engagement activities conducted by the St. Michael’s Hospital Knowledge Translation Program have also rated the outcomes. The process of incorporating patient priorities is described in the CTFPHC’s Patient Engagement Protocol (https://canadiantaskforce.ca/methods/patient-preferences-protocol/).

Reporting of this protocol was guided by the PRISMA Statement for Protocols (PRISMA-P) to the extent possible and where appropriate [76] (Additional file 1). The protocol is registered in PROSPERO (https://www.crd.york.ac.uk/PROSPERO/) (CRD42018099691, CRD42018099692). The final overview will be reported using the Preferred Reporting Items for Overviews of systematic reviews including harms pilot checklist (PRIO-harms) [77], and the updated systematic review will be reported using PRISMA [78].

A team of clinical and content experts will be consulted at key points during the conduct of the evidence review. Amendments to this protocol will be noted in the final report.

Stage 1: Overview of systematic reviews of stop smoking interventions

Guidelines for the conduct of overviews of reviews are currently lacking [79]. Given this current gap, the methodology for this overview will be guided by the Cochrane Handbook of Systematic Reviews of Interventions (Chapter 22) [80] as well as other available reports on overview methodology [79, 81,82,83,84,85].

Literature search

The search strategy will be developed and tested through an iterative process by an experienced medical information specialist in consultation with the review team. We will search Ovid MEDLINE®, Ovid MEDLINE® Epub Ahead of Print, In-Process & Other Non-Indexed Citations, PsycINFO, Embase Classic + Embase, and the Cochrane Library on Wiley. Databases will be searched from 2008 to the current date. The draft search strategy can be found in Additional file 2. The search strategy will be peer-reviewed using the PRESS 2015 guideline [86]. Results of the PRESS reviews will be provided in an appendix in the final report.

We will search for unpublished literature and reports of ongoing and completed reports using the Canadian Agency for Drugs and Technologies in Health (CADTH) Grey Matters checklist [87] and through searches of the following websites: CADTH, Ontario Tobacco Research Unit, The Canadian Partnership Against Cancer (cancerview.ca), SurgeonGeneral.gov, Philip Morris, Foundation for a Smoke-free World, Public Health England, Tobacco.org, Truth Initiative, Physicians for a Smoke-Free Canada, Centers for Disease Control and Prevention Smoking and Health Resource Library, Canadian Cancer Society, American Cancer Society, American Thoracic Society, US National Cancer Institute, US National Comprehensive Cancer Network, National Institute for Health and Care Excellence, World Health Organization Framework Convention on Tobacco Control, World Health Organization’s International Clinical Trials Registry Platform, OpenTrials.net, International Prevention Research Institute, North American Quitline Consortium website, and the Ottawa Heart Institute’s Ottawa Model for Smoking Cessation. We will also scan the bibliographies of relevant reviews and other identified overviews for grey literature and references not identified in our database search. Grey literature searching will be restricted to English and French language documents and will be limited to what can be completed within 1 week by one reviewer.

Eligibility criteria

KQ1a and KQ1b will examine interventions that can be delivered or referred to in the primary care setting. This includes certain behavioural change interventions, pharmacotherapies, e-cigarettes, exercise interventions, and alternative therapies (Table 1). Interventions that cannot be delivered or referred to by a wide variety of primary care practitioners (e.g. quit-to-win contests, biomedical risk assessment, aversive smoking, incentivized cessation) as well as specific behavioural counselling techniques (e.g. motivational interviewing, stage of change-based counselling) which require specialized training that has been shown to vary [88] and may not be readily available to all primary care practitioners will be excluded. We will also exclude reviews on broader public health interventions (e.g. mass media, taxation, packaging restrictions) as well as those on broad lifestyle interventions not specific to tobacco smoking behaviour and that do not attempt to isolate for the effect of our included interventions (i.e. when delivered as part of a multifaceted lifestyle intervention). Generally, pharmacotherapies that are not approved by Health Canada as smoking cessation aids (e.g. clonidine, lobeline, anxiolytics, nortriptyline, opioid antagonists, silver acetate, rimonabant) or not available in Canada (e.g. Nicobrevin, Nicobloc, nicotine vaccines, mecamylamine) will be excluded. However, due to their ease of access, an exception will be made for St. John’s Wort (sold in various forms in pharmacies and health stores across Canada), cytisine, and S-adenosylmethionine (SAMe) (licensed natural health products).

Systematic reviews for KQ1a and KQ1b will be selected for inclusion according to the eligibility criteria outlined in Table 1 [89, 90].

In addition to the other interventions listed in Table 1, the intent of KQ1a/b is to capture reviews which examine behavioural change interventions (e.g. practitioner advice, counselling, self-help interventions). These reviews may provide information on the active components of these interventions, referred to as behavioural change techniques. Examples of such techniques include providing information on consequences of smoking, explaining the importance of abrupt cessation, strengthening ex-smoker identity, and receiving prompt commitment from the patient [50]. If there is sufficient data, subgroup analysis by behavioural change technique or clusters of techniques will be performed for KQ1a/b (see the “Subgroup analysis” section).

While the intent of KQ1a/b is to synthesize reviews of behavioural change interventions (these reviews may or may not report the behavioural change techniques used as part of these interventions), the intent of KQ1c is to capture reviews which specifically examine the effectiveness of behavioural change techniques or cluster of techniques. A taxonomy of behavioural change techniques used in smoking cessation interventions will guide the coding of techniques encountered in the literature [50].

Eligibility of reviews for KQ1c will be evaluated in consultation with the WG on a case-by-case basis with selection for inclusion dependent on applicability to the primary care setting. For example, the WG may decide to include behavioural change interventions outside of those listed in Table 2 or may decide to include reviews in specialty settings if the review examines behavioural change techniques that can reasonably be applied in primary care. Selection of reviews for KQ1c will be guided by the eligibility criteria outlined in Table 2. All decisions regarding the selection of reviews will be reported in the completed review.

Study selection

Duplicates will be identified and removed using Reference Manager [91]. Title and abstract and full-text screening will be conducted using an online systematic review managing software, Distiller Systematic Review (DistillerSR) Software© [92]. Two reviewers will independently screen the title and abstracts of citations using the liberal accelerated method (i.e. a second reviewer verifies records excluded by a first reviewer). References will be randomized, and screening will be done concurrently to ensure that each reviewer cannot determine whether a given reference was excluded by another reviewer. The full text of potentially relevant citations will be retrieved, and two reviewers will independently assess the article for relevancy. If unclear whether a review is eligible after duplicate review, a third person will be consulted before excluding the review. Conflicts will be resolved by consensus or by consulting with a third team member. The reasons for exclusion at full-text screening will be documented.

Both screening forms will be piloted by reviewers prior to commencement of screening, with adjustments made, as needed, to maximize efficiency. If necessary, articles will be ordered via interlibrary loan. Only those received within 30 days will be included. Exclusions due to unavailability of articles will be noted.

A list of potentially relevant reviews available only in abstract form will be made available, but these studies will not be included in the overview.

Data mapping and overlap detection

Given the proliferation of systematic reviews [81], we anticipate that we will encounter multiple systematic reviews covering the same research question (i.e. population, intervention, comparison, outcomes, time points, and settings). Such reviews are expected to rely on the same evidence base (i.e. same studies and data); therefore, inclusion of these overlapping systematic reviews may potentially bias the overview findings as the same primary studies are counted more than once [93].

While there is currently no optimal approach for addressing the issue of overlapping reviews [79], existing options include the following: (1) limiting inclusion to a single systematic review using a priori established criteria or (2) including all available reviews and computing the degree of overlap [79, 81, 93]. Limiting inclusion to a single systematic review for a given research question may result in missing data, and while inclusion of all available reviews may improve comprehensiveness, it also increases workload and complexity [81].

To detect and address overlapping systematic reviews, we will first map the research questions (i.e. population, intervention, comparator, outcomes, time points, setting) and characteristics (i.e. date of last search, comprehensiveness, and quality) of all eligible systematic reviews. Where there are multiple reviews addressing the same research question, we will compare the review characteristics and exclude those which are “superseded by a later review, or (contain) no additional (studies) compared with a review of similar, or higher, methodological quality” [79, 94]. For example, an up-to-date, high-quality systematic review may report on a single intervention (e.g. acupuncture) while another review, of lower methodological quality and with an older search date, may report on a number of alternative therapies including acupuncture. Although superseded by the former in terms of quality and recency, the latter review captures evidence on additional interventions. Inclusion of both reviews would be necessary to capture all available information on alternative therapies for smoking cessation. In this particular example, we would rely on the former review for data on acupuncture and on the latter for all other interventions (i.e. excluding acupuncture). As described by Pollock et al., the decision to exclude reviews based on these criteria can be a complex process often due to slight differences in research questions [94]. The criteria above will be used as a guide; with the pool of candidate reviews in hand, information will be mapped to facilitate decisions about potential exclusion. Decisions to exclude reviews due to redundancy will be tracked and documented in a table of characteristics of excluded reviews.

In cases where overlapping data cannot be avoided (i.e. overlapping reviews with similar search dates, quality, and comprehensiveness), we will include overlapping reviews and calculate the degree of overlap using the corrected covered area (CCA) [83, 93]. Although reporting the degree of overlap is recommended, it does not minimize or omit potential bias caused by inclusion of overlapping reviews [83, 93]. The CCA is calculated using the formula below, where N is the total number of studies across reviews (including multiple occurrences of the same study), r is the number of unique (first occurrence) studies, and c is the number of reviews.

$$CCA=fracN-rrc-r$$

The benefit of the correction for primary studies is that it diminishes the impact of large reviews that may add area but not necessarily overlap. Hence, the CCA corrects for the first time that studies are counted. The higher the CCA value, the greater the overlap among reviews: CCA value 0–5 would represent slight overlap, 6–10 of moderate overlap, 11–15 of high overlap, and > 15 of very high overlap.

Mapping of review characteristics will be conducted by a single reviewer. The decision to exclude a review, using the criteria described above, will be made by two reviewers via discussion, with review by the guideline WG. Where overlapping reviews are included, concordance of results/conclusions will be explored (see the “Discordance” section of the manuscript).

Quality assessment of systematic reviews

The methodological quality of reviews will be evaluated according to the AMSTAR 2 instrument (Additional file 3). This updated version of the original AMSTAR tool allows for the appraisal of systematic reviews of randomized and non-randomized studies of interventions [95]. We will evaluate each review against the 16-item instrument. An overall rating of quality will be assigned according to the algorithm suggested by Shea et al. [95]. Reviews failing to meet any of the seven critical AMSTAR 2 items will be deemed to have a “critical flaw” while non-fulfillment of the remaining items will be deemed a “non-critical weakness” of the review (Additional file 4). Reviews with one or more critical flaws will receive a low or critically low rating, respectively. Reviews with no critical flaws will be considered either high or moderate quality depending on the number of non-critical weaknesses (i.e. high-quality reviews have a maximum of one non-critical weaknesses and moderate-quality reviews have more than one weakness). Aside from decisions on inclusion related to assessing duplicate or overlapping reviews, reviews will not need to meet a particular threshold for methodological quality to be included.

The quality of systematic reviews will be evaluated by one reviewer and verified by another. Disagreements regarding by-item and overall rating of quality will be resolved by consensus or third-party adjudication if consensus cannot be reached.

Data extraction and management

Data extraction forms will be developed a priori in DistillerSR and pilot tested on a sample of studies to adjust forms, where needed, to maximize efficiency. Full data abstraction will be completed by one reviewer and verified by a second reviewer. Disagreements will be resolved by consensus or third party adjudication if consensus cannot be reached.

Additional file 5 lists draft items to be collected from reviews during data extraction. We will extract data as synthesized and/or reported in the reviews. We will not consult primary studies for the purpose of data extraction, risk of bias assessment, or for verifying the accuracy of the data reported in the systematic reviews.

We will collect data regarding outcomes of interest as reported by review authors. For reviews reporting a meta-analysis, we will collect the pooled effect estimates, corresponding confidence intervals, and results of statistical tests for heterogeneity (e.g. number of studies, number of participants, chi-square, Cochrane Q, corresponding p values, I2).

For network meta-analyses, ideally sufficient evidence from direct comparisons will be available, and treatment effect estimates along with measures of uncertainty from those analyses will be extracted. However, where little to no evidence from direct comparisons is available and indirect comparison data exist, we will extract both analyses and determine extent of consistency of results and make appropriate interpretations. For indirect comparison analyses, effect estimates and corresponding credible intervals will be collected from indirect comparisons. We will extract and transparently describe if and how authors’ ranking of treatments was used, ensuring appropriateness; ranking may take the form of rank probabilities, mean/median rank, surface under the cumulative ranking (SUCRA) curve, or a P-score [96,97,98].

For outcomes where a pooled analysis was not performed, how data are extracted will be informed by authors’ reporting. For example, if effect estimates from primary studies are reported, then a range of those effects could be extracted. In the absence of optimal quantitative data, a narrative summary of findings will be extracted from the reviews. Data will be collected for all reported and relevant (see Table 1) time points of follow-up.

Where reviews partially overlap with the scope of interest, such that a subset of studies may be conducted in a different population (e.g. adolescents), setting (not relevant to primary care), or other relevant parameter, we will attempt to determine whether the analyses undertaken are sufficiently direct to the overview question by considering the relative contribution of those studies to the analysis, subject to adequate reporting of this information. How these analyses are handled (inclusion versus exclusion) will be reviewed with the WG for their input; those decisions and any accompanying uncertainty in the applicability of the included results will be detailed in the report.

Subgroup analysis

The overview will seek information on various factors that would typically be considered variables for effect modification. In the case of an overview, we expect to encounter reviews that have undertaken subgroup or meta-regression analyses. There may also be reviews through the process of defining scope that would have focused their interest according to a particular factor, such as evaluating the effects of an intervention in a particular setting. Reviews addressing both of these approaches will be included. Variables of interest listed below are those that we have considered as being potentially important effect modifiers that would influence the development of guideline recommendations or implementation considerations. According to guidance, we have restricted subgroup analysis to characteristics that are measured at baseline rather than after randomization [99].

Populations

• Fewer versus more quit attempts (specific groupings will depend on what is found in the literature)

• Opportunistic versus individuals seeking treatment

• Baseline level of nicotine dependence (e.g. using a validated scale or cigarettes per day as a proxy)

• By demographic factors (age, SES, sex, ethnicity, LGBTQ+)

• By comorbid conditions (e.g. mental illness, HIV infection, cardiovascular disease, COPD, obesity, substance use disorder)

• By pregnancy status

Intervention-related variables

• Dose, type, duration, number of sessions

• Specific forms of an intervention (e.g. yoga as a form of exercise)

• KQ1a/b: behavioural change technique (e.g. providing information on consequences of smoking, explaining the importance of abrupt cessation, receiving prompt commitment from the patient)

Settings

• Family medicine clinics

• Walk-in clinics

• Smoking cessation clinics

• Urgent care facilities

• Emergency departments

• Public health units

• Pharmacies

• Dental offices

• Behavioural health/substance use treatment facilities (ambulatory or outpatient)

• Telehealth

Other variables

Evidence synthesis

While there are both simple (e.g. comparing 95% confidence intervals, statistical test of summary estimates) and complex (e.g. Bucher method, network meta-analysis) methods available for indirect comparisons of treatments across reviews, all approaches are based on the assumption that the primary studies are similar [85, 100]. This would require overview authors to be familiar with the primary study literature and not to rely solely on review authors’ reporting of the primary studies [85]. Given that we will not have opportunity to read and become familiar with the primary study reports themselves, conducting network meta-analyses or informal indirect comparisons of interventions will not be performed. As noted above, any existing network meta-analyses located in the literature will be included and commented on.

Similarly, subgroup analyses within reviews will provide evidence for effect modification. For factors that comprise the focused scope of a given review, as described in the previous section, we will provide the appropriate statements relating to interpretation but be unable to perform comparisons across reviews in the absence of the direct familiarity with the primary studies. Where possible, we will evaluate the credibility of subgroup analyses [99, 101, 102].

Although a narrative synthesis of available evidence to ensure appropriate interpretation will be provided for readers, the use of GRADE tables will facilitate appropriate presentation of this information in tabular form to avoid juxtaposition that may lend to inappropriate comparisons on the part of the reader [83, 85, 103]. Comparisons across reviews with similar scope will be limited to an assessment of the extent of concordance or discordance of the review results and, for discordance, an exploration of a potential explanation.

Discordance

Reviews that overlap in terms of scope may present discordant results and/or conclusions due to variation in eligibility criteria, data extraction, risk of bias assessment, data synthesis approach, or interpretation of the results [104]. In those instances, we will investigate the source(s) of discordance using the algorithm developed by Jadad et al. as a guide [104, 105].

Where overlapping reviews of similar quality rely on the exact same studies, we will investigate whether discordance was due to differences in data extraction (e.g. reviews may have extracted data at different time points of follow-up or reviews may vary regarding definitions of outcomes or outcome measurement methods), heterogeneity testing (e.g. reviews differ in their investigation of clinical and methodological heterogeneity and the decision in which to conduct a meta-analysis), or the synthesis approach (e.g. quantitative versus qualitative synthesis or in the statistical methods used).

If overlapping reviews do not rely on the exact same studies, we will investigate differences in the eligibility criteria. If similar, we will evaluate whether discordance is attributable to differences in the search strategies (e.g. number and type of databases searched, whether grey literature was searched) or in the application of the eligibility criteria. If reviews use different eligibility criteria, Jadad et al. [105] recommend comparing the publication status of primary studies (e.g. whether there are differences in the inclusion of unpublished reports), evaluation of the methodological quality of primary studies (e.g. differences across reviews regarding the assessment of quality of primary studies and how quality was used in interpreting the results of the review), language restrictions, and quantitative synthesis [105].

In addition to exploring sources of discordance, we will categorize discordance as follows: (1) direction of effect (i.e. reviews report results in opposite directions), (2) magnitude of effect (i.e. reviews report results in the same direction but differ in the size of the effect estimate), and (3) statistical significance (i.e. statistical significance reached in one review but not others) [105].

Quality of the body of evidence

The Task Force endorses the use of GRADE methodology for assessing the quality of the body of evidence for critical and important outcomes [106]. Currently, there are no methods to evaluate the strength of evidence across systematic reviews [83]. For each outcome of interest reported in each individual review, we will provide GRADE assessments by intervention/comparison [107]. We will not evaluate the strength of the evidence across reviews.

For reviews that have used GRADE methods, we will provide results for the overall quality of evidence, including reasons for downgrading. If available, we will also report the ratings for each of the five domains of GRADE (i.e. risk of bias, imprecision, indirectness, inconsistency, publication bias). We will not consult primary studies as a quality control measure.

If GRADE methods were not used in a given review, we will attempt to conduct GRADE assessments using information available in the review (e.g. risk of bias assessments). This will likely be challenging due to reporting issues; therefore, we will provide our best interpretation based on the available information and note any limitations. For systematic reviews that include a network meta-analysis, using information reported in the review, we will evaluate the quality of evidence using the GRADE extension for network meta-analysis [108]. As above, we will not consult primary studies for the purpose of conducting GRADE assessments. We will make note if it is not possible to conduct GRADE for a given review or outcome.

Stage 2: Updated systematic review on electronic cigarettes for smoking cessation

Literature search

The search strategy for this update will be developed using the search strategy of the candidate systematic review, once identified. The search strategy of the candidate review will be evaluated and modified as necessary. Databases will be searched from the last search date of the review. Using the OVID platform, we will search Ovid MEDLINE®, Ovid MEDLINE® Epub Ahead of Print, In-Process & Other Non-Indexed Citations, Embase Classic + Embase, and PsycINFO. We will also search the Cochrane Library on Wiley. The final search will be peer-reviewed using the PRESS 2015 guideline [86]. Results of the PRESS reviews will be provided in an appendix in the final report. The grey literature will be searched using the same approach outlined for the overview of reviews.

Eligibility criteria

Studies will be selected for inclusion using the criteria outlined in Table 3.

Study selection and data extraction

Study selection and data extraction will follow the same process described for the overview of reviews. Where study eligibility is unclear, authors will be contacted by email twice over 2 weeks for additional information.

We will collect both self-report and biochemically validated tobacco abstinence and relapse. Data will be collected for all reported and relevant (see Table 3) time points of follow-up. Where needed, we will convert data (e.g. standard error to standard deviation) to facilitate consistent presentation of results across studies. Authors will be contacted by email twice over 2 weeks if any information is missing or unclear. Refer to Additional file 6 for a list of draft items to be collected during data extraction

We will consult studies included in the original review to ensure that all outcomes of interest (Table 3) have been captured.

Risk of bias assessment

For consistency, risk of bias assessments/quality appraisal will be performed for all available studies (i.e. studies included in the original review and newly identified studies). The risk of bias of randomized and non-randomized controlled trials will be assessed by one reviewer using the Cochrane risk of bias (ROB) tool [109] (Additional file 7). We will consider industry funding under the “other sources of bias” domain of the tool. A modified version of the Scottish Intercollegiate Guidelines Network critical appraisal tool [110] (Additional file 8), which accounts for potential sources of bias including that arising from industry funding, will be used to evaluate the quality of prospective cohort studies. Verification will be done by a second reviewer. Disagreements will be resolved by consensus or third-party adjudication.

Some domains are outcome-specific and will be assessed at the outcome level. Overall risk of bias for the body of evidence will be evaluated according to the importance of domains, the likely direction of bias, and the likely magnitude of bias [109]. The Agency for Healthcare Research and Quality guidance will be followed for evaluating risk of bias for outcome and analysis reporting bias [111].

Analysis

Study characteristics will be summarized narratively and presented in summary tables. Where possible, relative and absolute effects with 95% confidence intervals will be calculated for the GRADE summary of findings and evidence profile tables. Risk ratios and risk differences will be used to report effects for dichotomous data. For calculating the risk difference from meta-analyzed data, we will use the median baseline risk for the control group in the included studies, although we may perform sensitivity analysis using differing baseline risks if thought to be suitable. For continuous outcomes, mean difference (i.e. difference in means) effect measures will be used for outcomes using the same measure and standardized mean differences for outcomes using different measures, consistent with GRADE guidance [112].

Meta-analysis

We will examine the extent of clinical and methodological heterogeneity to determine appropriateness of performing meta-analysis. The Cochrane’s Q (considered statistically significant at p < 0.10) and I2 statistic will be used to assess the statistical heterogeneity across included studies [113, 114]. If appropriate, data from the original systematic review will be meta-analyzed with data from newly identified studies, using random effects models. For time-to-event data, the hazard ratio will be pooled using the generic inverse variance method. Analyses will be stratified by study design. For observational studies, we will use adjusted risk estimates in the meta-analysis.

Should meta-analysis not be appropriate due to considerable heterogeneity, the range of effects will be presented and results will be discussed narratively. Studies will also be presented in a forest plot without a pooled risk estimate. Clinical and methodological sources of heterogeneity will also be explored using subgroup, sensitivity, and/or meta-regression analyses, depending on how data are reported in studies. We will follow previously published guidance for meta-regression [115].

Sparse binary data and studies with zero events

Results will be synthesized narratively if studies report rare events. The risk difference will be used for outcomes (e.g. serious adverse events) where at least one intervention group contains zero events.

Subgroup analysis

If there are sufficient data, the following subgroup analyses will be conducted:

• Fewer versus more quit attempts (specific groupings will depend on what is found in the literature)

• Opportunistic versus individuals seeking treatment

• Baseline level of nicotine dependence (e.g. using a validated scale or cigarettes per day as a proxy)

• By demographic factors (age, SES, sex, ethnicity, LGBTQ+)

• By comorbid conditions (e.g. mental illness, HIV infection, cardiovascular disease, COPD, obesity, substance use disorder)

• By use of other substances (alcohol, cannabis, opioids)

• By pregnancy status

• By setting (e.g. family medicine clinics, walk-in clinics, urgent care facilities)

• Nicotine content (groupings will depend on what is found in the literature)

• Intensity of behavioural therapy (groupings will depend on what is found in the literature)

• Duration of e-cigarette usage as part of the intervention (groupings will depend on what is found in the literature)

• By type or generation of e-cigarette device

• By industry funding

Sensitivity analysis

Sensitivity analyses restricted to low risk of bias studies may be performed. Sensitivity analyses may also be performed to explore statistical heterogeneity or to evaluate the impact of various decisions made during the conduct of the review.

Small study effects

To evaluate small study effects, a combination of graphical aids and/or statistical tests will be performed if there are at least 10 studies in the analysis.

Software

The Cochrane Review Manager software version 5.3 [116] will be used to conduct analyses. Where needed, Comprehensive Meta-Analysis (CMA) or Stata may be used.

Grading the quality of evidence and interpretation

For critical and important outcomes, the GRADE framework [106, 117] will be used to assess the quality of the evidence.