INSERM U1153, Hôpital Hôtel-Dieu, 1 place du Parvis Notre-Dame, 75004 Paris, France

Centre Cochrane Français, Paris, France

Department of Epidemiology, Columbia University Mailman School of Public Health, New York, NY, USA

Stanford Prevention Research Center, Department of Medicine and Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA

Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, CA, USA

Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA

Université Paris Descartes - Sorbonne Paris Cité, Paris, France

INSERM CIE 4, Paris, France

Assistance Publique-Hôpitaux de Paris, Hôpital Européen Georges Pompidou, Unité de Recherche Clinique, Paris, France

Assistance Publique-Hôpitaux de Paris, Hôpital Hôtel-Dieu, Centre d’Epidémiologie Clinique, Paris, France

Abstract

Background

Networks of trials assessing several treatment options available for the same condition are increasingly considered. Randomized trial evidence may be missing because of reporting bias. We propose a test for reporting bias in trial networks.

Methods

We test whether there is an excess of trials with statistically significant results across a network of trials. The observed number of trials with nominally statistically significant p-values across the network is compared with the expected number. The performance of the test (type I error rate and power) was assessed using simulation studies under different scenarios of selective reporting bias. Examples are provided for networks of antidepressant and antipsychotic trials, where reporting biases have been previously demonstrated by comparing published to Food and Drug Administration (FDA) data.

Results

In simulations, the test maintained the type I error rate and was moderately powerful after adjustment for type I error rate, except when the between-trial variance was substantial. In all, a positive test result increased moderately or markedly the probability of reporting bias being present, while a negative test result was not very informative. In the two examples, the test gave a signal for an excess of statistically significant results in the network of published data but not in the network of FDA data.

Conclusion

The test could be useful to document an excess of significant findings in trial networks, providing a signal for potential publication bias or other selective analysis and outcome reporting biases.

Background

Reporting bias proceeds from the tendency of researchers, pharmaceutical companies and journals to publish trial results based on the direction, magnitude and statistical significance of the results

Numerous statistical tests have been introduced to detect the presence of reporting bias in conventional meta-analyses

The classic situation of many meta-analyses of randomized trials with the same setting, disease and outcome is network meta-analysis

Methods

Test of bias in a conventional meta-analysis

We consider a meta-analysis of

To estimate the expected number _{
i
} to detect

The true effect size

We test whether the observed number ^{′} using a binomial probability test. We set ^{′}=0.10, as is typical for selective reporting bias tests. Consequently, we would reject the null hypothesis in favor of excess significant findings if

For binary outcome data, let us assume that we observe _{
Ei
} and _{
Ci
} events in _{
Ei
} and _{
Ci
} patients in the experimental and control groups of trial _{
i
} is estimated as the power of the two-sided Fisher’s exact test to detect the difference between _{
Ei
} and _{
Ci
} patients at the specified _{
Ei
}, _{
Ci
}, _{
Ei
}, _{
Ci
}.

For continuous outcome data, let us assume that we observe the means and standard deviations _{
Ei
}, _{
Ei
} and _{
Ci
}, _{
Ci
} from _{
Ei
} and _{
Ci
} patients in the experimental and control groups, respectively, of trial _{
i
} is estimated as the power of the two-sided t-test to detect a difference between _{
Ei
} and _{
Ci
} in _{
Ei
} and _{
Ci
} patients at the specified _{
Ei
}, _{
Ei
}, _{
Ei
} and _{
Ci
}, _{
Ci
}, _{
Ci
}.

Test of bias in a network of trials

We consider that the network of trials can be described as _{
j
} trials each. We estimate the expected number _{
j
} of trials with statistically significant results for each meta-analysis across the network by assuming a true effect size _{
j
} for each meta-analysis. The expected number

Detection of excess significant findings does not mean that all pairwise meta-analyses included in the network have been equally affected by the same bias. Even if the bias is exchangeable across all pairwise meta-analyses in the network, we may observe that some pairwise meta-analyses are affected more than others, and some are not affected at all. For instance, for 3 meta-analyses with 10 trials per meta-analysis, and because of reporting bias, 10% of the evidence disappears in a “file drawer”, it is within the range of chance that these 3 file-drawer trials may come one from each meta-analysis, or all 3 may come from the same meta-analysis. However, detection of excess significance suggests that selective reporting bias may have affected the results of this whole body of evidence and thus inferences should be cautioned in this regard.

Simulation studies

We assessed the type I error rate and power of the test using Monte Carlo simulation studies. These simulations were based on binary outcome data. The protocol for simulation studies is described in detail in the Additional file

**Protocol of simulation studies.**

Click here for file

For a given meta-analysis (i.e., a given pair of experimental and control treatments), we set the number of trials ^{2} and we generated the number of events and non-events in the experimental and control groups for each trial under a random-effects model. We used ^{2} as (0.02, 0.08, 0.25).

To simulate reporting bias affecting trials in a given meta-analysis, we considered a selection model that links the probability of trial selection to both trial size and intensity of treatment effect

We simulated a network of trials as _{
j
}=_{
j
}. We considered 4 distinct scenarios by setting known relative effects from _{1}= log(0.75) or _{2}= log(0.95) and _{1}=0.02 or _{2}=0.08. The sets of realizations are reported in Table _{
j
} as in Table ^{2} as (0.02, 0.08, 0.25). These values were based on the characteristics of a large sample of Cochrane meta-analyses

**Treatment**

**Dispersion of**

**No. of**

**True average treatment effects**

**effect****
ψ
**

**treatment effect****
ν
**

**meta-analyses****
J
**

**
θ
**

We set the relative effects _{
j
},

_{1}=0.02

6

0.793, 0.889, 0.741, 0.954, 0.569, 0.684

_{2}=0.08

6

0.808, 0.725, 0.876, 0.698, 0.699, 0.395

_{1}=0.02

6

0.796, 1.172, 1.000, 1.171, 1.099, 0.883

_{2}=0.08

6

0.491, 1.214, 0.936, 0.977, 1.451, 0.754

_{1}=0.02

10

0.658, 0.852, 0.696, 0.889, 0.741, 0.722, 0.645, 0.683, 0.816, 0.796

_{2}=0.08

10

0.978, 0.432, 1.149, 0.706, 0.432, 0.751, 0.679, 0.653, 0.624, 0.568

_{1}=0.02

10

1.081, 1.089, 0.767, 0.973, 0.781, 0.763, 1.266, 0.816, 1.066, 0.992

_{2}=0.08

10

0.845, 0.637, 1.030, 0.799, 0.541, 0.851, 1.063, 1.307, 0.674, 0.732

Reporting bias was induced for each of the _{
j
} from an uniform distribution over the support [ _{
min
};_{
max
}]. We considered

For each scenario, we generated 10,000 datasets. We assessed the empirical type I error rate and power for scenarios without and with reporting bias, respectively. We took into account the possibly differing type I error rates of the tests and we estimated powers adjusted for type I error rate

In the cases of a conventional meta-analysis, we also assessed for comparison purposes the performance of the test introduced by Rücker et al. based on a weighted regression of the arcsine transformation of observed risks with explicit modeling of between-trial heterogeneity

Application of the test with 2 trial networks

We provide two illustrations of the test for networks of antidepressant and antipsychotic trials where strong and weak selective reporting biases have been convincingly demonstrated based on a comparison of FDA data versus published data (Figure

Networks of antidepressant and antipsychotic trials.

**Networks of antidepressant and antipsychotic trials.**

Networks of antidepressant trials

For the antidepressant trials, we used 2 star-shaped networks created from US Food and Drug Administration (FDA) reviews of antidepressant trials and their matching publications

Networks of antipsychotic trials

Turner et al. also used FDA data to assess whether the apparent efficacy of second-generation antipsychotics had been influenced by reporting bias

Results

Simulation studies in a conventional meta-analysis

Complete results in the cases of a conventional meta-analysis are reported in the Additional file

**Results of simulation studies for conventional meta-analysis.**

Click here for file

Simulation studies in a network of trials

Type I error rate

Results for scenarios without reporting bias are presented in Figure ^{2}=0.02). Conversely, error inflation was substantial with substantial between-trial variance within a meta-analysis (^{2}=0.25) except when the true effect size was estimated as the treatment effect estimate of the largest trial in the meta-analysis with the empirical type I error rate being in good agreement with the pre-specified significance level of 0.10.

Type I error rate of the extended tests for reporting bias in a network of trials.

**Type I error rate of the extended tests for reporting bias in a network of trials.**

Power

Results for scenarios with reporting bias are presented in Figure

Adjusted power of the extended tests for reporting bias in a network of trials.

**Adjusted power of the extended tests for reporting bias in a network of trials.**

**Additional results of simulation studies for trial networks.**

Click here for file

We also ran simulations with another set of vectors of true average treatment effects _{
j
}. Results were similar (not shown).

Likelihood ratio

Likelihood ratios of a positive test result indicated that the proposed test had modest (with substantial heterogeneity) or high (with little or no heterogeneity) effect on increasing the likelihood of bias. Likelihood ratios of a negative test result indicated that the proposed test had a weak to moderate effect in decreasing the likelihood of bias (Figure

Likelihood ratios of the extended tests for reporting bias in network of trials.

**Likelihood ratios of the extended tests for reporting bias in network of trials.**

Application of the test with 2 trial networks

Networks of antidepressant trials

When we considered the fixed-effect summaries as the plausible effect sizes, the observed number of trials with significant results across the network of published data was larger than the expected number (Table

**Antidepressant trials**

**Published data**

**FDA data**

**(N =51 trials)**

**(N =74 trials)**

**Plausible effects**

**
O
**

**
E
**

**
p
**

**
O
**

**
E
**

**
p
**

Fixed-effect summary

41

35.3

0.05

38

34.5

0.24

Random-effects summary

41

35.6

0.06

38

34.6

0.25

Largest trial

41

31.3

0.002

38

29.0

0.02

**Antipsychotic trials**

**Published data**

**FDA data**

**(N =20 trials)**

**(N =24 trials)**

**Plausible effects**

**
O
**

**
E
**

**
p
**

**
O
**

**
E
**

**
p
**

Fixed-effect summary

19

18.1

0.43

20

19.6

0.53

Random-effects summary

19

18.4

0.50

20

34.6

0.56

Largest trial

19

16.1

0.10

20

18.6

0.36

Networks of antipsychotic trials

When we considered the fixed-effect summaries as the plausible effect sizes, we found no evidence of an excess of statistically significant results across the network of published data and the network of FDA data. Results were similar when using random-effects summaries as plausible effect sizes. However, when using the estimates of the largest trials as plausible effect sizes, there was a signal for an excess of statistically significant results in the network of published data but not in the network of FDA data.

Discussion

In this paper, we proposed a test for reporting bias in networks of trials. The test is based on the observed and expected numbers of trials with statistically significant results across the network. In simulation studies, we found that the type I error rate of the proposed test was in agreement with the nominal type I error level and that the proposed test had overall moderate power after adjustment for type I error rate, except when the between-trial variance was substantial in which case the empirical type I error rate was considerably inflated and the empirical power was low. In all, a positive test result increases modestly or even markedly the probability of reporting bias being present, while a negative test result is not very informative.

The proposed test fits well with the widespread notion that a treatment effect estimate has to pass a cut-off of statistical significance, resulting in an aversion to null results or, conversely, in significance chasing

In a network of trials, conventional tests for asymmetry could be applied to each meta-analysis constituting the network. If reporting bias is detected in any pairwise comparison, meta-analysts have a signal that they should interpret the synthesis results with caution. However, the number of trials addressing each pairwise comparison may often be limited (<10 trials for each pairwise comparison), which would prevent this approach from documenting or excluding reporting bias appropriately

Decisions about conclusiveness and dissemination of research findings are commonly based on statistical significance (only) and this practice is likely to affect any network of evidence. We acknowledge that the exchangeability assumption may not be tenable in contexts in which reporting biases may affect the network in a systematically unbalanced way. For instance, only some pairwise comparisons could be affected and not others. However, the proposed test could still be useful in such networks if there are sufficient numbers of trials across the affected pairwise comparisons.

To estimate the expected number of trials with statistically significant results, estimates of the unknown true effect sizes for each meta-analysis in the network are required. In this regard, the true effect sizes are approximated by pooled estimates using just the very trials that are suspected to be affected by a meta-bias. However, we note that, in the presence of reporting bias, the fixed-effect or random-effects summary effects are likely to be biased and overestimate the true effect. Even the effect size from the largest trial in the meta-analysis may be biased sometimes, and often there may be no large enough trial. Consequently, the plausible effects used are conservative in testing for excess statistical significance. Moreover, because a random-effects meta-analysis gives larger relative weights to smaller trials than does a fixed-effect meta-analysis, the random-effects summary may be the farthest from the true effect in the presence of reporting bias. This may explain why the power of the proposed test was poorer when using the random-effects summaries as plausible effects. Finally, we explored the use of the single most precise trials and found that the proposed test had fair power. Therefore, for applications at the network level, we propose using either the result of the largest (most precise) trial or the fixed-effect summary. The former choice may often have a minor advantage.

The proposed test relied on further assumptions. We assumed that the observed number of trials with significant findings could be modelled by a common binomial parameter. However, because trial size varies within each meta-analysis, because the numbers of trials and the numbers of trials with significant findings vary across the meta-analyses constituting the network, the distribution of the total number of trials with significant findings is more complex. Moreover, we estimated the plausible effects based only on direct evidence and by considering that different comparisons in the network were independent. Other options may be considered for plausible effects. First, estimates from a consistency network meta-analysis model may be used as plausible effects. In the examples of antidepressant and antipsychotic trials, there were no closed loops, so this would not be relevant. But in cases where there are closed loops and network meta-analysis estimates can be obtained, it would be useful to compare the results of the excess significance test using also the network meta-analysis estimates as plausible effects in a sensitivity analysis. However, differential reporting bias may lead to violation of the consistency assumption

A potential concern with the proposed test is that reporting bias and between-trial heterogeneity may be confounded. In fact, the type I error rate was inflated with increased between-trial variance. This is a typical issue with all tests of reporting bias introduced for conventional meta-analysis. However, we observed this finding with between-trial variance equal to 0.25 (only 25% of meta-analyses have this extent of heterogeneity

Although we illustrated the test with star-shaped networks of trials, the test can be used for networks of trials with closed loops. In this regard, a particular strength of our simulation studies is that the sets of realizations could reflect networks of placebo-controlled trials, head-to-head trials or both placebo-controlled and head-to-head trials. Moreover, multi-arm trials are frequent in networks of trials. The proposed test may handle them if a

Our simulation studies have several limitations. First, values of variables were derived from a large sample of Cochrane meta-analyses to have realistic scenarios

Some other modeling approaches have been introduced recently to investigate the extent of reporting bias in a network meta-analysis. Mavridis et al. presented a Bayesian implementation of the Copas selection model extended to network meta-analyses

Conclusion

The proposed excess significance test could be useful to provide a statistical signal indicating an excess of significant findings in clinical trial networks. If such a signal is detected across the network of trials or in specific pairwise comparisons by conventional approaches, the network of trials and its meta-analyses should be considered with caution.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

LT, JPAI, GC and PR drafted the manuscript, and read and approved the final manuscript.

Acknowledgements

The authors thank Laura Smales (BioMedEditing, Toronto, Canada) for copy-editing the manuscript.

Financial disclosure

Grant support was from the European Union Seventh Framework Programme (FP7 – HEALTH.2011.4.1-2) under grant agreement n 285453 (

Pre-publication history

The pre-publication history for this paper can be accessed here: