Estimating the mean and variance from the median, range, and the size of a sample
1 Indiana University Northwest, Department of Mathematics, Gary, IN, 46408 USA
2 Interdisciplinary Oncology Program, H. Lee Moffitt Cancer Center and Research Institute at the University of South Florida, Tampa, FL, USA
BMC Medical Research Methodology 2005, 5:13 doi:10.1186/1471-2288-5-13Published: 20 April 2005
Usually the researchers performing meta-analysis of continuous outcomes from clinical trials need their mean value and the variance (or standard deviation) in order to pool data. However, sometimes the published reports of clinical trials only report the median, range and the size of the trial.
In this article we use simple and elementary inequalities and approximations in order to estimate the mean and the variance for such trials. Our estimation is distribution-free, i.e., it makes no assumption on the distribution of the underlying data.
We found two simple formulas that estimate the mean using the values of the median (m), low and high end of the range (a and b, respectively), and n (the sample size). Using simulations, we show that median can be used to estimate mean when the sample size is larger than 25. For smaller samples our new formula, devised in this paper, should be used. We also estimated the variance of an unknown sample using the median, low and high end of the range, and the sample size. Our estimate is performing as the best estimate in our simulations for very small samples (n ≤ 15). For moderately sized samples (15 <n ≤ 70), our simulations show that the formula range/4 is the best estimator for the standard deviation (variance). For large samples (n > 70), the formula range/6 gives the best estimator for the standard deviation (variance).
We also include an illustrative example of the potential value of our method using reports from the Cochrane review on the role of erythropoietin in anemia due to malignancy.
Using these formulas, we hope to help meta-analysts use clinical trials in their analysis even when not all of the information is available and/or reported.