Email updates

Keep up to date with the latest news and content from BMC Research Notes and BioMed Central.

Open Access Highly Accessed Technical Note

Meta-analyses and Forest plots using a microsoft excel spreadsheet: step-by-step guide focusing on descriptive data analysis

Jeruza L Neyeloff1*, Sandra C Fuchs12 and Leila B Moreira12

Author affiliations

1 Post Graduate Program of Cardiology, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil

2 Hospital de Clínicas de Porto Alegre, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil

For all author emails, please log on.

Citation and License

BMC Research Notes 2012, 5:52  doi:10.1186/1756-0500-5-52


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1756-0500/5/52


Received:4 August 2011
Accepted:20 January 2012
Published:20 January 2012

© 2012 Neyeloff et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Meta-analyses are necessary to synthesize data obtained from primary research, and in many situations reviews of observational studies are the only available alternative. General purpose statistical packages can meta-analyze data, but usually require external macros or coding. Commercial specialist software is available, but may be expensive and focused in a particular type of primary data. Most available softwares have limitations in dealing with descriptive data, and the graphical display of summary statistics such as incidence and prevalence is unsatisfactory. Analyses can be conducted using Microsoft Excel, but there was no previous guide available.

Findings

We constructed a step-by-step guide to perform a meta-analysis in a Microsoft Excel spreadsheet, using either fixed-effect or random-effects models. We have also developed a second spreadsheet capable of producing customized forest plots.

Conclusions

It is possible to conduct a meta-analysis using only Microsoft Excel. More important, to our knowledge this is the first description of a method for producing a statistically adequate but graphically appealing forest plot summarizing descriptive data, using widely available software.

Background

Meta-analyses and systematic reviews are necessary to synthesize the ever-growing data obtained from primary research. Performing a search on Pubmed limiting to the type of article, the Mesh term "meta-analysis" will wield 4223 results in 2010 only. Although reviews of interventional studies, especially clinical trials, provide the best evidence, there are several situations in which observational studies are the only alternative. Meta-analyses of these studies are becoming more common, particularly after publication of the MOOSE statement [1]. Some of the studies are not concerned with the assessment of relative risks or odds ratios, but are focused on a summary statistics of incidence or prevalence.

General purpose statistical packages such as SPSS, Stata, SAS, and R can be used to perform meta-analyses, but it is not their primary function and hence they all require external macros or coding. These can be downloaded, but are not always easy for the researcher to understand or customize. Additionally, the first three programs do not have free access, with prices ranging from $250 to over $30,000 depending on version and country. R is a very resourceful open source package, but its use in health is still limited, due mostly to the need of programming instead of a point-and-click interface.

There are some software packages specifically developed to conduct meta-analyses. RevMan [2] is a freeware program from the Cochrane Collaboration that requires the researcher to fill all steps of a systematic review. It only accepts effect sizes in traditional formats. Metawin [3] and Comprehensive Metanalysis (CMA) [4] are commercial software that have user friendly interfaces. The former only accepts three types of primary data, while the latter has a purchase cost, but accepts more types of data. It can perform advanced analyses, but there are still limitations regarding graphic display, particularly of descriptive data, since CMA does not allow customization of the forest plot produced. Finally, there is also Meta-Analysis Made Easy (MIX) [5], an add-on for Excel. It can be used for analysis of descriptive data selecting the input type to "continuous", but the free version does not allow for analysis of original data, only build in datasets. Some other options are no longer available, as FAST*PRO [6], and others are still currently under development, as Meta-Analyst [7].

Another option would be to analyze data using directly Microsoft Excel. Although it has a purchase cost, it is usually already installed in most computers, bundled with Microsoft Office package. Most researchers would be uncomfortable entering all the formulas themselves, since they may seem complex at first. However, if the calculations are done in steps, statistics like Q and I2 can be computed with basic arithmetic operations. Borestein et al [8] cites the impossibility of producing forest plots as an important limitation, but we have developed a method to turn a scatter plot into a statistically correct forest plot, allowing the researcher to take advantage of all excel formatting tools. Our work is separated into two spreadsheets, so researchers can use both to conduct all calculations or simply the second one if they have already analyzed the data in any other software, but want an appealing graphical way of presenting it [Additional file 1].

Additional file 1. Meta-analyses and forest plots in MS Excel. This file contains both spreadsheets developed.

Format: XLSX Size: 26KB Download fileOpen Data

Findings

Technical notes

The method described here was designed on a laptop with Intel Core Duo 2.2 GHz processor, 4 GB RAM, running Windows Seven 64 bit and Microsoft Office Excel 2007. The spreadsheets were later tested on Excel 2003, with no differences found in either the calculations or graphs.

The outcome of meta-analyses is the effect summary. However, some reviews may only aim in combining rates or prevalences; technically these cannot be called "effects", since there is nothing "causing" it, and the correct term would be single group summary. We will refer to both these estimates simply as "outcome" in order to avoid confusion, and maintain only the abbreviation as es to follow textbooks standard.

Since we have established that the limitation of the existing software packages is handling descriptive data, we will be using rates in our example so that the difference in the final forest plot is more overt. The data could be the prevalence of smoking in a country or the incidence of myocardial infarction in high risk patients. We chose to use theoretical numbers so we could openly distribute the spreadsheets, test particular formulas and compare results obtained with other software. All formulas are presented in traditional equations and also in excel format.

Steps 1 and 2 always require adjustments according to study type and outcome. Columns in light grey in spreadsheet 1 are the ones to be adapted, while columns in dark grey do not require any modification regardless of study type (this includes all further steps of the guide). The necessary adjustments can be easily found on methodological books [8-10].

Cell B14 should be filled with the number of studies being analyzed. There are annotations on the spreadsheet that pop up when the mouse pointer is upon selected cells, so the downloaded file can be used without constant consultation of the full article. The explanation for the formulas and detailing of steps are not present on the spreadsheet though. A recently published paper by Schriger et al [11] reviewed over 300 systematic reviews and highlighted important aspects of producing forest plots, which were considered in developing this approach.

Steps in analyzing data and producing a forest plot

Spreadsheet 1-analysis (Figure 1)

1. Calculating the outcome (effect size, es)

In our example we have the number of events and the number of subjects in columns B and C, so we can simply compute the rate in column D as <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M1">View MathML</a> or D3 = B3/C3 in Excel. It is the same from D3 to D12, and copy and paste will automatically adjust the cell numbers. This copying and pasting should be done for steps 1 through 6 and in step 9 B.1.

thumbnailFigure 1. Spreadsheet 1: Analysis This spreadsheet contains the calculations necessary for the analyses. Input in light gray columns must be adapted according to effect size type. Calculations in dark grey columns are the same for any effect size type.

2. Calculating Standard Error (SE)

All SE can be derived from the formula <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M2">View MathML</a>, but there are simplified derived equations for different types of studies. Since we are using rates, we can use <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M3">View MathML</a> or <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M4">View MathML</a>, the same formula used in CMA. In excel this will be E3 = D3/SQRT(D3*C3).

3. Computing variance (Var)

This formula is simple: Var = SE2. In Excel, F3 = E3^2.

4. Computing individual study weights (w)

We must weight each study with the inverse of its variance, so <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M5">View MathML</a> or G3 = 1/F3 in Excel.

5. Computing each weighted effect size (w*es)

This is computed multiplying each effect size by the study weight. If we are not using any corrections on the weight (meaning, single effect model) this equation will result again in the study size for some types of studies. In excel, this will be H3 = G3*D3

6. Other necessary variables (w*es2 and w2)

We will need two other variables in order to calculate the Q statistics (columns I and J of spreadsheet 1). In excel this will be I3 = G3*(D3 ^ 2) and J3 = G3 ^ 2.

Now we need to sum all values of each variable. In our spreadsheet they are in line 14, labeled "Sums": G14 = SUM (G3:G12), H14 = SUM (H3:H12), I14 = SUM (I3:I12), J14 = SUM (J3:J12)

7. Calculating Q

The Q test measures heterogeneity among studies, and works like a t test. It is calculated as the weighted sum of squared differences between individual study effects and the pooled effect across studies, with the weights being those used in the pooling method. Q is distributed as a chi-square statistic with k (number of studies) minus 1 degrees of freedom. Our null hypothesis is that all studies are equal. To test that, we need to calculate Q and compare it against a table of critical values. If our calculated Q is lower than that of the table's, than we fail to reject the null hypothesis (and hence the studies are similar).

The formula is <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M6">View MathML</a>, but in our spreadsheet it will be simply B17 = I14 - ((H14 ^ 2)/G14) since we already have all the sums.

8. Calculating I2

The I2 was proposed as a method to quantify heterogeneity, and it is expressed in percentage of the total variability in a set of effect sizes due to true heterogeneity, that is, to between-studies variability. The formula is <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M7">View MathML</a>, where "df" stands for "degrees of freedom", simply the total number of studies (k) minus 1. In excel, B18 = ((B17 - B15)/B17)*100.

9. Deciding on effect summary <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M8">View MathML</a> model.

If heterogeneity is low, we can use a fixed effect model, that assumes the effect size is the same in our parameter population, and differences in studies are just from sampling error. However, if we think our sample populations may differ from each other, we can use a random effects model. Many researchers will choose this model even if heterogeneity is low. In our example, Q is higher than 16.919, the critical value for 9 degrees of freedom found in a chi-square distribution, and I2 is 49%, so we have moderate heterogeneity [12]. We must decide whether the data is possible to meta-analyze, and if so we may choose to proceed to a random effects models.

A. Fixed effects Model

Our effect summary is <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M9">View MathML</a>, or B20 = (H14/G14). The standard error is <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M10">View MathML</a>, or B21 = RAIZ (1/G14). With the <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M11">View MathML</a> we calculate the 95% Confidence Interval, as <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M12">View MathML</a>. In Excel, B22 = B20 - (1.96*B21) and C22 = B20 - (1.96*B21). In our example we will not use these results.

B. Random effects model

Since we are assuming that variability is not only due to sampling error, but also to variability in the population of effects, in this model the weight of each study will be adjusted with a constant (v) that represents this.

B1. The formula is <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M13">View MathML</a>. We have all these information, except for <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M14">View MathML</a>. We can compute w2 in column J with J3 = G3 ^ 2, and then its sum with J14 = SOMA (J3: J12). Now, applying the formula, M16 = (B17 - B15)/(G14 - (J14/G14)).

B2. Once we have the constant, we can calculate new weight for each study, using <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M15">View MathML</a>. In excel, L3 = 1/((E3 ^ 2)+$M $16). We need the $ to fix cell M16, or else it will change when we copy the equation to cells L4 to L12.

B3. Now we repeat steps 5 to 8, but using our new weight Wv. The results are in columns M, N and O. Applying the Q and I2 formulas we have now an acceptable Q and low heterogeneity. We calculate our effect summary as <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M16">View MathML</a>, and standard error as <a onClick="popup('http://www.biomedcentral.com/1756-0500/5/52/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1756-0500/5/52/mathml/M17">View MathML</a>.

In excel: F20 = M14/L14, F21 = SQRT (1/L14), F22 = F20 - (1.96*F21) and G22 = F20+(1.96*F21). The confidence intervals are broader than the ones calculated with fixed effect model, however, little change in the effect summary is expected.

Analyzing these numbers in CMA we achieved exactly the same results. - [Additional files 2 and 3].

Additional file 2. CMA calculations fixed effect. This is a portable document format (pdf) of the calculations performed by the software Comprehensive Meta-Analysis, when calculating the effect summary using fixed effect model. It is provided so readers may compare the calculations and results obtained using Microsoft Excel spreadsheet and the commercial software.

Format: PDF Size: 822KB Download file

This file can be viewed with: Adobe Acrobat ReaderOpen Data

Additional file 3. CMA calculations random effects. This is a portable document format (pdf) of the calculations performed by the software Comprehensive Meta-Analysis, when calculating the effect summary using random effects model. It is provided so readers may compare the calculations and results obtained using Microsoft Excel spreadsheet and the commercial software.

Format: PDF Size: 862KB Download file

This file can be viewed with: Adobe Acrobat ReaderOpen Data

Spreadsheet 2-forest plot (Figure 2)

thumbnailFigure 2. Spreadsheet 2: Forest Plot This spreadsheet contains the final forest plot. Data must be manually entered, either after using spreadsheet 1 or any other analysis software.

Columns A-G have the studies information. The user can insert each study effect size and confidence interval directly into columns D, F and G if he has the data. In our example we copied the calculations from spreadsheet 1, and also the values of the random effects model effect summary.

1. Make sure the information is the way we want it displayed. In our example, we wanted the rates in percentages, so column I = column D*100.

2. We usually read the lower and upper confidence interval as a value, but excel understands it as a difference to the mean. This is key to obtain a proper forest plot. These values are J2 = I2 - (100*F2) and K2 = I2 + (100* F2). Again, we multiply by 100 to have it in percentage.

3. In order to have each study in a different line, we will assign ordinal numbers to the studies. Our effect summary must be number 1 if we want it in the bottom of the graph. This is done manually in column H of our spreadsheet.

4. We are ready to build the graph. Insert > Graph > Scatter Plot. X values will be column I, lines 2-12, and Y values column H, lines 2-12.

5. We must now add the error bars. In Excel 2007 this is done in the Layout tab, clicking the "Error Bar" button on the right side. In Excel 2003 we must right click on the data series (points on the graph) and click "format data series", then chose the "X error bar" tab. In this window we mark the option "personalized values", and then assign columns J and K, lines 2 to 12, to the lower and upper value.

6. To insert the line marking the summary effect value we will add another data series. First we manually build this data set in the spreadsheet. Then right click on the graph > Select Data. Click on "add", and chose X values as column C, lines 15 to 26, and Y values as columns B, lines 15 to 26. A new set of points will appear on the graph. Right-click on any of the new dots and select "format data series". Then we will choose "no marker" and "solid line" on the Marker Options and Line Color tabs.

7. We can now format the X axis, right-clicking on it. In our example we want it to begin on 10 and end on 28, interval of 2 units. It is not our case, but if the researcher is dealing with relative data, then "logarithmic scale" must be marked.

8. The graph is ready. The user can format colors, outlines, shadows and sizes. In our example we changed the summary effect to a diamond shape. This is done by selecting only one dot (double click) and then right clicking it.

9. For presentation we recommend copying and pasting the graph over a table with study information (Figure 3).

thumbnailFigure 3. Comparison of Forest Plots Comparison of forest plots produced using our spreadsheet (left) and CMA (right).

Conclusion

We have constructed a guide to aid researchers interested in meta-analyzing data using a spreadsheet. To the best of our knowledge there is no prior step-by-step approach, but it should be noted that all formulas and methodology were previously publicly available.

The main limitation of analyzing data in a spreadsheet is the potential for errors by typing incorrect formulas. We believe that a step-by-step approach as those presented in this article with all formulas already incorporated in the excel format can help minimize this possibility. The guide presented also does not handle advanced analyses such as multiple regression. However, this is not frequently used in summarizing descriptive data. All sensitivity analysis must be done manually, including and excluding each study of the effect summary calculations, but this limitation is also present in other softwares.

Microsoft Excel is part of the Microsoft Office Package, and therefore it is not free of costs. However, for those who already have the package, this use of Excel could amplify its utility offering an alternative for customizing the graphic presentation of the forest plot.

The main limitation of the forest plot is that all studies are represented by squares of the same size, instead of proportional to study weight. We did not feel this could overshadow all other formatting possibilities, since study weight can also be estimated by the confidence interval width.

In conclusion, it is possible to meta-analyze data using a Microsoft Excel spreadsheet, using either fixed effect or random effects model. The main advantages of this approach are the understanding of the complete process and formulas, and the use of widely available software. It is also possible and simple to make a forest plot using excel. Since displaying results in a graphically appealing but also statistically correct way is usually a problem to most researchers, we believe the method presented here could be of great use. Figure 3 compares the graph obtained with our method and with CMA software.

Availability and requirements

Project name: Meta-analyses and Forest Plots using a Microsoft Excel spreadsheet: step-by-step guide focusing on descriptive data analysis;

Project home page: none;

Operating systems: any OS supporting Microsoft Excel;

Programming language: not-applicable;

Other requirements: Microsoft Excel 2003 or higher;

License: Creative Commons Attribution 3.0 Unported (CC BY 3.0);

Restrictions to use by non-academics: none

Availability of supporting data

The spreadsheets mentioned and the CMA files used for comparison of statistics are available as complementary material.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

JLN conceived the article, designed the spreadsheets, and drafted the manuscript. LBM and SCF revised the manuscript and approved the final version.

Acknowledgements

This study was funded by Conselho Nacional de Pesquisas (CNPq) and Fundo de Incentivo à Pesquisa do Hospital de Clínicas de Porto Alegre (FIPE-HCPA).

References

  1. Stroup DF, Berlin JA, Morton SC, et al.: Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group.

    JAMA 2000, 283:2008-12. PubMed Abstract | Publisher Full Text OpenURL

  2. Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration, 2011 [http://ims.cochrane.org/revman] webcite

    Review Manager (RevMan) [Computer program]. Version 5.1 OpenURL

  3. Rosenberg M: MetaWin: Statistical Software for Meta-Analysis Version 2. [http://www.metawinsoft.com] webcite

    Sunderland, Massachusetts: Sinauer Associates; 2000.

  4. Borenstein M, Hedges L, Higgins J, Rothstein H: Comprehensive Meta-analysis Version 2. [http://www.meta-analysis.com] webcite

    Biostat, Englewood NJ; 2005.

  5. Bax L, Yu L-M, Ikeda N, et al.: Development and validation of MIX: comprehensive free software for meta-analysis of causal research data.

    BMC Med Res Methodol 2006, 6:50. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  6. Eddy DM: FAST*PRO: Software for meta-analysis by the confidence profile method. Academic Press; 1992. OpenURL

  7. Wallace BC, Schmid CH, Lau J, et al.: Meta-Analyst: software for meta-analysis of binary, continuous and diagnostic data.

    BMC Med Res Methodol 2009, 9:80. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  8. Borenstein M, Hedges LV, Higgins JPT, et al.: Introduction to Meta-Analysis. 1st edition. Wiley; 2009. OpenURL

  9. Lipsey MW, Wilson D: Practical Meta-Analysis. 1st edition. Sage Publications, Inc; 2000. OpenURL

  10. Egger M, Smith GD, Altman D: Systematic Reviews in Health Care: Meta-Analysis in Context. 2nd edition. BMJ Books; 2001. OpenURL

  11. Schriger DL, Altman DG, Vetter JA, et al.: Forest plots in reports of systematic reviews: a cross-sectional study reviewing current practice.

    Int J Epidemiol 2010, 39:421-9. PubMed Abstract | Publisher Full Text OpenURL

  12. Higgins JPT, Thompson SG, Deeks JJ, et al.: Measuring inconsistency in meta-analyses.

    BMJ 2003, 327:557-60. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL