Abstract
Backgrounds
Risk for Ovarian Malignancy Algorithm (ROMA) and Human epididymis protein 4 (HE4) appear to be promising predictors for epithelial ovarian cancer (EOC), however, conflicting results exist in the diagnostic performance comparison among ROMA, HE4 and CA125.
Methods
Remote databases (MEDLINE/PUBMED, EMBASE, Web of Science, Google Scholar, the Cochrane Library and ClinicalTrials.gov) and full texts bibliography were searched for relevant abstracts. All studies included were closely assessed with the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies-2). EOC predictive value of ROMA was systematically evaluated, and comparison among the predictive performances of ROMA, HE4 and CA125 were conducted within the same population. Sensitivity, specificity, DOR (diagnostic odds ratio), LR ± (positive and negative likelihood ratio) and AUC (area under receiver operating characteristic-curve) were summarized with a bivariate model. Subgroup analysis and sensitivity analysis were used to explore the heterogeneity.
Results
Data of 7792 tests were retrieved from 11 studies. The overall estimates of ROMA for EOC predicting were: sensitivity (0.89, 95% CI 0.84-0.93), specificity (0.83, 95% CI 0.77-0.88), and AUC (0.93, 95% CI 0.90-0.95). Comparison of EOC predictive value between HE4 and CA125 found, specificity: HE4 (0.93, 95% CI 0.87-0.96) > CA125 (0.84, 95% CI 0.76-0.90); AUC: CA125 (0.88, 95% CI 0.85-0.91) > HE4 (0.82, 95% CI 0.78-0.85). Comparison of OC predictive value between HE4 and CA125 found, AUC: CA125 (0.89, 95% CI 0.85-0.91) > HE4 (0.79, 95% CI 0.76-0.83). Comparison among the three tests for EOC prediction found, sensitivity: ROMA (0.86, 95%CI 0.81-0.91) > HE4 (0.80, 95% CI 0.73-0.85); specificity: HE4 (0.94, 95% CI 0.90-0.96) > ROMA (0.84, 95% CI 0.79-0.88) > CA125 (0.78, 95%CI 0.73-0.83).
Conclusions
ROMA is helpful for distinguishing epithelial ovarian cancer from benign pelvic mass. HE4 is not better than CA125 either for EOC or OC prediction. ROMA is promising predictors of epithelial ovarian cancer to replace CA125, but its utilization requires further exploration.
Background
Ovarian cancer is the leading cause of death from gynecologic cancers in the United States and the fifth-top cause of cancer death in women (Link 1). Non-specific clinical manifestation mainly hinders the early diagnosis of ovarian cancer[1]. Cancer antigen 125 (CA125) was the only FDA-approved biomarker for ovarian cancer before the year 2008. CA125 is indicated for use as an aid in the detection of residual ovarian carcinoma in patients who have undergone first-line therapy and would be considered for diagnostic second-look procedures. Although the CA125 serum level elevated in 80% of epithelial ovarian cancer (EOC) patients with advanced stage [2], it increased in only 50% of patients with stage I EOC [3]. In addition, CA125 serum levels elevate in various benign gynecological diseases (including endometriosis) [4], non-gynecologic malignancies [5]. Therefore, considerable efforts are underway to identify new serum biomarkers, alone or combining with CA125 to improve EOC detection [6,7].
With high-throughput technologies employed, a large number of new biomarkers have been discovered [8-10]. Human epididymis protein 4 (HE4) is among the most promising ones [11]. High levels of HE4 are found in the serum of patients with EOC, especially in serous and endometroid cancers [12]. Unlike CA125, HE4 doesn’t overexpress in endometriosis and other benign gynecological diseases [11]. And HE4, as an aid in monitoring recurrence or progressive disease in patients with epithelial ovarian cancer, has been the first biomarker for EOC after CA125 to be approved by the U.S. Food and Drug Administration (FDA) at the year of 2008. However, conflicts arise on the sensitivity of HE4 and CA125 [5,13-16].
Moore and colleagues [17] have explored a multianalytes assay named the Risk of Ovarian Malignancy Algorithm (ROMA™), which combines the results of HE4 EIA (enzyme immunoassay), ARCHITECT CA 125 II™ and menopausal status into a numerical score to predict malignancy when an ovarian mass was found clinically. Although ROMA™ has received clearance from the FDA of U.S. in September of the year 2011, the diagnostic accuracy of ROMA compared to CA125 and HE4 alone is still controversial [13,16-18]. Here we try to clarify conflicting results existing in the diagnostic accuracy of ROMA, and in the performance comparison among ROMA, HE4 and CA125.
Methods
Data sources and search strategy
We followed the Meta-analysis Of Observational Studies in Epidemiology (MOOSE)[19] and the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Link 2). MEDLINE (through PubMed interface), EMBASE, Web of Science, Google Scholar, the Cochrane Library and ClinicalTrials.gov (ended on 22th December, 2011) were searched. Reference lists of articles identified were manually searched. Publication languages were not limited. The terminology for search was based on the standardized National Library of Medicine MeSH terms and free texts. The search strategies of all the databases were based on those of PubMed (Additional file 1: Table S1).
Additional file 1 . Table S1. Searching strategies.
Format: DOC Size: 35KB Download file
This file can be viewed with: Microsoft Word Viewer
Two authors (RXT and WPL) independently screened the search results based on the titles and abstracts. The full text of selected articles were reviewed independently by another two authors (KC and LLY) to determine the inclusion. Disagreements were resolved by referring to a third author (MC).
Inclusion criteria
Studies that investigated both serum HE4 and CA125 as diagnostic tests or calculated the ROMA algorithm were included if (1) they were cross-sectional studies; and (2) performed in the same population presenting pelvic mass; (3) all serum specimens were collected preoperatively; (4) all subjects with histological diagnostic information; (5) with sufficient data for reconstructing fourfold table.
Studies recruiting participants without presenting pelvis mass, with obviously error data or ROC curve analysis containing healthy person and case–control studies were excluded. Case–control studies were excluded, for these studies had a tendency of overestimating or underestimating the diagnostic performance of a test [20].
Data extraction
The data extracted from each study included: author; year; country; design; recruitment; age; menopausal status; test methods (e.g. chemilumenesence immunoassay); number of patients; sensitivity; specificity and cut-off value. Four fold tables were reconstructed. Two reviewers (FKL and RXT) independently extracted the data for each study and referred to a third opinion (MC) when disagreements appeared. Important data that were not provided in the original studies were referred to their authors through Emails.
Index tests and reference standard
Since the Risk of Ovarian Malignancy Algorithm (ROMA™) is a qualitative serum test that combines the results of HE4 EIA (enzyme immunometric assays), ARCHITECT CA 125 II™ and menopausal status into a numerical score. Index tests for HE4 and CA125 in this meta-analysis questions were specified as EIAs and chemilumenesence immunoassays respectively. ROMA algorithm is the following [17]:
Reference standard was based on outcomes of histopathological diagnosis. In all studies, ovarian cancer surgical stages were referred to criteria from FIGO (International Federation of Gynecology and Obsterics) [21] (Link 3). Early stage were defined as FIGO stages I & II, while advanced stage were FIGO stages III & IV.
Methodological quality assessment
The methodological quality of each study was evaluated with QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2) [22] quality items. Overall scores were not helpful for interpreting study quality [23] and were avoided in studies evaluation by QUADAS-2 tool. Doubts were resolved by discussion. In the items of QUADAS-2, the blindness of index tests and reference test has been list, but not the blindness between index tests. So one item that focus on validity of this comparative question has been added in Risk of Bias part of Domain 2 (Index Test) in QUADAS-2 [22] as follows. “Were the results of index tests interpreted without knowledge of each other?” The answers (Yes, No or Unclear) of this question were considered to help assessing the Risk of Bias of including studies. According to the suggestion in Concerns Regarding Applicability part of Domain 2 (Index Test) in QUADAS-2 [22], variations in test technology, executing, or interpretation might affect estimates of the diagnostic accuracy of a test. If index test methods varied from those specified in the review question, concerns about applicability might exist.
Index tests for HE4 and CA125 in this meta-analysis questions were specified as EIAs and chemilumenesence immunoassays respectively. For tests of HE4, the chemilumenesence immunoassays were more sensitive than the specified EIAs, thus bias might be introduced into pooling of studies. And similarly, for CA125, EIA and RIA (radioimmunoassay) assays were less sensitive and steady than chemilumenesence immunoassays, so studies using either EIA or RIA will be considered as High Concern Regarding Applicability. The ROMA test employed the results from tests of CA125 and HE4 within the same study. So ROMA was considered as High Concern Regarding Applicability when either HE4 or CA125 test was evaluated as High Concern Regarding Applicability.
Data analysis plan
The statistical analysis is based on the following steps: (1) qualitatively describing the findings; (2) searching for heterogeneity and threshold effect; (3) figuring out the sources of heterogeneity by subgroup analysis; (4) choosing appropriate model and pooling estimates statistically. Univariate [24] and bivariate model [25] were two choices for diagnostic meta-analysis. When a positive correlation existed between true positive rate (TPR) and false positive rate (FPR), the bivariate analysis model was more appropriate [26].
Heterogeneity of studies were shown with forest graphs and explored with I2 estimates [27]. The main advantage of I2 was inherent independence with the number of the studies included in the meta-analysis. I2 estimates below 25% were regarded as low risk of heterogeneity, between 25% and 50% as moderate heterogeneity, and 50% or higher as high heterogeneity. If there was a low level heterogeneity, univariate meta-analysis model was used (Meta-DiSc software version 1.4 [28]). If there was a moderate to high heterogeneity, Spearman correlation coefficients was explored. Positive Spearman correlation coefficients between Logit(TPR) and Logit(FPR) denoted the presence of threshold effects (Meta-DiSc software version 1.4). Then a bivariate model as well as HSROC (Hierarchical Summary Receiver Operator Characteristics) were estimated and plotted; if negative, summary estimates were pooled without HSROC [24,29]; and if zero, summary estimates were pooled the way same as low level heterogeneity.
Influence analysis reestimated the meta-analysis by omitting each study in turn (STATA version 10.0) to confirm the stability of our analysis model. Publication bias was investigated by Deek’s funnel plot as well as asymmetry test [30]. Subgroups were analyzed hierarchically by menopausal status, FIGO stages and concern of methods of index tests. In some studies, patients with low malignant potential tumors (LMP) or borderline tumors (BL) were classified into EOC group. And these studies were specifically analyzed as subgroup EOC (LMP/BL). Subgroups with less than four studies were analyzed with univariate model, because the bivariate model required 4 studies at least [26]. Summary estimates and 95% CIs (confidence intervals) for sensitivity, specificity, DOR, LR ± and AUC were calculated (STATA version 10.0 [31,32]). HSROC (Hierarchical summary receiver operating characteristic curves) plots were shown when appropriate. Comparisons between estimates of different tests were performed with z-test.
Results
Search results
Of the 267 references identified from 6 databases, 11 articles [13-18,33-37] met the inclusion criteria and were included in meta-analysis (Figure1).
Figure 1 . Flowchart of selection of eligible studies.
Characteristics of the included studies were summarized (Table1). 7792 tests from 2878 patients presenting pelvic mass at risk of ovarian cancer were retrieved. Of the 11 studies, 6 studies [15,17,18,34,36,37] enrolling 1547 patients investigated the performance of ROMA for EOC prediction. Five studies [16,33-36] with 883 patients compared the performance of HE4 and CA125 for OC prediction. Four studies [13,15,18,36] with 715 patients compared the performance of HE4 and CA125 for EOC prediction. And 3 studies [15,18,36] (482 patients) compared the performance among ROMA, HE4 and CA125 for EOC prediction. In all studies, the spectrum of patients was considered representative. All enrolled participants present pelvis mass of suspected ovarian origin, have never received any treatment before and plan to have a surgical intervention. The prevalence of proven ovarian cancer across all studies ranged from 7.86% to 63.1% (overall prevalence was 18.5% for EOC). The study of Holcomb and colleagues [14] had the lowest prevalence (7.86%) for only investigating the results of premenopausal women.
Table 1. Characteristics of studies included in the analysis
Methods of index tests
All of 11 including studies measured serum HE4 and CA125. For HE4 measurement, 8 studies [13,16-18,33,35-37] used EIA (enzyme immunoassay), the other 3 studies [14,15,34] employed CMIA (chemiluminescent microparticle immunoassay). For the measure of CA125, 5 studies [14,15,17,34,37] employed CMIA, 3 studies [16,35,36] with EIA, 3 studies [13,18,33] used RIA (radioimmunoassay), CLEIA (chemilumenscence enzyme immunoassay) and ECLIA (electrochemilumenscence immunoassay) respectively. CMIA, CLEIA and ECLIA belonged to chemilumenesence immunoassays, which were higher sensitive than EIA or RIA. According to Methodological quality assessment (the 4th part of Methods section), HE4 tests with CMIA, CA125 tests with EIA and RIA were regarded as high Concern Regarding Applicability. The ROMA tests were considered as high Concern Regarding Applicability when either HE4 or CA125 test was evaluated as high Concern Regarding Applicability (Figure2).
Figure 2 . Graph of QUADAS-2 quality items results. Figure2a. Proportion of studies with low, high, or unclear risk of bias.Figure2b. Proportion of studies with low, high, or unclear Concerns Regarding Applicability.
Three horizontal bars represented index tests HE4, CA125 and ROMA, respectively.
Methodological quality of all included studies
Quality of included studies was assessed by the QUADAS-2 tool (Figure2 & Table2). Within 9 [13,14,16-18,34-37] of 11 studies, the results interpretation of index tests (HE4/CA125) were blind with reference standard test (ROMA). The other 2 studies [15,33] were unclear. In 5 of the 11 studies [14,16,34-36] the results of index tests (HE4 and CA125) were interpreted without knowledge of each other. In the other 6 studies [13,15,17,18,33,37] the blindness was unclear. So when assessing the studies with the item “Could the conduct or interpretation of the index test have introduced bias?” in domain 2 of QUADAS-2, the results showed that 5 studies [14,16,34-36] were low risk of bias, 1 study [13] was high risk of bias and 5 studies [15,17,18,33,37] were unclear their risk of bias. Four [16-18,34] of the total 11 studies were considered as low risk of bias for the Patient Selection (Domain 1 of QUADAS-2) for their consecutive enrollment of patients; 2 studies [14,15] were regarded as high risk of bias and in the other 5 studies [13,33,35-37] the risk was unclear.
Table 2. QUADAS-2 quality items results
Performance of ROMA for predicting EOC
Forest plots of sensitivity and specificity of ROMA for EOC prediction were shown in Figure3.
Mean estimates and their 95%CIs were: sensitivity 0.89 (0.84- 0.93), specificity 0.83 (0.77- 0.88) and AUC 0.93 (0.90- 0.95) (Table3). High level of heterogeneity lay in both sensitivity (I2 = 71.6%) and specificity (I2 = 80.7%).
Threshold effect existed (Spearman correlation coefficient: 0.657, p = 0.156).Thus bivariate model was used to pool estimates. HSROC plots showed the summary estimates of sensitivity and specificity as well as the confidence and prediction regions (Figure4).
Subgroups analysis observed variability in pooled estimates (Table3). We have compared these estimates between subgroups to investigate the performance of ROMA. Across all subgroups, performance (AUCs) of ROMA for EOC detection ranged from 0.88 to 0.97. The ROMA performed better in EOC whole population (AUC: 0.93, 95%CI 0.90- 0.95) than in either premenopausal subgroup (EOC-preM) (AUC: 0.88, 95% CI 0.85- 0.91) or postmenopausal subgroup (EOC-postM) (AUC: 0.89, 95% CI 0.86- 0.92). And the ROMA had better performence in EOC-advanced stage group (AUC: 0.88, 95% CI 0.85- 0.91) than in both EOC whole population and EOC-early stage group (AUC: 0.88, 95% CI 0.83- 0.93). What’s more, the ROMA performed better in EOC population than in OC population (AUC: 0.89, 95% CI 0.87- 0.92).
ROMA had lower sensitivity in premenopausal subgroup (EOC-preM) (0.82, 95%CI 0.67- 0.91) than postmenopausal subgroup (EOC-postM) (0.93, 95%CI 0.89- 0.96). EOC group (0.83, 95% CI 0.77- 0.88) had higher specificity than both EOC-early stage (0.76, 95% CI 0.73- 0.79) and EOC-advanced stage (0.76, 95% CI 0.73- 0.79) groups. ROMA had higher sensitivity in EOC-advanced stage group (0.98, 95%CI: 0.94-1.00) than in EOC whole population (0.90, 95% CI 0.84- 0.93) and EOC-early stage group (0.81, 95% CI 0.71- 0.89). In addition, we found in subgroup method with Concern Regarding Applicability, ROMA had higher specificity in high Concern Regarding Applicability group (EOC-methods High concern) (0.87, 95% CI 0.83- 0.90) than both high Concern Regarding Applicability group (EOC-methods Low concern) (0.75, 95% CI 0.72- 0.78) and EOC whole population. Finally, No differences were found in other summary estimates (except AUC between EOC and OC groups) within EOC, EOC (LMP/BL) and OC groups (Table3).
The appearance of the Deeks’ funnel plot for ROMA on EOC detection was symmetrical (Additional file 2: Figure S1), and the funnel plot asymmetry test showed little sign of publication bias (regression coefficients was −3.73; p = 0.617). When single study was omitted, the summary estimates (sensitivity, specificity and DOR) were close to those obtained with all eligible studies (Figure5 & Additional file 3: Table S2).
Additional file 2 . Figure S1. Deeks’ funnel plot for ROMA.
Format: JPEG Size: 30KB Download file
Additional file 3 . Table S2. Influence analysis of individual studies for diagnostic performance of ROMA. Estimates were pooled by bivariate model. Excluding any individual study only a small change were resulted in the sensitivity (sen), specificity (spe) or diagnostic odds ratio (DOR) compared with all eligible studies. All differences were not significant (p > 0.05).
Format: DOC Size: 40KB Download file
This file can be viewed with: Microsoft Word Viewer
Figure 3 . Forest Plots of paired sensitivity and specificity for ROMA.
Table 3. Summary estimates of ROMA for EOC and OC prediction
Figure 4 . Hierarchical summary receiver operating characteristic (HSROC) curves and results
of bivariate analysis for ROMA to predict EOC. Results of bivariate analysis: estimates of each studies (the squares), the summary
point (solid circle), 95% confidence region (the small ellipse), 95% prediction region
(the big ellipse) and HSROC (solid line) were shown. Each study is represented by
each square in the meta-analysis. The size of the square indicates the size of each
study.
Figure 5 . Influence analysis of individual studies for performance of ROMA to predict EOC. The meta-analysis was reestimated by omitting each study in turn. The diamonds represented
the estimates of the left studies, with their 95% confidence intervals (solid) went
through their centers.









Figure 6 .
Figure 7 .
Figure 8 .
Figure 9.
Figure 10 .
Figure 11 .
Figure 12 .
Figure 13 .