Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Merging microarray data from separate breast cancer studies provides a robust prognostic test

Lei Xu1*, Aik Choon Tan1, Raimond L Winslow1 and Donald Geman12

Author affiliations

1 The Institute for Computational Medicine and Center for Cardiovascular Bioinformatics and Modeling, Johns Hopkins University, Baltimore, MD 21218, USA

2 Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2008, 9:125  doi:10.1186/1471-2105-9-125

Published: 27 February 2008

Abstract

Background

There is an urgent need for new prognostic markers of breast cancer metastases to ensure that newly diagnosed patients receive appropriate therapy. Recent studies have demonstrated the potential value of gene expression signatures in assessing the risk of developing distant metastases. However, due to the small sample sizes of individual studies, the overlap among signatures is almost zero and their predictive power is often limited. Integrating microarray data from multiple studies in order to increase sample size is therefore a promising approach to the development of more robust prognostic tests.

Results

In this study, by using a highly stable data aggregation procedure based on expression comparisons, we have integrated three independent microarray gene expression data sets for breast cancer and identified a structured prognostic signature consisting of 112 genes organized into 80 pair-wise expression comparisons. A classical likelihood ratio test based on these comparisons, essentially weighted voting, achieves 88.6% sensitivity and 54.6% specificity in an independent external test set of 154 samples. The test is highly informative in assessing the risk of developing distant metastases within five years (hazard ratio 9.3 with 95% CI 2.9–29.9).

Conclusion

Rank-based features provide a stable way to integrate patient data from separate microarray studies due to invariance to data normalization, and such features can be combined into a useful predictor of distant metastases in breast cancer within a statistical modeling framework which begins to capture gene-gene interactions. Upon further confirmation on large-scale independent data, such prognostic signatures and tests could provide a powerful tool to guide adjuvant systemic treatment that could greatly reduce the cost of breast cancer treatment, both in terms of toxic side effects and health care expenditures.