Email updates

Keep up to date with the latest news and content from BMC Systems Biology and BioMed Central.

Open Access Research article

Genomic distance entrained clustering and regression modelling highlights interacting genomic regions contributing to proliferation in breast cancer

Tim J Dexter*, David Sims, Costas Mitsopoulos, Alan Mackay, Anita Grigoriadis, Amar S Ahmad and Marketa Zvelebil

Author Affiliations

Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, Chester Beatty Laboratories, 237 Fulham Road, London, SW3 6JB, UK

For all author emails, please log on.

BMC Systems Biology 2010, 4:127  doi:10.1186/1752-0509-4-127

Published: 8 September 2010

Abstract

Background

Genomic copy number changes and regional alterations in epigenetic states have been linked to grade in breast cancer. However, the relative contribution of specific alterations to the pathology of different breast cancer subtypes remains unclear. The heterogeneity and interplay of genomic and epigenetic variations means that large datasets and statistical data mining methods are required to uncover recurrent patterns that are likely to be important in cancer progression.

Results

We employed ridge regression to model the relationship between regional changes in gene expression and proliferation. Regional features were extracted from tumour gene expression data using a novel clustering method, called genomic distance entrained agglomerative (GDEC) clustering. Using gene expression data in this way provides a simple means of integrating the phenotypic effects of both copy number aberrations and alterations in chromatin state. We show that regional metagenes derived from GDEC clustering are representative of recurrent regions of epigenetic regulation or copy number aberrations in breast cancer. Furthermore, detected patterns of genomic alterations are conserved across independent oestrogen receptor positive breast cancer datasets. Sequential competitive metagene selection was used to reveal the relative importance of genomic regions in predicting proliferation rate. The predictive model suggested additive interactions between the most informative regions such as 8p22-12 and 8q13-22.

Conclusions

Data-mining of large-scale microarray gene expression datasets can reveal regional clusters of co-ordinate gene expression, independent of cause. By correlating these clusters with tumour proliferation we have identified a number of genomic regions that act together to promote proliferation in ER+ breast cancer. Identification of such regions should enable prioritisation of genomic regions for combinatorial functional studies to pinpoint the key genes and interactions contributing to tumourigenicity.