Linear and non-linear dependencies between copy number aberrations and mRNA expression reveal distinct molecular pathways in breast cancer
- Equal contributors
1 Department of Genetics, Institute for Cancer Research, Oslo University Hospital, Radiumhospitalet, Montebello, 0310 Oslo, Norway
2 Department of Biostatistics, Institute of Basic Medical Science, University of Oslo, Norway
3 Biomedical Research Group, Department of Informatics, University of Oslo, Norway
4 Statistics for Innovation - (sfi)2, NR -Norwegian Computing Centre, Norway
5 Institute for Clinical Medicine, University of Oslo, Norway
6 Institute for Clinical Epidemiology and Molecular Biology (EpiGen), Faculty of Medicine, Division Akershus University Hospital, University of Oslo, Norway
BMC Bioinformatics 2011, 12:197 doi:10.1186/1471-2105-12-197Published: 24 May 2011
Elucidating the exact relationship between gene copy number and expression would enable identification of regulatory mechanisms of abnormal gene expression and biological pathways of regulation. Most current approaches either depend on linear correlation or on nonparametric tests of association that are insensitive to the exact shape of the relationship. Based on knowledge of enzyme kinetics and gene regulation, we would expect the functional shape of the relationship to be gene dependent and to be related to the gene regulatory mechanisms involved. Here, we propose a statistical approach to investigate and distinguish between linear and nonlinear dependences between DNA copy number alteration and mRNA expression.
We applied the proposed method to DNA copy numbers derived from Illumina 109 K SNP-CGH arrays (using the log R values) and expression data from Agilent 44 K mRNA arrays, focusing on commonly aberrated genomic loci in a collection of 102 breast tumors. Regression analysis was used to identify the type of relationship (linear or nonlinear), and subsequent pathway analysis revealed that genes displaying a linear relationship were overall associated with substantially different biological processes than genes displaying a nonlinear relationship. In the group of genes with a linear relationship, we found significant association to canonical pathways, including purine and pyrimidine metabolism (for both deletions and amplifications) as well as estrogen metabolism (linear amplification) and BRCA-related response to damage (linear deletion). In the group of genes displaying a nonlinear relationship, the top canonical pathways were specific pathways like PTEN and PI13K/AKT (nonlinear amplification) and Wnt(B) and IL-2 signalling (nonlinear deletion). Both amplifications and deletions pointed to the same affected pathways and identified cancer as the top significant disease and cell cycle, cell signaling and cellular development as significant networks.
This paper presents a novel approach to assessing the validity of the dependence of expression data on copy number data, and this approach may help in identifying the drivers of carcinogenesis.