BMC Bioinformatics

official impact factor 3.03

Open Access Highly Access Methodology article

Identifying differential correlation in gene/pathway combinations

Rosemary Braun1*, Leslie Cope2 and Giovanni Parmigiani2

Author Affiliations

1 National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA

2 The Sydney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21205, USA

For all author emails, please log on.

BMC Bioinformatics 2008, 9:488 doi:10.1186/1471-2105-9-488

Published: 18 November 2008

Abstract

Background

An important emerging trend in the analysis of microarray data is to incorporate known pathway information a priori. Expression level "summaries" for pathways, obtained from the expression data for the genes constituting the pathway, permit the inclusion of pathway information, reduce the high dimensionality of microarray data, and have the power to elucidate gene-interaction dependencies which are not already accounted for through known pathway identification.

Results

We present a novel method for the analysis of microarray data that identifies joint differential expression in gene-pathway pairs. This method takes advantage of known gene pathway memberships to compute a summary expression level for each pathway as a whole. Correlations between the pathway expression summary and the expression levels of genes not already known to be associated with the pathway provide clues to gene interaction dependencies that are not already accounted for through known pathway identification, and statistically significant differences between gene-pathway correlations in phenotypically different cells (e.g., where the expression level of a single gene and a given pathway summary correlate strongly in normal cells but weakly in tumor cells) may indicate biologically relevant gene-pathway interactions. Here, we detail the methodology and present the results of this method applied to two gene-expression datasets, identifying gene-pathway pairs which exhibit differential joint expression by phenotype.

Conclusion

The method described herein provides a means by which interactions between large numbers of genes may be identified by incorporating known pathway information to reduce the dimensionality of gene interactions. The method is efficient and easily applied to data sets of ~102 arrays. Application of this method to two publicly-available cancer data sets yields suggestive and promising results. This method has the potential to complement gene-at-a-time analysis techniques for microarray analysis by indicating relationships between pathways and genes that have not previously been identified and which may play a role in disease.