This article is part of the supplement: Proceedings of the Fifth Annual MCBIOS Conference. Systems Biology: Bridging the Omics

Open Access Proceedings

From microarray to biology: an integrated experimental, statistical and in silico analysis of how the extracellular matrix modulates the phenotype of cancer cells

Mikhail G Dozmorov1*, Kimberly D Kyker1, Paul J Hauser1, Ricardo Saban4, David D Buethe1, Igor Dozmorov3, Michael B Centola4, Daniel J Culkin1 and Robert E Hurst12*

Author Affiliations

1 Department of Urology, Oklahoma University Health Sciences Centre, Oklahoma City, OK 73104, USA

2 Department of Biochemistry and Molecular Biology, Oklahoma University Health Sciences Centre, Oklahoma City, OK 73104, USA

3 Microarray Core Facility, Oklahoma Medical Research Foundation, Oklahoma City, OK 73104, USA

4 Department of Physiology, Oklahoma University Health Sciences Centre, Oklahoma City, OK 73104, USA

For all author emails, please log on.

BMC Bioinformatics 2008, 9(Suppl 9):S4  doi:10.1186/1471-2105-9-S9-S4

Published: 12 August 2008


A statistically robust and biologically-based approach for analysis of microarray data is described that integrates independent biological knowledge and data with a global F-test for finding genes of interest that minimizes the need for replicates when used for hypothesis generation. First, each microarray is normalized to its noise level around zero. The microarray dataset is then globally adjusted by robust linear regression. Second, genes of interest that capture significant responses to experimental conditions are selected by finding those that express significantly higher variance than those expressing only technical variability. Clustering expression data and identifying expression-independent properties of genes of interest including upstream transcriptional regulatory elements (TREs), ontologies and networks or pathways organizes the data into a biologically meaningful system. We demonstrate that when the number of genes of interest is inconveniently large, identifying a subset of "beacon genes" representing the largest changes will identify pathways or networks altered by biological manipulation. The entire dataset is then used to complete the picture outlined by the "beacon genes." This allow construction of a structured model of a system that can generate biologically testable hypotheses. We illustrate this approach by comparing cells cultured on plastic or an extracellular matrix which organizes a dataset of over 2,000 genes of interest from a genome wide scan of transcription. The resulting model was confirmed by comparing the predicted pattern of TREs with experimental determination of active transcription factors.