Open Access Highly Accessed Research article

An integrative ChIP-chip and gene expression profiling to model SMAD regulatory modules

Huaxia Qin1, Michael WY Chan15, Sandya Liyanarachchi1, Curtis Balch4, Dustin Potter1, Irene J Souriraj1, Alfred SL Cheng16, Francisco J Agosto-Perez1, Elena V Nikonova7, Pearlly S Yan1, Huey-Jen Lin2, Kenneth P Nephew4, Joel H Saltz3, Louise C Showe7, Tim HM Huang1 and Ramana V Davuluri17*

Author Affiliations

1 Human Cancer Genetics Program, Department of Molecular Virology, Immunology, and Medical Genetics, The Ohio State University, Columbus, OH 43210, USA

2 Division of Medical Technology, School of Allied Medical Professions, The Ohio State University, Columbus, OH 43210, USA

3 Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA

4 Medical Sciences, Indiana University School of Medicine, Bloomington, IN 47405, USA

5 Department of Life Science and Institute of Molecular Biology, National Chung Cheng University, Min-Hsiung, Chia-Yi 621, Taiwan, Republic of China

6 Institute of Digestive Disease, The Chinese University of Hong Kong, Hong Kong SAR, PR China

7 Center for Systems and Computational Biology, Molecular and Cellular Oncogenesis Program, The Wistar Institute, Philadelphia, PA, USA

For all author emails, please log on.

BMC Systems Biology 2009, 3:73  doi:10.1186/1752-0509-3-73

Published: 17 July 2009

Additional files

Additional file 1:

Figure S1. Reproducibility of ChIP-chip experiments. Normalized log ratios (immonuprecipated DNA over total input DNA) of the biological replicate experiments (0 hrs untreated or 3 hrs TGF-β1-treated) are plotted as smooth scatter plots. Binding ratios for 150 significant genes are indicated by red dots. The overall correlation coefficient of each plot is also shown.

Format: PPT Size: 699KB Download file

This file can be viewed with: Microsoft PowerPoint Viewer

Open Data

Additional file 2:

Figure S2. Reproducibility of expression microarrays. Dye intensities (log 2) from the technical replicate experiments (0 hrs untreated, 3, 6, 12 hrs TGF-β1-treated) are plotted as scatter plots. Expression data for 150 significant genes are indicated by red dots. The overall correlation coefficient of each plot is also shown.

Format: PPT Size: 1009KB Download file

This file can be viewed with: Microsoft PowerPoint Viewer

Open Data

Additional file 3:

Figure S3. Cluster analysis of expression microarray. Data from expression microarrays were used to perform cluster analysis. The replicates at each time points were technical replicates and were labeled as "Rep1" and "Rep2". The scale bar is "1-correlation". Therefore, the shorter the distance, the stronger the correlation. The result showed that data from the treated and the untreated experiments can be grouped into two different clusters.

Format: PPT Size: 59KB Download file

This file can be viewed with: Microsoft PowerPoint Viewer

Open Data

Additional file 4:

Table S1. List of 150 putative TGF-β/SMAD target genes and their expression levels. Genes are sorted from most change to lease change according to binding response to treatment.

Format: XLS Size: 40KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 5:

Supplementary Tables S2, S3 and S4. Table S2. Distribution of TGF-β/SMAD target genes. Table S3. Misclassification rates by CART and RF modeling with three synexpression groupsTable S4. Selection of known SMAD co-regulators by RF.

Format: DOC Size: 66KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 6:

Table S5. Predicted modules for TGF-β/SMAD target genes. Column 1 shows the SMAD target gene, column2 gives the predicted SMAD module; columns 3 and 4 show the predicted and observed groups respectively.

Format: XLS Size: 28KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 7:

Figure S4. A graphical representation of overlapping molecular and cellular functions in SMAD responsive (from Affymetrix array data) and SMAD target (from ChIP-chip) gene sets from Ingenuity Pathway Analysis. A graphical representation of overlapping molecular and cellular functions for 73 IPA and 145 SMAD-module predicted targets sorted by a p-value. The significance of each function was calculated by Fischer's exact test (see Methods).

Format: PPT Size: 568KB Download file

This file can be viewed with: Microsoft PowerPoint Viewer

Open Data

Additional file 8:

Figure S5. Choosing the best CART model by step-wise forward variable selection procedure. Figure S5A: Plot of the mean error rates as a function of the number of variables in the CART model (top ranking 30 most important variables selected by RF were used by step-wise forward selection, starting with the most important variable) for dataset 1: up- vs. down-regulated targets and dataset 2: Sustained up- vs. transient up-regulated targets. The error rates were a summation of the error rates of the two classes and were estimated from 10-fold cross-validation. The error rates first dropped and then increased as a function of the number of independent variables. The best CART models, in terms of the lowest overall error rate, consisted of 4 variables for up vs down and 3 variables for sustained up- vs. transient up-regulated targets. Figure S5B: The sensitivity versus 1-specificity plot of the CART models. Down regulated target class and transient up-regulated class were selected as positive group for datasets – 1 and 2, respectively. The sensitivity and specificity values were derived from the confusion matrix on the test data reported by the CART software. The point closest to the upper left corner (1-specificity = 0, sensitivity = 1) on each plot was indicated with an arrow, which was the best model in terms of a balance between sensitivity and specificity. For both datasets, equal mis-classification cost rate was used. Consequently, the model with optimal sensitivity and specificity values was also the model with the lowest overall error rate.

Format: PPT Size: 91KB Download file

This file can be viewed with: Microsoft PowerPoint Viewer

Open Data

Additional file 9:

Table S6. List of 67 sequences containing SBEs from the published literature. Column 1 shows the number of experimentally known binding sites (SBEs) within each target gene; columns 2, 3, 4 and 5 give the gene symbol, Unigene ID, Accession ID and Gene ID respectively; columns 6 and 7 give the relative start and end positions of the SBE (relative to transcription start site); column 8 gives the SBE; columns 9 to 12 give the chromosomal location of the SBE (The genomic coordinates are according to Human NCBI Build 35; Rat Nov. 2004 (rn4) assembly & Mouse NCBI Build 36); columns 13 and 14 give the sequence around SBE (-50 to +50 around the SBE) and its length respectively.

Format: XLS Size: 43KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data