Open Access Highly Accessed Research article

Exploiting combinatorial cultivation conditions to infer transcriptional regulation

Theo A Knijnenburg1*, Johannes H de Winde2, Jean-Marc Daran2, Pascale Daran-Lapujade2, Jack T Pronk2, Marcel JT Reinders1 and Lodewyk FA Wessels13

Author Affiliations

1 Information and Communication Theory Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands

2 Industrial Microbiology, Department of Biotechnology, Delft University of Technology, Julianalaan 67, 2628 BC Delft, The Netherlands

3 Department of Molecular Biology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands

For all author emails, please log on.

BMC Genomics 2007, 8:25  doi:10.1186/1471-2164-8-25

Published: 22 January 2007

Additional files

Additional file 1:

Discretized Clustering With Linear Mapping. An Excel file containing all the modules derived with the proposed method. Also, significantly enriched annotation categories and transcription factors are given.

Format: XLS Size: 3MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 6:

Microscopic pictures of Saccharomyces cerevisiae. Microscopic pictures of Saccharomyces cerevisiae grown in aerobic carbon limited (left) and anaerobic carbon limited (right) chemostats. The cells were sampled from the fermenters and directly observed under an optical microscope equipped with a camera. Also for the other nutrient limitations these observations were made. These results were not photographed.

Format: JPEG Size: 763KB Download file

Open Data

Additional file 2:

Discretized Clustering Without Linear Mapping. An Excel file containing all the modules derived with the proposed method, however without applying the linear mapping. Also, significantly enriched annotation categories and transcription factors are given.

Format: XLS Size: 2MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

Enrichment of annotation categories with and without appliance of the linear mapping. After performing the hypergeometric tests on modules created both with and without appliance of the linear mapping, we select for all different annotation categories from GO, KEGG and MIPS, the smallest P-value (highest enrichment in a particular module) for both approaches. These minimal P-values for all functional categories obtained by applying or omitting the linear mapping are plotted against each other. Note that in the case of the Gene Ontology (GO) we consider two types of data: One indicates whether a gene is assigned to a particular leaf (biological process) in the GO annotation tree. This one is referred to as 'GO leaves'. The other associates a gene located in a certain leaf not only with that particular leaf but also with all nodes between the leaf node and the root of the GO tree. We refer to this GO data as 'GO comp'.

Format: EPS Size: 1.9MB Download file

Open Data

Additional file 5:

The TF circle for modules uncovered without applying the linear mapping. Similar to Figure 4, except now the proposed method is applied without performing the linear mapping.

Format: EPS Size: 4.2MB Download file

Open Data

Additional file 7:

Procedure to derive the discretized representation of a gene using only three nutrient limitations in the regression. Similar to Figure 2. For this gene, no good linear relationship could be found using all four nutrients limitations. However, when the carbon limitation is left out, there exist a good linear relationship (see b).

Format: EPS Size: 4.3MB Download file

Open Data

Additional file 8:

Procedure to derive the discretized representation of a gene using the mean-offset correction. Similar to Figure 2. For this gene, no good linear relationship could be found using four or sets of three nutrient limitations. Therefore, the slope is fixed to one and only the offset is computed (see b).

Format: EPS Size: 4.3MB Download file

Open Data

Additional file 3:

Permutation tests. For the differentially expressed genes in the dataset, the estimated parameters (scaling factor (slope) and offset) from the heteroscedastic regression are compared with those generated by employing the regression strategy on 1000 datasets, where the eight condition labels were randomly permuted. Also, the P-values that represent the variability of the slope and the offset were computed for all permutations. In the top-left plot the red line indicates the number of genes with a P-value that is lower than the Pcutoff, which is found on the x-axis. The blue line indicates the false discovery rate (FDR) as a function of the Pcutoff. Here, the FDR is defined as the median number of genes with P <Pcutoff from the permuted datasets divided by number of genes with P <Pcutoff from the original dataset. The top-right plot displays the same features for the offset. In these top figures, if the scaling factor is smaller than zero, the P-values of scaling factor and offset are set to 1. A slope smaller than zero implies that the best linear relationship is found when inverting the expression pattern. This, of course, makes no sense from a biological perspective. The bottom-left plot displays the distribution of the computed scaling factors. This distribution is estimated using Parzen density estimation. From the 1000 permuted datasets, 1000 different distributions were estimated. These were plotted using an errorbar plot, which indicates the standard deviation of the permuted distribution. Here, it is clearly visible that for the correct labels only few genes exhibit a slope smaller than zero. Additionally, for most genes the slope is between zero and one and the offset is smaller than zero, indicating that the majority of genes have a higher expression when grown within the presence of oxygen. The bottom-right plot displays the original and permuted distribution for the offset.

Format: EPS Size: 488KB Download file

Open Data