Research article
Unraveling gene regulatory networks from time-resolved gene expression data -- a measures comparison study
1 Interdisciplinary Center for Dynamics of Complex Systems, University of Potsdam, Campus Golm, Karl-Liebknecht-Str. 24, D-14476 Potsdam, Germany
2 Potsdam Institute for Climate Impact Research (PIK), Telegraphenberg A 31, D-14473 Potsdam, Germany
3 Department of Physics, Humboldt University of Berlin, Campus Adlershof, Newtonstr. 15, D-12489 Berlin, Germany
4 Systems Biology and Mathematical Modeling Group, Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, D-14476 Potsdam, Germany
5 Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 25, D-14476 Potsdam, Germany
6 Institute for Complex Systems and Mathematical Biology, University of Aberdeen, Aberdeen AB243UE, UK
BMC Bioinformatics 2011, 12:292 doi:10.1186/1471-2105-12-292
Published: 19 July 2011Additional files
Additional file 1:
Supplement Figures. Figure 1: Performance of the identity scoring scheme using different measures operating
on vectors, in terms of the ROC curves, where the false positive rate (fpr) vs. the true positive rate (tpr) is plotted. The results shown here are obtained from the Euclidean distance (μEC), the Ls norm (μL) and the Manhattan distance (μMA), as well as from the dynamic time warping (μW) with the step pattern symmetric1, symmetric2 and asymmetric. Figure 2: ROC curves obtained for the ID scoring scheme using the simple, conditional and partial Pearson correlation (μP,
,
), where the diagonal of the cross-correlation matrix is set to 0, when a significance
test (by reshuffling of the time series) is applied. Figure 3: Evaluation of the ID scoring scheme using information-theoretic measures: simple, conditional and residual
mutual information (μI,
and
) when a significance test by reshuffling is applied. Figure 4: ROC curves for the mutual coarse-grained information rate (
, the conditional coarse-grained information rate (
(similarity)), and the CCIR represented as a distance (
(distance)), in frames of the identity scoring scheme. Figure 5: (a) The ROC curves, obtained for the simple, conditional and partial Granger causality index (μG,
,
) using the identity scoring scheme are shown. (b) The panel illustrates the associated
results under consideration of significance (simple significance test by reshuffling
of the time series). Figure 6: ROC curves obtained for the Spearman correlation coefficient μS using the CLR, MRNET and the ARACNE scoring scheme. Figure 7: Reconstruction from noisy data (noise level 0.3). ROC curves of (a) the Granger and partial Granger causality (μG,
), the mutual and conditional coarse-grained information rates (
,
), and the conditional mutual information (
), norm, Euclidean as well as (b) the distance measures: Ls norm, Euclidean distance, Manhattan distance and dynamic time warping with the step
pattern symmetric1, symmetric2 and asymmetric. Figure 8: The role of interpolation
and sampling: simulated expression time series of 100 equally sampled data points
(black line), the effect of (spline) interpolation (including the following data points
of the original series: 1|2|3|6|9|15|25|39|63|99., green line). Figure 9: Artefacts introduced in the reconstruction procedure (measure:
μI, scoring scheme: ID) by interpolation of short, coarsely sampled time series. The left panel shows the
corresponding ROC curves in the noise-free case for 10 points equally sampled in time, whereas the right
panel presents the same results for 10 points, unequally sampled. The unequal sampling
in time is the same as in Figure 8. Figure 10: ROC curves for selected measures and algorithms obtained in the noise-free case, using
unequally sampled data without interpolation. The sampling is the same as in the previous
two figures, including the following data points of a simulated series of 100 points:
1|2|3|6|9|15|25|39|63|99. Figure 11: ROC curves obtained from the reconstruction of an E. coli network of 100 genes, a S.cerevisiae network of 100 gene and an E. coli network of 200 genes. (a)-(i) show the results using various similarity measures together
with the ID scoring scheme: (a) Euclidean distance μEC, (b) Manhattan distance μMA, (c) Ls norm μL, (d) Kendall's rank correlation μK, (e) Pearson correlation μP, (f) conditional Pearson correlation
, (g) mutual information of symbol vectors
, (h) mean of symbol sequence similarity and the mutual information of symbol vectors
, and (i) conditional mutual information
. Moreover, the results using Kendall's rank correlation μK together with (j) MRNET, (k) CLR, and (l) ARACNE scoring scheme are shown. Figure 12: Summary statistics for the top-ranked measures/scoring
schemes for increasing noise intensities (noise level 0.5). Similar approaches are
grouped together. The first group in cyan refers to the different measures applied
together with the ID scoring scheme. The green stands for the CLR scoring scheme, the orange for the MRNET, yellow refers to the ARACNE, magenta to the AWE and violet stands for the TS. Furthermore, blue groups together all measures applied with a combination of scoring
schemes. Figure 13: Summary statistics ((a), (c) and (e) area under the ROC curve, as well as (b), (d) and (f) Y OUDEN index) for the top-ranked measures/scoring schemes as a function of the noise intensity
for varying lengths of the time series. The results in (a) and (b) are obtained from
8 time points, those in (c) and (d) from 10 time points, and those in (e) and (f)
from 20 time points. Figure 14: (a) Illustration of the network and its degree distribution
for 100 genes in E. coli. Here and in the following figures p(k) is the frequency of nodes with total degree k, p_in(k) is the frequency of nodes with an in-degree k, and p out(k) is the frequency of nodes with an out-degree k. Futhermore, the network and its degree distribution for (b) 100 genes in S.cerevisiae, and (c) 200 genes in E. coli ar
Format: PDF Size: 3.2MB Download file
This file can be viewed with: Adobe Acrobat Reader


