# Unraveling gene regulatory networks from time-resolved gene expression data -- a measures comparison study

^{1} Interdisciplinary Center for Dynamics of Complex Systems, University of Potsdam, Campus Golm, Karl-Liebknecht-Str. 24, D-14476 Potsdam, Germany

^{2} Potsdam Institute for Climate Impact Research (PIK), Telegraphenberg A 31, D-14473 Potsdam, Germany

^{3} Department of Physics, Humboldt University of Berlin, Campus Adlershof, Newtonstr. 15, D-12489 Berlin, Germany

^{4} Systems Biology and Mathematical Modeling Group, Max Planck Institute for Molecular Plant Physiology, Am Mühlenberg 1, D-14476 Potsdam, Germany

^{5} Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 25, D-14476 Potsdam, Germany

^{6} Institute for Complex Systems and Mathematical Biology, University of Aberdeen, Aberdeen AB243UE, UK

*BMC Bioinformatics* 2011, **12**:292
doi:10.1186/1471-2105-12-292

### Additional files

**Additional file 1:**

**Supplement Figures**. Figure 1: Performance of the identity scoring scheme using different measures operating
on vectors, in terms of the *ROC *curves, where the false positive rate (*fpr*) vs. the true positive rate (*tpr*) is plotted. The results shown here are obtained from the Euclidean distance (*μ _{EC}*), the

*Ls*norm (

*μ*) and the Manhattan distance (

_{L}*μ*), as well as from the dynamic time warping (

_{MA}*μ*) with the step pattern symmetric1, symmetric2 and asymmetric. Figure 2:

_{W}*ROC*curves obtained for the

*ID*scoring scheme using the simple, conditional and partial Pearson correlation (

*μ*, , ), where the diagonal of the cross-correlation matrix is set to 0, when a significance test (by reshuffling of the time series) is applied. Figure 3: Evaluation of the

_{P}*ID*scoring scheme using information-theoretic measures: simple, conditional and residual mutual information (

*μ*, and ) when a significance test by reshuffling is applied. Figure 4:

_{I}*ROC*curves for the mutual coarse-grained information rate (, the conditional coarse-grained information rate ( (

*similarity*)), and the

*CCIR*represented as a distance ( (

*distance*)), in frames of the identity scoring scheme. Figure 5: (a) The

*ROC*curves, obtained for the simple, conditional and partial Granger causality index (

*μ*, , ) using the identity scoring scheme are shown. (b) The panel illustrates the associated results under consideration of significance (simple significance test by reshuffling of the time series). Figure 6:

_{G}*ROC*curves obtained for the Spearman correlation coefficient

*μS*using the

*CLR*,

*MRNET*and the

*ARACNE*scoring scheme. Figure 7: Reconstruction from noisy data (noise level 0.3).

*ROC*curves of (a) the Granger and partial Granger causality (

*μ*, ), the mutual and conditional coarse-grained information rates (, ), and the conditional mutual information (), norm, Euclidean as well as (b) the distance measures:

_{G}*L*norm, Euclidean distance, Manhattan distance and dynamic time warping with the step pattern symmetric1, symmetric2 and asymmetric. Figure 8: The role of interpolation and sampling: simulated expression time series of 100 equally sampled data points (black line), the effect of (spline) interpolation (including the following data points of the original series: 1

^{s }*|*2

*|*3

*|*6

*|*9

*|*15

*|*25

*|*39

*|*63

*|*99., green line). Figure 9: Artefacts introduced in the reconstruction procedure (measure:

*μ*, scoring scheme:

_{I}*ID*) by interpolation of short, coarsely sampled time series. The left panel shows the corresponding

*ROC*curves in the noise-free case for 10 points equally sampled in time, whereas the right panel presents the same results for 10 points, unequally sampled. The unequal sampling in time is the same as in Figure 8. Figure 10:

*ROC*curves for selected measures and algorithms obtained in the noise-free case, using unequally sampled data without interpolation. The sampling is the same as in the previous two figures, including the following data points of a simulated series of 100 points: 1

*|*2

*|*3

*|*6

*|*9

*|*15

*|*25

*|*39

*|*63

*|*99. Figure 11:

*ROC*curves obtained from the reconstruction of an

*E. coli*network of 100 genes, a

*S.cerevisiae*network of 100 gene and an

*E. coli*network of 200 genes. (a)-(i) show the results using various similarity measures together with the

*ID*scoring scheme: (a) Euclidean distance

*μ*, (b) Manhattan distance

_{EC}*μ*, (c)

_{MA}*Ls*norm

*μ*, (d) Kendall's rank correlation

_{L}*μ*, (e) Pearson correlation

_{K}*μP*, (f) conditional Pearson correlation , (g) mutual information of symbol vectors , (h) mean of symbol sequence similarity and the mutual information of symbol vectors , and (i) conditional mutual information . Moreover, the results using Kendall's rank correlation

*μ*together with (j)

_{K }*MRNET*, (k)

*CLR*, and (l)

*ARACNE*scoring scheme are shown. Figure 12: Summary statistics for the top-ranked measures/scoring schemes for increasing noise intensities (noise level 0.5). Similar approaches are grouped together. The first group in cyan refers to the different measures applied together with the

*ID*scoring scheme. The green stands for the

*CLR*scoring scheme, the orange for the

*MRNET*, yellow refers to the

*ARACNE*, magenta to the

*AWE*and violet stands for the

*TS*. Furthermore, blue groups together all measures applied with a combination of scoring schemes. Figure 13: Summary statistics ((a), (c) and (e) area under the

*ROC*curve, as well as (b), (d) and (f)

*Y OUDEN*index) for the top-ranked measures/scoring schemes as a function of the noise intensity for varying lengths of the time series. The results in (a) and (b) are obtained from 8 time points, those in (c) and (d) from 10 time points, and those in (e) and (f) from 20 time points. Figure 14: (a) Illustration of the network and its degree distribution for 100 genes in

*E. coli*. Here and in the following figures

*p*(

*k*) is the frequency of nodes with total degree

*k*,

*p_in*(

*k*) is the frequency of nodes with an in-degree

*k*, and

*p out*(

*k*) is the frequency of nodes with an out-degree

*k*. Futhermore, the network and its degree distribution for (b) 100 genes in

*S.cerevisiae*, and (c) 200 genes in

*E. coli*ar

Format: PDF Size: 3.2MB Download file

This file can be viewed with: Adobe Acrobat Reader