(Both) Statistics of the sign-inference process on the regulatory network of E. coli from complete expression profiles. The signed interaction graph is used to generate sets of N random artificial expression profiles which cover the whole network. Then, each set of N profiles is used with the unsigned interaction graph to recover regulatory roles. X-axis: number N of expression profiles in the dataset. Y-axis: percentage of recovered signs in the unsigned interaction graph. Each set of N random profiles was generated 100 times; the distribution of the recovered signs is plotted as a boxplot. The continuous line corresponds to the theoretical formula Y = M1 + M2(1 - (1 - p)X); M1 denotes the number of single incoming regulations inferred with probability one from any complete profile (using the naive inference algorithm), and M2 denotes the number of signs inferred with a probability p (0 <p < 1) per experiment. (Left) Statistics using the whole E. coli regulatory network. We estimated that at most 37.3% of the network can be inferred from a small number of different complete profiles. Among the inferred regulations, we estimated to M1 = 609 the number of signs inferred with probability one from any complete expression profile. The remaining M2 = 811 signs are inferred with a probability whose average is p = 0.049 per experiment. Hence, 30 perturbation experiments are enough to infer 33% of the network. (Right) Statistics using only the core of the former graph (see definition of a core in the text). We estimated M1 = 18 and M2 = 9, implying that the maximum rate of inference is 47.4%. Since p = 0.0011, the number of expression profiles required to obtain a given percentage of inference is greater than in the case using the whole network (N = 100 to infer 33% of the network).
Veber et al. BMC Bioinformatics 2008 9:228 doi:10.1186/1471-2105-9-228