Additional file 1.

Table S1 - Imputation. The whole genome expression data (wild-type and mutant) includes 20.4% missing values. Since clustering and network inference need complete observations, we imputed those missing values following a similar approach applied by Albrecht et al. [52]. First, we removed 1253 genes (rows), which had 100% missing values (genes not spotted on the chip). Then, we tested the following imputation methods. From the R package 'impute' [53] K-nearest-neighbour; from the R package 'pcaMethods' [38]: probabilistic Principal Component Analysis (PCA), Bayesian PCA (BPCA), Single-Value-Decomposition impute (SVD impute), PCA by non-linear iterative partial least squares (NIPALS), Neural network based non-linear PCA (NLPCA), and Local Least Squares (LLS) imputation. The concatenation of the wild-type data and the mutant data was used together, since more data improves imputation. For test purpose we found the largest sub-matrix, which consists of full observations (5566 genes with no missing data) and constructed a test-data matrix by randomly introducing artificial missing values in this sub-matrix, keeping the distribution of missing values within the columns the same like in the original matrix. We used the different imputing methods on the test-data and compared the results to the original data in terms of the root mean square error (RMSE). In the first step, we ran each method separately on a range of parameter settings to identify optimal local parameter values. In the second step, we applied the methods using the respective optimal parameter settings on 500 random test matrices. Finally, we compared the methods to each other using the mean RMSE values. The table summarizes results of both steps.

Format: XLS Size: 8KB Download file

This file can be viewed with: Microsoft Excel Viewer

Linde et al. BMC Systems Biology 2012 6:6   doi:10.1186/1752-0509-6-6