This article is part of the supplement: Probabilistic Modeling and Machine Learning in Structural and Systems Biology .Robust imputation method for missing values in microarray data1 Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea 2 Department of Statistics, College of Natural Science, Seoul National University, San 56-1, Shin Lim-Dong, Kwanak-ku, Seoul, 151-742, Korea
BMC Bioinformatics 2007, 8(Suppl 2):S6doi:10.1186/1471-2105-8-S2-S6
AbstractBackgroundWhen analyzing microarray gene expression data, missing values are often encountered. Most multivariate statistical methods proposed for microarray data analysis cannot be applied when the data have missing values. Numerous imputation algorithms have been proposed to estimate the missing values. In this study, we develop a robust least squares estimation with principal components (RLSP) method by extending the local least square imputation (LLSimpute) method. The basic idea of our method is to employ quantile regression to estimate the missing values, using the estimated principal components of a selected set of similar genes. ResultsUsing the normalized root mean squares error, the performance of the proposed method was evaluated and compared with other previously proposed imputation methods. The proposed RLSP method clearly outperformed the weighted k-nearest neighbors imputation (kNNimpute) method and LLSimpute method, and showed competitive results with Bayesian principal component analysis (BPCA) method. ConclusionAdapting the principal components of the selected genes and employing the quantile regression model improved the robustness and accuracy of missing value imputation. Thus, the proposed RLSP method is, according to our empirical studies, more robust and accurate than the widely used kNNimpute and LLSimpute methods. |



on Google Scholar







author email
corresponding author email