Log on / register
Feedback | Support | My details
Open AccessResearch article

Missing value imputation for microarray gene expression data using histone acetylation information

Qian Xiang email, Xianhua Dai email, Yangyang Deng email, Caisheng He email, Jiang Wang email, Jihua Feng email and Zhiming Dai email

Department of Electronics & Communications Engineering, School of Information Science and Technology, Sun Yat-Sen University, 135 West Xin'gang Road, Guangzhou, PR China

author email corresponding author email

BMC Bioinformatics 2008, 9:252doi:10.1186/1471-2105-9-252

Published: 29 May 2008

Abstract

Background

It is an important pre-processing step to accurately estimate missing values in microarray data, because complete datasets are required in numerous expression profile analysis in bioinformatics. Although several methods have been suggested, their performances are not satisfactory for datasets with high missing percentages.

Results

The paper explores the feasibility of doing missing value imputation with the help of gene regulatory mechanism. An imputation framework called histone acetylation information aided imputation method (HAIimpute method) is presented. It incorporates the histone acetylation information into the conventional KNN(k-nearest neighbor) and LLS(local least square) imputation algorithms for final prediction of the missing values. The experimental results indicated that the use of acetylation information can provide significant improvements in microarray imputation accuracy. The HAIimpute methods consistently improve the widely used methods such as KNN and LLS in terms of normalized root mean squared error (NRMSE). Meanwhile, the genes imputed by HAIimpute methods are more correlated with the original complete genes in terms of Pearson correlation coefficients. Furthermore, the proposed methods also outperform GOimpute, which is one of the existing related methods that use the functional similarity as the external information.

Conclusion

We demonstrated that the using of histone acetylation information could greatly improve the performance of the imputation especially at high missing percentages. This idea can be generalized to various imputation methods to facilitate the performance. Moreover, with more knowledge accumulated on gene regulatory mechanism in addition to histone acetylation, the performance of our approach can be further improved and verified.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.