An integrated hierarchical Bayesian approach to normalizing left-censored microRNA microarray data
1 Department of Biometrics Research, Merck Research Laboratories, Rahway, NJ 07065, USA
2 Department of Safety Assessment, Merck Research Laboratories, West Point, PA 19486, USA
3 Present Address: Discovery Informatics, Infinity Pharmaceuticals, 780 Memorial Drive, Cambridge, MA 02139, USA
BMC Genomics 2013, 14:507 doi:10.1186/1471-2164-14-507Published: 26 July 2013
MicroRNAs (miRNAs) are small endogenous ssRNAs that regulate target gene expression post-transcriptionally through the RNAi pathway. A critical pre-processing procedure for detecting differentially expressed miRNAs is normalization, aiming at removing the between-array systematic bias. Most normalization methods adopted for miRNA data are the same methods used to normalize mRNA data; but miRNA data are very different from mRNA data mainly because of possibly larger proportion of differentially expressed miRNA probes, and much larger percentage of left-censored miRNA probes below detection limit (DL). Taking the unique characteristics of miRNA data into account, we present a hierarchical Bayesian approach that integrates normalization, missing data imputation, and feature selection in the same model.
Results from both simulation and real data seem to suggest the superiority of performance of Bayesian method over other widely used normalization methods in detecting truly differentially expressed miRNAs. In addition, our findings clearly demonstrate the necessity of miRNA data normalization, and the robustness of our Bayesian approach against the violation of standard assumptions adopted in mRNA normalization methods.
Our study indicates that normalization procedures can have a profound impact on the detection of truly differentially expressed miRNAs. Although the proposed Bayesian method was formulated to handle normalization issues in miRNA data, we expect that biomarker discovery with other high-dimensional profiling techniques where there are a significant proportion of left-censored data points (e.g., proteomics) might also benefit from this approach.