Proportion statistics to detect differentially expressed genes: a comparison with log-ratio statistics
1 Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, 55455, USA
2 Cardiac Rhythm Disease Management, Medtronic, Mounds View, MN, 55112, USA
3 Department of Mathematics and Computer Science, Biola University, La Mirada, CA 90639, USA
BMC Bioinformatics 2011, 12:228 doi:10.1186/1471-2105-12-228Published: 7 June 2011
In genetic transcription research, gene expression is typically reported in a test sample relative to a reference sample. Laboratory assays that measure gene expression levels, from Q-RT-PCR to microarrays to RNA-Seq experiments, will compare two samples to the same genetic sequence of interest. Standard practice is to use the log2-ratio as the measure of relative expression. There are drawbacks to using this measurement, including unstable ratios when the denominator is small. This paper suggests an alternative estimate based on a proportion that is just as simple to calculate, just as intuitive, with the added benefit of greater numerical stability.
Analysis of two groups of mice measured with 16 cDNA microarrays found similar results between the previously used methods and our proposed methods. In a study of liver and kidney samples measured with RNA-Seq, we found that proportion statistics could detect additional differentially expressed genes usually classified as missing by ratio statistics. Additionally, simulations demonstrated that one of our proposed proportion-based test statistics was robust to deviations from distributional assumptions where all other methods examined were not.
To measure relative expression between two samples, the proportion estimates that we propose yield equivalent results to the log2-ratio under most circumstances and better results than the log2-ratio when expression values are close to zero.