Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Two-part permutation tests for DNA methylation and microarray data

Markus Neuhäuser*, Tanja Boes and Karl-Heinz Jöckel

Author Affiliations

Institute for Medical Informatics, Biometry and Epidemiology, University of Duisburg-Essen, Hufelandstr. 55, D-45122 Essen, Germany

For all author emails, please log on.

BMC Bioinformatics 2005, 6:35  doi:10.1186/1471-2105-6-35

Published: 22 February 2005

Abstract

Background

One important application of microarray experiments is to identify differentially expressed genes. Often, small and negative expression levels were clipped-off to be equal to an arbitrarily chosen cutoff value before a statistical test is carried out. Then, there are two types of data: truncated values and original observations. The truncated values are not just another point on the continuum of possible values and, therefore, it is appropriate to combine two statistical tests in a two-part model rather than using standard statistical methods. A similar situation occurs when DNA methylation data are investigated. In that case, there are null values (undetectable methylation) and observed positive values. For these data, we propose a two-part permutation test.

Results

The proposed permutation test leads to smaller p-values in comparison to the original two-part test. We found this for both DNA methylation data and microarray data. With a simulation study we confirmed this result and could show that the two-part permutation test is, on average, more powerful. The new test also reduces, without any loss of power, to a standard test when there are no null or truncated values.

Conclusion

The two-part permutation test can be used in routine analyses since it reduces to a standard test when there are positive values only. Further advantages of the new test are that it opens the possibility to use other test statistics to construct the two-part test and that it avoids the use of any asymptotic distribution. The latter advantage is particularly important for the analysis of microarrays since sample sizes are usually small.