Handling multiple testing while interpreting microarrays with the Gene Ontology Database
1 Department of Biological Sciences, Rochester Institute of Technology, 85 Lomb Memorial Drive, Rochester, NY 14623, USA
2 Dept. of Epidemiology and Public Health, Yale University, New Haven, CT 06520, USA
3 Dept. of Genetics, Yale University, New Haven, CT 06520, USA
4 Yale Center for Medical Informatics, 300 George St. Suite 501, New Haven, CT 06511, USA
BMC Bioinformatics 2004, 5:124 doi:10.1186/1471-2105-5-124Published: 6 September 2004
The development of software tools that analyze microarray data in the context of genetic knowledgebases is being pursued by multiple research groups using different methods. A common problem for many of these tools is how to correct for multiple statistical testing since simple corrections are overly conservative and more sophisticated corrections are currently impractical. A careful study of the nature of the distribution one would expect by chance, such as by a simulation study, may be able to guide the development of an appropriate correction that is not overly time consuming computationally.
We present the results from a preliminary study of the distribution one would expect for analyzing sets of genes extracted from Drosophila, S. cerevisiae, Wormbase, and Gramene databases using the Gene Ontology Database.
We found that the estimated distribution is not regular and is not predictable outside of a particular set of genes. Permutation-based simulations may be necessary to determine the confidence in results of such analyses.