Dissecting systems-wide data using mixture models: application to identify affected cellular processes
1 Department of Toxicogenetics, Leiden University Medical Centre, P.O. Box 9503, 2300 RA Leiden, the Netherlands
2 Department of Oncology, Radiology and Clinical Immunology, Academic Hospital, 751 85 Uppsala, Sweden
3 Department of Medical Statistics, Leiden University Medical Centre, P.O. Box 9604, 2300 RA Leiden, the Netherlands
BMC Bioinformatics 2005, 6:177 doi:10.1186/1471-2105-6-177Published: 14 July 2005
Functional analysis of data from genome-scale experiments, such as microarrays, requires an extensive selection of differentially expressed genes. Under many conditions, the proportion of differentially expressed genes is considerable, making the selection criteria a balance between the inclusion of false positives and the exclusion of false negatives.
We developed an analytical method to determine a p-value threshold from a microarray experiment that is dependent on the quality and design of the data set. To this aim, populations of p-values are modeled as mathematical functions in which the parameters to describe these functions are estimated in an unsupervised manner. The strength of the method is exemplified by its application to a published gene expression data set of sporadic and familial breast tumors with BRCA1 or BRCA2 mutations.
We present an objective and unsupervised way to set thresholds adapted to the quality and design of the experiment. The resulting mathematical description of the data sets of genome-scale experiments enables a probabilistic approach in systems biology.