Curated disease-chemical enrichment versus prediction lists for prostate, lung, and breast cancer datasets. For a prediction list, we selected chemicals that ranked within α = 10-4, 10-3, 10-2, and 0.05. This -log10(threshold) along with number of total chemicals found (in parentheses) for each threshold is seen on the x-axis of each figure. We tested if these highly ranked chemicals found under each threshold were enriched for chemicals that had known curated association with the cancer in question. The -log10(p-value) for this enrichment is seen on the y-axis. The solid round red marker represents the enrichment test for the actual disease for which the predictions were based; the number underneath represents the total number of chemicals found in the prediction list that had a curated association with the disease and the percent found among all curated relations for that disease. We estimated accuracy and precision by computing disease-chemical enrichment for all other diseases; false positives are offset in black and true negatives are in yellow. The false positive rate is bracketed and in italics. Examples of false positives are annotated in blue italics along with the number of chemicals found in the prediction list corresponding to that disease and the percent found among all curated relations for that disease. We computed this validation enrichment for A.) prostate cancer, B.) lung cancer from nonsmokers, and C.) non-tumorigenic breast cancers.
Patel and Butte BMC Medical Genomics 2010 3:17 doi:10.1186/1755-8794-3-17