Software
CTen: a web-based platform for identifying enriched cell types from heterogeneous microarray data
1 JST ERATO KAWAOKA Infection-induced Host Responses Project, Tokyo, Japan
2 The Systems Biology Institute, Tokyo, Japan
3 Influenza Research Institute, Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, Wisconsin, USA
4 Institute of Medical Science, Division of Virology, Department of Microbiology and Immunology, University of Tokyo, Tokyo, Japan
5 Sony Computer Science Laboratories, Inc, Tokyo, Japan
6 Open Biology Unit, Okinawa Institute of Science and Technology, Okinawa, Japan
BMC Genomics 2012, 13:460 doi:10.1186/1471-2164-13-460
Published: 6 September 2012Additional files
Additional file 1:
A list of the cell types currently available in CTen.
Format: XLS Size: 24KB Download file
This file can be viewed with: Microsoft Excel Viewer
Additional file 2:
The enrichment performance of the mouse HECS database for select HECS criteria and enrichment scores. We evaluated (1) does the precise cutoff for defining a HECS gene affect the enrichment performance and (2) for each cutoff, what values of the enrichment score seems to best minimize the false positive rate (FPR) without impacting the true positive rate (TPR). We reconstructed the HECS database by defining the HECS assignment threshold as (A) 5, (B) 10, (C) 15, and (D) 20 times the median. Then, from the Mouse MOE430 Gene Atlas dataset, we took the top 10% of the most highly expressed genes for each cell type. From this 10%, we randomly sampled between 500 to 4000 genes 3 times to create 288 gene lists. Using the same procedures described in the CTen implementation, these lists were analyzed for cell type enrichment for each HECS database constructed. The ROC curve illustrates the that sensitivity (TPR) and the FPR are not greatly affected by the HECS assignment threshold selected. Furthermore, on each figure, we show the performance expected for selected values of the enrichment score. We see that selecting enrichment scores of 2 or higher results in a reasonably low FPR but this can be significantly improved by demanding enrichments scores of ~25 before the TPR is affected.
Format: PDF Size: 44KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 3:
The enrichment performance of the human HECS database for select HECS criteria and enrichment scores. We evaluated (1) does the precise cutoff for defining a HECS gene affect the enrichment performance and (2) for each cutoff, what values of the enrichment score seems to best minimize the false positive rate (FPR) without impacting the true positive rate (TPR). We reconstructed the HECS database by defining the HECS assignment threshold as (A) 5, (B) 10, and (C) 15 times the median. Then, from the Human U133A/GNF1H Gene Atlas dataset, we took the top 10% of the most highly expressed genes for each cell type. From this 10%, we randomly sampled between 500 to 4000 genes 3 times to create 252 gene lists. Using the same procedures described in the CTen implementation, these lists were analyzed for cell type enrichment for each HECS database constructed. The ROC curve illustrates the that sensitivity (TPR) and the FPR are not greatly affected by the HECS assignment threshold selected. Furthermore, on each figure, we show the performance expected for selected values of the enrichment score. We see that selecting enrichment scores of 2 or higher results in a reasonably low FPR but this can be significantly improved by demanding enrichments scores of ~20 before the TPR is affected.
Format: PDF Size: 41KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 4:
A heatmap of the percentage of HECS genes shared by any two cell types in the mouse (upper right) and human (lower left) databases.
Format: PDF Size: 112KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 5:
The highest ranked cell types identified by CTen.Using the GNF1M_plus_macrophage_small dataset from BioGPS, the top 2-10% most highly expressed genes for tissues shown were analyzed in CTen. The enrichment scores from CTen were ranked from highest to lowest, and the heatmap illustrates the top 3 most enriched cell types (columns) for each lymphocyte data tested (row labels). BM = bone marrow.
Format: PDF Size: 43KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 6:
Expected enrichment scores for random gene lists. We analyzed in CTen 150 lists of 100–400 randomly selected IDs for (A) mouse and (B) human Entrez Gene IDs - this resulted in a distribution of enrichment scores. The distributions were fit to a gamma distribution using the MASS package in R. Here, we show the density histogram and fitted gamma function (left hand axis) and the probability distribution function (right hand axis). The red bar highlights the enrichment score which is 95% confidently above 0 (α = 0.95 at enrichment scores of 1.66 and 1.67 in the mouse and human data, respectively).
Format: PDF Size: 43KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 7:
A list of genes upregulated in mouse lungs which have been infected with influenza virus and the full results of analyzing this list in DAVID.
Format: XLS Size: 426KB Download file
This file can be viewed with: Microsoft Excel Viewer


