Table 2

Lists of # of genes and relations in integrated database

Statistics information of integrated databases


Database

Organism

# of gene

# of relation


PID+KEGG+TRANSFAC

Homo sapiens

8173

9308


Reactome

Homo sapiens

538

31240


Statistics information on each of the three databases


Database

# of TFs

# of target gene parsed

# of pairing regulate relation parsed


TRANSFAC

157

825

529625


Database

# of pathways

# of gene, protein, enzyme parsed

# of relation parsed


PID + KEGG

197

18937

8880


PID

60


KEGG

137


We integrated the PID (the date of version, July 15, 2008), KEGG (release 47.0, July 1, 2008) and TRANSFAC public databases (version 7.0), and further eliminated duplicated reactions and elements. Accordingly, 8173 genes and 9308 interactions were remained. To assess the importance of genes within each filtered pathway, we also implemented the betweenness centrality and degree centrality for each node. The degree and betweenness centrality of genes were calculated using the Reactome database [31] as a base to cross validate our experimental results. Pathways downloaded from PID and KEGG were parsed by batch processing. A gene (or protein) may be involved in several pathways, which means some genes were repeated. Therefore, the number of parsed entity (including genes, proteins, and enzymes) was 18937. Moreover, one gene may be regulated by several TFs, or one TF may regulate numerous target genes. As a result, the total number of pairing regulate relation parsed was 529625.

Chao et al. BMC Medical Genomics 2011 4:23   doi:10.1186/1755-8794-4-23

Open Data