Table 4

Correlations of 14 clusters generated by the GO clustering method and Valle's categories of Human Disease Genes

GO-Clusters (Information Theoretic Distance)

Valle's Protein Function Categories

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Total Genes in Valle's Category


1

Unknown

3

4

2

8

4

2

1

1

8

2

2

2

39

2

Enzyme

4

10

4

5

2

2

48

55

7

48

1

40

6

232

3

Transcription factor

1

7

45

23

2

1

79

4

Receptor

7

11

2

1

58

1

6

86

5

Hormone

11

1

2

14

6

Channel

28

4

32

7

Trans-membrane Transporter

19

15

1

35

8

EC Transport

1

1

3

1

1

3

10

9

Modulator of protein function

5

16

4

17

8

6

3

8

17

7

2

12

105

10

Other

4

2

5

5

3

3

1

1

3

27

11

Extracellular matrix component

8

5

1

2

2

17

2

14

3

54

12

Intracellular matrix component

3

1

1

2

23

19

1

50

13

Immunoglobulin

1

2

1

4

14

Cell Signaling

12

2

1

1

1

1

1

1

20


Mapping of 14 clusters to 14 of Valle's classifications of HDGs. Numbers in the table denote the count of HDGs in each category. By design, multiple clusters could map to a protein function category, but each cluster could not be mapped to more than one category. The bold underlined numbers represent the true positive HDG and the selected Valle Category chosen for each GO Cluster. Other numbers in the cluster are considered as false positive in the evaluation. Valle's categories "unknown" and "others" were not evaluated because of their ambiguity.

Chen et al. BMC Bioinformatics 2007 8(Suppl 3):S7   doi:10.1186/1471-2105-8-S3-S7

Open Data