Table 1

Summary of gold standard networks

Dataset

Domain

Instances

Nodes

Edges

Average In-degree


Statlog (Australian Credit Approval)

Industry

690

15

33

2.20

Breast Cancer

Biology

699

10

20

2.00

Car Evaluation

Industry

1,728

7

9

1.29

Cleveland Heart Disease

Biology

303

14

22

1.57

Credit Approval

Industry

690

16

35

2.19

Diabetes

Biology

768

9

13

1.44

Glass Identification

Industry

214

10

17

1.70

Statlog (Heart)

Biology

270

14

21

1.50

Hepatitis

Biology

155

20

36

1.80

Iris

Biology

150

5

8

1.60

Nursery

Industry

12,960

9

14

1.56

Statlog (Vehicle Silhouettes)

Industry

846

19

40

2.11

Congressional Voting Records

Political

436

17

46

2.71


This table describes all of the datasets we used in this study. Dataset gives the name of the dataset in the UCI machine learning repository. Domain gives a rough indication of the domain of the dataset. Instances gives the number of instances in the original dataset. Nodes gives the number of variables in the dataset (and the number of nodes in the corresponding Bayesian network). Edges gives the number of edges in the optimal Bayesian network learned from the original dataset. This is the gold standard network used throughout the rest of the evaluation. Average In - degree gives the average number of parents of each variable in the learned Bayesian network.

Liu et al. BMC Bioinformatics 2012 13(Suppl 15):S14   doi:10.1186/1471-2105-13-S15-S14

Open Data