Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA

Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA

Clinical and Translational Science Institute, University Pittsburgh, Pittsburgh, PA, USA

Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA

Department of Computer Science, Northeastern Illinois University, Chicago, IL, USA

Abstract

Background

Gene-gene epistatic interactions likely play an important role in the genetic basis of many common diseases. Recently, machine-learning and data mining methods have been developed for learning epistatic relationships from data. A well-known combinatorial method that has been successfully applied for detecting epistasis is

Although BNs are a promising tool for learning epistatic relationships from data, we cannot confidently use them in this domain until we determine which scoring criteria work best or even well when we try learning the correct model without knowledge of the number of SNPs in that model.

Results

We evaluated the performance of 22 BN scoring criteria using 28,000 simulated data sets and a real Alzheimer's GWAS data set. Our results were surprising in that the Bayesian scoring criterion with large values of a hyperparameter called α performed best. This score performed better than other BN scoring criteria and MDR at

Conclusions

We conclude that representing epistatic interactions using BN models and scoring them using a BN scoring criterion holds promise for identifying epistatic genetic variants in data. In particular, the Bayesian scoring criterion with large values of a hyperparameter α appears more promising than a number of alternatives.

Background

The advent of high-throughput genotyping technology has brought the promise of identifying genetic variations that underlie common diseases such as hypertension, diabetes mellitus, cancer and Alzheimer's disease. However, our knowledge of the genetic architecture of common diseases remains limited; this is in part due to the complex relationship between the genotype and the phenotype. One likely reason for this complex relationship arises from gene-gene and gene-environment interactions. So an important challenge in the analysis of high-throughput genetic data is the development of computational and statistical methods to identify gene-gene interactions. In this paper we apply Bayesian network scoring criteria to identifying gene-gene interactions from genome-wide association study (GWAS) data.

As background we review gene-gene interactions, GWAS, Bayesian networks, and modeling gene-gene interactions using Bayesian networks.

Epistasis

In Mendelian diseases, a genetic variant at a single locus may give rise to the disease

The ability to identify epistasis from genomic data is important in understanding the inheritance of many common diseases. For example, studying genetic interactions in cancer is essential to further our understanding of cancer mechanisms at the genetic level. It is known that cancerous cells often develop due to mutations at multiple loci, whose joint biological effects lead to uncontrolled growth. But many cancer-associated mutations and interactions among the mutated loci remain unknown. For example, highly penetrant cancer susceptibility genes, such as BRCA1 and BRCA2, are linked to breast cancer

Recently, machine-learning and data mining techniques have been developed to identify epistatic interactions in genomic data. Such methods include combinatorial methods, set association analysis, genetic programming, neural networks and random forests

GWAS

The most common genetic variation is the

The advent of high-throughput technologies has enabled

An important challenge in the analysis of genome-wide data sets is the identification of epistatic loci that interact in their association with disease. Many existing methods for epistasis learning such as combinatorial methods cannot handle a high-dimensional GWAS data set. For example, if we only investigated all 0, 1, 2, 3 and 4-SNP combinations when there are 500,000 SNPs, we would need to investigate 2.604 × 10^{21 }combinations. Researchers are just beginning to develop new approaches for learning epistatic interactions using a GWAS data set

Bayesian Networks

Bayesian networks _{1}, _{2},...,_{n}}, and _{i }is the set of parent nodes of _{1}, then

BNs are often developed by first specifying a DAG that satisfies the Markov condition relative to our belief about the probability distribution, and then determining the conditional distributions for this DAG. One common way to specify the edges in the DAG is to include the edge _{1 }→ _{2 }only if _{1 }is a direct cause of _{2 }

An example BN

**An example BN**. A BN that models lung disorders. This BN is intentionally simple to illustrate concepts; it is not intended to be clinically complete.

Both the parameters and the structure of a BN can be learned from data. The

BN Scoring Criteria

We review several BN scoring criteria for scoring DAG models in the case where all variables are discrete since this is the case for the application we will consider. BN scoring criteria can be broadly divided into Bayesian and information-theoretic scoring criteria.

Bayesian scoring criteria

The Bayesian scoring criteria compute the posterior probability distribution, starting from a prior probability distribution on the possible DAG models, conditional on the _{1}, _{2},...,_{n}} and

where _{i }is the number of states of _{i}, _{i }is the number of different values the parents of _{i }in _{ijk }is the prior belief concerning the number of times _{i }took its _{i }took their _{ijk }is the number of times in the data that _{i }took its _{i }took their

The Bayesian score given by Equation 1 assumes that our prior belief concerning each unknown parameter in each DAG model is represented by a Dirichlet distribution, where the hyperparameters _{ijk }are the parameters for this distribution. Cooper and Herskovits _{ijk }equal to 1, which assigns a prior uniform distribution to the value of each parameter (prior ignorance as to its value). Setting all hyperparameters to 1 yields the

The K2 score does not necessarily assign the same score to Markov equivalent DAG models. Two DAGs are _{ijk }= _{i}_{i}, where _{i }is the number of states of the _{i }is the number of different values the parents of _{i }can jointly assume. When we use a prior equivalent sample size

The Bayesian score does not explicitly include a _{ijk}. Silander et al.

Minimum description length scoring criteria

The

where _{i }is the number of parameters needed to represent the conditional probability distributions associated with the _{i }is the number of states of _{i}, _{ik }is the _{i}, _{i }is the number of different values the parents of _{i }can jointly assume, _{ij }is the _{i}, and the probabilities are estimated from the

Other MDL scores assign different DAG penalties and therefore differ in the first term in Equation 2, but encode the data the same. For example, the _{AIC}. In the DDAG Model section (acronym DDAG is defined in that section) we give an MDL score designed specifically for scoring BNs representing epistatic interactions.

Minimum message length scoring criterion

Another score based on information theory is the

where _{i }is the number of parameters stored for the _{k}_{2 }is the K2 score mentioned previously.

To learn a DAG model from data, we can score all DAG models using one of the scores just discussed and then choose the highest scoring model. However, when the number of variables is not small, the number of candidate DAGs is forbiddingly large. Moreover, the BN structure learning problem has been shown to be NP-hard

In the large sample limit, all the scoring criteria favor a model that most succinctly represents the generative distribution. However, for practical sized data sets, the results can be quite disparate. Silander et al. ^{-20 }to 34,000. Although researchers have recommended various ways for choosing

Detecting Epistasis Using BNs

BNs have been applied to learning epistatic interactions from GWAS data sets. Han et al. ^{2 }test instead of a BN scoring criterion. Verzilli et al.

Jiang et al.

In real data sets, we ordinarily do not know the number of SNPs that influence phenotype. BNMBL may not perform as well if we also scored models containing more than two SNPs. Although BNs are a promising tool for learning epistatic relationships from data, we cannot confidently use them in this domain until we determine which scoring criteria work best or even well when we try learning the correct model without knowledge of the number of SNPs in that model. We provide results of experiments investigating this performance in the Results section.

Diagnostic BNs Containing SNP Variables

BN diagnostic systems that contain SNP information have also been learned from data. For example, Sebastiani et al.

Results

We first describe the BN model used to model SNP interactions associated with disease. Next, we develop a BN score tailored to this model and list the other BN scores that are evaluated. Finally, we provide the results of experiments that evaluate the various BN scores and MDR using simulated data and a real GWAS data set.

The DDAG Model

We use BNs to model the relationships among SNPs and a phenotype such as disease susceptibility. Given a set of SNPs {_{1}, _{2}, ...,_{n}} and a disease

An example DAG

**An example DAG**. A DAG showing probabilistic relationships among SNPs and a disease

An example DDAG

**An example DDAG**. A DDAG showing probabilistic relationships between SNPs and a disease

The number of DAGs that can be constructed is forbiddingly large when the number of nodes is not small. For example, there are ~4.2 × 10^{18 }possible DAGs for a domain with ten variables ^{n }DDAGs, where ^{10 }DDAGs. Though the model space of DDAGs is much smaller that the space of DAGs, it still remains exponential in the number of variables. In the studies reported here, we search in the space of DDAGs.

The BN Minimum Bit Length (BNMBL) Score

An MDL score called BNMBL that is adapted to DDAGs is developed next. Each parameter (conditional probability) in a DAG model learned from data is a fraction with precision 1/_{2 }

Suppose that ^{k }joint states of the parents of ^{k}. If we approximate the precision for each of

The multiplier 2 appears in the second term because each SNP has three values. We need store only two of the three parameters corresponding to the SNP states, since the value of the remaining parameter is uniquely determined given the other two. No multiplier appears in the first term because the disease node has only two values. When we use this DAG penalty in an MDL score (Equation 2), we call the score _{Epi}.

BN Scoring Criteria Evaluated

We evaluated the performance of MDR; three MDL scores: _{Epi}, _{Suz}, and _{AIC}; two Bayesian scores: _{K}_{2}, and _{α}; and the information-theoretic score _{MML}. For _{α }we performed a sensitivity analysis over the following values of _{Epi}_{1}). In the second version, only the _{Epi}_{2}). The penalty term for _{Epi }that is given in Equation 3 is for version 2.

After describing the results obtained using simulated data, we show those for real data.

Simulated Data Results

We evaluated the scoring criteria using simulated data sets that were developed from 70 genetic models with different heritabilities, minor allele frequencies and penetrance values. Each model consists of a probabilistic relationship in which 2 SNPs combined are correlated with the disease, but neither SNP is individually correlated. Each data set has sample size equal to 200, 400, 800, or 1600, and there are 7000 data sets of each size. More details of the datasets are given in the Methods section.

For each of the simulated data sets, we scored all 1-SNP, 2-SNP, 3-SNP, and 4-SNP DDAGs. The total number of DDAGs scored for each data set was therefore 6195. Since in a real setting we would not know the number of SNPs in the model generating the data, all models were treated equally in the learning process; that is, no preference was given to 2-SNP models.

We say that a method

Accuracies of scoring criteria

**Scoring Criterion**

**200**

**400**

**800**

**1600**

**Total**

1

_{
α
}
_{= 15}

4379

5426

6105

6614

22524

2

_{
α
}
_{= 12}

4438

5421

6070

6590

22519

3

_{
α
}
_{= 18}

4227

5389

6095

6625

22336

4

_{
α
}
_{= 9}

4419

5349

5996

6546

22313

5

_{
α
}
_{= 21}

3989

5286

6060

6602

21934

6

_{
α
}
_{= 6}

4220

5165

5874

6442

21701

7

_{
MML
}
_{1}

4049

5111

5881

6463

21504

8

_{
α
}
_{= 24}

3749

5156

5991

6562

21448

9

_{
MDR
}

4112

4954

5555

5982

20603

10

_{
α
}
_{= 3}

3839

4814

5629

6277

20559

11

_{
Epi
}
_{2}

3571

4791

5648

6297

20307

12

_{
α
}
_{= 30}

3285

4779

5755

6415

20234

13

_{
MML
}
_{2}

3768

4914

5754

5780

20216

14

_{
Epi
}
_{1}

2344

5225

6065

6553

20187

15

_{
Suz
}
_{1}

3489

4580

5521

6215

19805

16

_{
α
}
_{= 36}

2810

4393

5464

6150

18817

17

_{
α
}
_{= 42}

2310

4052

5158

5895

17415

18

_{
K
}
_{2}

1850

3475

5095

6116

16536

19

_{
Suz
}
_{2}

2245

3529

4684

5673

16131

20

_{
α
}
_{= 54}

1651

3297

4492

5329

14769

21

_{
AIC
}
_{2}

3364

3153

2812

2520

11847

22

_{
AIC
}
_{1}

2497

1967

1462

1126

7052

23

_{
α
}
_{= 162}

26

476

1300

2046

3848

The number of times out of 7000 data sets that each scoring criterion identified the correct model for sample sizes of 200, 400, 800, and 1600. The last column gives the total accuracy over all sample sizes. The scoring criteria are listed in descending order of total accuracy.

The ability of the highest ranking score (the BDeu _{α = 15}) to identify the correct model was compared to that of the next six highest ranking scores using the McNemar chi-square test (see Table _{MML}_{1}).

Statistical comparison of accuracies of scoring criteria

**Scoring Criterion**

1

_{
α
}
_{= 15}

NA

2

_{
α
}
_{= 12}

0.996

3

_{
α
}
_{= 18}

0.076

4

_{
α
}
_{= 9}

0.046

5

_{
α
}
_{= 21}

4.086 × 10^{-8}

6

_{
α
}
_{= 6}

3.468 × 10^{-14}

7

_{
MML
}
_{1}

1.200 × 10^{-20}

_{α = 15}) with the next six highest ranking scoring criteria using the McNemar chi-square test. Each

BDeu scores with values of

where

Table _{Suz}_{1 }and _{Suz}_{2}, which have the largest DAG penalties of the MDL scores, appear at the bottom of the list. MDR again performed well but substantially worse than the best performing scores.

Recall for scoring criteria

**Scoring Criterion**

**200**

**400**

**800**

**1600**

**Total**

1

_{α = 162}

5259

6043

6566

6890

24758

2

_{
AIC
}
_{2}

5204

5969

6511

6849

24533

3

_{
AIC
}
_{1}

5186

5960

6481

6830

24457

4

_{α = 54}

5223

5941

6473

6813

24450

5

_{
K
}
_{2}

5303

5962

6371

6747

24383

6

_{α = 42}

5203

5902

6425

6794

24324

7

_{α = 36}

5181

5866

6395

6768

24210

8

_{α = 30}

5147

5816

6352

6754

24069

9

_{α = 24}

5080

5767

6300

6725

23872

10

_{α = 21}

5031

5733

6265

6704

23733

11

_{
MDR
}

4870

5710

6324

6748

23652

12

_{α = 18}

4973

5681

6230

6681

23565

13

_{α = 15}

4902

5622

6183

6647

23354

14

_{
Epi
}
_{1}

4984

5529

6105

6575

23193

15

_{α = 12}

4786

5531

6119

6605

23041

16

_{α = 9}

4649

5416

6026

6547

22638

17

_{α = 6}

4383

5219

5901

6453

21956

18

_{
MML
}
_{1}

4151

5159

5903

6473

21686

19

_{
MML
}
_{2}

3881

4969

5780

6412

21042

20

_{
Epi
}
_{2}

3895

4901

5715

6329

20840

21

_{α = 3}

3953

4862

5652

6285

20752

22

_{
Suz
}
_{1}

3618

4696

5595

6251

20160

23

_{
Suz
}
_{2}

2500

3712

4811

5737

17760

The sum of the recall for each scoring criterion over 7000 data sets for sample sizes of 200, 400, 800, and 1600. The last column gives the total recall over all sample sizes. The scoring criteria are listed in descending order of total recall.

Perhaps the smaller DAG penalty is not the only reason that the BDeu scores with larger values of α performed best. It is possible that the BDeu scores with larger values of α can better detect the interacting SNPs than the BDeu scores with smaller values, but that the scores with larger values do poorly at scoring the correct model (the one with only the two interacting SNPs) highest because they too often pick a larger model containing those SNPs. To investigate this possibility, we investigated how well the scores discovered models 55-59 (See Supplementary Table one to

Table _{α = 54}) is compared to the next five highest ranking scores using the McNemar chi-square test. The BDeu score with large values of α performed significantly better than all other scores.

Accuracies of scoring criteria on most difficult models

**Scoring Criterion**

**200**

**400**

**800**

**1600**

**Total**

1

_{α = 54}

14

48

167

352

581

2

_{α = 162}

1

21

146

355

563

3

_{α = 36}

13

46

155

318

532

4

_{α = 21}

12

43

106

289

450

5

_{α = 18}

11

37

91

274

413

6

_{
MDR
}

3

25

79

245

352

7

_{α = 12}

7

25

65

215

312

8

_{
AIC
}
_{2}

16

33

80

138

267

9

_{α = 9}

5

20

48

186

259

10

_{
Epi
}
_{1}

4

16

47

179

246

11

_{
MML
}
_{1}

2

7

23

140

172

12

_{α = 3}

3

6

13

86

108

13

_{
Epi
}
_{2}

0

1

4

72

77

14

_{
Suz
}
_{1}

0

1

2

41

44

The number of times out of 500 that each scoring criterion correctly learned the correct model in the case of the most difficult models (55-59) for sample sizes of 200, 400, 800, and 1600. The last column gives the total accuracy over all sample sizes. The scoring criteria are listed in descending order of accuracy.

Statistical comparison of accuracies of scoring criteria on most difficult models

**Scoring Criterion**

_{α = 54}

NA

_{α = 162}

0.610

_{α = 36}

0.147

_{α = 21}

4.870 × 10^{-5}

_{α = 18}

1.080 × 10^{-7}

_{
MDR
}

7.254 × 10^{-14}

_{α = 15}) with the next five highest ranking scoring criteria using the McNemar chi-square test. Each

The BDeu scores with large α values discovered the difficult models best, though they perform poorly on the average when all models were considered. An explanation for this phenomenon is that these scores can indeed find interacting SNPs better than scores with smaller values of α. However, when the interacting SNPs are fairly easy to identify, their larger DAG penalties makes it harder for them to identify the correct model relative to other scores. On the other hand, when the SNPs are hard to detect, their better detection capability more than compensates for their increased DAG penalty. Additional file

**Illustrative Example of Better Large α Performance**. This file provides an illustrative example to demonstrate a possible explanation for the better performance of the BDeu score at larger values of α on hard-to-detect genetic models.

Click here for file

GWAS Data Results

We evaluated the scoring criteria using a late onset Alzheimer's disease (LOAD) GWAS data set. LOAD is the most common form of dementia in the above 65-year-old age group. It is a progressive neurodegenerative disease that affects memory, thinking, and behavior. The only genetic risk factor for LOAD that has been consistently replicated involves the apolipoprotein E (APOE) gene. The ε4 APOE genotype increases the risk of development of LOAD, while the ε2 genotype is believed to have a protective effect.

The LOAD GWAS data set that we analyzed was collected and analyzed by Rieman et al.

To analyze this Alzheimer GWAS data set, for a representative subset of the scores listed in Table

Evaluation of scoring criteria concerning detection of GAB2 SNPs

**Rank**

**α = 3**

**α = 12**

**α = 21**

**α = 54**

**α = 162**

**α = 1000**

**K2**

**MML1**

**MDLn**

**Suz1**

**Epi2**

**MDR**

1

4

4

4

4 G

4 G

4

4 G

4 G

4 G

3

4 G

4

2

4

4

4

4 G

4 G

4

4 G

4 G

4 G

3 G

4 G

4

3

4

4

4 G

4 G

4

4

4 G

4 G

4 G

3 G

4 G

4

4

4

4

4

4 G

4 G

4

4

4

4 G

3

4 G

4

5

4

4

4

4

4

4

4

3

4 G

3

3

4

6

4

4

4

4

4 G

4

4

4

4

3 G

4 G

4 G

7

4

4

4

4 G

4

4

4

4

4

3 G

4

4

8

4

4

4

4

4

4

4

4

4 G

3 G

4

4 G

9

4

4

4

4 G

4

4

4

4

4 G

3 G

4 G

4

10

4

4

4

4 G

4 G

4

4

3 G

4

2

4 G

4 G

11

4

4

4 G

4 G

4 G

4 G

4 G

4

4

3

4

4

12

4

4 G

4 G

4

4 G

4 G

4 G

4

4

3 G

4

4 G

13

4

4

4

4 G

4 G

4 G

4

4 G

4 G

3 G

4

4

14

4

4

4

4

4 G

4

4 G

4 G

4

3

3 G

4

15

4

4

4 G

4 G

4 G

4

4 G

3 G

4

3 G

4 G

4 G

16

4

4

4 G

4 G

4 G

4 G

4 G

4

4

3 G

3 G

4

17

4

4

4

4 G

4 G

4

4 G

3

4 G

3

4

4 G

18

4

4

4 G

4 G

4 G

4 G

4

4 G

4 G

3 G

4

4

19

4

4

4 G

4 G

4 G

4 G

4

4 G

4 G

3 G

4

4

20

4

4

4

4 G

4 G

4 G

4

4 G

4

3 G

4 G

4 G

21

4

4

4

4 G

4

4 G

4

4 G

4 G

3 G

4 G

4 G

22

4

4 G

4

4 G

4 G

4 G

4

4

4 G

3

4 G

4

23

4

4 G

4

4 G

4

4 G

4 G

4

4 G

3 G

4

4

24

4

4

4 G

4 G

4 G

4

4

4

4 G

3 G

4 G

4

25

4

4

4

4

4 G

4

4

4

4 G

3 G

3

4

Total # G G##GGG

0

3

7

19

18

10

10

11

16

17

14

8

# Diff G

0

2

3

7

6

4

4

4

8

8

8

6

Information about the 25 highest scoring models for a variety of scoring criteria. The number on the left in a cell is the number of SNPs in the model, and the letter G appears to the right of that number if a GAB2 SNP appears in the model. The second to the last row shows the total number of models in the top 25 that contained a GAB2 SNP. The last row shows the total number of different GAB2 SNPs appearing in the top 25 models.

We included two new scores in this analysis. The first score is the BDeu score with α = 1000. We did this to test whether we can get good recall with arbitrarily high values of α. The second new score is an MDL score with no DAG penalty (labelled MDLn in the table). We did this to investigate the recall for the MDL score when we constrain the highest scoring model to be one containing four parent loci.

These results substantiate our hypothesis that larger values of α (54 and 162) can better detect the interacting SNPs. For each of the BDeu scores, the 25 highest-scoring models each contain 4 parent loci. However, when α equals 54 or 162, 19 and 18 respectively of the 25 highest-scoring models contain a GAB2 SNP, whereas for α equal to 12 only 7 of them contain a GAB2 SNP, and for α equal to 3 none of them do. The results for α equal to 1000 are not very good, indicating that we cannot obtain good results for arbitrarily large values of α. The MDL scores (MDLn, Suz1 and Epi2) all performed well, with the Suz1 score never selecting a model with more than 3 parent loci. This result indicates that the larger DAG penalty seems to have helped us hone in on the interacting SNPs. All the MDL scores detected the highest number of different GAB2 SNPs, namely 8. In comparison, MDR did not perform very well, having only 8 models of the top 25 containing GAB2 SNPs and none of the top 5 containing GAB2 SNPs.

Discussion

We compared the performance of a number of BN scoring criteria when identifying interacting SNPs from simulated genetic data sets. Each data set contained 20 SNPs with two interacting SNPs and was generated from one of 70 different epistasis models. Jiang et al.

Table

We evaluated the performance of a subset of the BN scores used in the simulated data analysis on a LOAD GWAS data set. The effectiveness of each score was judged according to how well it substantiated the previously obtained result that the GAB2 gene is associated with LOAD. As shown in Table

Overall, our results are mixed. Although scores with moderate values of α performed better at actually scoring the correct model highest using simulated data sets, scores with larger values of α performed better at recall, at detecting models that are hardest to detect, and at substantiating previous results using a real data set. Our main goal is to develop a method that can discover SNPs associated with a disease from real data. Therefore, based on the results reported here, it seems that it is more promising to use the BDeu score with large values of α (54-162), rather than smaller values.

The MDL scores also performed well in the case of the real data set. An explanation for their poor performance with the simulated data sets is that their DAG penalties are either too large or too small. If we simply used an MDL score with no DAG penalty we should be able to discover interacting SNPs well (as indicated by Table

Another consideration which was not investigated here is the possible increase in false positives with increased detection capability. That is, although the BDeu score with large values of α performed best at recall and at identifying hard-to-detect models, perhaps these scores may also score some incorrect models higher, and at a given threshold might have more false positives. Further research is needed to investigate this matter.

Additional file

Conclusions

Our results indicate that representing epistatic interactions using BNs and scoring them using a BN scoring criteria holds promise for identifying epistatic relationships. Furthermore, they show that the use of the BDeu score with large values of α (54-162) can yield the best results on some data sets. Compared to MDR and other BN scoring criteria, these BDeu scores performed substantially better at detecting the hardest-to-detect models using simulated data sets, and at confirming previous results using a real GWAS data set.

Methods

Simulated Data Sets

Each simulated data set was developed from one of 70 epistasis models described in Velez et al.

Each model represents a probabilistic relationship in which two SNPs together are correlated with the disease, but neither SNP is individually predictive of disease. The relationships represent various degrees of penetrance, heritability, and minor allele frequency. The models are distributed uniformly among seven broad-sense heritabilities ranging from 0.01 to 0.40 (0.01, 0.025, 0.05, 0.10, 0.20, 0.30, and 0.40) and two minor allele frequencies (0.2 and 0.4).

Data sets were generated with case-control ratio (ratio of individuals with the disease to those without the disease) of 1:1. To create one data set they fixed the model. Based on the model, they then generated data concerning the two SNPs that were related to the disease in the model, 18 other unrelated SNPs, and the disease. For each of the 70 models, 100 data sets were generated for a total of 7000 data sets. This procedure was followed for data set sizes equal to 200, 400, 800, and 1600.

GWAS Data Set

Several LOAD GWA studies have been conducted. We utilized data from one such study ^{-8 }to 1 × 10^{-7 }) were located in the GRB-associated binding protein 2 (GAB2) gene on chromosome 11q14.1. Associations with LOAD for 6 of these SNPs were confirmed in the two replication cohorts. Combined data from all three cohorts exhibited significant association between LOAD and all 10 GAB2 SNPs. These 10 SNPs were not significantly associated with LOAD in the APOE

Implementation

We implemented the methods for learning and scoring DDAGs using BN scoring criteria in the Java programming language. MDR v. mdr-2.0_beta_5 (available at

Authors' contributions

XJ conceived the study, developed the DDAG model and the BNMBL score, conducted the experiments, and drafted the manuscript. RN identified the BN scores that were evaluated, performed the statistical analysis, and conceived and wrote Additional file

Acknowledgements

The research reported here was funded in part by grant 1K99LM010822-01 from the National Library of Medicine.