Bioinformatics Research Center, Department of Statistics, North Carolina State University, Raleigh, NC, 27695, USA

Abstract

Background

Determining the genes responsible for certain human traits can be challenging when the underlying genetic model takes a complicated form such as heterogeneity (in which different genetic models can result in the same trait) or epistasis (in which genes interact with other genes and the environment). Multifactor Dimensionality Reduction (MDR) is a widely used method that effectively detects epistasis; however, it does not perform well in the presence of heterogeneity partly due to its reliance on cross-validation for internal model validation. Cross-validation allows for only one “best” model and is therefore inadequate when more than one model could cause the same trait. We hypothesize that another internal model validation method known as a three-way split will be better at detecting heterogeneity models.

Results

In this study, we test this hypothesis by performing a simulation study to compare the performance of MDR to detect models of heterogeneity with the two different internal model validation techniques. We simulated a range of disease models with both main effects and gene-gene interactions with a range of effect sizes. We assessed the performance of each method using a range of definitions of power.

Conclusions

Overall, the power of MDR to detect heterogeneity models was relatively poor, especially under more conservative (strict) definitions of power. While the overall power was low, our results show that the cross-validation approach greatly outperformed the three-way split approach in detecting heterogeneity. This would motivate using cross-validation with MDR in studies where heterogeneity might be present. These results also emphasize the challenge of detecting heterogeneity models and the need for further methods development.

Background

An important problem in human genetics is the challenge of identifying polymorphisms that are associated with high disease risk. This task can be difficult because the underlying genetic models of many common human diseases, such as heart disease and Type II diabetes, are complex in their genetic etiology

To address these problems, a number of new approaches have been developed to try to detect interactions

MDR is a nonparametric procedure that reduces the dimensionality of the data by classifying each genotype as either high-risk or low-risk and then uses internal model validation, typically either five-fold or ten-fold cross-validation (CV), to select the best model

One drawback of MDR with CV is that it is computationally intensive because it performs an exhaustive search of all possible combinations of factors. Further, the use of m-fold CV for internal model validation requires that the MDR algorithm be executed m times for each possible combination, which adds to the computation time. To help reduce the required computation an alternative internal model validation method, the three-way split (3WS), has been incorporated into the MDR algorithm

Another drawback of MDR is that it performs poorly in the presence of genetic heterogeneity

The purpose of the present study is to compare the effectiveness of MDR with CV to that of MDR with 3WS in situations wherein genetic heterogeneity is present. This is accomplished through simulating genetic data exhibiting heterogeneity and evaluating the success of the two internal model validation methods at identifying the correct underlying models. It is necessary to use simulated data because we must know the true underlying model in order to assess the accuracy of the predicted model and such information is not known with real data.

Methods

Multifactor Dimensionality Reduction (MDR)

MDR is a widely used data mining technique that performs an exhaustive search of all possible genes and combinations of genes to find the best model for a certain genetic trait _{1} cases and n_{0} controls) for which the genotypes at K loci are known and it is believed that the largest interaction involves k terms.

The first step in the MDR algorithm is to enumerate all possible combinations of k loci. For each combination of loci the number of cases and controls are counted for every possible combination of genotypes. For genes with two possible alleles each locus has three possible genotypes, so the data can be classified into 3^{k} genotypic combinations. We will refer to each such combination as a multifactor class. The ratio of cases to controls is calculated for each multifactor class using the sample data and this value is used to classify each multifactor class as either high-risk or low-risk. In the case of balanced data, meaning data with an equal number of cases and controls, the multifactor classes with a case-to-control ratio exceeding one are considered high-risk while those with a ratio below one are considered low-risk. In general the threshold is n_{1}/n_{0}. This high-risk/low-risk parameterization serves to reduce the high dimensionality of the data.

After each multifactor class is categorized as high-risk or low-risk, the observed data are compared to the resulting model to determine what proportion of the observations are classified correctly. The goal is to find the model that minimizes the misclassification rate. When the sample does not include an equal number of cases and controls, balanced accuracy, the mean of sensitivity and specificity, is used

Workflow of the MDR process

**Workflow of the MDR process.**

Cross-validation (CV)

CV is the internal model validation method most commonly used with MDR. Before running the MDR algorithm on any data the full dataset is split into m equal intervals. One of these intervals is considered the testing set while the other m-1 intervals make up the training set. MDR is run on the training data for each of the m possible splits of the data. That is, for each possible combination of k loci MDR is run m times with a different interval being excluded from the analysis each time. After the high-risk and low-risk categories are determined using the training set, the predictive capability of the resulting model is determined using the testing set. For each split of the data and each size of interaction the model that maximizes the prediction accuracy, meaning the one that minimizes the misclassification rate for the testing data, is considered the best model for that size interaction. This process is illustrated in Figure

Workflow of the five-fold cross-validation process

**Workflow of the five-fold cross-validation process.**

The number of times that a particular model is identified as the best model across the m subsets of the data is known as the cross-validation consistency. The model chosen as the best overall model is the one that has both the highest prediction accuracy and the highest cross-validation consistency. If the model that maximizes prediction accuracy is different than the model that maximizes cross-validation consistency, then the more parsimonious model is chosen

Three-way split (3WS)

3WS is an internal model validation method that has only recently been implemented with MDR. For this procedure, the full dataset is randomly split into three parts: a training set to build initial models, a testing set to narrow the list of potential models, and a validation set to choose the best model and assess its predictive capability. It has been shown that the proportion of the data included in each split does not make a major difference in the resulting model, but the optimal split, and the one we use, is a 2:2:1 ratio

When MDR is performed on the training set all possible combinations of loci for each size combination up to size k are considered. The top x models for each size are chosen based on balanced accuracy and these models move on to the testing set. The value of x is arbitrary and is chosen by the user. A common practice, and the one we use in our analysis, is to set x equal to K, the total number of loci being considered. This “rule of thumb” was proposed based on the results of a parameter sweep comparing the performance of MDR with different splits of the data

Workflow of the three-way split approach

**Workflow of the three-way split approach.**

Data simulation

To determine if MDR with 3WS can better detect genetic heterogeneity than MDR with CV we performed a simulation-based study so that we could calculate the empirical power for both methods (since theoretical power calculations are not possible with MDR). Factors of interest considered were the number of loci in the true disease model, the structure of the true model, the odds ratio, and the level of heterogeneity. In particular, genetic heterogeneity models consisting of two one-locus models or two two-locus models were simulated. The one-locus models involved additive or recessive effects while the two-locus models followed an XOR model, which is an epistatic model that has been previously discussed in the literature ^{2}) of .05 which is a very low genetic signal compared to many genetic diseases. The penetrance tables for the models simulated are shown in Figure

Penetrance tables for the models simulated

**Penetrance tables for the models simulated.**

We simulated a total of 21 genetic heterogeneity models. Table

**First model**

**Second model**

**Simulation**

**Disease loci**

**Model type**

**Level of heterogeneity**

**Odds ratio**

**Contribution**

**Odds ratio**

**Contribution**

1

2

additive

25/75

1.5

25%

1.5

75%

2

2

additive

25/75

2

25%

2

75%

3

2

additive

25/75

1.5

25%

2

75%

4

2

additive

25/75

2

25%

1.5

75%

5

2

additive

50/50

1.5

50%

1.5

50%

6

2

additive

50/50

2

50%

2

50%

7

2

additive

50/50

1.5

50%

2

50%

8

2

recessive

25/75

1.5

25%

1.5

75%

9

2

recessive

25/75

2

25%

2

75%

10

2

recessive

25/75

1.5

25%

2

75%

11

2

recessive

25/75

2

25%

1.5

75%

12

2

recessive

50/50

1.5

50%

1.5

50%

13

2

recessive

50/50

2

50%

2

50%

14

2

recessive

50/50

1.5

50%

2

50%

15

4

XOR

25/75

1.5

25%

1.5

75%

16

4

XOR

25/75

2

25%

2

75%

17

4

XOR

25/75

1.5

25%

2

75%

18

4

XOR

25/75

2

25%

1.5

75%

19

4

XOR

50/50

1.5

50%

1.5

50%

20

4

XOR

50/50

2

50%

2

50%

21

4

XOR

50/50

1.5

50%

2

50%

Analysis

All 100 datasets for each of the 21 simulations were analyzed using MDR with five-fold CV and MDR with 3WS. This was done using the MDR package available for the statistical software R

We collected the output from these MDR procedures to assess the accuracy of the final models. Power was calculated as the percentage of times out of the 100 datasets for each simulation that the final model met some specified criterion. We initially computed a conservative estimate of power for which this criterion was that the final predicted model included all of the true disease loci and no false positive loci. It was immediately apparent that both methods did a poor job finding the entire correct model. We therefore defined several more liberal types of power to assess how often each method found at least one of the two models included in the heterogeneity model. For the power labeled mod1 a trial was considered a success if at least the locus or loci of the first of the two models contributing to the heterogeneity model was included in the final predicted model. For the power labeled onlymod1 the requirement was that the final predicted model be exactly the first of the two simulated models contributing to the overall model with no additional loci included. The power definitions mod2 and onlymod2 are analogous to mod1 and onlymod1, but for the second of the two models. We also defined a power, labeled nofalse, that considered a trial a success if the predicted model included any number of correct loci and no false positive loci.

Differences between the performances of the two internal model validation methods were tested using an analysis of variance (ANOVA), implemented in SASv9.2

Results and discussion

MDR was rarely able to detect the true disease model for both the two-locus and four-locus heterogeneity models regardless of whether it was implemented with 3WS or five-fold CV. This is an expected result given previous studies that have examined the power of MDR to detect heterogeneity

Power results for both the three-way split (3WS) and cross-validation (CV) implementations of MDR where power is defined as the percentage of times that both models were identified (with no false positive and no false negative loci)

**Power results for both the three-way split (3WS) and cross-validation (CV) implementations of MDR where power is defined as the percentage of times that both models were identified (with no false positive and no false negative loci).**

Since the conservative power estimates did not provide much information as to which internal model validation method has better performance, 3WS and CV were compared using more liberal estimates of power. These alternative forms of power will be referred to as mod1, mod2, onlymod1, onlymod2, and nofalse. The criteria for mod1 and mod2 was that the final predicted model include all of the true disease loci in either the first or second of the two models contributing to the overall heterogeneity model. This is not as stringent as the conservative power that required all the true disease loci from both of the contributing models to be included in the final predicted model. By easing back the requirement for the method to be considered a success we saw an improvement in performance and the emergence of differences between the two methods. This approach is similar to previous studies that considered heterogeneity

Power results for both the three-way split (3WS) and cross-validation (CV) implementations of MDR where power is defined as the percentage of times that one of the underlying models was identified (with no false negative loci but allowing false positive loci)

**Power results for both the three-way split (3WS) and cross-validation (CV) implementations of MDR where power is defined as the percentage of times that one of the underlying models was identified (with no false negative loci but allowing false positive loci). **The results for Model 1 are shown in **A**, and the results for Model 2 are shown in **B**.

Two more stringent definitions of power that accounted for the inclusion of false positives in the final predicted model, onlymod1 and onlymod2, saw similar improvements in power (when compared to conservative power) for MDR implemented with CV. These definitions of power required that exactly one of the two contributing models be identified with no additional loci included in the final predicted model. For MDR implemented with CV there was a drastic improvement in terms of finding the second model and a minor improvement in terms of detecting the first model. However, MDR implemented with 3WS had very little success detecting either model. Figure

Power results for both the three-way split (3WS) and cross-validation (CV) implementations of MDR where power is defined as the percentage of times that one of the underlying models was identified (with no false positive and no false negative loci)

**Power results for both the three-way split (3WS) and cross-validation (CV) implementations of MDR where power is defined as the percentage of times that one of the underlying models was identified (with no false positive and no false negative loci). **The results for Model 1 are shown in **A**, and the results for Model 2 are shown in **B**.

In fact, MDR implemented with 3WS tends to choose a larger final model than MDR implemented with CV. For the two-locus heterogeneity models the mean size of the final predicted model was 1.89 for 3WS (mode of 2) and 1.21 for CV (mode of 1). For the four-locus heterogeneity models the mean size of the final predicted model was 3.99 for 3WS (mode of 4) and 1.84 for CV (mode of 2). Since 3WS tended to choose the largest possible final model, it had poorer power in terms of producing final models that included exactly one of the two contributing models with no additional loci. It also had a tendency to produce fewer models with no false positives (power labeled nofalse). For all 21 simulations, MDR implemented with CV produced a final predicted model with no false positives in at least six more of the 100 datasets than MDR implemented with 3WS did. In most cases the difference between the two methods was much greater with the disparity between number of datasets yielding a predicted model with no false positives getting as high as 91. This is illustrated in Figure

Power results for both the three-way split (3WS) and cross-validation (CV) implementations of MDR where power is defined as the percentage of times that any of the correct loci were identified (allowing false negative loci but not false positive loci)

**Power results for both the three-way split (3WS) and cross-validation (CV) implementations of MDR where power is defined as the percentage of times that any of the correct loci were identified (allowing false negative loci but not false positive loci)**.

For both the two-locus and four-locus heterogeneity models, MDR implemented with CV tended to outperform MDR implemented with 3WS based on the more liberal definitions of power. Statistical significance (at α = .05) was achieved for mod2 (p-value=.0056), onlymod1 (p-value=.0012), onlymod2 (p-value >.0001), and nofalse (p-value > .0001). The greatest differences in performance were seen with onlymod2 and nofalse where CV had extremely high power while 3WS had minimal power. The only liberal definition of power that did not see a significant difference was mod1. This lack of significance resulted more from the poor performance of MDR implemented with CV than from the strong performance of MDR implemented with 3WS. Many of the models that needed to be identified to be considered a success for this type of power contributed only 25% to the overall heterogeneity model, so they were extremely hard to detect. While the performance of MDR implemented with CV was about the same for mod1 as for onlymod1, there was a significant difference between CV and 3WS based on onlymod1 because MDR implemented with 3WS almost never identified the first model without including any additional loci.

The results of the ANOVA analysis to evaluate the results of the simulations experiment are shown in Table

**Effect**

**Conservative**

**mod1**

**onlymod1**

**mod2**

**onlymod2**

**nofalse**

internal model validation method

0.1637

0.5136

0.0012

0.0056

< .0001

<.0001

level of heterogeneity

0.2482

< .0001

0.0003

0.0005

0.001

0.0733

model type

0.0006

0.0147

0.1672

0.0004

0.0109

0.155

odds ratio (OR)

0.0444

0.2025

0.7075

0.18

0.3708

0.0003

In terms of computing time, MDR implemented with 3WS was approximately five times faster than MDR implemented with CV. This is consistent with results published by Winham et al.

Conclusion

While MDR implemented with CV has been effective at detecting disease models exhibiting epistasis, it has been shown to have a dramatic decrease in power in the presence of genetic heterogeneity

Both 3WS and CV perform extremely poorly in terms of detecting the full heterogeneity model. Neither method did significantly better than the other in this respect, but neither performed well enough to have any practical utility. Looking at more liberal definitions of power, for which it was considered a success if MDR detected one of the two models contributing to the overall genetic heterogeneity model, differences in performance arise. In particular, MDR implemented with CV is significantly better at detecting models that contribute at least 50% to the overall genetic heterogeneity model. There is not, however, a significant difference in the ability of the two methods to detect models that contribute at most 50% to the overall model. This can be attributed primarily to the extremely poor performance of both methods in regard to detecting the less prevalent model.

When the inclusion of false positives into the model predicted by MDR was considered, it was found that MDR implemented with CV is far better than MDR implemented with 3WS at finding exactly one of the two models contributing to the overall genetic heterogeneity model without including any additional loci. The average final model size for MDR implemented with 3WS was about twice that of MDR implemented with CV. This was expected based on previous findings

Ultimately, MDR does not appear to be able to effectively detect models exhibiting genetic heterogeneity regardless of the internal model validation method used. Therefore, some other approach must be developed to find this type of model. Ritchie et al.

Abbreviations

MDR: Multifactor dimensionality reduction; CV: Cross-validation; 3WS: three-way split.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JG and HS performed the analysis, helped design the study, and drafted the manuscript. DMR and AMR helped design the study and edited the manuscript. All the authors approved the final version of the manuscript.

Acknowledgements

This project was supported by NSF-CSUMS project DMS-0703392 (PI: Sujit Ghosh).