Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL 33146, USA

Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA

Abstract

Background

Complex binary traits are influenced by many factors including the main effects of many quantitative trait loci (QTLs), the epistatic effects involving more than one QTLs, environmental effects and the effects of gene-environment interactions. Although a number of QTL mapping methods for binary traits have been developed, there still lacks an efficient and powerful method that can handle both main and epistatic effects of a relatively large number of possible QTLs.

Results

In this paper, we use a Bayesian logistic regression model as the QTL model for binary traits that includes both main and epistatic effects. Our logistic regression model employs hierarchical priors for regression coefficients similar to the ones used in the Bayesian LASSO linear model for multiple QTL mapping for continuous traits. We develop efficient empirical Bayesian algorithms to infer the logistic regression model. Our simulation study shows that our algorithms can easily handle a QTL model with a large number of main and epistatic effects on a personal computer, and outperform five other methods examined including the LASSO, HyperLasso, BhGLM, RVM and the single-QTL mapping method based on logistic regression in terms of power of detection and false positive rate. The utility of our algorithms is also demonstrated through analysis of a real data set. A software package implementing the empirical Bayesian algorithms in this paper is freely available upon request.

Conclusions

The EBLASSO logistic regression method can handle a large number of effects possibly including the main and epistatic QTL effects, environmental effects and the effects of gene-environment interactions. It will be a very useful tool for multiple QTLs mapping for complex binary traits.

Background

Quantitative traits are usually influenced by multiple quantitative trait loci (QTLs), environmental factors and their interactions

A number of statistical methods have been developed to identify QTLs for binary traits in experimental crosses. Single-QTL mapping methods

Since hundreds or thousands of genomic loci or markers are usually genotyped and involved in QTL mapping studies, including all these markers and their possible interactions in a single model for multiple-QTL mapping leads to a huge number of model variables, typically much larger than the sample size. This not only entails huge computation that is not affordable to existing QTL mapping methods mentioned earlier but also may reduce power of detection and/or increase false discover rate. Two techniques have been proposed to handle the second problem: variable or model section and shrinkage. More specifically, model selection using the Akaike or Bayesian information criterion (AIC or BIC) and variable selection based on stepwise logistic regression and the BIC have been proposed in

Recently, we developed an efficient empirical Bayesian LASSO (EBLASSO) algorithm for multiple-QTL mapping for continuous traits, which is capable of handling a large number of markers and their interactions simultaneously

Methods

Logistic regression model

Let _{i} = 0 or 1 denote the binary trait of the ** y** = [

where _{j} and _{jj’} are regression coefficients, and logit(_{i}) is defined as:

The widely adopted Cockerham genetic model
_{ij} for two possible genotypes at marker _{2}) design, there are two possible main effects named additive and dominance effects. The Cockerham model defines the values of the additive effect as −1, 0 and 1 for the three genotypes and the values of the dominance effect as −0.5 and 0.5 for homozygotes and heterozygotes, respectively. For simplicity, we only consider additive effects in (1), although the methods developed in this paper are also applicable to the model with dominance effects.

Let us define **x**_{Gi} = [_{i1}, _{i2}, ⋯, _{im}]^{T}, and
**x**_{GGi} be a _{GG} be a vector consisting of corresponding regression coefficients. We further define **x**_{i} = [1, **x**_{Gi}^{T}, **x**_{GGi}^{T}]^{T} and

From (3), we can express _{i} as follows:

Note that there are ** β** are zeros and thus we have a sparse model. We will exploit this sparsity to develop efficient methods to infer

Prior distributions for the regression coefficients

We assign a noninformative uniform prior to _{0}, i.e., _{0}) ∝ 1. For _{i}, _{i}, _{i}^{2}: _{i} ∼ _{i}^{2}). At the second level, _{i}^{2}, _{i}^{2}) = _{i}^{2}), with parameter

The three-level hierarchical model has two hyperparameters

Empirical Bayesian algorithm for the BLASSO-NEG model (EBLASSO-NEG)

Using the EB approach
_{i}^{2}, ** β** based on the estimated

Let us define
** y** =

where
** β** in (6) to get the marginal posterior distribution of

Let _{i} = 1/_{i}^{2}, **A** is a diagonal matrix With ** α** on its diagonal. Suppose in the (

The gradient ** g** and Hessian matrix

where **B**_{MAP} is obtained with

If we postulate a linear model
_{MAP} being obtained with
** β** in this linear model as follows. The linear model implies that the posterior distribution of

Therefore, in the

where ** ŷ** and the distribution of

Algorithm 1 (EBLASSO-NEG)

1. Initialization: choose _{j}, is identified by
_{0} being the proportion of _{i} = 1 in the dataset. Compute
_{i},

2. While the convergence criteria are not satisfied

3. Given **Â**, use the Newton–Raphson method to find

4. Calculate

5. Apply the EBLASSO algorithm
**Â**.

6. End while

7. Output
**Σ**.

Note that the algorithm starts with a logistic regression model with only one variable and then iteratively adds variables with a finite _{i} to the model. The number of variables in the model _{m} is typically much smaller than the total number of possible variables ** g** and the Hessian matrix

The convergence criteria in Algorithm 1 are defined as: 1) no effect can be added to or delete from the model, 2) the likelihood change between two consecutive iterations is less than a pre-specified small value and 3) the total change of ** α** between two consecutive iterations is less than a pre-specified small value. In step 5, the EBLASSO algorithm

The values of hyperparameters (** y** as the response variable

Suppose that
_{n} entries. Then the posterior distribution of the corresponding _{n} × 1 regression coefficient
**Σ**. For the

We next derive an efficient EB algorithm for the BLASSO-NE model, which simplifies the hyperparameter selection, since we only need to determine the optimal value of one hyperparameter

Empirical Bayesian algorithm for the BLASSO-NE model (EBLASSO-NE)

The prior distribution of _{i}^{2}, _{i}, **A** in each iteration. Then the EBLASSO-NE algorithm uses the same steps in the EBLASSO-NEG except that the new formula is used in step 5 to find **Â**.

Suppose that the postulated linear model after step 4 is
** α** can be found as:

where
**ŷ** in the linear model. Each _{i} will be estimated iteratively by maximizing the log marginal posterior distribution ** α**) with the other parameters fixed. Specifically,

where
** α**) has a unique global maximum and that the optimal

where
_{i}^{2} + 8_{i}^{2}.

**Derivation of equation (12).** Replicates 2 and 3 for the simulations with only main effects.

Click here for file

The EBLASSO-NE algorithm has the same steps as those in Algorithm 1 but with the following two modifications. First, we need to choose a value for parameter _{max}= 1.5_{lasso} based on our simulations showing that this maximum value usually only gives one nonzero regression coefficient. Second, when applying the EBLASSO algorithm in step 5 of Algorithm 1, the EBLASSO algorithm uses equation (12) instead of equation (8) in
**A**.

Note that since the EBLASSO-NEG model for logistic regression considered in this paper uses the same prior as the one used by the EBLASSO for linear regression considered in

Results

Simulation study

A total of ^{-2d} = 0.9048 since the Haldane map function
_{2} family derived from cross of two inbred lines. The dummy variable for the three genotypes, _{1}_{1}_{1}_{2} and _{2}_{2} of individual _{ij} = 1, 0, -1, respectively. Two simulation setups were employed. In the first setup, 20 markers were QTLs with main effects but without interactions. In the second setup, 10 main and 10 epistatic effects were simulated; a marker could have both main and epistatic effects, while two markers involving in an interaction effect did not necessarily have main effects. The QTLs were selected randomly with varying distances (5 cM - 500 cM) and effect sizes (in the range between −1.28 and 2.19). Note that QTLs were assumed to be coincided with markers in both simulation setups. If QTLs are not on markers, they may still be detected since correlation between a QTL and a nearby marker is high, although a slightly larger sample size may be needed to give the same power of detection.

EBLASSO-NEG and EBLASSO-NE algorithms were implemented in C and could be called from the R environment

Simulation result for the dataset with only main effects

The genotypes of **x**_{i} be a 20 × 1 vector containing the genotypes of 20 QTLs of individual ** β** contain corresponding effect sizes. Then probability

**locus**

**True **

**EBLASSO-NE**

**EBLASSO-NEG**

**LASSO**

**HyperLasso**

**BhGLM**

**RVM**

**Single QTL**

^{a}The estimated marker effect is denoted by

^{b}The estimated marker effect was obtained from a neighboring marker (≤ 20 cM) rather than from the marker with true effect.

^{c}Number of effects with a

^{d}Number of effects with a ^{-4} after Bonferroni correction was applied.

11

1.99

0.76(0.22)

1.61(0.22)

0.51(0.73)

1.67(0.30)

1.50(0.27)

3.87(0.60)

0.67(0.13)

26

1.81

0.54(0.19)

1.23(0.21)

−

0.93(0.38)

1.07(0.28)

1.53(0.47)^{b}

0.73(0.13)

42

−1.28

−0.34(0.17)

−0.72(0.21)^{b}

−

−1.02(0.29)

−0.95(0.24)

−

−

48

−0.91

−0.40(0.19)

−0.82(0.21)

−

−1.12(0.28)^{b}

−0.91(0.23)

−

−

72

1.28

−

−

−

−

−

−

0.77(0.14)

73

1.81

1.37(0.21)

1.91(0.24)

1.03(0.85)

2.37(0.32)

2.16(0.28)

4.73(0.59)

0.80(0.14)

123

0.63

−

−

−

−

−

−

−

127

−0.63

−

−

−

−

−

−

−

161

0.44

0.30(0.15)^{b}

0.59(0.19)^{b}

−

0.82(0.25)^{b}

−

1.15(0.35)

0.57(0.13)

181

0.99

0.38(0.20)^{b}

−

−

−

−

−

1.67(0.18)

182

2.19

1.60(0.29)

2.86(0.31)

1.26(0.86)

3.34(0.38)

−2.73(0.37)

5.01(0.72)

1.89(0.19)

185

1.29

0.36(0.17)^{b}

0.56(0.19)^{b}

0.27(0.43)^{b}

1.00(0.27)^{b}

0.73(0.32)

2.38(0.69)^{b}

1.44(0.16)

221

−0.75

−

−0.36(0.16)^{b}

−

−

−

−

−

243

−0.57

−0.34(0.15)

−0.41(0.16)

−0.26(0.33)

−0.75(0.24)

−0.69(0.20)

−1.74(0.45)

−

262

−1.28

−

−

−

−

−

−

−

268

0.91

−

−

−

−

−

2.90(0.62)

−

270

0.57

−

−

−

−

−

−

−

274

−0.99

−

−

−

−

−

−1.90(0.46)^{b}

−

361

0.41

0.30(0.16)^{b}

0.40(0.16)^{b}

0.15(0.56)^{b}

0.77(0.24)^{b}

−0.72(0.21)^{b}

1.80(0.40)^{b}

−

461

0.51

−

−

−

−

−

−

−

Parameter(s)

CPU time(s)

25.56

1.31

1.67

1.90

20.64

54.70

8.84

true/false positive

11/2^{c}

11/1^{c}

6/4^{c}

10/1^{c}

9/0^{c}

17/18^{c}

8/25^{d}

The average log likelihood (denoted as

For the EBLASSO-NE, we first calculated _{max} as described earlier. We then chose a set of values for _{max} to 0.001 at a step of 0.35 on the logarithmic scale. Ten-fold cross validation using this set of values identified an optimal value denoted as _{1.} We next zoomed in the interval of length 0.01 centering at _{1}, and performed cross validation using ten more values equally spaced in the interval. This procedure identified the largest

**Algorithm**

**Parameters**^{a}

**± **^{b}

^{a}Parameters are

^{b}The average log likelihood and standard error were obtained from ten-fold cross validation.

^{c}The optimal log likelihood and corresponding parameter(s) chosen for comparison with other methods.

0.0011

−0.39 ± 0.03

0.0022

−0.42 ± 0.03

0.0447

−0.42 ± 0.04

EBLASSO-NE

0.0500

−0.36 ± 0.02^{c}

0.0631

−0.39 ± 0.02

0.1259

−0.41 ± 0.03

0.2512

−0.40 ± 0.01

(−0.5,0.05)

−0.38 ± 0.03

(0.01,0.05)

−0.37 ± 0.02

(1,0.05)

−0.47 ± 0.02

EBLASSO-NEG

(0.01,5)

−0.39 ± 0.03

(0.01,6)

–0.36 ± 0.02^{c}

(0.01,7)

−0.37 ± 0.02

0.1037

−0.56 ± 0.02

0.0516

−0.44 ± 0.03

LASSO

0.0257

−0.37 ± 0.04

0.0128

−0.35 ± 0.05^{c}

0.0064

−0.36 ± 0.06

The optimal values of parameters _{1}, _{1}) corresponding to the largest _{1} and _{2} corresponding to the largest _{2} was fixed and

For the LASSO-logistic regression, the cross validation procedure in the _{max} for _{max} to _{min} = 0.001_{max} with a step size of [log(_{max}) − log(_{min})]/100 on the logarithmic scale. The largest

The HyperLasso employed the same Bayesian NEG hierarchical prior for the marker effects and estimated the posterior modes using a numerical algorithm
** β** without a confidence interval or a

**Parameters**

**True/False positive effects**^{a}

**Shape **

**Inverse scale **

**Type I error **

^{a}Effects with

^{b}The optimal results chosen for comparison with other methods.

0.1

1.7 × 10^{-3}

0.05

10/1^{b}

0.05

1.5 × 10^{-3}

9/2

0.01

1.4 × 10^{-3}

10/2

0.1

9.8 × 10^{-4}

0.01

9/1

0.05

8.8 × 10^{-4}

9/1

0.01

7.9 × 10^{-4}

9/1

0.1

5.2 × 10^{-4}

8/1

0.05

4.7 × 10^{-4}

8/1

0.01

4.2 × 10^{-4}

8/1

0.1

3.6 × 10^{-4}

7/0

0.05

3.2 × 10^{-4}

7/0

0.01

2.9 × 10^{-4}

7/0

The BhGLM method
** β** follows a normal distribution

**Parameters**

**True/False positive effects**^{a}

^{a}Effects with

^{b}The optimal results chosen for comparison with other methods.

10^{-5}

9/0

10^{-4}

9/0

10^{-3}

10^{-5}

9/0

10^{-2}

9/0

10^{-1}

9/0

10^{-5}

9/0

10^{-4}

9/0

10^{-3}

10^{-4}

9/0^{b}

10^{-2}

9/0

10^{-1}

9/0

10^{-5}

9/0

10^{-4}

9/0

10^{-3}

10^{-3}

9/0

10^{-2}

9/0

10^{-1}

9/0

The RVM for classification

Single QTL mapping with logistic regression was performed using the ^{-4} were considered as significant, which identified 8 true and 25 false positive effects. The estimated sizes of true effects and their standard errors were depicted in Table
^{-4} so that the number of true positive effects detected was increased to 11, at a level same as that of our EBLASSO-NEG and EBLASSO-NE methods, but then the number of false positive effects was increased to 27.

As shown in Table

It is well known that the LASSO typically selects only one variable among a set of highly correlated variables. This phenomenon is indeed observed from the results in Tables

All simulations were performed on a personal computer (PC) with a 3.4 GHz Intel PentiumD CPU and 2Gb memory running Windows XP, except that the HyperLasso was ran on an IBM BladeCenter cluster including computing nodes with 2.6 GHz Xeon or 2.2 GHz Opteron CPU running Linux. The speeds of the EBLASSO-NEG, the LASSO and the HyperLasso are comparable and faster than the other methods. The speeds of the EBLASSO-NE, the BhGLM and the single-QTL mapping method are comparable.

Simulation results for the model with main and epistatic effects

In the second simulation setup, 10 main and 10 epistatic effects were simulated. The genotypes of **X** was of size 1000 × 115,921. QTL mapping was performed with all methods described earlier. However, the BhGLM method did not converge and the RVM ran out of memory due to a large number of nonzero effects included in the model, and thus, they did not yield any result.

**locus **

**locus **

**True **

**EBLASSO- NE**

**EBLASSO-NEG**

**LASSO**

**HyperLasso**

**Two-locus test**

^{a}The estimated marker effect is denoted by

^{b}The estimated marker effect was obtained from a neighboring marker (≤ 20 cM) rather than from the marker with true effect.

^{c}Number of effects with

^{d}Number of effects with a ^{-7} after Bonferroni correction was applied.

11

11

1.99

0.83(0.12)

1.66(0.19)

0.72(0.65)

2.21(0.25)

0.88(0.10)

26

26

1.81

0.46(0.11)

1.42(0.18)

0.39(0.55)

1.73(0.23)

0.56(0.09)

42

42

−1.28

−0.36(0.11)

−0.87(0.20)

−

−1.59(0.21)^{b}

−

48

48

−0.91

−0.19(0.09)^{b}

−0.68(0.19)^{b}

−0.14(0.55)^{b}

−

−

72

72

1.28

1.01(0.16)

2.53(0.20)

0.92(1.18)

3.17(0.27)

1.08(0.10)

73

73

1.81

0.40(0.14)

−

−

−

1.04(0.10)

182

182

2.19

0.50(0.14)

1.57(0.26)

0.51(0.96)

2.03(0.30)

1.23(0.10)

185

185

1.29

0.69(0.14)

1.49(0.26)

0.57(0.91)

1.88(0.30)

1.23(0.10)

262

262

−1.28

−0.24(0.09)

−0.70(0.15)

−0.15(0.46)

−0.78(0.19)^{b}

−

268

268

0.91

−

−

−

−

−

5

6

1.28

0.42(0.13)

1.11(0.22)

0.40(0.63)

1.63(0.28)

−

6

39

1.29

0.38(0.15)^{b}

1.37(0.23)^{b}

0.15(1.16)^{b}

1.28(0.35)^{b}

−

42

220

1.99

0.23(0.13)

1.99(0.25)^{b}

−

2.47(0.32)

0.77(0.14)

81

200

−1.28

−0.36(0.13)^{b}

−1.02(0.22)^{b}

−0.15(1.42)^{b}

−1.22(0.27)^{b}

−

87

164

1.81

0.44(0.17)

1.73(0.25)

0.24(1.44)

2.15(0.32)^{b}

−

87

322

2.19

0.90(0.15)

2.10(0.25)

0.74(0.66)

2.44(0.30)

0.79(0.13)

118

278

−1.28

−0.29(0.12)

−0.76(0.20)

−0.19(1.29)

−0.99(0.26)

−

328

404

−0.99

−0.21(0.12)^{b}

−

−0.15(0.73)^{b}

−1.15(0.30)^{b}

−

373

400

−0.91

−0.22(0.12)^{b}

−1.12(0.22)^{b}

−0.19(0.87)

−1.23(0.27)

−

431

439

1.81

0.24(0.13)

1.37(0.24)

−

1.58(0.29)^{b}

−

Parameter(s)

α=0.01

CPU time(s)

2037.4

268.6

62.7

1094.6

2936.0

True/False positive

19/5^{c}

17/4^{c}

15/26^{c}

17/7^{c}

8/18^{d}

The same cross-validation procedures described earlier were performed to choose the optimal values of the hyperparameters for the EBLASSO-NE, the EBLASSO-NEG and LASSO, and the results for several values of hyperparameters are presented in Table

**Algorithm**

**Parameters**^{a}

**± **^{b}

^{a}Parameters are

^{b}The average log likelihood and standard error were obtained from ten-fold cross validation.

^{c}The optimal log likelihood and corresponding parameter(s) chosen for comparison with other methods.

0.0631

−0.44 ± 0.04

0.0891

−0.41 ± 0.04

0.1259

−0.39 ± 0.03

EBLASSO-NE

0.1600

−0.37 ± 0.01^{c}

0.1778

−0.42 ± 0.04

0.2512

−0.53 ± 0.04

0.3548

−0.47 ± 0.04

(−0.4,0.05)

−0.40 ± 0.05

(−0.2,0.05)

−0.20 ± 0.05

(−0.1,0.05)

−0.10 ± 0.05

EBLASSO-NEG

(−0.2,0.01)

−0.35 ± 0.02

(−0.2,0.1)

−0.33 ± 0.02^{c}

(−0.2,0.5)

−0.35 ± 0.02

0.1027

−0.57 ± 0.01

0.0511

−0.47 ± 0.02

LASSO

0.0254

−0.37 ± 0.02^{c}

0.0127

−0.37 ± 0.03

0.0063

−0.39 ± 0.04

For the HyperLasso, we again examined 12 pairs of values for hyperparameters ^{-4} as the optimal values that gave best tradeoff between the true and false positive effects. We also used a two-locus logistic regression model logit(_{i}) = _{0} + _{1}_{i} + _{2}_{j} + _{3}_{i} ⋅ _{j}, ^{-7} were considered as significant. Detailed QTL mapping results for the HyperLasso and the two-locus test are also given in Table

**Parameters**

**True/False positive effects**^{a}

**Shape**

**Inverse scale **

**Type I error **

^{a}Effects with

^{b}The optimal results chosen for comparison with other methods.

0.1

8.5 × 10^{-4}

0.05

17/18

0.05

7.6 × 10^{-4}

19/18

0.01

6.8 × 10^{-4}

19/19

0.1

4.9 × 10^{-4}

0.01

17/7^{b}

0.05

4.4 × 10^{-4}

17/8

0.01

3.9 × 10^{-4}

17/8

0.1

1.3 × 10^{-4}

5/0

0.05

1.1 × 10^{-4}

5/0

0.01

1.0 × 10^{-4}

8/0

0.1

1.1 × 10^{-4}

5/0

0.05

1.0 × 10^{-4}

5/0

0.01

0.9 × 10^{-4}

5/0

As shown in Table

Real data analysis

We used a mouse data published by Masinde _{2} mice derived from the cross of a faster healer inbred line (MRL/MPj) and a slow healer inbred line (SJL/J). At age 3 weeks, each F_{2} mouse was punched a 2-mm hole in the lower cartilaginous part of each ear using a metal ear puncher. The fast healer mice completely healed in 21 days after ear punch (complete closure of the holes) while the slow healer mice remained open for the holes after 21 days of ear punch. Some of the F_{2} mice healed partially and these mice were phenotypically coded as 1 if the holes were < 0.7 mm and 0 if the holes were > 0.7 mm. This dataset consisted of the genotypes of 119 markers across the mouse genome from 633 samples. Samples with more than 10% of missing markers or with missing phenotype were removed, resulting in a 532 × 119 genotype matrix with 3.28% missing values. These missing genotypes were inferred from neighboring markers. Total number of possible effects is

We carried out QTL mapping for this dataset using the EBLASSO-NE, the EBLASSO-NEG, the LASSO and the HyperLasso, since simulation results presented earlier show these four methods offer better performance than the other methods. Ten-fold cross validation for the EBLASSO-NE, the EBLASSO-NEG and the LASSO were performed with the same procedures used in simulation studies to obtain optimal values of the hyperparameters. For the EBLASSO-NE,

**Marker/Marker pair**^{a }**IDs**

**Position (Chr,cM)**

**EBLASSO-NE**

**EBLASSO-NEG**

**LASSO**

**HyperLasso**

^{a}Paired markers in parenthesis are markers involved in an epistatic effect. Only effects detected by at least three of the four algorithms are shown. All effects listed have a

^{b}Parameters are

^{c}The estimated marker effect was obtained from a neighboring marker D13mit35 (59.0 cM).

^{d}Markers identified previously by Masinde

D1mit334^{d}

(1,49.2)

−0.15(0.28)

−

−0.37(0.19)

−0.80(0.18)

D3mit217^{d}

(3,43.7)

−0.20(0.30)

−0.62(0.13)

−0.42(0.20)

−

D4mit214^{d}

(4,21.9)

−0.24(0.30)

−

−0.46(0.16)

−0.81(0.20)

D6mit261^{d}

(6,29.5)

−0.18(0.29)

−0.42(0.12)

−0.56(0.15)

−0.78(0.18)

D9mit270^{d}

(9,41.5)

−0.25(0.31)

−0.72(0.13)

−0.39(0.23)

−0.57(0.24)

D9mit182

(9,53.6)

−0.25(0.32)

−

−0.51(0.20)

−0.80(0.26)

D13mit228^{d}

(13,45.9)

−0.12(0.27)^{c}

−0.40(0.12)^{c}

0.38(0.20)

0.91(0.19)

(D1mit19;D17mit176)

(1,37.2;17,12.0)

0.32(0.37)

0.60(0.19)

0.89(0.24)

0.92(0.30)

(D4mit31^{d};Dxmit208)

(4,50.3;20,18.6)

0.19(0.33)

0.72(0.18)

0.71(0.23)

0.69(0.29)

(D7mit246;D11mit242)

(7,12.0;11,31.9)

0.20(0.33)

0.48(0.16)

−

0.71(0.25)

Masinde

Some of the identified QTLs are in positions close to the genes that are up-regulated in expression profiles obtained during the inflammation stage of wound healing

Discussion

Our EBLASSO-NEG algorithm is based on a Bayesian logistic regression model that uses the same three-level hierarchical prior for the regression coefficients as the one used in the Bayesian LASSO linear regression model

The LASSO-logistic regression was applied to GWAS to identify genomic loci associated with complex disease

Another prior distribution commonly employed in Bayesian shrinkage is the mixture of normal and inverse-^{2} distributions as used in the Bayesian linear regression model for continuous traits

Our EBLASSO-NEG and EBLASSO-NE use the same Laplace’s method originally proposed in

Our EBLASSO-NEG and EBLASSO-NE estimate the variance of regression coefficients iteratively. In each iteration, Laplace’s method is first used to obtain an approximation of the posterior distribution, which results in an equivalent linear regression model. Then, the EBLASSO-NEG uses the EBLASSO algorithm we developed in

The full Bayesian methods

The MIM methods

We demonstrated that our EBLASSO-NE and EBLASSO-NEG can easily handle a model with 115,922 variables. If much more variables are involved, in e.g. GWAS or QTL mapping with high order interactions, our methods can be combined with a variable screening method such as the sure independence screening (SIS)

Conclusions

We have developed two algorithms named EBLASSO-NEG and EBLASSO-NE for the inference of Bayesian logistic regression models for multiple QTL mapping. While the EBLASSO-NEG is an extension of the EBLASSO

Abbreviations

AIC: Akaike information criterion; BIC: Bayesian information criterion; BhGLM: Bayesian hierarchical generalized linear model; BLASSO: Bayesian LASSO; EB: Empirical Bayes; EBLASSO: Empirical Bayesian LASSO; EM: Expectation-maximization; GWAS: Genome-wide association study; LASSO: Least absolute shrinkage and selection operator; MAP: Maximum

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

AH participated in the design of the algorithms, developed the computer programs, performed the simulations and data analyses, and drafted the manuscript. SX participated in the development of the algorithms, designed simulation study, participated in the data analyses and helped to draft the manuscript. XC conceived of and coordinated the study, developed the algorithms, participated in the data analyses and drafted the manuscript. All authors read and approved the final manuscript.

Acknowledgments

This work was supported by the National Science Foundation (NSF) under NSF CAREER Award no. 0746882 to XC and by the Agriculture and Food Research Initiative (AFRI) of the USDA National Institute of Food and Agriculture under the Plant Genome, Genetics and Breeding Program 2007-35300-18285 to SX.