Department of Statistics & Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695-7566, USA

Department of Genetics, North Carolina State University, Raleigh, NC 27695-7566, USA

Abstract

Background

Although many experiments have measurements on multiple traits, most studies performed the analysis of mapping of quantitative trait loci (QTL) for each trait separately using single trait analysis. Single trait analysis does not take advantage of possible genetic and environmental correlations between traits. In this paper, we propose a novel statistical method for multiple trait multiple interval mapping (MTMIM) of QTL for inbred line crosses. We also develop a novel score-based method for estimating genome-wide significance level of putative QTL effects suitable for the MTMIM model. The MTMIM method is implemented in the freely available and widely used Windows QTL Cartographer software.

Results

Throughout the paper, we provide compelling empirical evidences that: (1) the score-based threshold maintains proper type I error rate and tends to keep false discovery rate within an acceptable level; (2) the MTMIM method can deliver better parameter estimates and power than single trait multiple interval mapping method; (3) an analysis of

Conclusions

The MTMIM method represents a convenient statistical framework to test hypotheses of pleiotropic QTL versus closely linked nonpleiotropic QTL, QTL by environment interaction, and to estimate the total genotypic variance-covariance matrix between traits and to decompose it in terms of QTL-specific variance-covariance matrices, therefore, providing more details on the genetic architecture of complex traits.

Background

Many traits that are important to agriculture, human health and evolutionary biology are quantitative in nature, influenced by multiple genes. Efficient and robust identification and mapping onto genomic positions of those genes is a very important goal in quantitative genetics. The availability of genome-wide molecular markers provides the means for us to locate and map those quantitative trait loci (QTL) in a systematic way. Since the publication of interval mapping method for QTL genome-wide scan

Although single trait QTL mapping methods have been applied in many studies to estimate the genetic basis and architecture of complex traits, these methods did not utilize the information of genetic and environmental correlations between traits, and are not ideal for data analysis. Multiple trait analysis however can take these into account and also can formally test a number of hypotheses concerning the nature of genetic correlations, such as pleiotropy vs. close linkage and genotype by environment interaction

Multiple trait CIM

In what follows, we motivate MTMIM modeling from a practical point of view, describe the MTMIM statistical model, build the likelihood function, derive parameter estimators, extend the score-based threshold method

We organize this paper in a manner such that a reader less interested in the mathematical aspect of the modeling could skip the analytical derivations while being able to understand the main points regarding multiple trait multiple interval mapping of QTL.

A motivating example

We use data from a cross between fruit flies _{1} hybrids. F_{1} females were then crossed to each parental species to produce two backcross populations of males,

All variables related to the posterior lobe (PC1, ADJPC1 and AREA) were reported to be highly correlated between themselves in both BM1 and BS1, correlation larger than 0.82

We carried out MIM analysis of PC1 and ADJPC1 in the pooled samples of BM1 and BM2 (n=192+299), hearafter referred as BM data, and we found statistical evidence for seventeen genomic regions harboring QTL (Figure

LRT profile of separate MIM analyses of PC1 and ADJPC1, and MTMIM analysis of PC1 and ADJPC1 (Joint) for the BM data

**LRT profile of separate MIM analyses of PC1 and ADJPC1, and MTMIM analysis of PC1 and ADJPC1 (Joint) for the BM data.** LRT profile of separate MIM analyses of PC1 and ADJPC1, and MTMIM analysis of PC1 and ADJPC1 (Joint) for the BM data with 10% genome-wide significance level. Tick marks in the horizontal axis represent positions of genetic markers on chromosomes X, 2 and 3 (from left to right). Bold triagles bellow the horizontal axis indicate positions of mapped QTL in separate and joint analyses. Map distances are expressed in centiMorgans according to Haldane’s mapping function.

Positions of mapped QTL in regions 4, 5, 7, 10, 11, 13, 14, 16 and 17 (Figure

Results and discussion

Type I error

The results show clearly an excellent agreement between estimated type I error and nominal level in the range of 1 to 15% (Figure

Estimated and expected genome-wide type I error

**Estimated and expected genome-wide type I error.** Estimated and expected type I error, in percentage, of LRT when using the genome-wide score-based threshold to assess significance level of putative QTL in genome-wide scan of 1000 replicates.

Model size (results not shown)

The number of QTL in the MTMIM model of scenario SI was much closer to the simulated parameter (five QTL) when compared to scenario SII, for any genome-wide significance level. While a QTL in both scenarios has to exceed very similar thresholds to be declared significant in the forward selection, the number of traits affected by a QTL is rather different between the two scenarios. In scenario SI all QTL have effect on all traits, while in scenario SII a QTL may have effect either on one, two or three traits. Therefore, model overparametrization makes the detection of QTL with effects on one and two traits in scenario SII more difficult. Lastly, our results show that in general the number of mapped QTL is closer to the simulated (five QTL) in the MTMIM than in the MIM model.

FDR

FDR is a very import measure of quality control in statistical analysis

**Analysis**

**SI**

**SII**

**SIII**

Estimates of FDR (%) in the MIM and MTMIM models as observed in scenarios SI, SII and SIII across genome-wide significance levels (1, 5 and 10%) and LOD-

**(trait)**

**LOD- d**

**1%**

**5%**

**10%**

**1%**

**5%**

**10%**

**1%**

**5%**

**10%**

MIM

1.0

9.1

9.1

9.9

8.9

9.2

10.0

7.2

7.9

8.7

(T1)

1.5

3.9

4.4

5.3

3.7

4.3

5.3

2.8

3.5

4.1

2.0

2.0

2.7

3.6

1.8

2.2

3.0

1.4

1.9

2.3

MIM

1.0

8.0

8.7

8.9

7.9

8.6

9.6

6.2

7.0

7.8

(T2)

1.5

3.9

4.2

4.7

3.2

4.1

5.4

3.1

3.7

4.5

2.0

2.0

2.3

3.0

1.2

2.2

3.6

1.2

2.1

2.8

MIM

1.0

10.7

9.6

9.9

12.4

13.8

18.0

–

–

–

(T3)

1.5

3.8

4.2

4.9

7.5

9.0

11.4

–

–

–

2.0

1.8

2.3

3.1

4.8

6.5

8.5

–

–

–

MTMIM

1.0

4.6

5.4

6.9

8.5

9.2

10.0

5.6

7.8

8.4

1.5

1.9

2.7

4.0

3.3

4.1

4.9

2.9

5.2

5.7

2.0

1.1

1.9

3.3

1.4

2.4

3.2

2.2

4.1

4.5

Power

Results of power for the MIM and MTMIM models of all three scenarios clearly show a remarkable increment in power as genome-wide significance levels grow less stringent, for any LOD-

**Analysis**

**SI**

**SII**

**SIII**

Power (%) of QTL identification in the MIM and MTMIM models as observed in scenarios SI, SII and SIII across genome-wide significance levels (1, 5, and 10%) and LOD-1.5 support interval.

**(trait)**

**QTL**

**1%**

**5%**

**10%**

**1%**

**5%**

**10%**

**1%**

**5%**

**10%**

Q1

66.8

82.0

86.6

65.8

80.2

84.2

67.6

77.2

79.6

MIM

Q2

63.6

81.8

87.6

59.8

78.2

81.8

–

–

–

(T1)

Q3

67.4

81.6

87.2

63.2

81.2

85.8

75.2

87.0

90.2

Q4

66.4

81.8

87.0

63.4

78.4

83.4

–

–

–

Q5

66.8

83.6

86.4

65.6

82.0

87.2

70.2

78.4

81.6

Q1

64.8

80.0

88.2

–

–

–

–

–

–

MIM

Q2

64.8

80.0

84.8

74.4

85.4

89.8

64.2

74.2

76.4

(T2)

Q3

65.6

79.8

83.4

76.4

86.0

90.0

76.4

88.4

91.2

Q4

66.0

82.4

87.0

77.4

87.6

92.0

74.6

86.0

88.0

Q5

68.4

83.0

88.8

–

–

–

–

–

–

Q1

65.6

81.4

86.0

–

–

–

–

–

–

MIM

Q2

63.2

80.0

86.6

–

–

–

–

–

–

(T3)

Q3

65.6

80.4

84.0

53.4

70.6

77.8

–

–

–

Q4

65.4

80.8

87.8

–

–

–

–

–

–

Q5

65.4

83.0

88.6

–

–

–

–

–

–

Q1

98.8

99.4

99.4

53.8

71.0

78.2

65.4

65.2

70.0

MTMIM

Q2

98.0

98.0

98.2

89.0

94.4

95.6

64.6

66.6

68.0

Q3

97.0

97.4

97.4

96.6

97.0

97.2

94.4

96.4

97.0

Q4

98.4

98.8

99.0

87.6

93.2

94.6

74.8

77.4

78.2

Q5

98.6

98.6

98.6

57.2

71.8

78.4

65.6

66.2

68.0

Results of power (10% genome-wide significance level and LOD-1.5) to identify QTL in the MTMIM model show that QTL affecting more traits have higher chances of being identified in the forward selection. In scenario SI, which is the most favorable among all three scenarios, all QTL have effects on all traits. Therefore, all QTL were correctly identified very often, power ≥ 97

In scenarios SII and SIII, we decomposed power of QTL identification (10% genome-wide significance level and LOD-1.5) into three nonoverlapping subsets (Table

**Scenario**

**Subsets**

Decomposition of total power (P_{total} in Table
_{trait}) with 10% genome-wide significance level and LOD-1.5 support interval. In SII, subsets (1, 0, 0), (1, 1, 0) and (1, 1, 1) contain replicates with QTL affecting T1 only, T1 and T2, and T1, T2 and T3, respectively. In SIII, subsets (1, 0), (0, 1) and (1, 1) contain replicates with QTL affecting T1 only, T2 only, and T1 and T2, respectively. The QTL-trait to the overall power ratio (ratio=P_{trait} /P_{total}) is also presented.

**(1,0,0)**

**(1,1,0)**

**(1,1,1)**

**Q1**

**Q2**

**Q3**

**Q4**

**Q5**

**Q1**

**Q2**

**Q3**

**Q4**

**Q5**

**Q1**

**Q2**

**Q3**

**Q4**

**Q5**

SII

P_{trait}

66.4

1.2

0.0

0.8

64.0

4.2

86.4

5.0

87.2

8.2

0.8

6.6

89.0

5.8

0.2

ratio

0.85

0.01

0.00

0.01

0.82

0.05

0.90

0.05

0.92

0.10

0.01

0.07

0.92

0.06

0.00

**(1,0)**

**(0,1)**

**(1,1)**

SIII

P_{trait}

36.8

2.8

3.4

1.0

46.0

2.8

36.2

4.0

49.6

1.2

30.4

29.0

89.6

27.6

20.8

ratio

0.53

0.04

0.04

0.01

0.68

0.04

0.53

0.04

0.63

0.02

0.43

0.43

0.92

0.35

0.31

Mean position of QTL

Our simulations show that mean estimates of QTL position in the MIM and MTMIM models have no qualitative difference and are in close agreement with the simulated parameters (Table

**Position**

**Coverage**

**Length**

Means of QTL position (cM), LOD-

**Analysis (Trait)**

**QTL**

**Parameter**

**Estimate**

**1**

**1.5**

**2**

**1**

**1.5**

**2**

MIM (T1)

Q1

23 [1]

23.7 (0.31)

91.4

95.7

99.3

21.7 (0.42)

29.4 (0.55)

37.3 (0.66)

Q2

15 [2]

14.6 (0.31)

92.2

95.8

98.1

21.1 (0.38)

27.7 (0.55)

34.9 (0.73)

Q3

45 [3]

45.4 (0.38)

88.8

95.8

98.2

23.7 (0.49)

33.0 (0.67)

41.9 (0.81)

Q4

67 [5]

66.9 (0.29)

92.2

95.8

98.4

20.2 (0.35)

26.7 (0.51)

35.4 (0.79)

Q5

53 [6]

52.9 (0.33)

93.4

98.8

99.6

21.3 (0.43)

28.7 (0.56)

36.4 (0.68)

MIM (T2)

Q2

15 [2]

14.7 (0.30)

92.6

97.4

98.7

21.0 (0.88)

27.9 (0.55)

34.1 (0.67)

Q3

45 [3]

45.2 (0.35)

90.6

95.9

98.3

22.3 (0.38)

29.8 (0.56)

39.1 (0.74)

Q4

67 [5]

67.0 (0.27)

95.3

98.1

99.6

19.6 (0.33)

26.1 (0.49)

32.6 (0.67)

MIM (T3)

Q3

45 [3]

44.7 (0.45)

88.8

94.6

96.8

25.3 (0.55)

35.3 (0.74)

46.2 (0.88)

MTMIM

Q1

23 [1]

23.5 (0.32)

89.5

95.6

97.6

20.0 (0.38)

26.4 (0.47)

33.1 (0.56)

Q2

15 [2]

14.4 (0.22)

93.1

97.8

98.9

16.2 (0.25)

21.0 (0.33)

25.3 (0.39)

Q3

45 [3]

44.9 (0.18)

92.8

97.2

99.4

13.1 (0.22)

17.2 (0.28)

20.7 (0.33)

Q4

67 [5]

67.6 (0.19)

94.2

97.5

98.9

15.6 (0.23)

20.3 (0.31)

24.2 (0.39)

Q5

53 [6]

52.8 (0.37)

89.5

97.8

99.8

19.7 (0.41)

26.1 (0.51)

32.6 (0.60)

Coverage and length of LOD-d support interval

In Table

Mean effect of QTL

The average of effects of QTL in scenario SI (Table

**SI**

**SII**

**SIII**

Mean effect of QTL in the MIM and MTMIM models as observed in scenarios SI, SII and SIII with 10% genome-wide significance level and LOD-1.5 support interval. Standard errors of means are between parentheses.

**Trait**

**QTL**

**Parameter**

**MIM**

**MTMIM**

**MIM**

**MTMIM**

**MIM**

**MTMIM**

T1

Q1

0.52

0.57 (0.006)

0.51 (0.007)

0.56 (0.005)

0.56 (0.005)

0.57 (0.006)

0.56 (0.011)

Q2

0.52

0.56 (0.006)

0.51 (0.006)

0.56 (0.006)

0.52 (0.007)

–

0.20 (0.019)

Q3

0.52

0.56 (0.006)

0.52 (0.006)

0.54 (0.005)

0.51 (0.007)

0.57 (0.005)

0.52 (0.008)

Q4

0.52

0.55 (0.006)

0.51 (0.006)

0.55 (0.006)

0.52 (0.006)

–

0.13 (0.015)

Q5

0.52

0.56 (0.006)

0.52 (0.007)

0.55 (0.006)

0.56 (0.005)

0.58 (0.005)

0.58 (0.013)

T2

Q1

0.52

0.55 (0.007)

0.50 (0.007)

–

0.00 (0.004)

–

0.23 (0.016)

Q2

0.52

0.56 (0.005)

0.51 (0.006)

0.57 (0.006)

0.54 (0.007)

0.58 (0.006)

0.55 (0.009)

Q3

0.52

0.56 (0.005)

0.52 (0.006)

0.57 (0.005)

0.54 (0.007)

0.57 (0.005)

0.54 (0.008)

Q4

0.52

0.55 (0.005)

0.50 (0.006)

0.57 (0.005)

0.55 (0.006)

0.58 (0.006)

0.60 (0.008)

Q5

0.52

0.55 (0.006)

0.52 (0.007)

–

0.00 (0.005)

–

0.09 (0.015)

T3

Q1

0.52

0.56 (0.005)

0.52 (0.006)

–

0.00 (0.005)

–

–

Q2

0.52

0.55 (0.005)

0.51 (0.007)

–

0.01 (0.004)

–

–

Q3

0.52

0.55 (0.005)

0.51 (0.006)

0.51 (0.006)

0.44 (0.008)

–

–

Q4

0.52

0.55 (0.005)

0.52 (0.007)

–

0.00 (0.003)

–

–

Q5

0.52

0.56 (0.006)

0.53 (0.008)

–

0.00 (0.004)

–

–

The effects of all QTL were overestimated in the MIM model. This phenomena is expected due to estimation conditional on detection, the so-called “Beavis effect”

Pleiotropic versus closely linked nonpleiotropic QTL

In scenario SIII, after selecting an MTMIM model in the forward selection, each mapped pleiotropic QTL was tested against the alternative of closely linked nonpleiotropic QTL. In the bivariate model, we performed a two-dimensional search for positions of putative closely linked nonpleiotropic QTL in the neighborhood of the position of each pleiotropic QTL, as suggested in

Because Q3 was simulated as being pleiotropic, rejection of pleiotropic hypothesis for Q3 provides a measure of type I error. On the other hand, Q1 and Q2, and Q4 and Q5 were simulated as pairs of closely linked nonpleiotropic QTL. Therefore, rejection of pleiotropic hypothesis at these QTL provides a measure of power. Under our simulation setting, the LRT performed better than the AICc. The LRT was able to keep the best balance between type I error and power. Estimated frequency of rejecting pleiotropy for Q3 (4%) using the LRT agrees very well with the expected 5% nominal error rate, and estimated frequency of rejecting pleiotropy for Q1 (38%) and Q2 (36%) are satisfactory high, taking into account that Q1 and Q2 are considerably close to each other in a linkage map with markers considerably distant from each other (10 cM from marker-to-marker). On the other hand, the AICc criterion showed higher power for Q1 (45%) and Q2 (45%), but with a cost of high type I error for Q3 (15%). Moreover, because Q4 and Q5 are 15 cM apart from each other, the frequency of rejecting pleiotropy using LRT for these two QTL (41 and 48%, respectively) is higher than for Q1 (38%) and Q2 (36%), which are 10 cM apart from each other.

Motivating example revisited

Motivated by the fact that the joint analysis of PC1 and ADJPC1 in the

The LRT profiles of genome-wide scan in the BM data (Figure

**MIM**

**MTMIM (GEM-NR)**

**MTMIM (ECM)**

Estimates of QTL position (
^{5} ) are also shown.

^{a} Estimated position (cM) of QTL from the leftmost genetic marker on the chromosome.

^{ns} Nonsignificant main effect tested with the LRT and 5% significance level. The critical value of the LRT was obtained from the chi-squared distribution function with one degree of freedom.

**PC1**

**ADJPC1**

**(PC1 and ADJPC1)**

**(PC1 and ADJPC1)**

**QTL**

^{
a
}

Chromosome X

1

1

0.0020

1

0.0165

1

0.0021

0.0175

0.0021

0.0175

2

20

0.0018

20

0.0284

20

0.0017

0.0275

0.0017

0.0275

Chromosome 2

3

–

–

1

0.0304

1

0.0007

0.0293

0.0007

0.0293

4

14

0.0018

17

0.0215

17

0.0018

0.0220

0.0018

0.0220

5

26

0.0017

30

0.0141

29

0.0012

0.0146

0.0011

0.0146

6

71

0.0016

–

–

70

0.0017

-0.0048^{ns}

0.0017

-0.0048^{ns}

7

111

0.0009

116

0.0147

116

0.0011

0.0176

0.0011

0.0177

8

144

0.0012

144

0.0091

144

0.0011

0.0082

0.0011

0.0082

Chromosome 3

9

5

0.0013

–

–

4

0.0011

0.0107

0.0011

0.0107

10

17

0.0022

16

0.0503

17

0.0022

0.0427

0.0022

0.0426

11

48

0.0033

44

0.0279

45

0.0027

0.0253

0.0027

0.0254

12

–

–

54

0.0235

54

0.0007^{ns}

0.0255

0.0007

0.0254

13

82

0.0033

83

0.0391

83

0.0034

0.0394

0.0034

0.0394

14

112

0.0009

116

0.0324

115

0.0009

0.0257

0.0009

0.0257

15

129

0.0015

–

–

128

0.0012

0.0094^{ns}

0.0012

0.0094^{ns}

16

147

0.0007

146

0.0116

145

0.0009

0.0092

0.0009

0.0092

17

169

0.0021

166

0.0268

167

0.0021

0.0273

0.0021

0.0273

Total QTL

15

14

17

2.761

31.73

31.73

521.6

2.358

–

2.369

31.48

–

453.0

31.48

453.2

MIM models of PC1 and ADJPC1 all together showed statistical evidence of twelve genomic regions with statistical significant QTL affecting both traits, and five regions with statistically significant QTL affecting either one of the traits (regions 3, 6, 9 , 12 and 15 shown in Figure

Positions of QTL in regions 4, 5, 7, 10, 11, 13, 14, 16 and 17 (Figure

Partition of the phenotypic variance-covariance matrix between PC1 and ADJPC1 in terms of their environmental and genotypic components, as estimated in the MTMIM model, shows that most of the phenotypic variance-covariance between these traits is due to the genotypic component (Table

**QTL**

**Traits**

**QTL**

**1**

**2**

**3**

**4**

**5**

**6**

**7**

**8**

**9**

**10**

**11**

**12**

**13**

**14**

**15**

**16**

**17**

PC1

1

0.11

0.93

0.12

1.49

0.00

0.08

0.01

0.07

0.00

0.03

0.00

-0.01

0.00

0.05

0.01

0.07

0.00

-0.02

-0.01

-0.11

-0.03

-0.28

-0.01

-0.13

0.01

0.06

-0.01

-0.10

-0.01

-0.05

0.00

0.02

0.00

0.03

ADJ

0.93

7.69

1.49

16.20

0.08

1.06

0.07

0.64

0.03

0.30

-0.01

0.11

0.05

0.57

0.07

0.54

-0.02

-0.16

-0.11

-1.32

-0.28

-2.44

-0.13

-1.74

0.06

0.61

-0.10

-1.23

-0.05

-0.39

0.02

0.21

0.03

0.28

PC1

2

0.08

1.21

0.00

0.00

0.00

0.00

0.00

0.02

0.00

-0.03

0.01

0.10

0.01

0.09

0.00

0.02

0.00

0.05

-0.01

-0.16

0.00

-0.05

0.01

0.20

-0.01

-0.13

-0.01

-0.10

0.00

0.00

-0.01

-0.10

ADJ

1.21

19.13

0.00

-0.05

0.00

0.05

0.02

0.26

-0.03

0.17

0.10

1.57

0.09

0.92

0.02

0.29

0.05

0.81

-0.16

-1.83

-0.05

-1.09

0.20

2.66

-0.13

-2.57

-0.10

-1.00

0.00

0.04

-0.10

-1.36

PC1

3

0.01

0.52

0.05

1.30

0.02

0.47

0.00

0.09

0.00

-0.03

0.00

-0.04

0.00

0.07

0.00

-0.02

-0.01

-0.22

0.00

-0.08

-0.01

-0.28

0.00

-0.01

0.00

0.04

0.00

0.04

0.00

-0.04

ADJ

0.52

21.64

1.30

24.16

0.47

9.37

0.09

-0.57

-0.03

-0.63

-0.04

-0.51

0.07

1.06

-0.02

-0.55

-0.22

-3.36

-0.08

-3.13

-0.28

-5.17

-0.01

-0.20

0.04

0.49

0.04

0.70

-0.04

-0.72

PC1

4

0.10

1.18

0.07

0.92

0.03

0.15

0.00

0.00

-0.01

-0.05

0.01

0.11

0.00

-0.02

-0.03

-0.32

-0.01

-0.16

-0.02

-0.24

0.01

0.13

0.01

0.07

0.01

0.06

0.00

-0.03

ADJ

1.18

14.08

0.92

11.44

0.15

-1.12

0.00

-0.03

-0.05

-0.45

0.11

1.14

-0.02

-0.36

-0.32

-3.32

-0.16

-2.94

-0.24

-2.88

0.13

2.22

0.07

0.64

0.06

0.68

-0.03

-0.41

PC1

5

0.02

0.31

0.03

0.13

0.00

0.04

0.00

-0.01

0.00

0.05

0.00

-0.02

-0.01

-0.13

0.00

-0.07

-0.01

-0.10

0.00

0.05

0.00

0.03

0.00

0.03

0.00

0.03

ADJ

0.31

4.06

0.13

-0.93

0.04

0.57

-0.01

-0.05

0.05

0.50

-0.02

-0.39

-0.13

-1.40

-0.07

-1.44

-0.10

-1.19

0.05

0.83

0.03

0.32

0.03

0.37

0.03

0.38

PC1

6

0.07

-0.20

0.02

0.16

0.01

0.02

0.00

-0.01

-0.01

-0.06

-0.01

-0.02

0.00

-0.01

0.00

0.00

0.00

0.05

0.00

0.01

0.00

0.01

0.00

0.01

ADJ

-0.20

0.57

0.16

-1.08

0.02

-0.17

-0.01

0.08

-0.06

0.39

-0.02

0.16

-0.01

0.08

0.00

0.03

0.05

-0.33

0.01

-0.05

0.01

-0.07

0.01

-0.09

PC1

7

0.03

0.49

0.03

0.32

0.00

0.01

0.00

0.08

0.00

0.03

0.00

0.02

0.00

0.02

0.00

-0.10

0.00

-0.05

0.00

-0.03

-0.01

-0.11

ADJ

0.49

7.76

0.32

3.09

0.01

0.09

0.08

1.38

0.03

0.30

0.02

0.35

0.02

0.21

-0.10

-2.07

-0.05

-0.56

-0.03

-0.41

-0.11

-1.61

PC1

8

0.03

0.22

0.00

0.00

0.00

0.04

0.00

-0.02

0.00

0.00

0.00

0.03

0.00

-0.04

0.00

-0.01

0.00

-0.01

0.00

-0.02

ADJ

0.22

1.54

0.00

0.01

0.04

0.37

-0.02

-0.13

0.00

0.01

0.03

0.26

-0.04

-0.44

-0.01

-0.11

-0.01

-0.04

-0.02

-0.15

9

0.03

0.30

0.08

1.24

0.04

0.34

0.01

0.15

-0.01

-0.08

0.00

0.01

0.00

0.00

0.00

0.00

0.00

0.00

0.30

2.88

1.24

16.13

0.34

3.25

0.15

2.34

-0.08

-0.85

0.01

0.14

0.00

0.00

0.00

-0.01

0.00

-0.05

10

0.12

2.32

0.13

1.93

0.02

0.73

0.02

0.29

0.00

0.06

0.00

-0.01

0.00

-0.04

-0.01

-0.20

2.32

45.50

1.93

24.29

0.73

18.94

0.29

4.25

0.06

1.36

-0.01

-0.07

-0.04

-0.61

-0.20

-3.05

11

0.18

1.64

0.07

1.66

0.15

1.58

0.01

0.27

0.01

0.08

0.00

-0.01

-0.01

-0.14

1.64

15.19

1.66

24.92

1.58

16.29

0.27

3.81

0.08

0.68

-0.01

-0.14

-0.14

-1.48

12

0.01

0.38

0.05

1.16

0.00

0.14

0.00

0.07

0.00

0.00

0.00

-0.07

0.38

14.74

1.16

20.87

0.14

4.55

0.07

0.89

0.00

-0.04

-0.07

-1.41

13

0.27

3.12

0.05

0.91

0.04

0.39

0.01

0.17

0.01

0.09

3.12

36.41

0.91

15.12

0.39

3.66

0.17

1.89

0.09

1.06

14

0.02

0.53

0.04

0.70

0.02

0.31

0.01

0.31

0.53

15.11

0.70

8.66

0.31

5.01

0.31

5.53

15

0.04

0.29

0.03

0.30

0.04

0.38

0.29

2.27

0.30

2.82

0.38

3.73

16

0.02

0.18

0.05

0.57

0.18

2.03

0.57

6.83

17

0.11

1.44

1.44

18.55

Total

2.36

31.48

31.48

453.20

The possibility of fitting many traits and many QTL in the MTMIM model imposes severe burden in the estimation of parameters both in terms of reliability of parameter estimates (accuracy) and computation time (speed). The GEM-NR and ECM algorithms are two alternative approaches suitable for parameter estimation in such complex models. We evaluate these two algorithms with the BM data by fitting an MTMIM model for PC1 and ADJPC1. The results (Figure

Comparison of performances between ECM and GEM-NR algorithms

**Comparison of performances between ECM and GEM-NR algorithms.** Comparison of performances between ECM and GEM-NR algorithms in terms of number of iterations required to the convergence of the likelihood function. Both algorithms were applied to an MTMIM model of traits PC1 and ADJPC1 of the BM data. The algorithms were said to have converged whenever the difference between the natural logarithm of the likelihood function of two consecutive iterations was smaller than or equal to 10^{−4}. (**A**) shows the values of the natural logarithm of the likelihood function at each iteration [log_{e} (L_{k})] until convergence was reached. The GEM-NR algorithm began with 5 iterations of ECM algorithm. Therefore, the first 5 iterations produced identical values in the likelihood function of both algorithms, and because of that we omitted the first 4 iterations. (**B**) shows the difference between the natural logarithm of the likelihood function of two consecutive iterations until convergence was reached. In (**B**), the y-axis was rescaled via logarithm of base ten to improve graphical resolution.

Conclusions

A novel statistical method for multiple trait multiple interval mapping (MTMIM) of QTL from inbred line crosses was proposed and developed. We also proposed a novel method for estimating genome-wide threshold and assessing the significance level of putative QTL effects in the MTMIM model. The method of genome-wide threshold estimation is based on the score-based resampling framework

The MTMIM model provides a comprehensive framework for QTL inference on multiple traits and the score-based threshold serves as an essential and elegant tool for computing significance level of effects of putative QTL in the genome-wide scan. The MTMIM model and score-based threshold were evaluated through simulations. Also, we analyzed data from an experiment with

Results from our simulations showed many interesting features of the MTMIM model and score-based threshold. First, the score-based threshold maintained the type I error at a desired nominal level when no QTL effects were present in the simulated datasets. Second, discovery of spurious QTL (false discovery rate) was almost constant across genome-wide significance levels of 1, 5 and 10%, while power to identify simulated QTL increased substantially as the significance level grew less stringent. Therefore, a more liberal (10%) genome-wide significance level could be used in the genome-wide scan, corroborating the results of C. Laurie, S. Wang, L. A. Carlini-Garcia and Z-B. Zeng as observed in the MIM model (unpublished observations). Third, the MTMIM model could show lower power than the MIM model for QTL with effects on only a small subset of traits. However, as the number of traits affected by a QTL increases, power in the MTMIM model overpasses power in the MIM model even when not all traits under analysis are affected by that QTL. Forth, on average the estimates of QTL position in the MIM and MTMIM models were very similar, but the MTMIM model delivers estimates with smaller sampling variances. Fifth, the LOD-1.5 support interval produced confidence intervals for QTL position with approximately 95% coverage in both the MIM and MTMIM models. However, the support interval was much wider in the MIM than in MTMIM model. Overall, a qualitative comparison of results from the MIM and MTMIM models shows that effect estimates in the latter are less biased than in the former. Lastly, the LRT was shown to keep adequate type I error level when testing the null hypothesis of pleiotropic QTL against the alternative of closely linked nonpleiotropic QTL in the bivariate analysis, while it delivered reasonable power when data were generated under the alternative.

Throughout this paper, we provided compelling empirical evidences that the score-based threshold maintained proper type I error rate and tend to give a false discovery rate within acceptable level, and that the MTMIM model can deliver better parameter estimates and power than the MIM model, and yet the MTMIM model provides a framework to test hypotheses of pleiotropic QTL versus closely linked nonpleiotropic QTL, QTL by environment interaction, and to estimate the total genotypic variance-covariances matrix between traits and to decompose it in terms of QTL-specific variance-covariance matrices. An analysis of phenotypic and genotypic data from an experiment with

Methods

In what follows, for any matrix ** A**, its transpose is denoted by

Statistical model

Our statistical model for multiple trait multiple QTL inference for a backcross (BC) population is a linear model, in which the measurement _{ti} of trait _{ir} (_{ir} takes either value
_{tr} is called the main effect of the ^{tℎ} QTL on trait _{T} for each trait, it may include a subset _{trl}) among all pairwise QTL interactions (_{ti} . The linear model is:

For each subject **∑**_{e}, i.e., _{i} ∼ _{T} (**0,∑**_{e}). For each

We collect all effect parameters (** θ** = (

Likelihood function

In order to search the entire genome for significant QTL effects, the genome is partitioned into ** ζ**. The set of positions of

We define an ^{m} matrix _{[b,·]}, corresponds to a column of effect parameters in
_{[·,j]}, represents a coded genotype _{j} . If _{[b,j]} = _{r}, otherwise _{[b,j]} = _{r} ∗_{l}, where _{u} (_{u} in _{j} is

The individual (_{i}) and overall likelihood (^{m} multivariate normal distribution functions with different means (
**∑**_{e}), and mixing probabilities _{ij} (^{m}), i.e.,
_{i} with mean
_{e} . In what follows, _{i} (** θ**|

Parameter estimation

Estimation of parameters in the likelihood function is cumbersome due to mixture of distributions. The expectation-maximization (EM)

Many modifications of the EM algorithm and many hybrids of EM and Gauss-Newton (GN) methods have been proposed

Expectation-conditional maximization algorithm

The EM algorithm

** θ** (see Appendix). The E-step at the (

It is worth mentioning that in the E-step above, the updating equation at step _{ij} instead of

The CM-step consists of maximizing the expected complete logarithm likelihood function with respect to the unknown parameters (see Appendix). ** u**, and

for

The E- and CM-steps are computed iteratively until convergence of the likelihood function. Our choice of initial values for ** u** and

It is worth mentioning that for many combinations of _{ij} are zero or very close to zero. Therefore, one may choose to ignore unimportant small probabilities in the computations, which may lead to significant improvement on computation time.

Generalized EM algorithm based on Newton-Raphson methods

The generalized EM-Newton-Raphson (GEM-NR) methods combine the EM algorithm with the NR method for maximizing the complete-data logarithm likelihood function
^{(v)} (0 <^{(v)} ≤ 1) and by having the incomplete-data logarithm likelihood function (_{c}) in the updating NR formula, a modified version of the updating equation

The advantage of using equation (2) is that an appropriate choice of ^{(v)} guarantees that the logarithm likelihood function increases at each iteration. So long as ^{(v)} is chosen to make (3) positive definite, the logarithm likelihood function is guaranteed to increase at every iteration (Appendix).

where ** C** is the Cholesky decomposition of the negative of the matrix of second order derivatives of the complete logarithm likelihood function (see Appendix) and

To guarantee that the logarithm likelihood function is nondecreasing,

As ^{(ξ)} lies in the line segment from ^{(v)} to ^{(v + 1)}, and ** θ** lives in high-dimensional space, the choice of

1. Run the ECM algorithm a couple of iterations (say five iterations);

2. Let ^{(v)} be the parameter estimate in the ^{tℎ} EM iteration;

3. Set ^{(v)} = 1;

4. Estimate ^{(v + 1)} using equation (2) with the first and second order derivatives of _{c} (** θ**|

5. ● If ^{(v + 1)}|** λ**) >

● Otherwise, keep repeating step 4 with smaller and smaller ^{(v)}, until the likelihood function increases or until ^{(v)} gets too small, in which case start again in step 1;

In cases in which the complete-data logarithm likelihood function does not allow for closed form solution of parameter estimators,

Genome-wide significance level and model selection

Score-based threshold

We extend the score statistic

Under some regular conditions, the score and LRT statistics are asymptotically equivalent in large sample

In multiple trait genome-wide scan, a putative pleiotropic QTL is assumed at every position ** ζ** and the significance level of its effects (main or epistatic effects) is tested against the null of no effects. For instance, assume a model with

The score statistic to test H_{0} vs H_{1} can be written as

where
** η** under H

In order to maintain equal expected variances in the resampled score and score statistic
_{i} from the univariate normal distribution with mean zero and unit variance, i.e. _{i} ∼

1. generate _{i} (

2. for each

3. repeat

4. the score-based threshold for a given significance

If
_{i} in
**I**) The conditional distribution of
**II**) From **I**, it follows that the distributions of
**III**) From **II**, it is possible to approximate the distribution of ^{∗}(

Model selection

The search for QTL effects on phenotypic traits consists on identifying those subset of genomic regions for which statistical tests are significant.

The score-based threshold can be used as a criterion to build and refine models with many QTL. Starting with a model with no QTL effect we can select putative QTL and refine the model, by including to or excluding from the MTMIM model any effects, all based on their statistical significance assessed via the score-based method. We propose an algorithm, analogue to the algorithm described in

Forward selection

Assuming that model (1) starts with no QTL, one QTL is added at each step of the forward selection. In the ^{tℎ} step of the forward selection, we assume a putative pleiotropic QTL at every position ** ζ** (one at the time), but avoiding positions within 5 cM neighboring regions of the

Model optimization

In turns, we update the positions of all QTL in the model. We pick a QTL and hold the other QTL fixed at the positions that they were mapped. The effects of the picked QTL are then removed from the model and a new search is done within the region delimited by its two neighboring QTL, avoiding 5 cM from each neighbor (the search is performed until the end of the chromosome if no neighbor QTL is found on either side of the picked QTL). The new position of the picked QTL is set to the position of the maximum LRT statistic within the searched region and all parameters in the model are updated. This procedure is repeated until the positions of all QTL are updated.

Some suitable hypotheses in the MTMIM model

Testing pleiotropic versus closely linked nonpleiotropic QTL

Although testing for pleiotropic versus closely linked nonpleiotropic QTL is a part of model selection, we preferred to separate it from the model selection because in general this test is performed at the end of the model selection procedure, when the final model is almost fitted.

As previously stated, an advantage of multiple trait analysis is the possibility of testing for a single locus affecting multiple traits versus the alternative of two or more closely linked nonpleiotropic loci. For instance, suppose we have measurements of two traits and a total of three nonepistatic QTL at positions _{1}, _{2} and _{3}. The multiple trait multiple QTL pleiotropic model for a subject

The model above assumes that all QTL have the same pattern of pleiotropy, but instead, suppose we want to test whether the last locus in model (5) is indeed two closely linked nonpleiotropic loci. The model with two pleiotropic (positions _{1} and _{2}) and two closely linked nonpleiotropic QTL (positions _{3} and _{4}) for a subject

Or, suppose we want to test whether the last two QTL in the model (6) are both pleiotropic. The model with four pleiotropic QTL for a subject

Many hypotheses can be formulated and tested, for example, the hypotheses of model (5) versus (6) can be stated as _{0} : _{3} = _{4} versus _{1} : _{3} ≠ _{4}, and the hypotheses of model (6) versus (7) can be stated as _{0} : _{14} = _{23} = 0 versus _{1} : _{14} ≠ 0 and _{23} ≠ 0. In general, testing whether QTL _{0} : _{tr} = 0 ∀ _{1} : _{tr} ≠ 0 for some _{0} : _{trl} = 0 ∀ _{1} : _{trl} ≠ 0 for some

When models are nested, the critical value to assess the strength of the LRT is straightforward, in the sense that under regular conditions the LRT has asymptotic chi-squared distribution with degrees of freedom equal to the difference between the number of parameters in the full and reduced models. However, the pleiotropic and closely linkage models may not be nested (for instance, models (6) and (7)), which then requires some correction for the LRT

When a QTL has epistasis, testing this QTL for pleiotropy versus close linkage is not trivial because the test not only depends on the QTL being tested but also on any other QTL in the model that might interact with it. In general, we suggest to search for QTL main effects, and upon finishing this search to test for pleiotropy versus close linkage, and finally to search for epistasis and no longer to test pleiotropy or to test solely those QTL without epistasis.

QTL by environment interaction

The possibility of testing for QTL by environment interaction arises as another advantage of the multiple trait analysis. There are two situations in which we are able to study the differential expression of QTL. First, when the same set of genotypes are evaluated phenotypically in different environments (design I), and second when the phenotypic evaluations are done in different sets of genotypes in different environments (design II)

Let us reiterate that in design I we regard the expression of a trait in different environments as different trait states
_{0} : _{tr} = _{r} ∀ _{1} : _{tr} ≠ _{r} for some _{0} : _{trl} = 0 ∀ _{1} : _{trl} ≠ 0 for some

The LRT may be used to evaluate the hypotheses above. The cut-off point for the test can be obtained from the chi-squared probability distribution function with degrees of freedom being the difference between the number of parameters in the full (H_{1}) and reduced (H_{0}) models.

Evaluation of the MTMIM model by simulation

We implemented the MTMIM model and score-based threshold method, and evaluated them with several simulated datasets. More specifically, we evaluated type I error, model fitting, and the efficiency of pleiotropic versus closely linked nonpleiotropic QTL testing hypothesis delivered by the MTMIM model.

Genome-wide type I error

We use simulation to evaluate the proportion of falsely discovered QTL (type I error) in the analysis of datasets simulated without QTL effects. The LRT statistic is used for hypothesis testing and the score-based threshold is used as the criterion to assess significance level of QTL effects in a genome-wide scan. Each replicate has six chromosomes, each with nine markers evenly spaced 10 cM apart from each other, 300 subjects, and three quantitative traits (see Scenario S0 in Table

**Effects of each QTL**
^{
d
}

**
∑
**

Simulated genetic architecture of traits T1, T2, and T3, as dictated by QTL Q1, Q2, Q3, Q4, and Q5.

^{a} Scenario S0 is for type I error evaluation. Scenarios SI, SII and SIII are for model fitting evaluations.

^{b} Heritability (%) due to all QTL affecting a trait.

^{c} General mean of each trait.

^{d} Main effect of QTL. The percentage of phenotypic variation of each trait due to each QTL is 5%.

^{e} Position, in cM, of the QTL from the leftmost marker in the chromosome (Chr).

^{f } Residual variance-covariance matrix.

**Scenario**
^{
a
}

**h**
^{
2
}
^{
b
}

**
u
**

**Q1**

**Q2**

**Q3**

**Q4**

**Q5**

**T1**

**T2**

**T3**

T1

0

30

0

0

0

0

0

1

0.2

0

S0

T2

0

35

0

0

0

0

0

0.2

1

-0.2

T3

0

30

0

0

0

0

0

0

-0.2

1

T1

25

30

0.52

0.52

0.52

0.52

0.52

1

0.2

0

T2

25

35

0.52

0.52

0.52

0.52

0.52

0.2

1

-0.2

SI

T3

25

30

0.52

0.52

0.52

0.52

0.52

0

-0.2

1

Chr.

–

–

1

2

3

5

6

–

–

–

Position^{e}

–

–

23

15

45

67

53

–

–

–

T1

25

30

0.52

0.52

0.52

0.52

0.52

1

0.2

0

T2

18

35

0

0.54

0.54

0.54

0

0.2

1

-0.2

SII

T3

5

30

0

0

0.46

0

0

0

-0.2

1

Chr.

–

–

1

2

3

5

6

–

–

–

Position

–

–

23

15

45

67

53

–

–

–

T1

18

30

0.54

0

0.54

0

0.54

1

0.2

–

T2

18

35

0

0.54

0.54

0.54

0

0.2

1

–

SIII

Chr.

–

–

1

1

3

6

6

–

–

–

Position

–

–

23

33

45

38

53

–

–

–

Model fit evaluations

We use simulation to evaluate the overall performance of the MTMIM model and score-based threshold as the criterion to assess the significance level of QTL effects in the genome-wide scan. We examined the performance of the MTMIM in three different scenarios (SI, SII and SIII shown in Table

The general goal of each simulated scenario is: (SI) With a basic and favorable situation, we want to evaluate basic properties of the MTMIM model; (SII) With a mixture of QTL affecting one, two and three traits, we want to evaluate how well the MTMIM model handles the estimation of QTL with effects on only a subset of traits; (SIII) With presence of closely linked nonpleiotropic QTL and a pleiotropic QTL, we want to evaluate the MTMIM model under more complex genetic architecture. In SIII, we build an MTMIM model for each replicate using the forward selection without testing for pleiotropic versus closely linked nonpleiotropic QTL. Each MTMIM model built in the forward selection was then refined with a follow-up test of pleiotropic versus closely linked nonpleiotropic QTL. The pleiotropic versus closely linked nonpleiotropic test was carried out for every pleiotropic QTL in the MTMIM model.

We evaluated the MTMIM model under three genome-wide significance levels: 1, 5 and 10%. For each replicate, all QTL selected in the forward selection are defined as **mapped** QTL. We summarize the performance of the MTMIM model with measures that are function of the logarithm of odds ratio (LOD) support interval of mapped QTL. The LOD-_{r}, for **paired** with a mapped QTL if the simulated and mapped QTL are nearby. A mapped QTL is defined as being **matched** to a paired QTL if the LOD-**mismatched** if it is not matched. A simulated QTL _{r} is defined as **identified** if it has a matched QTL. For each simulated _{r} and for each _{r} is identified. We define
_{b}(_{r}, Power(_{r}_{r}, _{r}_{r} is paired with a mapped QTL; (5) _{r}, which is the average length of LOD-_{r} over replicates in
_{r}, which is the average effects of _{r} over replicates in
_{r}, which is the average positions of _{r} over replicates in

Appendix

Parameter estimation

Expectation-conditional maximization algorithm

Let
^{tℎ} subject has genotype _{j} (j = 1,2,· · ·,2^{m} ), otherwise

where
_{i} with mean vector
_{e} . The joint distribution of observed and missing data allow us to obtain the complete-data logarithm likelihood function (_{c}):

The E-step requires computation of the expectation of the complete-data logarithm likelihood function, conditional on the observed data ** θ** (denoted here as

where

The CM-step consists of maximizing the expected complete logarithm likelihood function with respect to the unknown parameters through derivatives (see Section Derivatives).

Newton-Raphson method

The NR updating formula for parameter estimation

The NR method is not very stable for complex functions because it requires accurate initial values of parameters, in certain problems, in order for right convergency. Moreover, the NR method has almost equally chances to move either in the direction of saddle points, local minima or local maxima
** θ**,

Generalized EM-Newton-Raphson method

By introducing a step-size ^{ (v)} (0 <^{ (v)} ≤ 1) and by having the incomplete-data logarithm likelihood function (_{c}) in the updating NR formula (8), a modified version of the updating equation

The advantage of using the modified version of the updating equation is that an appropriate choice of ^{(v)} guarantees that the logarithm likelihood function increases at each iteration. The negative of the matrix of second order derivatives is positive definite under usual conditions. Therefore, its inverse has the Cholesky decomposition (10), where ** c** is an upper triangular matrix.

Let ^{(ξ)} be a point in the line segment from ^{(v)} to ^{(v + 1)}, the Taylor’s expansion of the complete-data logarithm likelihood function around ^{(v)} is:

Plugging ^{(v)} from (2) into (11), and upon making some algebra using (10), we obtain:

where

and ** I** is an identity matrix. From (12), we can see that so long as

Derivatives

We provide analytical formulae of the first and second order derivatives of the logarithm of individual and overall likelihood functions of data under the MTMIM model. We borrowed useful ideas from

Auxiliary matrices

We assume ^{m} .

_{uℓ} is a _{[u,ℓ]} and _{[ℓ,u]}, and zero elsewhere ** I** is a

First order derivatives of the logarithm of the individual likelihood function

In the following equations we use a short-hand notation _{i}(** θ**) =

Second order derivatives of the logarithm of the overall likelihood function

In the following equations we use a short-hand notation ** θ**) =

First and second order derivatives of the expected complete-data logarithm likelihood function

Given current estimated values of
^{(v)}, the first and second order derivatives of the expected complete-data logarithm likelihood function are shown bellow. We assume

Extension to other crosses

The extension of score statistic to other cross types (for instance, intercross F_{2}, recombinant inbred lines, double haploids) is straightforward, in fact, the auxiliary matrices, expressions of first and second order derivatives of the logarithm of individual and overall likelihood functions can be straightly obtained from the general expressions derived previously. For a specific cross type, the extension consists basically of building an appropriate design matrix ^{m} in the summations by the appropriate value according to that cross type (for instance, 3^{m} for intercross F_{2}).

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

LDCES derived the analytical equations, wrote computer code, carried out the simulations and data analysis, summarized and interpreted the results, and wrote the first draft manuscript. ZBZ provided some initial results of multiple trait analysis, intellectual support criticizing the derivation and implementation of methods, and helped drafting the manuscript. SW implemented the MTMIM method in the Windows QTL Cartographer software. All authors approved this manuscript.

Acknowledgements

The authors wish to thank the editor Rongling Wu and the two anonymous reviewers for their valuable comments that improved the presentation of this paper. This work was carried out while L.D.C. E Silva was a Ph.D. candidate in Statistics at the North Carolina State University, with a joint fellowship from the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES - Brazil) and Fulbright.