Abstract
Background
Quantitative traits often underlie risk for complex diseases. For example, weight and body mass index (BMI) underlie the human abdominal obesitymetabolic syndrome. Many attempts have been made to identify quantitative trait loci (QTL) over the past decade, including association studies. However, a single QTL is often capable of affecting multiple traits, a quality known as gene pleiotropy. Gene pleiotropy may therefore cause a loss of power in association studies focused only on a single trait, whether based on single or multiple markers.
Results
We propose using principalcomponentbased multivariate regression (PCBMR) to test for gene pleiotropy with comprehensive evaluation. This method generates one or more independent canonical variables based on the principal components of original traits and conducts a multivariate regression to test for association with these new variables. Systematic simulation studies have shown that PCBMR has great power. PCBMRbased pleiotropic association studies of abdominal obesitymetabolic syndrome and its possible linkage to chromosomal band 3q27 identified 11 susceptibility genes with significant associations. Whereas some of these genes had been previously reported to be associated with metabolic traits, others had never been identified as metabolismassociated genes.
Conclusions
PCBMR is a computationally efficient and powerful test for gene pleiotropy. Application of PCBMR to abdominal obesitymetabolic syndrome indicated the existence of gene pleiotropy affecting this syndrome.
Background
Quantitative traits often underlie increased risk for complex diseases. To understand the genetic basis of such traits, each trait is often separately tested for association with one or more markers. This approach has two disadvantages: 1) independent tests of each trait may lead to issues related to multiple testing; and 2) if a locus affects two or more traits, a singletrait study may lose the power to detect a pleiotropic effect, where a single gene influences multiple phenotypic traits.
In the past decade, simultaneous analysis of multiple traits in the context of linkage mapping of quantitative trait loci (QTL) has attracted much attention. Three approaches to simultaneous analysis have been developed and broadly applied, the first of which is generalization of maximum likelihood (ML) [1,2]. Although this method can be applied to multiple traits, a large number of correlated traits requires the simultaneous estimation of too many parameters, restraining its practical use [3]. The second approach, first proposed by Haley & Knott, is multivariate regression [47]. This approach is computationally faster than maximum likelihood and is available in most statistical software packages. But as with the ML method, the requirement for simultaneous estimates of a large number of parameters may limit its application. The third approach is based on transformation of original traits to a reduced number of canonical variables [3,8]. This approach is often implemented in two steps. First, principal components of original traits are identified to generate canonical variables. Next, a classical single trait method is used as the test of linkage between a candidate locus and a canonical variable. The test is then repeated for each combination of locus and variable and is corrected for multiple testing.
The resolution of QTL linkage mapping is generally low (typically ≥ 10 cM) [9]. Thus, a QTL linked to multiple traits may be a single QTL with pleiotropy or different QTLs within the mapping region that affect different traits. Association studies, in contrast, have much higher resolution, and are more feasible for identifying gene pleiotropy. Lange [10] proposed a familybased association method that constructs an overall phenotype by finding a linear combination of traits to maximize heritability. Klei [11] extended this method to population samples. Both methods use principal components, reducing multiple phenotypes to only a single trait, which can cause loss of power. In addition, maximization of heritability and association testing in the same samples may inflate type I error. To address this issue, Klei [11] proposed to split the sample into training and testing data and apply crossvalidation to control error inflation, but this further increases computational complexity. In contrast to reduction of phenotypes, direct multivariate regression examines pleiotropy by simultaneous analysis of multiple phenotypes [12].
In this study, we propose to integrate two common methods that test for association by analyzing multiple traits simultaneously: principal components and multivariate regression. However, there are no comprehensive evaluations of this principalcomponentbased multivariate regression (PCBMR). In our study, we comprehensively evaluated the power and type I error of PCBMR using simulations that varied pleiotropic effects, linkage disequilibrium (LD), proportion of contributed correlation, and number of traits. We also used PCBMR to examine the pleiotropic effects of multiple traits on human abdominal obesitymetabolic syndrome.
Human abdominal obesitymetabolic syndrome [13], a cluster of syndrome phenotypes, increases the risk of developing both diabetes mellitus [14] and cardiovascular disease [15,16]. The prevalence of metabolic syndrome varies with age and sex [17]. Kissebah [18] performed a genomewide linkage scan with a marker density of 10 cM in 2,209 individuals from 507 Caucasian families. They found one QTL, on chromosome 3q27, that was strongly linked to six phenotypes: body mass index (BMI), waist circumference (WC), hip circumference (HC), weight, insulin, and insulin/glucose (I/G). The results indicated possible pleiotropic effects. Francke replicated this result, finding the same locus on 3q27 through a genomewide linkage scan of 99 families of northeastern Indian origin [19]. Here, we attempted to identify markers on 3q27 that are associated with the six traits above by using PCBMR to analyze data from the Bogalusa Heart Study [20].
Results
Simulation 1, differences in extent of QTL pleiotropic effect
The correlation coefficients between traits Y_{1 }and Y_{2 }varied from 0.35 to 0.37, with means for traits increasing as effect b increases. PCBMR generated two canonical variables for all simulated data. Power and type I error for the PCBMR and singletrait association studies are summarized in Table 1. When b = 0, the QTL had no effect on Y_{1 }and Y_{2 }and the type I errors were 4.5%5.6% for PCBMR, 4.9%6.1% for singletrait association without Bonferroni adjustment (SATN), and 2.8%3.0% for singletrait association with Bonferroni adjustment (SATB) for the different models (GEN, ADD, DOM, and REC). Power depends on the assumption of genetic model, with power in decreasing order for ADD, DOM, GEN, and REC. For each model, the following results were obtained: 1) power generally increased in PCBMR, SATN, and SATB as effect b got larger, and PCBMR generally had more power than SATB and SATN; 2) the binomial exact test showed that PCBMR was significantly more powerful than SATB for all b > 0 (results not shown), and more powerful than SATN for b > 0.2 (marked with star); and 3) for b ≤ 0.2, there was no significant power difference between PCBMR and SATN.
Table 1. Type 1 error and power of data sets of simulation 1
Simulation 2, differences in extent of LD between a marker and pleiotropic QTL
Correlation coefficients between Y_{1 }and Y_{2 }varied from 0.25 to 0.32, and two canonical variables were generated by PCBMR for pleiotropic association studies. Power and type I error for PCBMR, SATN, and SATB are presented in Table 2. Correlation coefficients (r) between tested markers and QTLs ranged from 0 to 1.0. A correlation of r = 0 indicated that the tested marker and the QTL were independent and that there was no association between them. Under the differing assumptions in different genetic models (GEN, ADD, DOM, and REC), type I error was 4.2%5.8% for PCBMR, 5.3%5.8% for SATN, and 2.5%3.1% for SATB. Power depended on the assumptions of the genetic models, and ADD, DOM, GEN, and REC had powers in decreasing order for all methods. For each model, the following results were obtained: 1) the powers of PCBMR, SATN, and SATB increased as r became larger; 2) according to the binomial exact test, PCBMR had significantly greater power than SATB (results not shown) for all r > 0, and significantly greater power than SATN when r > 0.2 in all but the REC model (marked with star); and 3) for r ≤ 0.2, there was no significant power difference between PCBMR and SATN.
Table 2. Type 1 error and power of data sets of simulation 2
Simulation 3, trait correlation between effects of two QTL and an environmental variable
The correlation coefficients between simulated traits Y_{1 }and Y_{2 }were ≥0.98. Based on this, PCBMR generated a single canonical variable for the pleiotropic association test. The tested QTL exerted a simulated effect b from 0 to 4, and based on equation 3, the percentage of trait correlation contributed by the QTL, P_{ρ}(b), ranged from 0 to 20%. The type I error and power related to P_{ρ}(b) for different methods are summarized in Table 3. A result of b = 0 (or P_{ρ}(b) = 0) indicates that the tested QTL had no pleiotropic effect on the simulated traits. The type I error was 3.9%5.6% for PCBMR, 4.2%5.7% for SATN, and 2.4%3.3% for SATB under the four genetic models, GEN, ADD, DOM, and REC. Power depended on the assumptions of the genetic models, with power in decreasing order for ADD, DOM, GEN, and REC. All methods increased in power as P_{ρ}(b) increased. When b = 0.5 (or P_{ρ}(b) = 0.4%), power was small for all three analytical methods; 6.4%8.8% for PCBMR, 6.1%8.4% for SATN, and 3.3%4.7% for SATB. When b equaled 3.5 and 4 (P_{ρ}(b) = 15.7% and 19.5%), all methods had power close to 1 under various genetic models, except recessive ones. For b > 0, the binomial exact test showed that PCBMR was not significantly different in power from SATN, but was significantly more powerful than SATB.
Table 3. Type 1 error and power of data sets of simulation 3
Simulation 4, pleiotropic effects on more than two traits
Under this simulation strategy, the number of traits affected by the QTL ranged from 2 to 10. The correlation coefficients between any pair of simulated traits were all ≥0.97 and the expected percentage of correlation contributed by the tested QTL was 8.7%. For all numbers of traits, PCBMR generated one canonical variable for the association test. Results are presented in Table 4. Power depended on genetic model assumptions, with power decreasing in order among ADD, DOM, GEN, and REC. For different numbers of traits and different genetic model assumptions, the power of PCBMR was consistently close to that of SATN, with no significant difference detected by the binomial exact test. Power was approximately equal for different numbers of traits as well. The power of SATB decreased dramatically as the number of traits increased. Compared with SATB, PCBMR had significantly improved power, especially with larger numbers of traits.
Table 4. Type 1 error and power of data sets of simulation 4
Pleiotropic Association Studies of Traits of Abdominal ObesityMetabolic Syndrome
A total of 1,196 subjects with 5,529 SNPs in the candidate region of chromosome 3 (at 182227cM or 173.4198.8 Mb) made up the study population. Quality control measures included the removal of SNPs with minor allele frequencies of ≤0.01 and HardyWeinberg equilibrium pvalues of ≤1e^{5}, leaving 4,769 SNPs in the study. The characteristics of the study participants are summarized in Table 5 for both males and females, as follows: age (AGE, in years), weight circumference (WEIGHT, in kg), waist circumference (WAIST, in cm), body mass index (BMI, in kg/m^{2}), hip circumference (HIP, in cm), plasma insulin level (INSULIN, in μU/mL) and plasma insulin/glucose ratio (I/G). The pairwise correlation coefficients (r) among adjusted traits are presented in Table 6. The correlations clustered into two groups, with the first group comprised of WEIGHT, BMI, WAIST, and HIP (r ≥ 0.89) and the second group comprised of INSULIN and I/G (r = 0.97).
Table 5. Characteristics of study participants
Table 6. Pairwise correlation coefficients, r, between adjusted traits
The results of the PCBMR pleiotropic association studies based on the GEN model are presented in Figures 1 and 2. Markers with significant pvalues (≤1e^{5}) are summarized in Tables 7 and 8. For these markers, analyses based on recessive, dominant and additive models were conducted, and the best genetic model and its pvalue were documented.
Figure 1. Pleiotropic association study of WEIGHT, HIP, BMI and WAIST based on general model by PCMBA on the candidate region, 182227cM of Chromosome 3. There are 4769 total SNPs. The x axis is the SNP position and y axis is negative logarithm of pvalue, i.e. log (P).
Figure 2. Pleiotropic association study of INSULIN and I/G by PCMBA based on general model on the candidate region, 182227cM of Chromosome 3. There are total 4769 SNPs. The x axis is the SNP position and y axis is negative logarithm of pvalue, i.e. log (P).
Table 7. Significant pleiotropic association with WEIGHT, HIP, BMI and WAIST
Table 8. Significant pleiotropic association with INSULIN and I/G
For the first trait group of WEIGHT, BMI, WAIST, and HIP, PCBMR generated a single canonical variable that explained 94.1% of the variance. With Bonferroni adjustment, PCBMR using the GEN model found four SNPs with significant pleiotropic association (p <1e^{5}) (Figure 1). Among these, SNP rs11721044 at 174.6 Mb and rs11926347 at 185.2 Mb were located in genes NLGN1 (OMIM 600568) and ABCC5 (OMIM 60521), respectively (Table 7).
For the second trait group of INSULIN and I/G, PCBMR also generated a single canonical variable, and this variable explained 98.6% of the variance. Using the GEN model, thirtyfour SNPs passed Bonferroni significance level (Figure 2), of which 17 were found within 11 genes. SNP rs11926347, in an intron of ABCC5 (OMIM 60521), and SNP rs6795506, near the 5' end of AHSG (OMIM 138680), had extremely small pvalues (Table 8). Among the other nine genes, ADIPOQ (OMIM 605441, 612556) has been widely reported to be associated with obesity and diabetes [2124]; FNDC3B (OMIM 611909) is involved in positive regulation of adipogenesis [25]; and DGKG (OMIM 601854) and AHSG (OMIM 138680) have been reported to be associated with obesityrelated metabolic traits [26,27]. The remaining genes have no reported relation to obesityrelated metabolic traits based on our literature review.
SNP rs11926347 in ABCC5 showed significant pleiotropic association in both groups and the pvalue was extremely small in the second group of traits (log(P) = 109.86). To validate these PCBMR results, this SNP was extracted for further study. The SNP's phenotype distribution, divided by genotype, is presented in Table 9. Its alleles are 'A' and 'G' and the frequency of the minor allele 'A' is 0.02. The HardyWeinberg Equilibrium (HWE) exact test[28] yielded a pvalue of 0.37. Homozygotes for the minor allele ('A/A') exhibited only one extreme mean value for all six traits. Heterozygotes ('G/A') had smaller values than 'A/A' homozygotes but much larger values than homozygotes for the major allele ('G/G'). SATN analyses with adjustment for age and sex gave pvalues of ≤1.15*10^{5 }for all traits (results not shown). With allele A as a reference, we conducted an examination of pleiotropic effects for rs11926347 based on additive, dominant, and recessive models. The corresponding log_{10}(P) was 6.46, 5.92, and 2.12 for the first group's traits and 11.65, 4.74, and 110.37 for the second group's traits for additive, dominant, and recessive models, respectively. These results indicate that the additive model best suits the first trait group's association and the recessive model best suits the second trait group's association. After dropping the single homozygote, analyses based on different genetic models generated the same results. The significant association was absent in the second group's traits (log_{10}(p) = 1.14), but still present in the first group's traits (log_{10}(p) = 5.2). These results indicated that allele 'A' may be involved in pleiotropic association with metabolic syndrome and merits attention and inclusion in genetic studies of obesity in the future.
Table 9. Summary of rs11926347 in ABCC5
Discussion
Most current association studies have been based on single traitsingle marker or single traitmultiple marker tests. These kinds of studies lose power in identifying genes with pleiotropic effects. In some cases, genes with pleiotropy may be found by separately testing each trait. However, two major issues make this strategy not always appropriate. First, pleiotropic effects for each trait may be too weak to be identified. Second, multiple testing problems may either lower the power or inflate the type I error. It is therefore important to develop methods that can test for association by analyzing multiple traits simultaneously.
In this paper, we present the use of PCBMR as a method which detects pleiotropic effects by combining principal component methods and multivariate regression. PCBMR generates a set of independent canonical variables based on principal components. Each canonical variable is associated with multiple traits and the sum of all variables explains at least 80% of the variation. Analysis of canonical variables is simultaneously implemented by multivariate regression. The statistic of PCBMR is simply the sum of individual test statistics. PCBMR is computationally efficient and can be easily implemented by most statistical packages. This makes PCBMR fast and feasible not only for candidategene association studies but also for genomewide association studies (GWAS).
Comprehensive studies of simulated data have shown that PCBMR has wellcontrolled type I error, about 5%, when a tested marker has no pleiotropy (simulation 1 and 3) or exhibits linkage equilibrium to the pleiotropic QTL, in the case of pleiotropic tested markers (simulation 2). The power of PCBMR depends on the extent of the pleiotropic effect and on the LD of the QTL. Larger pleiotropic effects and higher LD result in larger power (simulation 1 and 2). When the trait correlation caused by pleiotropy was not strong (simulation 1), the number of canonical variables was the same as the number of traits and the power was reasonably high, even compared with SATN. When there were strong correlations among traits (simulation 4), the reduced number of variables resulted in fewer degrees of freedom for the PCBMR test, and the power of PCBMR was as high as SATN. However, SATN always has much higher type I error than PCBMR due to multiple testing. PCBMR was robust to conflicting effects from environmental factors or other, untested QTLs (simulation 3). In all cases, multitrait association analyses using PCBMR were much more powerful than multiple singletrait association analyses using SATB. For all tests, multiple traits simultaneously studied by PCBMR were compared with the single trait with the best power as determined by SATN and SATB. The present study showed that PCBMR is at least as powerful as SATN and more powerful than SATB under pleiotropy.
PCBMR has great extensibility. For equations (1) and (2), PCBMR can be extended to any distribution in the exponential family and the parameter θ can take any link function (e.g. logistic or log) that relates a mean, Z_{i, }to covariates [29]. The covariate can be a single variable for one marker or multiple variables for different markers. In addition, nongenetic factors with or without interaction terms can also serve as the covariate. The final statistic, approximating the χ^{2 }distribution, is again the simple sum of statistics from separate regressions of a canonical variable on single or multiple covariates.
Comparisons of power estimates among PCBMR, SATB, and SATN in this study were based on analyses of the simulated additive model. To verify these findings, we tried studies on both simulated dominant and recessive models, and the same conclusions were obtained  that pleiotropic association studies by PCBMR are more powerful than singletrait association studies by either SATN or SATB (results not shown here). In addition, influences of model mismatch were also observed. For example, we observed that a pleiotropic study based on an additive model sacrificed its power when the true model was dominant or recessive. In addition, we observed that all studies based on the general model have acceptable power. In contrast to the additive model, which assumes linear trends of genotypic effect, and the dominant and recessive models, which assume equal effects of two genotypes for an SNP, the general model aims to separately estimate the effect of each genotype without any restriction. Therefore, PCBMR based on the general model has the advantage of testing for a pleiotropic effect when a complex trait has no obvious Mendelian inheritance.
As a real example, PCBMR was applied to test association in a study of traitsweight, waist circumference, BMI, hip circumference, plasma insulin, and insulinglucose ratiosof abdominal obesitymetabolic syndrome in the Bogalusa Heart Study cohort. The traits were clustered into two groups based on two previously identified linkage peaks [18] and these two groups exhibited strong correlation. After multipletest adjustment, PCBMR successfully identified several SNPs associated with the traits, especially in the trait group of INSULIN and I/G. Some of the genes had been wellcharacterized in prior studies, e.g. FNDC3B, which is involved in adipogenesis [25]. However, the functions of most of the genes were not yet explicitly clear at the time of the analysis. For example, some genes (e.g. ABCC5) are known to be related to energy metabolism, but are they truly involved in obesitymetabolic syndrome? If they are, what are their functions? The results from the use of PCBMR in this study offer guidance for future researchers in understanding genetic mechanisms and pathways in the pathogenesis of this human disease.
Although this study illustrates many advantages of PCBMR, there are also some challenges to be faced in terms of practical application. In contrast to pleiotropic linkage studies that map a QTL to a large locus [30], PCBMRbased studies can provide a higher resolution QTL position. However, the association may not justify the true pleiotropy of the identified marker or gene. For example, when PCBMR identifies a significant association by studying multiple traits, we may not observe significant association with a particular trait. This may result from either a weak pleiotropic effect or no effect at all. Such differentiation is generally difficult to achieve by statistical analysis. Further experimental studies or repeated studies with larger sample size are therefore necessary to confirm that the association is due to pleiotropy. In addition, the power of PCBMR depends on the assumptions of the genetic model, and misuse of a model will decrease power. The number of canonical variables also depends on the threshold. A value of 0.8 is used in simulation studies to explain at least 80% of the variation. Although this threshold is widely accepted for principal component analysis and has been proven to be suitable in our simulation studies, the ideal threshold may depend on practical data, with the exact value generally not known in advance. Furthermore, pleiotropic association is based on canonical variables, and to get an exact estimate of the effect on an original trait, a reverse transformation needs to be conducted.
Another challenge is to decide which traits should be studied simultaneously by PCBMR. Some strategies may help to address this challenge. Candidate traits could be those related to each other in the same pathway leading to a disease or symptom. For example, greater weight and BMI are correlated with obesity. Candidate traits could also include traits with linkage to the same region, such as two groups of traits with linkage peaks in two separate loci, as found in our studies of abdominal obesitymetabolic syndrome. Nevertheless, it is possible that two traits without much correlation may be strongly affected by a common gene. For example, in our simulation 1, though the effect is strong at b = 1, the correlation coefficient (r) ranges from 0.35 to 0.37 with a mean of only about 0.10. In this case, selection of traits mainly depends on currently established knowledge.
PCA is an important tool for data mining that transforms a larger number of correlated variables into a smaller number of independent variables, i.e., principal components. Factor analysis (FA), another important analytical tool, identifies common factors that capture variancecovariance of multiple variables with random error. PCA, in contrast, identifies principle components, with the restriction that random error must be zero[31]. Therefore, FA could be better suited to the analysis of observed traits with measured errors and to tests of genetic pleiotropy in some cases. The PCAbased multivariate regression proposed in this study can be easily extended to FAbased regression for testing of genetic pleiotropy in these cases. This can be implemented by replacing principal components with common factors. However, without estimation of random error, PCA is more computationally efficient for analyses involving large amounts of genetic data, and has great advantages in terms of practical application[32]. For most cases, PCA and FA procedures typically yield highly similar results[32]. This was also the case in the present study; we conducted an additional FAbased multivariate regression analysis of pleiotropic association with metabolic traits, and the results were the same as those obtained by PCBMR (please see additional file 1). This is consistent with previous findings that PCA and FA behave similarly in tests of genetic pleiotropy[33].
Additional file 1. Factor analysisbased study of pleiotropic association. Table of significant pleiotropic association and figure of pvalues of SNPs in linkage region.
Format: DOC Size: 90KB Download file
This file can be viewed with: Microsoft Word Viewer
In spite of its potential challenges, PCBMR is a powerful and computationally efficient method of studying the huge amounts of genetic data generated by advanced technology, e.g. GWAS. For a large number of markers, we suggest a strategy of traditional singletrait studies on a candidate marker that PCBMR declares significant. This strategy can not only help to explain PCBMR results, but also has great advantages over traditional singletrait studies in alleviating multiple testing problems. Suppose there are N markers and m traits, and the experimental type I error is controlled at α. The significance level for tests of a marker in traditional singletrait studies is α/(N*M). This level is extremely small when both N and M are large. In contrast, for a candidate marker, the significance level for this strategy is α/(N+M). Generally, for most association studies and GWAS, M is much smaller than N, and the significance level will approximate α/N.
Conclusion
In summary, we propose the use of PCBMR, a computationally efficient method for the testing of gene pleiotropy. Although PCBMR is a combination of two established methods principal components and multivariate regressionwe are the first to comprehensively evaluate this technique in its combined form. The simulation studies described here indicate that this method is powerful for different kinds of pleiotropy. In spite of some challenges for its use in practical studies, PCBMR can greatly increase the power of association studies under pleiotropy and can broaden understanding of a gene's functions as well as its pathway and mechanisms. PCBMR is not only a useful method for candidategene based studies; as the generation of highthroughput expression data becomes increasingly efficient, PCBMR can be used to study pleiotropy in analyses of massive amounts of data, such as GWAS.
Methods
Principal Component Based Multivariate Regression (PCBMR)
Given a set of traits, PCBMR uses the method of principal component analysis (PCA) [34,35] to construct one or more independent canonical variables based on a specific threshold (θ). Suppose Y = (Y_{1}, Y_{2},..., Y_{m}) represents variables of m traits. PCA searches for k principal components (k ≤ m), which is a new kdimensional coordinate system. Within each principal component a canonical variable is generated as a linear combination of the original m traits with maximized variance. The search can be simplified by using the decomposition of the covariance of Y. However, different units of trait measures may result in different decompositions. To overcome this issue, PCBMR standardizes original traits with mean 0 and sample variance 1. The standardized variable (Y^{s}) for trait Y is generalized by:
where μ is the mean of Y and V is a diagonal matrix with diagonal items equal to the variances of the corresponding traits. For Y^{S}, Cov(Y^{S}) = (V^{1/2})^{1}Cov(Y)(V^{1/2})^{1 }= ρ, so its covariance and correlation matrices are the same and ρ = ΓΛΓ^{T}, where Γ is the matrix of eigenvectors and Λ is the diagonal matrix of eigenvalues.
PCA finds the weighting vector δ = (δ^{1}, ..., δ^{p})^{T }that maximizes the variance of canonical variable z = δ^{T}Y^{S }[36]. This can be expressed by:
δ is proved to be an eigenvector of ρ [36]. If we use z = [z_{1}, z_{2}, ..., z_{m}]^{T }representing m canonical variables, then z = Γ^{T}Y and Var(z) = Λ. The correlation between z_{i }and Y_{j}^{S }is (Γ_{ij}Λ_{jj})^{1/2}, and the sum of squares of correlations between all m canonical variances and any original trait is equal to 1, i.e. [36]. Therefore, a canonical variable z_{i }can explain a fraction of the variance for each Y_{j}^{S}, and any maker associated with z_{i }will indicate association with the original traits. Canonical variables with very low eigenvalues explain only a minuscule fraction of the variance of the original traits and can be deleted from the analysis [37]. PCBMR chooses the first k principal components to construct canonical variables that explain over 80% of variation.
Suppose z_{1}, z_{2}, ...,z_{k }have normal distributions with mean μ_{i }and variance σ_{i}^{2 }(i = 1,2,...,k). Since all canonical variables are mutually independent, their joint distribution that takes the general form of the exponential family is:
Where θ_{i }= μ_{i}, ϕ_{i }= σ_{i}^{2}, a(ϕ_{i}) = ϕ_{i}, b(θ_{i}) = θ_{i}^{2}/2 and c(z_{i}, ϕ_{i}) = [z_{i}^{2}/ϕ_{i}+log(2πϕ_{i})]/2 [29].
In multivariate regression, PCBMR takes the canonic link. The mean regression model is μ_{i }= Xβ_{i}+Wτ_{i}, where X and W are explanatory variables of tested markers and other controlled variables, respectively, and β_{i }and τ_{i }are their corresponding parameter vectors, which denoting effects on the μ, of the ith canonical variable. The null hypothesis of no association (H_{0}) is:
We define the full model as the one without restriction of H_{0 }and the nested model as the one with restriction of H_{0}. PCBMR uses the likelihood ratio test (LRT) for the goodness of fit between full and nested models. Suppose z_{ij }is the observed canonical variable z_{i }on jth subject (j = 1,2,...,N). The sample likelihood L(θ) based on equation (1) is:
The LRT statistic T is 2[logL()  logL()], where is the maximum likelihood estimate (MLE) of θ for the nested model and the MLE of θ for the full model. When the mean regression model, θ_{i }= μ_{i }= Xβ_{i}+Wτ_{i}, is input into equation (2), the T statistic is simplified to:
The mean estimates, and, are calculated by simple linear regression of z_{i }on [X W] and W respectively. and are deviances of the full and nested models, respectively, and is the estimate of dispersion, all of which can be calculated by almost all statistical packages. T_{i }is the χ^{2 }distributed LRT statistic for testing marker association with canonical trait z_{i }by simple linear regression. The sum of T_{i }also has a χ^{2 }distribution with degrees of freedom equal to the difference of parameter numbers between the full and the nested model. A large T causing rejection of H_{0 }indicates at least one β_{i }≠ 0 and the presence of association attributable to the pleiotropic effects of multiple markers.
Simulation Studies
The power of PCBMR may depend on many factors; some of these are: 1) the extent of the QTL pleiotropic effect; 2) the extent of LD between the tested marker and the pleiotropic QTL; 3) the portion of the trait correlation contributed by the tested QTL relative to the portion contributed by other QTL and environmental factors; and 4) the number of traits in the study. For each simulation, 1,000 datasets were generated. Type I error and power were calculated as percentages of the datasets, with pvalue ≤ 0.05. Without loss of generality, in the following design, the QTL is simulated with additive effects on different traits. Y_{1}, Y_{2}, ...Y_{k }are original QTL traits, U_{1}, U_{2}, ..., U_{k }are the population means of K traits, X is the genotype of pleiotropic QTL denoted by 0, 1 and 2, b_{1}, b_{2},..., b_{k }are additive effects, and E_{1}, E_{2}, ..., E_{k }are random errors.
Simulation 1, different extents of pleiotropic effects in QTL
The minor allele frequency of QTL is 0.2 (p = 0.2), and simple linear regression models, Y_{1 }= U_{1}+X*b_{1}+ E_{1 }and Y_{2 }= U_{2}+X*b_{2}+ E_{2}, are used to simulate traits Y_{1 }and Y_{2}. To simplify the simulation, we set U_{1 }= 0 and U_{2 }= 50, E_{1 }and E_{2 }to a normal distribution of mean 0 and standard deviation 2 (E_{1}~E_{2}~N(0, 2^{2})), and b_{1 }= b_{2 }= b with 11 different effects from 0 to 1.0 with steps of 0.1.
Simulation 2, different extents of LD between a marker and a pleiotropic QTL
In this situation, the QTL (p_{1 }= 0.2) is not known directly. Instead, a marker of minor allele frequency 0.2 (q_{1 }= 0.2) with LD to the QTL is genotyped for the test. Linear regression models, U_{1}, U_{2}, E_{1}, and E_{2 }are set as above. The additive effects of b_{1 }and b_{2 }are fixed at 1. LD was measured using a correlation coefficient (r) set between 0 and 1 with steps of 0.1. For a pair of alleles of a tested marker, denoted A_{1 }and A_{2}, and those of the QTL, denoted B_{1 }and B_{2}, the following equation was used to calculate the joint allele frequencies of the tested marker and QTL. Based on r, D is calculated as , and the joint allele frequencies of the tested marker and QTL are calculated as f(A_{1}B_{1}) = p_{1}q_{1}+D, f(A_{1}B_{2}) = p_{1}(1q_{1})D, f(A_{2}B_{1}) = (1p_{1})q_{1}D and f(A_{2}B_{2}) = (1p_{1})(1q_{1})+D [38]. Assuming HardyWeinberg Equilibrium (HWE) for both QTL and marker, we can infer frequencies of the tested marker genotypes for simulation, given the frequency of QTL genotypes, f(AiAjBi'Bj') (i, i', j, j' = 1,2).
Simulation 3, trait correlation based on the effects of two QTL and an environmental variable
Two linear regression models, Y_{1 }= U_{1}+X*b_{1}+Q*c_{1}+W*d_{1}+E_{1 }and Y_{2 }= U_{2}+X*b_{2}+Q*c_{2}+W*d_{2}+ E_{2}, were used to simulate traits Y_{1 }and Y_{2}, where U_{1 }= 0, U_{2 }= 50, and E_{1}~E_{2}~N(0, 0.5^{2}). The effects of b_{1 }= b_{2 }= b are from 0 to 4 with steps of 0.5. Q is the second QTL with pleiotropic effects c_{1 }= c_{2 }= 4. Both X and Q have minor allele frequencies of 0.2. W is an environmental covariate with a standard normal distribution N(0, 1) and effects d_{1 }= d_{2 }= 4. The correlation, ρ(Y_{1}, Y_{2}), between Y_{1 }and Y_{2}is:
The proportion of the correlation contributed by QTL X, P_{ρ}(b), is
so P_{ρ}(b) increases as b increases.
Simulation 4, pleiotropic effects on more than two traits
Based on the linear regression model, Y_{i }= U_{i}+X*b+Q*c+W*d+E (i = 1,2, ...10), 2 to 10 traits were separately simulated in each dataset. Without loss of generality, X, Q, and W were defined as above with b = 2.5, c = 4, and d = 4 set correspondingly. E is distributed as normal, N(0, 0.5^{2}), and U_{i }= (i1)*50 for i = 1, 2,..., 10.
Power and type I error were estimated for PCBMR under the four simulation conditions. For comparison, we conducted singletrait association studies using classical linear regression with (STAB) and without (SATN) Bonferroni adjustment. For singletrait association studies, only the trait with the largest power or type I error was presented in the paper. Based on different assumptions of the genetic models, there are four possible ways of processing the X variable for genotypes, which take values 0, 1 and 2: 1) X is treated as a factor with three levels for the general model (GEN) without assumption of any genetic inheritance; 2) X is a linear variable in the additive model (ADD); 3) X is 0 for genotypes 0 and 1, and is 1 for genotype 2 in the dominant model (DOM); and 4) X is 0 for genotype 1 and is 1 for genotypes 1 and 2 in the recessive model (REC). All four assumptions were considered separately for association tests by PCBMR and single trait regression.
Power comparison by binomial exact test
Without loss of generality, we created indicator variables M_{1 }and M_{2 }for methods 1 and 2, respectively, where method 1 is PCBMR and method 2 is either SATB or SATN. The value of the variables was 1 for a significant pvalue and 0 otherwise. Matched pairs of M_{1 }and M_{2 }were tested by the binomial exact test [39], based on the fact that Σ_{i}M_{1i}(Σ_{i}M_{1i}+Σ_{i}M_{2i}) = N_{m }has binomial distribution (N_{m}, p), i = 1, 2, ....1000 for the simulated data above. For Σ_{i}M_{1i }> N_{m}/2, the null and alternative hypotheses are p ≤ 0.5 and p > 0.5 respectively. Rejection of the null hypothesis indicates that method 1 is significantly more powerful than method 2. For Σ_{i}M_{1i }< N_{m}/2, the null and alternative hypotheses are p ≥ 0.5 and p < 0.5, respectively. Rejection of the null hypothesis indicates that method 1 is significantly less powerful than method 2. To strictly evaluate the power of PCBMR, we compared it to method 2 for the trait with the largest power.
Pleiotropic Association Studies of Abdominal ObesityMetabolic Syndrome
We applied PCBMR to search for markers associated with multiple traits related to abdominal obesitymetabolic syndrome in the Bogalusa Heart Study, a communitybased investigation of the evolution of cardiovascular disease risk beginning in childhood [20]. Based on previous studies [18], we focused our studies on six traits (body mass index (BMI), waist circumference (WAIST), hip circumference (HIP), weight (WEIGHT), insulin (INSULIN) and insulin/glucose (I/G)) and on chromosome 3 from 182227 cM (173.4198.8 Mb), which contains potential pleiotropic QTL [18,19]. The most recent measures were used for all subjects. SNP genotyping was performed using data from Illumina Human610 BeadChips. Only SNPs passing our quality control measures were included in the study. BMI, WAIST, HIP and WEIGHT traits have a linkage peak at 189190 cM, and insulin and I/G at 202203 cM [18]. Hence, associations with the multiple traits of BMI, WAIST, HIP, and WEIGHT and of INSULIN and I/G were separately studied by PCBMR. These traits may depend on sex and age. Instead of analyzing original traits directly, traits were regressed by sex and age according to the following formula: Y_{i }= U+AGE*b_{1}+AGE^{2}*b_{2}+SEX+E_{i}, where residuals (E_{i}) were used as adjusted traits for association studies by PCBMR. The inheritance model for markers underlying abdominal obesitymetabolic syndrome is generally not known before pleiotropic association tests. We thus applied a general model that estimates the effect of each possible genotype for an SNP association in the sample. After susceptibility markers were identified, different models (additive, dominant and recessive) with the minor allele as reference were also examined in the comparisons. The pvalue was adjusted for multiple tests by the Bonferroni method. The number of SNPs in the pleiotropic study was 4,769, so the significance level for testing an SNP association was 1e^{5}.
Authors' contributions
HM developed and implemented the method. HM and WC performed the simulations, analysis and interpretation of the data. All authors participated in planning and discussion of the study. All authors read and approved the final manuscript.
Acknowledgements
This study was supported by grants 0855082E and 0555168B from American Heart Association, AG16592 from the National Institute on Aging and HL38844 from the National Heart, Lung, Blood Institute.
ElectronicDatabase Information
Online Mendelian Inheritance in Man (OMIM), http://www.ncbi.nlm.nih.gov/Omim/
References

Jiang C, Zeng ZB: Multiple trait analysis of genetic mapping for quantitative trait loci.
Genetics 1995, 140(3):11111127. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Korol AB, Ronin YI, Kirzhner VM: Interval mapping of quantitative trait loci employing correlated trait complexes.
Genetics 1995, 140(3):11371147. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Mangin B, Thoquet P, Grimsley N: Pleiotropic QTL analysis.
Biometrics 1998, 54:8899. Publisher Full Text

Calinski T, Kaczmarek Z, Krajewski P, Frova C, SariGorla M: A multivariate approach to the problem of QTL localization.
Heredity 2000, 84(Pt 3):303310. PubMed Abstract  Publisher Full Text

Hackett CA, Meyer RC, Thomas WT: Multitrait QTL mapping in barley using multivariate regression.
Genet Res 2001, 77(1):95106. PubMed Abstract  Publisher Full Text

Knott SA, Haley CS: Multitrait least squares for quantitative trait loci detection.
Genetics 2000, 156(2):899911. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Korol AB, Ronin YI, Nevo E, Hayes PM: Multiinterval mapping of correlated trait complexes.
Heredity 1998, 80(3):273284. Publisher Full Text

Weller JI, Wiggans GR, Vanraden PM, Ron M: Application of a canonical transformation to detection of quantitative trait loci with the aid of genetic markers in a multitrait experiment.
Theor App Genet 1996, 92:9981002. Publisher Full Text

Mackay TF: The genetic architecture of quantitative traits.
Annu Rev Genet 2001, 35:303339. PubMed Abstract  Publisher Full Text

Lange C, van Steen K, Andrew T, Lyon H, DeMeo DL, Raby B, Murphy A, Silverman EK, MacGregor A, Weiss ST, et al.: A familybased association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects.
Stat Appl Genet Mol Biol 2004., 3
Article17
PubMed Abstract  Publisher Full Text 
Klei L, Luca D, Devlin B, Roeder K: Pleiotropy and principal components of heritability combine to increase power for association analysis.
Genet Epidemiol 2008, 32(1):919. PubMed Abstract  Publisher Full Text

Stich B, Piepho HP, Schulz B, Melchinger AE: Multitrait association mapping in sugar beet (Beta vulgaris L.).
Theor Appl Genet 2008, 117(6):947954. PubMed Abstract  Publisher Full Text

Bjorntorp P: Metabolic implications of body fat distribution.
Diabetes Care 1991, 14(12):11321143. PubMed Abstract  Publisher Full Text

Haffner SM, Valdez RA, Hazuda HP, Mitchell BD, Morales PA, Stern MP: Prospective analysis of the insulinresistance syndrome (syndrome X).
Diabetes 1992, 41(6):715722. PubMed Abstract  Publisher Full Text

Isomaa B, Almgren P, Tuomi T, Forsen B, Lahti K, Nissen M, Taskinen MR, Groop L: Cardiovascular morbidity and mortality associated with the metabolic syndrome.
Diabetes Care 2001, 24(4):683689. PubMed Abstract  Publisher Full Text

Srinivasan SR, Myers L, Berenson GS: Changes in metabolic syndrome variables since childhood in prehypertensive and hypertensive subjects: the Bogalusa Heart Study.
Hypertension 2006, 48(1):3339. PubMed Abstract  Publisher Full Text

Esposito K, Pontillo A, Giugliano F, Giugliano G, Marfella R, Nicoletti G, Giugliano D: Association of low interleukin10 levels with the metabolic syndrome in obese women.
J Clin Endocrinol Metab 2003, 88(3):10551058. PubMed Abstract  Publisher Full Text

Kissebah AH, Sonnenberg GE, Myklebust J, Goldstein M, Broman K, James RG, Marks JA, Krakower GR, Jacob HJ, Weber J, et al.: Quantitative trait loci on chromosomes 3 and 17 influence phenotypes of the metabolic syndrome.
Proc Natl Acad Sci USA 2000, 97(26):1447814483. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Francke S, Manraj M, Lacquemant C, Lecoeur C, Lepretre F, Passa P, Hebe A, Corset L, Yan SL, Lahmidi S, et al.: A genomewide scan for coronary heart disease suggests in IndoMauritians a susceptibility locus on chromosome 16p13 and replicates linkage with the metabolic syndrome on 3q27.
Hum Mol Genet 2001, 10(24):27512765. PubMed Abstract  Publisher Full Text

Pickoff AS, Berenson GS, Schlant RC: Introduction to the symposium celebrating the Bogalusa Heart Study.
Am J Med Sci 1995, 310(Suppl 1):S12. PubMed Abstract

Vasseur F, Helbecque N, Dina C, Lobbens S, Delannoy V, Gaget S, Boutin P, Vaxillaire M, Lepretre F, Dupont S, et al.: Singlenucleotide polymorphism haplotypes in the both proximal promoter and exon 3 of the APM1 gene modulate adipocytesecreted adiponectin hormone levels and contribute to the genetic risk for type 2 diabetes in French Caucasians.
Hum Mol Genet 2002, 11(21):26072614. PubMed Abstract  Publisher Full Text

Filippi E, Sentinelli F, Trischitta V, Romeo S, Arca M, Leonetti F, Di Mario U, Baroni MG: Association of the human adiponectin gene and insulin resistance.
Eur J Hum Genet 2004, 12(3):199205. PubMed Abstract  Publisher Full Text

Menzaghi C, Ercolino T, Di Paola R, Berg AH, Warram JH, Scherer PE, Trischitta V, Doria A: A haplotype at the adiponectin locus is associated with obesity and other features of the insulin resistance syndrome.
Diabetes 2002, 51(7):23062312. PubMed Abstract  Publisher Full Text

Vimaleswaran KS, Radha V, Ramya K, Babu HN, Savitha N, Roopa V, Monalisa D, Deepa R, Ghosh S, Majumder PP, et al.: A novel association of a polymorphism in the first intron of adiponectin gene with type 2 diabetes, obesity and hypoadiponectinemia in Asian Indians.
Hum Genet 2008, 123(6):599605. PubMed Abstract  Publisher Full Text

Tominaga K, Kondo C, Johmura Y, Nishizuka M, Imagawa M: The novel gene fad104, containing a fibronectin type III domain, has a significant role in adipogenesis.
FEBS Lett 2004, 577(12):4954. PubMed Abstract  Publisher Full Text

Thorleifsson G, Walters GB, Gudbjartsson DF, Steinthorsdottir V, Sulem P, Helgadottir A, Styrkarsdottir U, Gretarsdottir S, Thorlacius S, Jonsdottir I, et al.: Genomewide association yields new sequence variants at seven loci that associate with measures of obesity.
Nat Genet 2009, 41(1):1824. PubMed Abstract  Publisher Full Text

Andersen G, Burgdorf KS, Sparso T, BorchJohnsen K, Jorgensen T, Hansen T, Pedersen O: AHSG tag single nucleotide polymorphisms associate with type 2 diabetes and dyslipidemia: studies of metabolic traits in 7,683 white Danish subjects.
Diabetes 2008, 57(5):14271432. PubMed Abstract  Publisher Full Text

Emigh TH: A Comparison of Tests for HardyWeinberg Equilibrium.
Biometrics 1980, 36(4):627642. Publisher Full Text

Faraway JJ: Extending Linear Models with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. Boca Raton: Chapman & Hall/CRC; 2006.

Gardner KM, Latta RG: Shared quantitative trait loci underlying the genetic correlation between continuous traits.
Mol Ecol 2007, 16(20):41954209. PubMed Abstract  Publisher Full Text

Larose DT: Data mining methods and models. Hoboken, New Jersey: John Wiley & Sons, Inc; 2006.

Velicer WF, Jackson DN: Component Analysis versus Common Factor Analysis: Some Issues in Selecting an Appropriate Procedure.

Wang X, Kammerer CM, Anderson S, Lu J, Feingold E: A comparison of principal component analysis and factor analysis strategies for uncovering pleiotropic factors.
Genet Epidemiol 2009, 33(4):325331. PubMed Abstract  Publisher Full Text

Hotelling H: Analysis of a complex of statistical variables into principal components.
Journal of Educational Psychology 1933, 24:417441. Publisher Full Text

Jolliffe IT: Principal Component Analysis. 2nd edition. New York: Springer; 2002.

Härdle W, Simar L: Applied Multivariate Statistical Analysis. 2nd edition. New York: Springer; 2007.

Mardia KV, Kent JT, Bibby JM: Multivariate Analysis. London: Academic Press; 1979.

Weir BS: Genetic Data Analysis 2: Methods for Discrete Population Genetic Data. 2nd edition. Sinauer Associates, Sunderland, MA; 1996.

Agresti A: Categorical Data Analysis. 2nd edition. New Jersey.: John Wiley & Sons, Inc; 2002.