Obesity has been shown to increase breast cancer risk. FTO is a novel gene which has been identified through genome wide association studies (GWAS) to be related to obesity. Our objective was to evaluate tissue expression of FTO in breast and the role of FTO SNPs in predicting breast cancer risk.
We performed a case-control study of 354 breast cancer cases and 364 controls. This study was conducted at Northwestern University. We examined the role of single nucleotide polymorphisms (SNPs) of intron 1 of FTO in breast cancer risk. We genotyped cases and controls for four SNPs: rs7206790, rs8047395, rs9939609 and rs1477196. We also evaluated tissue expression of FTO in normal and malignant breast tissue.
We found that all SNPs were significantly associated with breast cancer risk with rs1477196 showing the strongest association. We showed that FTO is expressed both in normal and malignant breast tissue. We found that FTO genotypes provided powerful classifiers to predict breast cancer risk and a model with epistatic interactions further improved the prediction accuracy with a receiver operating characteristic (ROC) curves of 0.68.
In conclusion we have shown a significant expression of FTO in malignant and normal breast tissue and that FTO SNPs in intron 1 are significantly associated with breast cancer risk. Furthermore, these FTO SNPs are powerful classifiers in predicting breast cancer risk.
Breast cancer is the most common malignancy in women in developed countries. In 2009 an estimated 194,280 new cases of breast cancer were diagnosed in the US. Several studies have associated obesity and weight gain with breast cancer risk[2,3] and there is evidence that weight loss, as well as, decrease in fat consumption may lead to decreased risk for breast cancer[4,5]. Diabetes mellitus (DM) has also been associated with breast cancer risk . This association between DM, obesity and breast cancer lead us to evaluate the role of genes, which have been found to be associated with diabetes and obesity, in predicting breast cancer risk.
The fat mass and obesity (FTO) associated gene was recently found in several genome wide association studies (GWAS) to be associated with obesity and type II DM[7-11]. More specifically single nucleotide polymorphisms (SNPs) in intron 1 of FTO, such as rs9939609 have been significantly associated with obesity[7-11]. Although its role remains essentially unclear, it seems that FTO encodes a 2-oxoglutarate-dependent nucleic acid demethylase. It is localized in the nucleus and is abundant in the brain. FTO has been shown to be expressed in other tissue as well including the pancreas, skeletal muscle and white adipose tissue although its expression in these tissues is significantly lower than its expression in the hypothalamus and the cerebellum.
Given the association between obesity and breast cancer [2,3] we elected to perform a case control study to evaluate the role of SNPs in intron 1 of FTO in predicting breast cancer risk. We selected SNPs in intron 1 because that was the region that was more strongly associated with obesity [8,13].
Our case-control study included 354 cases of breast cancer and 364 controls from Northwestern University in Chicago, Illinois. Cases and controls were frequency matched for age by 5 years, race, and sex. Race was included as a variable because breast cancer risk varies among various ethnic groups. Consecutive cases with a biopsy-confirmed diagnosis of breast adenocarcinoma were recruited from the medical oncology clinics affiliated with the Northwestern Medical Faculty Foundation from August 1, 2007, through October 28, 2008. Information on the subtype of breast cancer was obtained from the chart review and had been confirmed at Northwestern University. The response rate was 96% and a blood sample was obtained from each recruited patient. All cases signed an informed consent for genetic studies and the protocol was approved by the Institutional Board Review (IRB) of Northwestern University. Controls were chosen randomly from a total of 5578 patients without a diagnosis of cancer at the time of enrollment who were recruited between October 31, 2002, and December 31, 2007. Controls were matched for age within 5 years, sex, and ethnic status. Cases and controls were not matched by chronic disease status. Controls came from the NUgene Project, a biospecimen repository compliant with the Health Insurance Portability and Accountability Act and approved by the IRB, with questionnaire data and longitudinal medical information from participating patients treated at hospitals and outpatient clinics affiliated with Northwestern University. Potential participants are approached either by a genetic counselor or by a physician. The overall response rate has been 25%. The 2 main reasons for refusal to participate were lack of time and concerns about privacy. The repository represents a clinically diverse population with housing samples and data from healthy individuals and patients with the most common health conditions such as diabetes, cancer, and autoimmune and cardiovascular diseases. Participants signed an IRB-approved informed consent to allow distribution and use of de-identified samples and data for a broad range of research. The ability to regularly update participant health status allows NUgene to provide investigators with the most appropriate samples for their research. Controls were selected from NUgene participants without any self-reported history of cancer. Participants with International Classification of Diseases, Ninth Revision codes indicating a diagnosis of cancer were excluded.
DNA from whole blood lymphocytes was extracted using the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany) and was stored at -20°C until use for genotyping. The samples were stored at -20°C.
Selection of SNPs
We selected four SNPs in FTO to evaluate. These SNPs were selected based on previous data from GWAS. More specifically rs9939609 has been previously identified to be significantly associated with obesity and DM [8-11]. Furthermore, other SNPs in the same linkage disequilibrium (LD) block (rs1121980, rs9939973, rs7193144, rs9940128, rs8050136) were found to be associated with obesity . The SNPs we selected to evaluate are all in intron 1 of FTO and represent LD block 6 (rs7206790 and rs8047395) and LD block 7 (rs9939609 and rs1477196). This region has been suggested to be strongly associated with obesity[8,9,13].
Genotyping for the four SNPs was performed by Taqman SNP allelic discrimination by means of an ABI 7900HT (Applied Biosystems, Forest City, California). Results were ascertained by using the SDS software version 2.3 (Applied Biosystems). All results were automatically called (ie, the device displays the genotypes automatically with a 95% certainty). A total of 5% of samples were genotyped in duplicate and showed 100% concordance.
Tissue was obtained from the tissue core facility at Northwestern University. A total of 100 samples were received. Of them 50 were normal breast tissue and 50 were breast cancer tissue. Of the breast tumor tissue 25 samples were ER negative and 25 were ER positive, 20 were PR negative and 30 were PR positive, 41 samples were Her2 negative and 7 Her2 positive. We received 30-40 mg of tissue per sample.
RNA Extraction and cDNA synthesis
Breast Tissues in RNAlater were flash frozen in liquid N2. Tissues were crushed into fine powder and were subsequently lysed in 1 ml of TRIzol® Reagent and 200 ul of chloroform for phase separation. Nucleic Acid Solution (Applied Biosystems, Foster City, CA) was added to the clear (RNA) phase in 3:1 ratio and then added equal volume of 70% EtOH. Mixture was passed through the red filters from the Gen Elute Mammalian Total RNA Miniprep Kit (Sigma-Aldrich, St. Louis, MO). The filters were washed by wash buffers 1 and 2 then eluted using TE buffer. cDNA synthesis from RNA was performed using the
- TaqMan® Reverse Transcription Reagents
- TaqMan® PreAmp Master Mix
The transcription levels for FTO and GAPDH were quantified using the ABI 7900HT Real Time PCR System (Applied Biosystems, Foster City, CA). Primers and probes were designed by Applied Biosystems. Samples are heated at 50°C for 2 min, 95°C for 10 min, followed by 40 cycles of 95°C for 15 sec, and 60°C for 1 min using Taqman® Gene Expression Master Mix (Applied Biosystems, Foster City, CA). Results were normalized to the levels of GAPDH (Taqman® probes, Applied Biosystems) and relative quantification using the ΔΔCt formula.
Paraffin embedded sections of normal and tumor breast tissue samples are collected at Northwestern Memorial Hospital. The immunohistochemistry (IHC) protocol was validated in the Northwestern Pathology Core laboratory for the FTO (SDIX, Newark, DE) and FABP4 (Epitomics, Burlingame, CA) antibodies. Sections were deparaffinized, heat (digital pressure cooker, 30 sec at 125°C) induced antigen-retrieval in citrate buffer (10 mM, pH 6.0) was used then followed by endogenous peroxidase activity blocking for 10 min at room temperature incubation in 3% hydrogen peroxide. Non-specific binding sites were blocked using Dako Protein Block (Dako, Carpinteria, CA). The FTO and FABP4 antibodies, diluted at 1:200 and 1:50 respectively, were applied on sections and incubated for 30-60 minutes at room temperature followed by a TBST (Dako, Carpinteria, CA) rinse. The secondary antibody, anti-rabbit (Dako, Carpinteria, CA) conjugated to dextran labeled polymer and horse radish peroxidase is applied for 15 min and then washed with TBST. Sections are counterstained with Mayer's Hematoxylin (Sigma, St. Louis, MO) and mounted with cover glass. All tissue samples were read by an independent pathologist (Dr Siziopikou).
We used logistic regression models to analyze our data with two different methods. The first method simultaneously fit all main effects of the four SNPs, referred to as multiple-SNP nonepistatic analysis. We also used the traditional logistic regression analysis based on single SNPs to confirm our findings from the multiple-SNP nonepistatic analysis. The second method was the multiple-SNP epistatic analysis, simultaneously fitting all main effects and epistatic interactions using multiple logistic regression models. The motivation for including epistatic interactions is to understand how the genetic effect(s) of a SNP varies with the genotypes at another SNP, and to aid discovery of additional genetic variants. We first performed our analyses without any covariates, and then adjusted for race, age, and BMI by including them as covariates.
We coded the main-effect predictors using the commonly-used Cockerham genetic model, and constructed all epistatic interactions by multiplying two corresponding main-effect variables. We denoted common homozygote (i.e., the homozygote with higher frequency), heterozygote, and rare homozygote for each SNP by c, h, and r, respectively. The Cockerham model defines two main effects for each SNPs (i.e., additive and dominance effects) and four epistatic interactions for two SNPs (i.e., additive-additive, additive-dominance, dominance-additive, and dominance-additive interactions)[14,15]. The additive contrast is defined as -1, 0, and 1 for c, h, and r, and the dominance contrast is defined as -0.5 for c and r and 0.5 for h, respectively. The additive effect represents the genotypic effect (r - c)/2, and the dominance effect measures h - (c + r)/2 in the logit probability of being cases. A positive additive effect (i.e., OR > 1) indicates that the rare homozygote increases cancer risk compared to the common homozygote, and a positive dominance effect means that the heterozygote increases risk compared to the mean of two homozygotes.
Multiple-SNP analyses can accommodate LD among the SNPs and have the advantages of providing potentially increased power and reduced false positives to detect causal variants, of better separating highly correlated predictors, and of more efficiently detecting epistatic interactions [16-19]. However, traditional multiple logistic regression models and procedures are highly problematic when strong LD exists and/or the number of effects is large. Thus, our multiple-SNP analyses employed Bayesian hierarchical logistic models using prior distributions for genetic effects that constrain the effects to lie in a reasonable range and results in identifiable models . Specifically, we used Student-t prior distributions, i.e., and , with the hyperparameters (ν, s) reasonably preset. For main effects, we set (ν, s) to be (1, 2.5), which results in the routinely-used weakly informative prior . For epistatic effects, we chose (1, 2.5 Kmain/Kepistasis), where Kmain (Kepistasis) is the total number of main (epistatic) effects in the model. This prior applies more stringent restrictions on interactions. We fitted our hierarchical logistic models by incorporating an approximate EM (expectation-maximum) algorithm into the usual iteratively weighted least squares for classical logistic regression .
We used two summary measures, the deviance and the Akaike information criterion (AIC), to compare different models. The deviance, defined as -2 times the log-likelihood, is a statistical summary of model fit; lower deviance means better fit to data. The AIC, defined as deviance + 2· (number of predictors), measures the predictive power; a model is estimated to reduce out-of-sample prediction error if the AIC decreases.
To evaluate the risk prediction ability of the multiple-SNP logistic models, we calculated the true positive rate (TPR, or sensitivity) and the false positive rate (FPR, or 1 - specification) that the model discriminates cases and controls. TPR (or FPR) is defined as probability that an individual is predicted as a case given that this person is truly affected (or unaffected). Because our model fitting procedure yields the predicted probabilities of a case, we could compute different sets of TPR and FPR by changing the classification threshold between 0 and 1, and thus plot the receiver operating characteristic (ROC) curves.
A linear regression model with FTO as the response and the indicator variable (for example, Tumor or Normal) as explanatory predictor was used to test the difference of the FTO tissue expression between the groups. This method is equivalent to a t test. Outliers were removed from the analysis.
Demographics and Genotype frequencies of cases and controls
This study included 354 cases and 364 controls. As shown in Table 1 overall the median age for the cases was 50 and for the controls 47. This difference was insignificant (p-value = 0.6). The majority of cases and controls were Caucasian although other ethnic groups were represented as well. Cases and controls had similar ethnicity. The differences of weight and BMI among cases and controls were small but nonetheless significant. Table 1 also shows category variables for cases. Estrogen receptor (ER) and progesterone receptor (PR) status were categorized as positive or negative; tumor grade as 1, 2 or 3; stage as I, II and III/IV; Her2 status as positive or negative; Presence of lymph node metastases at diagnosis (LN) as yes or no; and family history as positive or negative. Among cases the majority were diagnosed at stage I or II (72.8%), were ER positive (78.9%) and Her2 negative (82.0%). Table 2 shows the distribution of the different alleles between cases and controls for the four FTO SNPs analyzed in this study. We tested for Hardy-Weinberg equilibrium (HWE) for each SNP among controls. No significant deviations from HWE were found.
Table 1. Baseline Characteristics of breast cases and controls
Table 2. Genotype characteristics of cases and controls.
We calculated the LD measures, D' and r2 from the control samples for different races. Due to the small sample sizes of several races (Table 1), we only used Caucasian, Black, and Asian samples. To compare our LD patterns with those from the HapMap project, we extracted LD values from corresponding HapMap samples from U.S. residents with northern and western European ancestry (CEU), Ibadan of Nigeria (YRI), and Tokyo of Japan (JPT), respectively. Additional File 1 shows the D' and r2 values from our study and the HapMap project. The LD values from our study were comparable with those from the HapMap project. We found that four SNPs were in strong LD for Caucasian and Asian samples but had reduced LD values for Black samples.
Additional file 1. LD analysis for SNPs. Odds ratios (ORs) of three genotypes for the four SNPs under the epistatic model adjusted for race, age and BMI. The notation c, h, and r represent common homozygote, heterozygote, and rare homozygote, respectively.
Format: DOC Size: 41KB Download file
This file can be viewed with: Microsoft Word Viewer
We performed multiple-SNP nonepistatic analyses to simultaneously estimate the main effects of all SNPs on breast cancer risk. Table 3 shows the estimated odds ratios along with the 95% confidence intervals for all main effects under three non-epistatic models. From these analyses we found that rs1477196 was significantly associated with the risk of breast cancer through the additive effect. When taking into account race, age and BMI, the odds ratio of the additive effect of rs1477196 was 2.611 (95% CI: 1.56-4.37), indicating that compared to the common homozygote AA the rare homozygote GG significantly increased the risk of breast cancer.
Table 3. Multiple-SNP nonepistatic and single SNP models: odds ratios (OR), and 95% confidence intervals for all main effects of SNPs.
We performed single SNP analyses to confirm the main effects of all SNPs on breast cancer risk. Table 3 shows the estimated odds ratios along with the 95% confidence intervals for all main effects under three models. From these analyses we confirmed that rs1477196 was significantly associated with the risk of breast cancer through the additive effect, indicating that compared to the common homozygote AA the rare homozygote GG significantly increased the risk of breast cancer. The p-values of additive effect in three models were 0.0055, 0.005, and 0.003, respectively. After the Bonferroni Correction for the multiple testing, the p-values were 0.044, 0.04, and 0.024, respectively, which were still significant at the nominal level of 0.05,
When using the multiple-SNP epistatic analysis we were able to simultaneously take into account additive and dominance effects, additive-additive, additive-dominance, dominance-additive, and dominance-dominance interactions between SNPs. Table 4 shows the estimated odds ratios and their 95% confidence intervals for all main effects under three epistatic models. Under the epistatic models, all the four FTO SNPs were significantly associated with breast cancer. Adjusted for the covariates, race, age and BMI, rs9939609 was significantly associated with the risk of breast cancer through both the additive (OR = 3.719; 95% CI: 1.43-9.68) and dominance effects (OR = 0.506; 95% CI: 0.30-0.88). Rs1477196 was also significantly associated with breast cancer through both the additive (OR = 30.111; 95% CI: 9.65-94.80) and dominance effects (OR = 0.458; 95% CI: 0.27-0.77). Finally the additive effects of rs7206790 (OR = 1.888; 95% CI: 1.08-3.29) and rs8047395 (OR = 0.281; 95% CI: 0.12-0.68) were significantly associated with breast cancer.
Table 4. Epistatic models: odds ratios (OR) and 95% confidence intervals (95% CI) for all main effects of SNPs.
Table 5 shows significant epistatic effects with p-value < 0.05 under three models. All the three analyses detected two significant epistatic interactions, additive-additive interaction between rs1477196 and rs9939609 and dominance-dominance interaction between rs1477196 and rs8047395. When taking into account race, age and BMI, the estimated odds ratios of these two interactions were 12.031 (95%CI: 2.68-53.95) and 0.054 (95%CI: 0.01-0.20).
Table 5. Epistatic analyses: odds ratios (OR) and 95% confidence intervals (95% CI) for epistatic effects with p-value < 0.05.
Based on the fitted epistatic models, we calculated the odds ratios of three genotypes for each SNP and the probabilities of being case for two SNP pairs, rs1477196 and rs9939609, rs1477196 and rs8047395, which showed significant interactions. Additional File 2 and 3 display the results from the epistatic models including race, age, and BMI as covariates. Other epistatic models produced similar results. From Additional File 2, we can see that the rare homozygotes of rs9939609, rs1477196, and rs7206790 increased the risk of breast cancer compared to the common homozygotes, while the common homozygote for rs8047395 was associated with an increased risk. Additional File 3 shows that the genetic effect(s) of a SNP varied with the genotypes at another SNP.
Additional file 2. Odds ratios (ORs) of three genotypes for the four SNPs under the epistatic model adjusted for race, age and BMI. The notation c, h, and r represent common homozygote, heterozygote, and rare homozygote, respectively.
Format: DOC Size: 26KB Download file
This file can be viewed with: Microsoft Word Viewer
Additional file 3. Probablities of being case for two-locus genotypes at SNPs that show significant interactions under the epistatic model adjusted for race, age and BMI. The notation c, h, and r represent common homozygote, heterozygote, and rare homozygote, respectively.
Format: DOC Size: 27KB Download file
This file can be viewed with: Microsoft Word Viewer
We estimated deviance and AIC in the multiple-SNP nonepistatic and epistatic analyses under the three models (Additional File 4). The epistatic models had lower deviance and AIC than the non-epistatic models. This indicated that inclusion of the significant epistatic interactions improved the fit of the model to data and reduced out-of-sample prediction error. The epistatic model adjusting for age, race and BMI had the lowest deviance and AIC and thus best fit to data. For this reason we have selected this model for the interpretation of our results.
Additional file 4. Receiver operating characteristic (ROC) curves for risk prediction for 1. epistatic model adjusted for race, age and BMI (dotted black line), 2. nonepistatic model adjusted for race, age and BMI (solid black line), and 3. model with covariates, race, age and BMI, and no SNPs. The areas under the ROC curves (AUC) are 0.68, 0.60, and 0.53, respectively.
Format: DOC Size: 26KB Download file
This file can be viewed with: Microsoft Word Viewer
Breast cancer risk prediction
To evaluate the ability of the multiple-SNP models of the FTO pathway genotypes to predict breast cancer risk we plotted the receiver operating characteristics (ROC) curve (Additional File 4). The ROC curve shows the estimated true positive and the false positive rates for different classification thresholds between 0 and 1. The area under the ROC curve (AUC) was used to measure the discrimination power of our genotypes. The AUC number represents the probability that given two random individuals, one who will develop the disease and the other who will not, the genotypes will assign the former as a positive test and the latter as a negative test. A perfect genotype would have an AUC of 1. It has been suggested that an AUC > 0.5 has some discriminatory ability and for screening high risk individuals an AUC > 0.75 should be used.
We performed analyses based on three models: the epistatic model adjusted for race, age and BMI, the nonepistatic model adjusted for race, age and BMI, and the model including race, age and BMI and no SNPs. The epistatic model achieved higher true positive rate and lower false positive rate than the nonepistatic model did for any thresholds. The areas under the ROC curves (AUC) were 0.68, 0.60, and 0.53, respectively. This indicated that FTO genotypes provided powerful classifiers to predict breast cancer risk and a model with epistatic interactions further improved the prediction accuracy.
We estimated category-specific risks by comparing the genotype distribution of cases and controls within each category for age or between each case category and all controls for case category variables, i.e., tumor grade, stage at diagnosis, ER, PR, Her2, LN, and family history (see Table 1). To investigate the effects of age, subjects were separated into four groups (under 39, 39-49, 50-59 and 60+) according to age at diagnosis (cases) or interview (controls). Since sample sizes were largely reduced, all these subgroup analyses fit only main effects.
Table 6 displays the estimated odd ratios and their 95% confidence intervals for significant effects (i.e., p-value < 0.05). These analyses showed that rs1477196 is associated with breast cancer regardless of ER or PR status, grade of diagnosis, presence or absence of axillary lymph nodes or family history. Rs1477196 was also associated with stage II, III and IV at diagnosis although it was not significantly associated with stage I breast cancer. Other subset analyses showed that rs7206790 was mostly associated with ER and PR+ disease, stage III/IV, grade II, presence of axillary LN and positive family history. Finally rs8047395 was associated with grade III breast cancer.
Table 6. Subgroup analyses: odds ratios (OR) and 95% confidence intervals (95% CI) for main effects with p-value < 0.05.
Tissue FTO expression
When looking at FTO expression between normal and tumor tissue we found that the FTO in tumor tissue was significantly lower than in normal tissue (p-value < 0.001) (Additional File 5, Figure 1). This difference was similar regardless of ER, PR and Her2 expression. However there didn't seem to be any significant difference in tumor FTO expression in ER + vs ER-, PR+ vs PR- or Her2+ vs Her2- cancers (Additional File 6).
Format: DOC Size: 29KB Download file
This file can be viewed with: Microsoft Word Viewer
Figure 1. Tissue expression of FTO in normal and tumor breast tissue. o = outliers, * = extreme cases.
Additional file 6. FTO expression according to ER, PR and her2 status. Tissue expression of FTO. TERneg: tumor ER negative, TERpos: tumor ER positive, TPRneg: tumor PR negative; TPRpos: tumor PR positive, THer2neg: Tumor Her2 negative, THer2pos: tumor Her2 positive. o = outliers, * = extreme cases.
Format: DOC Size: 26KB Download file
This file can be viewed with: Microsoft Word Viewer
To evaluate whether FTO was expressed in the breast tissue or surrounding area we performed IHC staining on 21 breast cancer tumors and 20 samples of normal breast epithelium. We found that all samples both malignant and normal breast epithelium showed positive cytoplasmic staining for FTO (Figure 2). We could not detect any differences between staining intensity between normal and malignant breast tissue. Adipose tissue had little or no staining of FTO. We also performed staining for FABP4, an adipocyte marker, and found that seven of 18 (38.8%) breast tumors stained positive and three of 20 (15%) normal breast epithelium stained positive, whereas all adipose tissue stained positive for FABP4.
Figure 2. Breast cancer (A,B) and normal breast tissue (C,D) staining of FTO. As shown here there is strong cytoplasmic staining for FTO in the breast tissue whereas the surrounding adipose tissue and stroma don't show any staining for FTO.
In this clinic-based case-control study we found that SNPs located in intron 1 of FTO are associated with breast cancer risk. More specifically rs1477196, rs9939609, rs7206790 and rs8047395 were significantly associated with breast cancer risk. Furthermore, with an AUC of 0.68 these FTO genotypes provided powerful classifiers to predict breast cancer risk. This is to our knowledge the first report of an association between FTO and breast cancer risk.
There has also been extensive research on the association of DM and breast cancer. In a recent meta-analysis of 20 case-control studies there was a 20% increased risk for breast cancer in women with DM. In the four cohort studies, which were included in the same meta-analysis, breast cancer risk increased by 24% in patients with DM. We recently showed that adiponectin, an adipokine which is linked to obesity and type II DM, plays an important role in breast cancer risk[22-24]. We therefore sought out to find other links between obesity, DM and breast cancer, which lead us to evaluate the role of FTO in breast cancer risk.
We performed several analyses using epistatic and nonepistatic genetic models. The nonepistatic models take into account all SNPs simultaneously (Multiple-SNP analysis). In the epistatic models the interaction between the SNPs is also taken into account as to how it changes individual SNP effects. In the nonepistatic model rs1477196 was found to be significantly associated with breast cancer risk. Of all models performed the epistatic model, which controls for age, race and BMI had the best fit. That model showed that all four SNPs (rs1477196, rs9939609, rs7206790 and rs8047395) are significantly and jointly associated with breast cancer risk. Furthermore, the interactions between rs1477196 and rs9939609, as well as, rs1477196 and rs8047395 were significantly associated with breast cancer risk.
In subset analyses we observed that rs1477196 was significantly associated with breast cancer regardless of hormone receptor status, tumor grade, stage at diagnosis or presence of family history. However rs7206790 predicted hormone receptor positive breast cancers, in young women with a positive family history and was associated with more advanced stages at diagnosis.
We performed a breast cancer risk prediction model using the ROC curve. This curve utilizes the genotypes measured in all individuals and assesses the combination of sensitivity and specificity of different combinations of genotypes. We observed an AUC of 0.68 for the epistatic model, significantly higher than the models ignoring epistasis and genetic factors. This result shows that there is utility of prediction of breast cancer risk using the FTO genotype. It has been suggested by others that an AUC > 0.5 has some discriminatory ability . In a recent paper by Watcholder et al et al a genetic model, that included 10 SNPs found in genome wide association studies (GWAS) to be associated with breast cancer risk, produced an AUC of 58.9%. In that study the SNP-only model predicted risk slightly better than the Gail model. Another study by Mealiffe et al also confirmed that a SNP-only model produced a higher AUC (58.7%) compared with the Gail model (55.7%). The AUCs in these studies are somewhat lower compared with the AUC obtained in our study. The major reasons for this prediction improvement are: 1) we used a Bayesian hierarchical model to fit the data and to predict the disease risk. As described in Gelman et al. , a Bayesian hierarchical model can improve the prediction accuracy; 2) the previous studies have not included interactions into the predictive models. If genetic interactions are present, adding these interactions to a predictive model can increase the accuracy of prediction[28-30]. However we do recognize that this is a single study and we would need to validate our results before we can be certain of the magnitude of our findings.
We also showed the presence of FTO in breast tissue and found that normal breast tissue had significantly higher FTO RNA expression compared with breast cancer tissue. Furthermore, when comparing subtypes of breast cancer risk by ER, PR and Her2 expression we did not find any significant differences in FTO RNA expression according to these subtypes. We confirmed the presence of FTO expression by performing IHC where we found that both normal breast tissue and breast cancer tissue have a positive cytoplasmic stain for FTO. FTO has a potential role in nucleic acid demethylation. Its expression in breast tissue points toward an important role in the demethylation process. Recent studies have shown that mice overexpressing FTO are obese. Moreover, ubiquitous overexpression of FTO has been shown to lead to a dose-dependent increase in body and fat mass, irrespective of whether mice are fed a standard or a high-fat diet suggesting that increased body mass results primarily from increased food intake.
Our study has several strengths and limitations. Cases and controls were similar in respect to age, race and geographic location but differed significantly in regard to BMI. However we controlled for age, race and BMI. Information on chronic diseases was not recorded for cases and controls with the possibility of discrepancy between the two groups. However FTO has only shown to play a role in obesity and we controlled for BMI. Therefore such discrepancy would likely not affect our findings. The selection of our SNPs was based on previous data from GWAS that correlated them with obesity. Although exposure was assessed in the context of a case control study, it is impossible that breast cancer would have changed SNP classification, which is already determined at birth. Thus this study does fulfill the "time sequence" criterion for causality. This in association with existing epidemiologic evidence and biologic plausibility support a causal association between these SNPs and risk for breast cancer. Strengths of the study include the use of state of the art methods for evaluation of SNPs as well as FTO IHC and that we have controlled for known potential confounders. Confounding from unknown factors remains a possibility, however. Some random misclassification is unavoidable but such misclassification leads to suppression of effect estimates and p-values away from statistical significance. Thus, our statistically significant results of this study could not have been attributed to misclassification.
To our knowledge this is the first study reporting an association between FTO pathway SNPs and breast cancer risk. This is also the first study to show expression of FTO in normal and malignant breast tissue. If confirmed in subsequent studies, our findings suggest that genetic variants of FTO alter breast cancer risk. The fact that FTO is expressed in human breast tissue as well as its role in nucleic acid demethylation point toward a direct effect of FTO in breast cancer. The FTO axis may emerge as an important modifier of breast cancer risk. Our findings point toward an important role of FTO in breast cancer.
FTO: Fat mass and obesity gene; SNP: single nucleotide polymorphism; AIC: Akaike information criterion; GWAS: genome wide association study; ROC: Receiver operating characteristic; AUC: area under the curve; DM: diabetes mellitus; IRB: institutional board review; LD: Linkage disequilibrium; RT-PCR: real time polymerase chain reaction; ER: estrogen receptor; PR: progesterone receptor; LN: lymph node; Her2: human epidermal receptor 2; DNA: Deoxyribonucleic acid; IHC: immunohistochemistry; OR: Odds ratio; CI: confidence interval; TPR: true positive rate; FPR: false positive rate; FABP4: fatty acid binding protein 4.
The authors declare that they have no competing interests.
VK participated in writing the paper and design of the study, NY participated in the statistical analysis of the paper, MS participated in genotyping and expression assays of the study, KS participated in the immunohistochemistry and the interpretation of the study, KZ participated in the statistical analysis of the paper, YX participated in the expression assays of the study, ST participated in the recruitment of cases and controls, SA participated in the recruitment of cases and controls, BP participated in the conception of the study and recruitment of cases and controls, CM participated in writing the paper and conception of the study. All authors read and approved the final manuscript.
This work was supported by the Walter S. Mander Foundation, Chicago, IL, the Lynn Sage Foundation, Chicago, IL, the Lynn Sage Scholar Award, Chicago, IL (Kaklamani); R01 CA112520 (Pasche), R01-GM74913 (Zhang) and 2R01GM069430-06 (Yi). In loving memory of Dolores Knes, who reminds us every day that the goal of our research is to improve the lives of our patients.
Harvie M, Howell A, Vierkant RA, Kumar N, Cerhan JR, Kelemen LE, et al.: Association of gain and loss of weight before and after menopause with risk of postmenopausal breast cancer in the Iowa women's health study.
Prentice RL, Caan B, Chlebowski RT, Patterson R, Kuller LH, Ockene JK, et al.: Low-fat dietary pattern and risk of invasive breast cancer: the Women's Health Initiative Randomized Controlled Dietary Modification Trial.
Cho YM, Kim TH, Lim S, Choi SH, Shin HD, Lee HK, et al.: Type 2 diabetes-associated genetic variants discovered in the recent genome-wide association studies are related to gestational diabetes mellitus in the Korean population.
Thorleifsson G, Walters GB, Gudbjartsson DF, Steinthorsdottir V, Sulem P, Helgadottir A, et al.: Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity.
Hinney A, Nguyen TT, Scherag A, Friedel S, Bronner G, Muller TD, et al.: Genome wide association (GWA) study for early onset extreme obesity supports the role of fat mass and obesity associated gene (FTO) variants.
Mantzoros CS, Li T, Manson JE, Meigs JB, Hu FB: Circulating adiponectin levels are associated with better glycemic control, more favorable lipid profile, and reduced inflammation in women with type 2 diabetes.
Annals of Applied Statistics 2008, 2:1360-1383. Publisher Full Text
Clark AG: Limits to prediction of phenotype from knowledge of genotypes. Limits to knowledge in evolutionary genetics. Edited by Clegg M, et al. Kluwer Academic/Penum Publishers, New York; 2000:205-224.
Genetics Research 2011, in press.
The pre-publication history for this paper can be accessed here: