Gene expression signature of estrogen receptor α status in breast cancer1Department of Carcinogenesis, The University of Texas M.D. Anderson Cancer Center, Science Park-Research Division, Smithville, Texas, USA 2Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA 3Department of Pathology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA
BMC Genomics 2005, 6:37doi:10.1186/1471-2164-6-37 The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2164/6/37
©
2005 Abba et al; licensee BioMed Central Ltd. AbstractBackgroundEstrogens are known to regulate the proliferation of breast cancer cells and to modify their phenotypic properties. Identification of estrogen-regulated genes in human breast tumors is an essential step toward understanding the molecular mechanisms of estrogen action in cancer. To this end we generated and compared the Serial Analysis of Gene Expression (SAGE) profiles of 26 human breast carcinomas based on their estrogen receptor α (ER) status. Thus, producing a breast cancer SAGE database of almost 2.5 million tags, representing over 50,000 transcripts. ResultsWe identified 520 transcripts differentially expressed between ERα-positive (+) and ERα-negative (-) primary breast tumors (Fold change ≥ 2; p < 0.05). Furthermore, we identified 220 high-affinity Estrogen Responsive Elements (EREs) distributed on the promoter regions of 163 out of the 473 up-modulated genes in ERα (+) breast tumors. In brief, we observed predominantly up-regulation of cell growth related genes, DNA binding and transcription factor activity related genes based on Gene Ontology (GO) biological functional annotation. GO terms over-representation analysis showed a statistically significant enrichment of various transcript families including: metal ion binding related transcripts (p = 0.011), calcium ion binding related transcripts (p = 0.033) and steroid hormone receptor activity related transcripts (p = 0.031). SAGE data associated with ERα status was compared with reported information from breast cancer DNA microarrays studies. A significant proportion of ERα associated gene expression changes was validated by this cross-platform comparison. However, our SAGE study also identified novel sets of genes as highly expressed in ERα (+) invasive breast tumors not previously reported. These observations were further validated in an independent set of human breast tumors by means of real time RT-PCR. ConclusionThe integration of the breast cancer comparative transcriptome analysis based on ERα status coupled to the genome-wide identification of high-affinity EREs and GO over-representation analysis, provide useful information for validation and discovery of signaling networks related to estrogen response in this malignancy. BackgroundEstrogen plays essential roles in the development, growth control and differentiation of the normal mammary gland. However, it is well documented that endogenous estrogens are powerful mitogens critical for the initiation and progression of human breast and gynecological cancers [1]. This cell proliferation signal is mediated by the estrogen receptors (ER), members of the nuclear receptor family that function both as signal transducers and transcription factors to modulate expression of target genes [2]. There are two main subtypes of estrogen receptors: ERα and ERβ that generally can form homo- and heterodimers before binding to DNA. Although the DNA binding domains of these receptors are very similar, the overall degree of homology is low [3]. Transcriptional regulation of target genes in response to 17β-estradiol (E2) is mediated by two main mechanisms. In one, the E2-ER complex binds to a specific DNA sequence called the estrogen response element (ERE), this receptor-ligand DNA bounded complex interacts with co-regulatory proteins, promoting chromatin remodeling and bridging with the general gene transcription machinery thus resulting in transcription initiation [4]. Alternatively, the ligand-ER complex can interact with other DNA-bound transcription factors that in turn bind DNA sequences (e.g. via AP1, SP1 complexes) [5,6]. ERα and ERβ have different affinities for different response elements and exhibit distinct transcriptional properties. Additionally, E2 also exerts rapid, non-genomic effects attributed to cell membrane-initiated signaling [7]. Approximately two-thirds of all breast cancers are ERα (+) at the time of diagnosis and expression of this receptor is determinant of a tumor phenotype that is associated with hormone-responsiveness. Patients with tumors that express ERα have a longer disease-free interval and overall survival than patients with tumors that lack ERα expression [8]. However, the association between ERα expression and hormonal responsiveness is not perfect: approximately 30% of ERα-positive tumors are not hormone-responsive while 5–15% of ERα-negative tumors respond to hormonal therapy [9]. The molecular basis for the association between ERα expression, hormonal responsiveness and breast cancer prognosis remains unclear. Several studies have been carried out using cDNA and oligonucleotide microarrays identifying breast cancer subclasses possessing distinct biological and clinical properties [10-13]. Among the distinctions made to date, the clearest separation was observed between ERα (+) and ERα (-) tumors [10-15]. It has been suggested that there are sets of genes expressed in association with ERα that could play an important role in determining the hormone-responsive breast cancer phenotype [16]. ERα is obviously likely to be important for the E2 induced proliferative response predominantly via the regulation of estradiol-responsive genes. Nevertheless, the expression of additional subsets of genes not necessarily directly regulated by estrogen may also be fundamental in defining the breast cancer hormone-responsive phenotype. To further elucidate the molecular basis of estrogen-dependent breast carcinogenesis, we here report a comparative transcriptome profiling of invasive breast tumors based on ERα status obtained by SAGE. The SAGE method provides a statistical description of the mRNA population present in a cell without prior selection of the genes to be studied, and this constitutes a major advantage [17]. The breast cancer SAGE comparative analysis was combined with promoter sequence analysis of genes of interest using high-throughput methods of high-affinity ERE identification. In order to have an even more comprehensive picture we also performed a cross-platform comparison between SAGE and DNA microarray studies. Results and discussionBiomarkers of ERα status in breast carcinomasThe primary goal of our study was to identify the most commonly deregulated genes in invasive breast carcinomas related to ERα status. To this end SAGE data was obtained from a set of primary breast carcinomas. Thus, a breast cancer SAGE database of almost 2.5 million tags was analyzed, representing over 50,000 tag species. We performed a comprehensive evaluation and comparison of gene expression profiles using a recently developed supervised method [18], to identify the most representative differentially expressed transcripts between tumors groups, i.e. ERα (+) vs. ERα (-) breast tumors. This statistical analysis revealed 520 genes differentially expressed (Fold change ≥ 2; p < 0.05) between ERα (+) and ERα (-) primary breast carcinomas (see 1). Among the 520 transcripts, 473 were up-modulated and 47 were down-modulated transcripts in ERα (+) tumors. Additional File 1. Differentially expressed genes between ERα (+) vs. ER α (-) breast carcinomas (Fold change ≥2; p < 0.05). Format: XLS Size: 1.1MB Download file This file can be viewed with: Microsoft Excel Viewer The most commonly over-expressed transcripts in ERα (+) tumors were: trefoil factor 1 (TFF1/pS2), synaptotagmin-like 4 (SYTL4), regulating synaptic membrane exocytosis 4 (RIMS4), dual specificity phosphatase 4 (DUSP4), chromosome 1 open reading frame 34 (C1orf34), necdin homolog (NDN), n-acetyltransferase 1 (NAT1) and caspase recruitment domain family 10 (CARD10) (Table 1 and 1). Table 1. Most highly up-modulated transcripts in ERα (+) breast carcinomas identified by SAGE. To validate novel ERα associated genes detected by SAGE not reported in other studies, we performed Real Time RT-PCR analysis of representative transcripts in an independent set of 36 invasive ductal breast carcinomas. In agreement with our SAGE analysis, we detected statistical differences in the over-expression of 8 out of 9 evaluated transcripts in ERα (+) breast tumors including: signal peptide CUB domain EGF-like 2 (SCUBE2) (p = 0.0001), SYTL4 (p = 0.0005), KIAA0882 protein (p = 0.0005), tetraspan 1 (TSPAN1) (p = 0.001), myeloblastosis viral oncogene homolog (C-MYB) (p = 0.002), epidermal growth factor-like 2 (CELSR2) (p = 0.011), nuclear receptor subfamily 4 (NR4A1) (p = 0.029), and enolase 2 (ENO2) (p = 0.033) (Figure 1). A trend of borderline significance was detected for the lectin galactoside-binding protein (LGALS3BP) (p = 0.079) transcript (Figure 1).
SCUBE2 (also known as EGF-like 2 or CEGP1) encodes a secreted and cell-surface protein containing EGF and CUB domains that defines a novel gene family [19]. The epidermal growth factor (EGF) motif is found in many extracellular proteins that play an important role during development, functioning as secreted growth factors, transmembrane receptors, signaling molecules, and important components of the extracellular matrix. The CUB domain is found in several proteins implicated in the regulation of extracellular process such as cell-cell communication and adhesion [20]. Expression of SCUBE2 has been detected in vascular endothelium and may play important roles in development, inflammation and perhaps carcinogenesis [19]. The CELSR2 gene (also known EGFL2) encodes a protein member of the nonclassic-type cadherins (flamingo subfamily). These 7-pass transmembrane proteins have nine cadherin domains, seven-epidermal growth factor-like repeats and two laminin A G-type repeats [21]. It is postulated that these proteins are receptors involved in cell adhesion and receptor-ligand interactions [21] playing a role in developmental processes and cell growth/ maintenance in epithelial and neuronal cells [22,23]. SYTL4 (also known as granuphilin-a or SLP4) contains an N-terminal Slp homology domain (SHD) than can specifically and directly bind the GTP-bound form of Rab27A, a small GTP-binding protein involved in granule exocytosis in cytotoxic T lymphocytes [24]. We determined that over-expression of SYTL4 is associated with ERα (+) tumors (Figure 1b). However, the potential role of this gene in breast carcinogenesis remains unknown. ENO2 (also known as NSE/neuron-specific gamma enolase) encodes one of three enolase isoenzymes found in mammals. This isoenzyme was described to be expressed in cells of neuronal origin. Interestingly, in a recent report Hao et al. (2004) showed high expression of ENO2 transcripts in breast cancer lymph node metastases when compared with primary breast tumors [25]. The TSPAN1 gene (also known as tetraspanin or NET1) encodes a cell-surface protein member of the transmembrane 4 superfamily (TM4SF), involved in the regulation of cell development, activation, growth and motility. A number of tetraspanins were described as tumor-specific antigens, and it was suggested that the function of some TM4SF proteins may be particularly relevant to tumor cell metastasis [26]. Sugiura and Berditchevski (1999) observed that TSPAN1 protein complexes may control the invasive migration of tumor cells and contribute to ECM-induced production of MMP2 in breast cancer cell line [27]. NR4A1, a nuclear receptor subfamily 4, group A gene (also known as steroid receptor TR3 or NUR77) encodes an orphan member of the steroid-thyroid hormone-retinoid receptor superfamily whose members mainly act as transcriptional factors to positively or negatively regulate gene expression and play roles in regulating growth and apoptosis [28,29]. A role for NR4A1 in cell proliferation has been previously reported. It was shown that its expression is rapidly induced by various mitogenic stimuli such as: serum growth factor, epidermal growth factor and fibroblast growth factor [28]. Taken together, the genes that we identified and validated appear to be involved in signaling pathways related to cell proliferation, invasion and metastatic processes, but their exact role in breast carcinogenesis remains to be elucidated. Gene Ontology analysisClassification of genes based on Gene Ontology (GO) terms is a powerful bioinformatics tool suited for the analysis of DNA microarray and SAGE data. Analysis of GO annotation allows one to identify families of genes that may play significant roles related to specific molecular or biological processes in expression profiles [30]. We used the Expression Analysis Systematic Explorer software (EASE) [31] to annotate the 520 deregulated genes according to the information provided by the GO Consortium [30]. The GO database provided annotation for 80% (419 out of 520) of the genes identified by SAGE. Results of this analysis are shown in Figure 2 and in detail in 2.
Additional File 2. Gene Ontology overrepresentation analysis. Format: XLS Size: 1MB Download file This file can be viewed with: Microsoft Excel Viewer We observed that 31% of ERα associated transcripts are involved in biological processes related to cell growth and/or maintenance, 21% are related to cell communication, and 16% are related to regulation of transcription. Approximately 16% of these deregulated genes are related to molecular functions associated with DNA binding and more specifically with transcription factor activity (10%) (Figure 2). Interestingly, using the enrichment GO terms analysis, we identified statistical significant over-representation of specific groups of proteins including: metal ion binding proteins (54 hits out of 419 annotated genes; p = 0.011), calcium ion binding proteins (27 hits out of 419; p = 0.032) and steroid hormone receptor activity related proteins (6 hits out of 419; p = 0.031) (2). The GO cluster related to steroid hormone receptor activity proteins includes: estrogen receptor 1 (ESR1, i.e. ERα), androgen receptor (AR), hydroxysteroid 17-β dehydrogenase 4 (HSD17β4), glucocorticoid receptor (NR3C1), oxysterol binding protein (OSBP), and retinoic acid receptor α (RARA). The observation of functionally related groups of genes identified in the SAGE dataset via GO over representation analysis allows the identification of distinct biological pathways directly or indirectly associated to estrogen response related processes and provides the basis for future mechanistic studies. Identification of high-affinity Estrogen Response ElementsWe used a recently reported genome-wide high-affinity ERE database [32] to identify putative EREs in the promoter regions of the SAGE-identified 473 up-modulated genes in ERα (+) breast tumors. We identified 220 EREs distributed on 163 out of the 473 genes (35%) (see 3). Seventy-two percent of these genes contain one high affinity ERE (117 out of 163) and 28% of them contain two or more EREs in proximity to the transcriptional start sites (TSS) (46 out of 163) (Figure 3a). These EREs can be located in both coding and non-coding sequences such as was described by Bourdeau et al. [32]. Additional File 3. High-affinity EREs identified in ERα (+) up-modulated genes. Format: XLS Size: 567KB Download file This file can be viewed with: Microsoft Excel Viewer
The observed frequency of these elements in our study was 220 EREs in 3260 kb (considering a DNA window of 20 kb for each one of the 163 up-modulated genes with EREs). Compared with the expected frequency from random distribution of high-affinity EREs found in the genome (732 EREs in 3,069334 kb 0.8 ERE in 3260 kb) (see material and methods) [32], the number of individual EREs was 270 fold higher than expected by chance (p < 0.00001). Fifty percent (110 out of 220) of the detected EREs mapped within a 10 kb region 5' of the TSS, while the rest mapped to 3' regions (Figure 3b). Approximately 68% of EREs mapped within the region between -5 to +5 kb from the TSS; in agreement with those observations of Bourdeau et al. [32]. However, it remains to be determined whether distantly located EREs (e.g. -10 kb from the TSS) are functional E2-ER binding sites related to transcriptional activation. Of the validated transcripts previously discussed (Figure 1), we detected high-affinity EREs on the upstream or downstream regions related to the TSS of SYTL4 (-8384 bp from the TSS: tggacatcatgacct), TSPAN1 (+974 bp and +9384 bp from the TSS: tggtctgaatgaccc and aggtcatttccacct respectively), CELSR2 (+173 bp and +3607 bp from the TSS: tgctcagggtgaccc and aggtcaccatgaccg respectively), and NR4A1 (-3478 bp and +4217 bp from the TSS: tgttcactctgacct). It is interesting to note that we were unable to identify high-affinity EREs on the majority of deregulated genes (65%) associated with a positive ER α status. The possibility exists that many of these genes are transcriptionally regulated by non-ERE mediated mechanisms such as those involving ER binding to the AP1 or SP1 transcription factors [33]. The AP1 transcription factor is a heterodimer formed by Jun and Fos family member proteins that binds to the phorbol diester (TPA) response element as well as to the AP1 consensus DNA sequence. In this pathway, ER plays a co-activator role for AP1 [6]. The ER/AP-1 complex can confer estrogen responsiveness to additional subset of genes found in our dataset such as: ovalbumin (Fold change: 3; p = 0.033) and c-fos (Fold change: 2.1; p = 0.033); two transcripts detected as over-expressed in ERα (+) breast tumors by SAGE (1). Similarly the ER/SP1 complex confers estrogen responsiveness to genes such as: retinoic acid receptor α (RARA) (Fold change: 6.7; p = 0.038), vascular endothelial growth factor (VEGFC) (Fold change: 2.6; p = 0.037), insulin-like growth factor binding protein-4 (IGFBP4) (Fold change: 2; p = 0.01) and heat shock protein 27 (HSPB1) (Fold change: 2; p = 0.045); four transcripts detected as over-expressed in ERα (+) tumors in our study (1). An additional pathway of transcription regulation by estrogen involves the ER-related receptors (ERR), nuclear orphan receptors with significant homology to ERs, which do not bind estrogen and have unknown physiological ligands. ERRs are known to bind to the steroidogenic factor 1 response element (SFRE) and also bind to classic EREs, by means of which they exert constitutive transcriptional activity [34]. We detected over-expression of the nuclear orphan receptor NR4A1 by SAGE and subsequently validated this observation by real time RT-PCR (Figure 1g). Interestingly, and as previously mentioned, the genomic region 5' and 3' to the TSS of NR4A1 contain high-affinity EREs. Interaction between ERs and ERRs has been observed in the transcriptional regulation of certain genes such as the human breast cancer related gene TFF1/pS2, the promoter of which is not only activated by ERs but also by ERRs [35]. As described, ERα can mediate estrogenic response through multiple genomic and non-genomic mechanisms, many of which affect proteins and pathways not necessarily directly or exclusively associated with ERα. Thus it is worth stressing that it will the totality of deregulated proteins the ones that ultimately define the phenotype of ERα (+) breast carcinomas regardless of whether a "direct association" with ER transcriptional regulation exists or not. In vivo versus in vitro estrogen induced global gene expression findingsThe SAGE profiles for E2-responsive genes in MCF-7 cell line, previously reported by us [36], was compared with the ER status genes expression profile found in primary breast carcinomas. Briefly, we detected 199 transcripts differentially expressed (p < 0.01) in MCF-7 treated cells, 124 were up-regulated and 75 were down-regulated transcripts. Basically and as reported Charpentier et al, we observed a general up-regulation cell cycle progression-related genes including: CCT2, CCND1, PES1, RAN/TC4, CALM1, CALM2; and tumor-associated genes such as: RFP, D52L1, TFF1/PS2, CAV1, and NDKA among others [36]. These together could contribute to the stimulation of proliferation and the suppression of apoptosis by E2-ER transcriptional regulation. By comparing the in vitro (199 differentially expressed transcripts) and in vivo (520 differentially expressed transcripts) gene expression profiles, to our surprise we detect that only few transcripts: TFF1, CCND1, H19, SREBF1 and WWP1 behaved similarly (i.e. up-regulation) in both studies. This is similar to observations made previously by Meltzer and co-workers whom showed that the majority of genes regulated in cell culture do not predict ER status in breast carcinomas [11,37]. This result suggests that the estrogen-responsive pathways affected in vitro represent only a minor portion of the global gene expression profiles characteristic of ERα (+) breast tumors. This maybe in great part the result of the heterogenous nature of bulk tumor tissue but in addition, the in vitro response of a single cell line to E2, in this particular case the widely used MCF-7 cells, may not faithfully reproduce the physiological effects of ER signaling in vivo. Cross-platform gene expression profiling comparisonIn order to identify and validate the most reliable set of genes able to discriminate breast carcinomas based on their ERα status, we performed a cross-platform comparison between the described SAGE dataset with two previously reported breast cancer studies based on DNA microarray methods [12,13]. van't Veer et al. [12] reported the gene expression profile of 97 primary breast tumors based on oligonucleotide microarrays containing 24,479 elements (Agilent Technologies, Palo Alto, CA, USA). In another study, Sotiriou et al. [13] reported the gene expression profile of 99 primary breast tumors using a cDNA microarray containing 7650 elements. Only files containing differentially expressed genes associated to ERα status tumors from both microarrays studies were obtained for cross-platform comparison (see material and methods). Among the three platforms, a total of 1686 transcripts were identified as over-expressed in ERα (+) breast tumors. One hundred and eighty-three genes were identified by more than one method (Figure 4; 4). Eleven of these 183 genes were identified by all three methods displaying over-expression in ERα (+) breast carcinomas: estrogen receptor 1 (ESR1), GATA-binding protein 3 (GATA3), mucin 1 (MUC1), v-myb-myeloblastosis viral oncogene homolog (C-MYB) , X-box-binding protein 1 (XBP1), hydroxysteroid 17-β dehydrogenase 4 (HSD17B4), BTG family member 2 (BTG2), transforming growth factor β-3 (TGFB3), member RAS oncogene family (RAB31), START domain containing 10 (STARD10), and KIAA0089 (Table 2).
Additional File 4. Cross-platform comparison of the up-modulated transcripts in ERα (+) breast carcinomas. Format: XLS Size: 439KB Download file This file can be viewed with: Microsoft Excel Viewer Table 2. Transcripts identified as over-expressed in ERα (+) breast cancers commonly detected by cross-platforms comparison (SAGE and oligonucleotide microarrays). One hundred and fourteen genes were identified as over-expressed by oligonucleotide microarrays [12] and SAGE in ERα (+) tumors, representing a non-random significant number of overlapping genes based on normal approximation to the binomial distribution (p < 0.001) (Figure 4 and Table 2). Sixty-six genes were identified as over-expressed in ERα (+) tumors by both DNA microarrays platforms (p < 0.01). The set of 25 genes overlapping between cDNA microarrays [13] and SAGE were not statistical significant (p > 0.05). Interestingly, we found a higher number of overlapping genes between the oligonucleotide microarray and SAGE platforms (114 genes), while only 66 genes were observed overlapping when comparing both microarray platforms. It is worth noting that 96% of the 470 genes (Figure 4) identified as overexpressed by the cDNA microarray method [13] were included within the total set of elements in the oligonucleotide microarray platform [12]. In other words, it appears that a better correlation was observed between SAGE and oligonucleotide arrays, than between both DNA microarray methods. ConclusionIn summary, our comprehensive comparison of overlapping genes across different gene expression platforms provides validation for a significant number of transcripts identified as highly expressed in ERα (+) breast tumors. More importantly this analysis identifies the most promising biomarkers for further evaluation as ERα associated genes in breast cancer. Furthermore, the identified proteins may be of value as breast cancer prognostic indicators analyzed either as a group or individually. It is also likely that groups of co-regulated genes in ERα (+) breast cancers may be associated to the hormonal control of mammary epithelial cells growth and differentiation. Finally, a better understanding of the signaling networks controlled or associated with the estrogen response may lead to the identification of novel breast cancer therapeutic targets. MethodsSAGE librariesTo perform the comparative breast cancer SAGE analysis based on ERα status, we analyzed 26 Stage I – Stage II invasive breast carcinomas (8 ERα-negative tumors and 18 ERα-positive tumors). To this end, we generated and sequenced 24 breast cancer SAGE libraries at an approximate resolution of 100,000 tags per library, combined with 2 additional breast cancer libraries (ERα-negative tumors) downloaded from the Cancer Genome Anatomy Project – SAGE Genie database (SAGE_Breast_Carcinoma_B_95-259 and B_IDC_4) http://cgap.nci.nih.gov/SAGE/ webcite. For the generation of our SAGE libraries, snap frozen samples were obtained from the M.D. Anderson breast cancer tumor bank, and SAGE analysis was performed as previously described [36,38]. Data processing and statistical analysis of SAGE librariesSAGE tag extraction from sequencing files was performed by using the SAGE2000 software version 4.0 (a kind gift of Dr. K. Kinzler, John Hopkins University). SAGE data management, tag to gene matching as well as additional gene annotations and links to publicly available resources such as GO, UniGene, LocusLink, were performed using a suite of web-based SAGE library tools developed by us http://spi.mdacc.tmc.edu/bitools/about/sage_lib_tool.html webcite. In our analyses we only considered tags with single tag-to-gene reliable matches. To compare these SAGE libraries, we utilized a modified t-test recently developed by us [18]. This test is based on a beta binomial sampling model that takes into account both, the intra-library and the inter-library variability, thus identifying 'common patterns' of SAGE transcript tag changes systematically occurring across samples [18]. All raw SAGE data reported as Supplementary tables in this manuscript is publicly available at http://sciencepark.mdanderson.org/ggeg/sage_Proj_9.htm webcite. Real Time RT-PCR analysisTemplate cDNAs were synthesized on mRNAs isolated from an independent set of 36 Stage I – Stage II human breast carcinomas (13 ERα-negative tumors and 23 ERα-positive tumors) obtained from our tumor bank. Primers and probes were obtained from the TaqMan Assays-on-Demand™ Gene Expression Products (Applied Biosystems, Foster City, CA, USA). All the PCR reactions were performed using the TaqMan PCR Core Reagents kit and the ABI Prism® 7700 Sequence Detection System (Applied Biosystems, Foster City, CA, USA). Experiments were performed in duplicate for each data point and 18s rRNA was used as control. Results were expressed as mean ± 2 Standard Error based on Log2 transformation of normalized real time RT-PCR values of the assayed genes. We used t-test to compare the gene expression levels of validated genes between ERα (+) and ERα (-) breast tumors (p < 0.05). Immunohistochemical determination of ER statusIHC staining and ER status determination was performed by the Pathology Department, MDACC following routine immunohistochemical procedures. Briefly, five micrometer sections of invasive breast carcinomas paraffin embedded tissues were used. Endogenous peroxidase activity was blocked with 3% H2O2 in methanol for 10 min. After pretreatment with Tris-EDTA buffer, in order to block non-specific antibody binding, the slides were incubated with 10% goat serum in PBS for 30 min. Primary monoclonal ERα antibody (ER-6F11, Novocastra, Newcastle, UK) was used at 1:50 dilution and detected following standard immunohistochemical techniques. DAB was used as chromogen and Mayers hematoxylin is used as counterstain. Scoring was performed by breast pathologist (AS). Cuttoff for positivity was determined at 5% of tumor cells staining positively for ER (i.e. < 5% of cells the tumor was considered negative for ERα). Bioinformatics analysisFor automated functional annotation and classification of genes of interest based on GO terms, we used the EASE [31] available at the Database for Annotation, Visualization and Integrated Discovery (DAVID) at http://david.niaid.nih.gov/david webcite[39]. The EASE software calculates over-representation of specific GO terms with respect to the total number of genes assayed and annotated. Statistical measures of specific enrichment of GO terms are determined by means of an EASE score (p < 0.05). The EASE score is a conservative adjustment of the Fisher exact probability that weights significance in favor of biological themes supported by more genes and is calculated using the Gaussian hypergeometric probability distribution that describes sampling without replacement from a finite population [31]. This allows one to identify biological themes within a specific list of EASE analyzed genes. High-affinity Estrogen Response Elements (ERE) analysisTo identify the occurrence of EREs within the promoter regions of up-modulated genes in ERα (+) breast tumors, we used a human genome-wide high-affinity ERE database http://mapageweb.umontreal.ca/maders/eredatabase/ webcite[32]. This public available database contains 71,119 EREs identified across the human genome (related to 17,353 transcriptional start sites), representing the consensus ERE (5'-Pu-GGTCA-NNN-TGACC-Py-3'), and equivalent sequences with only one or two nucleotide variations from such consensus. Based on these restrictions the expected random frequency was calculated as the total number of base pairs in the human genome divided by the frequency of occurrence of a sequence with specified base pairs at 10 positions and two base pair choices at two positions (3,069334246/411 = 732 high-affinity EREs) [32]. Comparison of gene expression patterns identified by different methodologiesERα status associated genes identified in previous breast cancer studies [12,13] using DNA microarray methods were compared with our SAGE findings. All over-expressed genes in ERα (+) breast tumors obtained from these studies were downloaded from the corresponding web sites (http:/ / www.nature.com/ cgi-taf/ DynaPage.taf?file=/ nature/ journal/ v415/ n6871/ abs/ 415530a_fs.html webcite and http://www.pnas.org/cgi/content/abstract/100/18/10393 webcite) [12,13]. These datasets were annotated by LocusLink ID using the EASE software [26], and then compiled into one Excel spreadsheet pivotTable for comparison of overlapping genes between platforms, i.e. SAGE, Oligonucleotide and cDNA arrays. Anonymous ESTs from the microarrays platforms were excluded due to the inability to cross validate the identities between different gene expression profiles. Any combination of two lists was compared for matching gene-identity. The number and identity of genes commonly affected in two platforms (e.g. SAGE study vs. DNA microarray) was determined. We used the normal approximation to the binomial distribution as previously described [40] to calculate whether the number of matching genes derived from each cross-platform comparison was of statistical significance (p < 0.05). Authors' contributionsM.C.A. conceived the study idea and carried out the real time RT-PCR validations, the biostatistical/ bioinformatics analysis and writing the manuscript. Y.H., H.S. and J.A.D. carried out the breast cancer SAGE libraries and provided practical feedback on aspects of the manuscript. K.B. and S.G. developed the biostatistical and web-page base methodology. A.S. provides the tissue samples and clinical information. C.M.A. is the principal investigator and was involved in the conceptualization, design and writing of the manuscript. All authors read and approved the final manuscript. Competing interestsThe author(s) declare that they have no competing interests. AcknowledgementsThe authors thank Dr. Michael MacLeod for critical reading of this manuscript. This work was supported by NIH-NCI Grant 1U19 CA84978-1A1 (C. M. Aldaz) and center grant ES-07784. References
Have something to say? Post a comment on this article! |




on Google Scholar







author email
corresponding author email
Figure 1.
Figure 2.
Figure 3.
Figure 4.