In a previous genome-wide analysis of FXR binding to hepatic chromatin, we noticed that an extra nuclear receptor (NR) half-site was co-enriched close to the FXR binding IR-1 elements and we provided limited support that the monomeric LRH-1 receptor that binds to NR half-sites might function together with FXR to activate gene expression.
To analyze the global pattern for LRH-1 binding and to determine whether it might associate with FXR on a whole genome-wide scale, we analyzed LRH-1 binding to the entire hepatic genome using a non-biased genome-wide ChIP-seq approach. We identified over 10,600 LRH-1 binding sites in hepatic chromatin and over 20% were located within 2 kb of the 5' end of a known mouse gene. Additionally, the results demonstrate that a significant fraction of the genome sites occupied by LRH-1 are located close to FXR binding sites revealed in our earlier study. A Gene ontology analysis revealed that genes preferentially enriched in the LRH-1/FXR overlapping gene set are related to lipid metabolism. These results demonstrate that LRH-1 recruits FXR to lipid metabolic genes. A significant fraction of FXR binding peaks also contain a nuclear receptor half-site that does not bind LRH-1 suggesting that additional monomeric nuclear receptors such as RORs and NR4As family members may also target FXR to other pathway selective genes related to other areas of metabolism such as glucose metabolism where FXR has also been shown to play an important role.
These results document an important role for LRH-1 in hepatic metabolism through acting predominantly at proximal promoter sites and working in concert with additional nuclear receptors that bind to neighboring sites
Keywords:LRH-1; FXR; ChIP-seq; lipid metabolism
Nuclear receptors are signal-regulated transcription factors that control a wide range of biological processes and influence many human diseases . Nuclear receptor activity is controlled by the binding of natural small molecules or ligands including hormones and metabolites and many synthetic compounds have been designed to mimic these natural regulators . The ability of nuclear receptors to alternate between activation and repression in response to specific ligands is mediated by differential binding of non-DNA binding co-regulators, including co-activators and co-repressors . In general, this switch is mediated through a conformational change in the ligand binding pocket of the nuclear receptor leading to dissociation of co-repressors and interaction with co-activators.
In addition to the non-DNA binding ligand-gated co-regulators, nuclear receptor activity can also be influenced by the binding of other DNA binding partner proteins that can interact with the nuclear receptors to form a cis-regulatory module to enhance or repress the transcription of select target genes .
The liver receptor homolog-1 (LRH-1; NR5A2)) is expressed mainly in the liver, intestine, exocrine pancreas, and ovary [4-6] and plays a role in the regulation of bile acid, cholesterol, and steroid hormone homeostasis. It belongs to a nuclear receptor subfamily that includes steroidogenic factor 1 (SF-1; NR5A1). LRH-1 was cloned independently by several groups and it received many names, including pancreas homolog receptor 1 (PHR-1), fetoprotein transcription factor (FTF), CYP7A1 promoter binding factor (CPF), human B1 binding factor (hB1F) .
Unlike nuclear receptors that form heterodimers with RXR to bind to their response element, LRH-1 regulates target genes by binding as a monomer to DNA response elements with consensus sequence 5'PyCAAGGPyCPu3' , which is similar to a "half-site" recognized by dimeric receptors. LRH-1 is involved in the regulation of genes, which participate in steroid, bile acid and cholesterol homeostasis . Recent structural studies for LRH-1 and SF-1 revealed a phospholipid located in the binding pocket of the protein crystal suggesting phospholipids might function as natural ligands [9,10]. Whereas the physiological relevance of the interaction between LRH-1 and putative phospholipid ligands remains to be fully appreciated, a recent study supports the role for specific phospholipids as regulatory agonists for LRH-1 in vivo .
LRH-1 also has a key role early in development where it activates expression of Oct4, which is required to maintain pluripotency at the earliest stages of both embryonic development and in ES cell differentiation . In fact, a recent study showed that LRH-1 could replace Oct4 in the re-programming of mouse somatic cells into pluripotent cells by presumably activating Oct4 .
In our analysis of FXR binding to hepatic chromatin, we showed that LRH-1 could function as a partner transcription factor for FXR on a small set of target genes through binding to a nuclear receptor half-site that was co-enriched with the FXR IR-1 element on a genome-wide scale . To determine how global the association between FXR and LRH-1 might be and to analyze LRH-1 more broadly, the binding of LRH-1 to the whole liver genome was accomplished by a non-biased genome wide ChIP-seq analysis in liver using an LRH-1 antibody to enrich LRH-1 target regions that were subsequently sequenced using Applied Biosystems' SOLiD (Sequencing by Oligonucleotide Ligation and Detection) System. The studies demonstrate that LRH-1 binds to over 10,6000 sites in the genome with a significant fraction located close to FXR binding sites identified in our earlier study. Gene ontology grouping revealed that the genes preferentially bound by both FXR and LRH-1 are involved in lipid metabolism suggesting that LRH-1 targets FXR for activation of genes of lipid metabolism. These data also suggest that additional monomeric nuclear receptors such as RORs and NR4As may also bind to NR half-sites close to FXR elements that are not occupied by LRH-1, which could target FXR to different gene clusters involved in other key areas of metabolism.
Results and Discussion
Identification of the Hepatic Cistrome for LRH-1
In our previous studies of genome-wide binding for FXR, our analysis revealed that an additional nuclear receptor (NR) half-site was present in 71% of the FXR/RXR binding IR-1 sites from our liver FXR ChIP-seq dataset . We also demonstrated that the IR-1 and additional NR half-sites were located relatively close together with most occurrences containing the two motifs within 50 bases of each other . This finding suggested that FXR regulates gene expression in combination with a co-binding monomeric nuclear receptor.
LRH-1 is a prominent monomeric liver NR that binds to half-site elements and we showed that a few of the FXR target promoters also bound LRH-1 . To both analyze the genome-wide binding for LRH-1 and to determine whether it might be associated with FXR binding on a genome-wide scale, we performed a ChIP-seq analysis with hepatic chromatin after enrichment with an LRH-1 antibody. Chromatin prepared from livers of six C57BL6 mice was pooled and processed for ChIP with an antibody to LRH-1 or a control IgG as described in Methods. The quality of the chromatin and specificity of the LRH-1 antibody were confirmed by comparative site-specific ChIP analysis using known FXR binding sites in the promoters of SHP, Pemt, Pcx, and Abca4 (Additional File 1). Chromatin enriched by the LRH-1 antibody produced a significantly increased qPCR signal for LRH-1 binding to these promoters relative to chromatin pulled down with a control IgG fraction (Additional File 1).
Additional file 1. LRH-1 ChIP of selected LRH target gnes. This file contains a qPCR analysis of 4 separate promoters after ChIP analysis for liver chromatin. This is essential to show the specificity of the LRH-1 antibody
Format: PDF Size: 1.3MB Download file
This file can be viewed with: Adobe Acrobat Reader
Next, DNA from the LRH-1 antibody enriched chromatin was subjected to ChIP-seq using the Applied Biosystems' SOLiD platform. The sequencing libraries were prepared according to the standard SOLiD System 2.0 Fragment Library Preparation protocol and the quality of ChIPed DNA, including DNA fragmentation and library amplification, was evaluated by using Agilent BioAnalyzer before running the sequencing reactions. Most DNA fragments were between 200-600 bp in size for both samples (Additional File 2). The DNA fragments between ~200-300 bp were selected for library preparation and SOLiD sequencing.
The data generated more than 40 million independent sequencing reads (Table 1). The individual 39 bp reads were filtered for high quality, as well as for alignment and unique placement in the mouse reference genome by using SOLiD™ BioScope™ Software (Life Technologies). This resulted in 8.3 million uniquely mapped reads corresponding for the IgG and 10.6 million for the LRH-1 enriched sample (Table 1). However, we applied an even more stringent cutoff mapping quality scores (MAPQ > 5) and obtained ~5.5 million for IgG and ~8 million reads for LRH-1 enriched samples which were used for further analysis (Table 1 and Figure 1A).
Table 1. Summary of SOLiD ChIP-seq analysis
Figure 1. MACS analysis for LRH-1 ChIP-seq. (A) Summary of ChIP-seq analysis for LRH-1 binding to DNA in hepatic chromatin by MACS. Given mfold 32 and sonication size (bw) 300 bp, MACS searched 2bw window area across the genome to find genomic peaks with tags more than mfold enriched relative to a random tag genome distribution. The results were obtained using the parameters of p-value cutoff 1 × 10-10 and false discovery rate (FDR) 1%. (B) Peak model built by MACS. MACS estimated the d for LRH-1 ChIP-seq data.
To identify LRH-1 binding peaks, we used Model-based Analysis of ChIP-seq (MACS), which was designed to analyze data generated by short read sequencers such as from the SOLiD platform  to first estimate peak size and location, using BED files as an input. The distance between the modes of the forward and reverse peaks in the alignments, defined as 'd', was 152 bp for the LRH-1 ChIP-seq data (Figure 1B). Using stringent p-value and false discovery rate (FDR) cutoffs of ≤ 1 × 10-10 and ≤ 1% respectively, we identified 10,634 genomic sites occupied by LRH-1 protein (Figure 1A).
The aligned sequence reads were displayed as a track onto the mouse reference genome using the University of California at Santa Cruz (UCSC) genome browser (http://genome.ucsc.edu/index.html webcite), and visual inspection of several sites confirmed that the LRH-1 peaks identified by MACS correspond to sites of over-represented sequence tags. For the examples shown in Figure 2, sequence reads corresponding to different DNA strands are colored in blue and red respectively for the SHP, Adfp, Gsk3b and Abca4 gene associated binding peaks. The peaks for SHP, Adfp or Gsk3b were distributed in the promoter regions, whereas that for Abca4 was located in an intron. We also inspected LRH-1 binding peaks by using the bedGraph format that allows a display of continuous-valued ChIP-seq data in track format using the UCSC genome browser. This showed LRH-1 binding peaks and extended regions from the entire locus of the respective genes (Figure 3).
Figure 2. Representative view of a LRH-1 ChIP-seq peak. The novel LRH-1 binding sites, mapped onto University of California at Santa Cruz (UCSC) genome browser, were identified in several genes presented here. Shown are chromosomal locations according to the July 2007 Mouse Genome Assembly (mm9). Blue and red tags represent sequence reads from opposite DNA strands showing approximately equal distribution as expected. (A) Nr0b2 (SHP). (B) Adfp (adipose differentiation related protein). (C) Gsk3b (Glycogen Synthase Kinases-3b). (D) Abca4 (ABC transporter 4).
Figure 3. Representative view of putative LRH-1 peaks and the entire locus of respective genes using bedGraph format. The novel FXR binding sites are mapped onto University of California at Santa Cruz (UCSC) genome browser. Shown are chromosomal locations of each peak and its gene according to the July 2007 Mouse Genome Assembly (mm9). (A) Nr0b2 (SHP). (B) Adfp (adipose differentiation related protein). (C) Gsk3b (Glycogen Synthase Kinases-3b). (D) Abca4 (ABC transporter 4).
Mapping of LRH-1 binding peaks
When we evaluated where the LRH-1 binding peaks were located with respect to mRNA encoding genes, we were surprised to find that LRH-1 binding sites were predominantly located in the promoter regions (2 kb 5', 24.1%), and 5'UTR (22%) relative to the transcription start site (TSS) for known genes (Figure 4A). Altogether, this accounts for 46% of the total LRH-1 binding events, suggesting a strong preference for TSS proximal binding by LRH-1. In contrast, when the genomic location for randomly generated peaks of similar size was estimated, the random peaks were predominantly localized within intergenic (56%) and intron (32%) regions, with only 2% positioned within 2 kb of a TSS (Figure 4B). Thus, the 24.1% for LRH-1 binding sites to within 2 KB of a TSS is a highly non-random occurrence. Next, we examined the distance from the summit of each LRH-1 peak to the TSS of the nearest identified gene. The distribution shown in Figure 4C provides a visual demonstration that LRH-1 binding peaks were enriched close to TSS for known genes.
Figure 4. Mapping of LRH-1 binding regions. (A) Mapping of LRH-1 binding peaks on genome-wide scale relative to RefSeq mouse genes. (B) Mapping for random peaks. The 'promoter' and 'downstream' are defined as 2 KB of 5' or 3' flanking regions. Intergenic region refers to all locations other than 'promoter', '5' UTR', 'exon', 'intron', '3'UTR', or 'downstream' (C) Distance from the summit of each LRH-1 peak to the TSS of the nearest RefSeq gene. An arbitrarily located site of the same length in each peak showed a non-enriched distribution pattern as reported previously .
Motif analysis for LRH-1 binding by MEME
The motif finding program MEME  was used to search for enriched motifs in the peaks from our LRH-1 ChIP-seq data set. We found two motifs that were represented with a very high score. One corresponds to a NR half site of 5'-CCAAGGTCA-3' (MOTIF 2; sites = 296/1000; E-value = 2.5e-061) (Figure 5A). 30% (296/1000) of all input peaks contained at least one of these half-site elements. This indicates that our genome wide analysis of in vivo binding sites is consistent with previous studies on the half-site for binding of LRH-1 (5'-CAGGGTCA-3') '. Additionally, this result is consistent with the genome-wide binding analyses fore an epitope-tagged and over-expressed LRH-1 in cultured embryonic stem cells reported previously . The other top-scoring motif identified by the MEME program was the GC box corresponding to a site for Sp1 binding (E-value = 1.7e-168), (Figure 5B). Sp1 is a transcription factor that is ubiquitously expressed and contains three C2H2-type zinc fingers as DNA binding domain . The Sp1 site was enriched at both promoter proximal and distal LRH-1 sites. There were no other transcription factor motifs that were significantly enriched in our analyses
Figure 5. Motif Analysis of LRH-1 peaks by MEME program. Consensus LRH-1-binding motif Weblogo found within the top 1000 peaks identified by LRH-1 ChIP-seq using MEME program. (A) Our LRH-1 motif identified by MEME. (B) SP-1 site, identified by MEME. * indicates a nuclear receptor half-site
A position weight matrix (PWM) for the LRH-1 motif from the MEME analysis was calculated and used to scan all of the LRH-1 peaks again using a more stringent z-score cutoff of 4.29 (p < 10-6) for motif identification. Using this stringent criterion, a half-site LRH-1 motif was present in 33% (3485/10634, z-score > 4.29) of the LRH-1 peaks from the MACS analysis (Figure 6A). Among the peaks containing the LRH-1 motif, most contain one motif element but there are some peak regions that contain more than one (Figure 6B).
Figure 6. Motif analysis for LRH-1 binding peaks. (A) Summary of LRH-1 motif analysis. (B) Number of LRH-1 motif in a peak identified by SOLiD ChIP-seq (z > 4.29). (C) Distribution of the distance from the best LRH-1 motif to the summit of each peak with a LRH-1 site. An arbitrarily located site of the same length in each peak showed a non-enriched distribution pattern as reported previously .
Next, we calculated the distance from the best LRH-1 site in each LRH-1 motif-containing peak to the corresponding peak summit. Theoretically, this is the most likely location of the actual site of LRH-1-DNA interaction. By this analysis, the NR half-site elements were preferentially located at the peak-summits relative to randomly placed motifs of a similar size. This observation is consistent with the theoretical prediction that the ChIP-seq peak mapping technique with small sequence reads accurately identifies the actual site of protein-DNA recognition and provides more confidence that the motif containing the half-site is actually the site of recognition for LRH-1 (Figure 6C).
Co-occupancy by peaks for LRH-1 and FXR
To investigate whether LRH-1 binding sites were enriched close to the sites of FXR binding from our previous study, we compared the ChIP-seq dataset for LRH-1 binding sites with our previous dataset for FXR binding peaks. This analysis showed that 23.8% of all FXR binding peaks were located close to LRH-1 peaks (Figure 7A). We also visually inspected the locations of several of the LRH-1 binding sites with respect to neighboring FXR binding peaks, using peak distribution tracks in the UCSC genome browser. This comparison for LRH-1 binding sites at the Pemt and Aifm2 loci is shown in Figure 7B and clearly shows the close apposition of the binding peaks for the two different ChIP-seq data sets.
Figure 7. Analysis of co-occupancy of LRH-1 ChIP-seq peak with FXR binding sites identified by MACS. (A) Comparison of ChIP-seq analysis for LRH-1 binding in hepatic chromatin with FXR binding peaks. (B) The LRH-1 binding sites for Pemt and Aifm2, mapped onto UCSC genome browser, were inspected for co-occupancy by FXR. Blue and red tags represent sequence reads from opposite DNA strands. Left panel, Pemt (phosphatidylethanolamine N-methyltransferase); Right panel, Aifm2 (apoptosis-inducing factor 2, mitochondrion).
Genes located close to the LRH-1 binding sites in liver
There were 395 overlapping peaks between LRH-1 and FXR binding (Figure 7A) that are located within 10 KB of 367 RefSeq genes. We used the DAVID Gene Ontology (GO) PANTHER 'Biological Process' term (http://david.abcc.ncifcrf.gov/ webcite)  to provide information on the genes that were co-occupied by LRH-1 and FXR. This analysis showed that there was a strong enrichment for genes in lipid metabolic processes, steroid and cholesterol metabolism (Table 2). The most significantly enriched genes were associated with 'cellular lipid metabolic process' (FDR = 0.0002%) and many of the genes in this category are predicted to regulate cholesterol homeostasis (Sec14l2, Scarb1, Srebp2, Lcat, Fdft1, Prkag2 and Ldlrap1).
Table 2. Summary of DAVID Gene Ontology analysis of genes near LRH-1 binding regions
Correlation between LRH-1 binding and FXR dependent gene regulation
We reasoned that if the co-occurrence of FXR and LRH-1 binding sites was functionally important then the genes associated with LRH-1 sites should be statistically correlated with a functional data set for FXR dependent gene expression. Thus, we analyzed the gene list from the MACS analysis for LRH-1 binding peaks for overlap with genes that were preferentially activated by an FXR expressing adenovirus  using a gene set enrichment analysis (GSEA) function and the modified Kolmogorov-Smirnov (KS) test . This KS plot distributes results from a gene expression microarray rank ordered for fold change on the X-axis and the occurrence of a gene from the ChIP-seq data set is then scanned for going from high to low fold change. The presence or absence of a ChIP-seq identified gene is scored on the Y-axis with a running enrichment score. This analysis showed a highly significant running enrichment score because the genes identified by LRH-1 ChIP-seq that overlap with FXR binding peaks were preferentially located toward the top of the differentially expressed gene list ranked by fold change in gene expression (Figure 8, p = 1.06e-07). Thus, it is highly likely that LRH-1 is a global co-regulator for FXR dependent gene expression.
Figure 8. Peak validation using Kolmogorov-Smirnov (KS) plot. The gene list for the LRH-1 ChIP-seq peaks that overlap with FXR ChIP-seq peaks was compared for their correlation to a set of genes that were activated by infection of primary mouse hepatocytes with a recombinant adenovirus expressing the constitutive FXRα2-VP16 hybrid protein as described in the text. Genes in the expression microarray were ranked by absolute fold change (A) or fold change (B) (x-axis) and the graph plots the running enrichment score.
In a previous report, we identified a nuclear receptor half-site that was co-enriched with FXR binding IR-1 sites in liver chromatin . LRH-1 is a liver enriched monomeric nuclear receptor that binds to half-site elements, so we hypothesized that LRH-1 would be a good candidate for binding the adjacent half-site to function as a FXR co-regulatory protein in liver chromatin. In fact, we presented a limited amount of evidence for this on a handful of FXR target genes , but it was important to extend this association to a genome-wide scale. To accomplish this goal, a genome-wide SOLiD ChIP-seq analysis was performed using chromatin enriched with an LRH-1 antibody. The SOLiD ChIP-seq data for LRH-1 binding generated more than 40 millions reads of 39 bp sequence tags. The ultra-high throughput SOLiD DNA sequencing platform is able to produce more than 400 million tags of 35-50 bp per run, and the high read numbers contribute to high sensitivity and signal-to-noise ratios, and to relative comprehensiveness for the genome. 10,634 genomic LRH-1 binding sites were identified with a high degree of confidence (p-value ≤ 1 × 10-10, FDR ≤ 1%) (Table 1 and Figure 1).
When we used the motif finding program MEME  to search for enriched motifs in the peaks from our LRH-1 ChIP-seq dataset, we found a motif (5'-CCAAGGTCA-3') containing a nuclear receptor half-site (MOTIF 2) (Figure 5) and 33% of all input peaks contained at least one LRH-1 motif (Figure 6). Our genome wide analysis of in vivo binding sites is also consistent with our previous studies for the half-site preference for binding of LRH-1 on the Fasn promoter (5'-CAGGGTCA-3') '.
On a genome-wide scale, the LRH-1 binding sites were localized mainly in proximal promoters (24%) and 5'UTR (22%) regions, whereas similar to other nuclear receptors analyzed to date, FXR binding occurs primarily in distal intergenic regions (44%) and introns (32%), with only 10% localizing to proximal promoter .
The ChIP-seq analysis demonstrated that LRH-1 binding sites are located close to ~24% of the FXR-binding sites (Figure 7). This represents a highly significant degree of co-localization with a p < 10 -6 that was calculated by sampling a control set of peaks with the same size distribution. The FXR/LRH-1 co-association was highly significant for both promoter proximal and non-proximal binding sites. This provides strong support for our hypothesis that LRH-1 is a key hepatic co-regulatory transcription factor for FXR.
We also analyzed the association of genes located close to FXR and LRH-1 binding sites relative to genes activated by FXR using a gene set enrichment analysis. The LRH-1 associated genes were localized within a set of FXR activated genes that were rank-ordered for differential expression after infection of primary hepatocytes with a control or a constitutively active FXR-VP16 fusion protein (, Figure 8). The corresponding Kolmogorov-Smirnov (KS) plot showed there was a high degree of correlation of the two data sets providing additional evidence that LRH-1 regulates genes in conjunction with FXR.
Because 76% of the LRH-1 binding sites were not located close to FXR elements, these results also predict that LRH-1 regulates gene expression without FXR as well. Consistent with this hypothesis LRH-1 has been shown to play a key role in regulating gene expression along with LXR as well [17,21,22].
The gene ontology analysis in Table 2 indicated that the genes co-regulated by FXR and LRH-1 are associated with lipid metabolic processes. It is likely that other nuclear receptors, such as RORs, NR4a's, ERR's and Reverb, that also bind as monomers to an isolated NR half-site, may target FXR to genes involved in other physiological responses. In fact, the NR4a nuclear receptors are involved in physiological processes including glucose metabolism and DNA repair  and these two GO categories were ranked just behind lipid metabolism as the most significantly associated pathways for FXR binding in our previous study . When we analyzed a list of NR4a responsive genes from microarray studies summarized in a previous report , we noticed that 14/48 of these target genes were found in our FXR target gene list. This is a highly significant correlation (p = 8.8 e-8), which provides strong support for this model of FXR pathway targeting.
Another relevant monomeric nuclear receptor where data from mouse liver is available is for the Reverb-α transcriptional regulator . In fact, recent studies suggest it is a repressor of lipogenic gene expression during the light phase of the diurnal cycle . When the overlap for genome-wide binding of Reverb-α at ZT 10 (the light phase) and LRH-1 in our study was evaluated, we found that there was a highly significant overlap (18% of LRH-1 peaks at p < 10-6) which is consistent with Reverb-α inhibiting lipogenesis during the light phase of the diurnal cycle at least partly through inhibiting genes that are activated by LRH-1 .
Our studies contribute to understanding the mechanism by which FXR and LRH-1 cooperatively regulate lipid metabolic process and suggest a generalized model for how FXR may be targeted to additional metabolic processes such as glucose and bile acid metabolism through association with distinct half-site binding monomeric nuclear receptors. The details and molecular mechanism of this cooperation remain to be elucidated. However, it is possible that the ability of FXR to function along with LRH-1 and other co-factors such as chromatin remodeling complexes at the adjacent sites results in synergistic effects on transcription activation. Future studies are necessary to characterize the chromatin context in which FXR and LRH-1 binding occurs, including histone modification profiles such as methylation or acetylation, binding site accessibility, as well as recruitment of other cofactors, by using rapidly advancing genome-wide binding approaches.
Chromatin immunoprecipitation sequencing (ChIP-seq) using the SOLiD platform
Six-week-old C57BL6 male mice were fed a standard chow diet . All animals were sacrificed at the end of the dark cycle and ChIP assays from liver were performed as previously described [14,25]. The liver chromatin from all six animals were pooled for analysis. Chromatin was extracted and subjected to an immunoselection process, which required the use of antibodies against LRH-1 (PP-H2325-00; R&D Systems) or mouse IgG (Sigma) as a control. To prepare samples for the SOLiD ChIP-seq, after isolating the ChIP-enriched DNA, gene-specific enrichment for some known FXR target genes including SHP, Pemt, Pcx, and Abca4 in the LRH-1 chromatin relative to IgG control chromatin was verified. Approximately 20 ng of ChIP enriched DNA or control DNA was processed by the Sanford-Burnham Medical Research Institute Genomics Core Facility (Orlando, FL) for high throughput DNA sequencing using SOLiD system. The libraries for the samples were prepared according to the standard SOLiD System 2.0 Fragment Library Preparation protocol. Then templated bead generation for each library was performed according to SOLiD System 2.0 Users Guide standard protocols. Each sample was deposited on a quadrant of the slide at a target bead density of 60-70 k beads/panel.
Quantitative PCR, microarray analysis
Manual ChIP confirmation on the randomly selected putative FXR target genes from lipid metabolism category was achieved by quantitative PCR (qPCR) method . Final ChIPed and control DNA samples were analyzed in triplicate with L32 as internal control. For this assay, we used pre-designed and validated qPCR primer specific to the peak regions containing LRH-DNA interaction and an additional co-regulatory site, and measured genomic DNA promoter region sequence enrichment within ChIPed samples.
ChIP-seq data analysis
Preprocessing sequence data
The ultra high read tag numbers of the SOLiD system contributes to high sensitivity, relative comprehensiveness for the mouse genome, and enables very robust statistical power required to map and accurately characterized the protein-DNA interactions of an entire genome. Like other sequencing technologies, it measures fluorescence intensities from dye-labeled molecules to determine the sequence of DNA fragments. The location of the sequence reads from SOLiD System and their frequency, which measures the degree of enrichment over the control, was revealed using currently available SOLiD sequencing analytical tools including SAMtools (http://samtools.sourceforge.net/ webcite).
The SOLiD ChIP-seq dataset was analyzed to determine peaks which contain binding sites of LRH-1 to its target genes. Short reads of 39-bp were produced from Applied Biosystem's (ABI) SOLiD (Sequencing by Oligonucleotide Ligation and Detection) System, and mapped to a reference genome by Life Technologies using SOLiD™ BioScope™ Software, allowing two mismatch. Short sequence reads that mapped to simple and complex repeats or that were not unique by chance were removed from the analysis. The resulting mapped file was in SAM ("Sequence Alignment/Map") format, and we converted the SAM files to BED files using SAMTools (http://samtools.sourceforge.net/ webcite), which can provide various utilities for manipulating alignments in the SAM format, including sorting, indexing, merging and generating alignments in a per-position format. The BED files which contain chromosomal start and stop positions were used as input to downstream processing, as well as visualization in the UCSC Genome Browser (http://genome.ucsc.edu/index.html webcite).
Finding peaks using MACS
To determine where the LRH-1 bound to the genome, we looked for areas where there were significantly more enriched reads mapped in the ChIP sample than in the IgG. This was accomplished using MACS  with the parameters of mfold 32, bandwidth 300 bp, p-value 1 × 10-10, and FDR 1%.
Distance to LRH-1 sites from the summit of each peak
MACS provides a summit for every peak, which can be regarded as the center of the peak. It is where there is the maximum number of overlapping reads, and is the most likely location of the binding site. For each peak with an LRH-1 site, we determined the distance from the best LRH-1 site to this summit. If they overlapped, we score the distance as zero. To give a sense of the enrichment, we evaluated an arbitrarily located site of the same length in each peak, determined the distance to the summit, and plotted the results on the same histogram.
Distance from peak to TSSs
For each LRH-1 peak, the distance from the peak to the nearest transcription start site was determined, and plotted. The transcription start sites (TSSs) were taken from a RefSeq file obtained from NCBI. The background was determined by placing peaks at random locations on the genome and by determining distances to TSSs.
DNA sequences for LRH-1 binding regions were retrieved using Galaxy (http://main.g2.bx.psu.edu webcite) and used for motif search using MEME . MEME represents motifs as position-dependent letter-probability matrices (PWM). The PWM was used to find a score for the top-scoring LRH-1 sequence; each letter in the sequence has a likelihood given in the PWM, these were summed to find a score for the sequence, with a higher score meaning it is more likely to be the motif in question. We used the PWM to find scores for every position along an entire chromosome (excepting coding and repeat regions), and found the average score and standard deviation. Then when a new sequence was tested, we obtained its score from the PWM, subtracted the average, and divided by the standard deviation. This provided us a z-score for any sequence, which was converted into a p-value via a standard normal curve.
The position weight matrix (PWM) for the LRH-1 motif from the MEME analysis was used to scan all our LRH-1 peaks again using a more stringent z-score cutoff of 4.29 (p < 10-6).
Annotation of genes and gene ontology (GO) analysis
All LRH-1 binding sites were assigned to nearest genes based on the Mus musculus NCBI m37 genome assembly (mm9; July 2007). GO analysis of LRH-1 target genes was conducted by using the NIH Database for Annotation, Visualization, and Integrated Discovery (DAVID; http://david.abcc.ncifcrf.gov/ webcite) . This analysis was used to classify the nearest gene list into functionally related gene groups by using 'PANTHER Biological Process' term.
The obtained LRH-1 ChIP-seq data was compared with an expression microarray data set for FXR dependence  by using a Kolmogorov-Smirnov (KS) plot, a modified method of gene set enrichment analysis (GSEA) . The KS plot tests the null hypothesis that the ranks of the genes identified by ChIP-seq is uniformly distributed throughout the FXR expression microarray. A KS plot was obtained by calculating the running sum statistics for our ChIP-seq gene set to observe enrichment in the ranked gene list from expression microarray data.
List of Abbreviations used
LRH: liver receptor homologue; FXR: farnesoid × receptor: ChIP: chromatin immunoprecipitation; GO: gene ontology; KS: Kolmogorov-Smirnov; TSS: transcription start site
Competing interests statement
The authors declare that they have no competing interests.
HKC, JB and YKS performed experiments, HKC, JB, XX and TO analyzed data, HKC and TO wrote the manuscript. All authors have read and approve of the final manuscript
We thank the Sanford Burnham Analytical Genomics Core for performing the High throughput DNA sequencing and Aniello M Infante and Peter Edwards for helpful discussions. This work was supported in part by grants from the NIH to TO (HL48044; DK71021).
Pare JF, Malenfant D, Courtemanche C, Jacob-Wagner M, Roy S, Allard D, Belanger L: The fetoprotein transcription factor (FTF) gene is essential to embryogenesis and cholesterol homeostasis and is regulated by a DR4 element.
Krylova IN, Sablin EP, Moore J, Xu RX, Waitt GM, MacKay JA, Juzumiene D, Bynum JM, Madauss K, Montana V, Lebedeva L, Suzawa M, Williams JD, Williams SP, Guy RK, Thornton JW, Fletterick RJ, Willson TM, Ingraham HA: Structural analyses reveal phosphatidyl inositols as ligands for the NR5 orphan receptors SF-1 and LRH-1.
Ortlund EA, Lee Y, Solomon IH, Hager JM, Safi R, Choi Y, Guan Z, Tripathy A, Raetz CR, McDonnell DP, Moore DD, Redinbo MR: Modulation of human nuclear receptor LRH-1 activity by phospholipids and SHP.
Gu P, Goodwin B, Chung AC, Xu X, Wheeler DA, Price RR, Galardi C, Peng L, Latour AM, Koller BH, Gossen J, Kliewer SA, Cooney AJ: Orphan nuclear receptor LRH-1 is required to maintain Oct4 expression at the epiblast stage of embryonic development.
Heng JC, Feng B, Han J, Jiang J, Kraus P, Ng JH, Orlov YL, Huss M, Yang L, Lufkin T, Lim B, Ng HH: The nuclear receptor Nr5a2 can replace Oct4 in the reprogramming of murine somatic cells to pluripotent cells.
Curr Protoc Bioinformatics 2002., Chapter 2:
Unit 2 4
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.
Goodwin B, Jones SA, Price RR, Watson MA, McKee DD, Moore LB, Galardi C, Wilson JG, Lewis MC, Roth ME, Maloney PR, Willson TM, Kliewer SA: A regulatory cascade of the nuclear receptors FXR, SHP-1, and LRH-1 represses bile acid biosynthesis.
Molecular endocrinology (Baltimore, Md) 2010, 24(10):1891-1903. Publisher Full Text