Discovering chromatin motifs using FAIRE sequencing and the human diploid genome

Yang, Chia-Chun; Buck, Michael J; Chen, Min-Hsuan; Chen, Yun-Fan; Lan, Hsin-Chi; Chen, Jeremy JW; Cheng, Chao; Liu, Chun-Chi

doi:10.1186/1471-2164-14-310

Methodology article
Open access
Published: 08 May 2013

Discovering chromatin motifs using FAIRE sequencing and the human diploid genome

Chia-Chun Yang^1,2,
Michael J Buck³,
Min-Hsuan Chen²,
Yun-Fan Chen²,
Hsin-Chi Lan¹,
Jeremy JW Chen^1,4,5,
Chao Cheng^6,7 &
…
Chun-Chi Liu^2,4,5

BMC Genomics volume 14, Article number: 310 (2013) Cite this article

4700 Accesses
5 Citations
3 Altmetric
Metrics details

Abstract

Background

Specific chromatin structures are associated with active or inactive gene transcription. The gene regulatory elements are intrinsically dynamic and alternate between inactive and active states through the recruitment of DNA binding proteins, such as chromatin-remodeling proteins.

Results

We developed a unique genome-wide method to discover DNA motifs associated with chromatin accessibility using formaldehyde-assisted isolation of regulatory elements with high-throughput sequencing (FAIRE-seq). We aligned the FAIRE-seq reads to the GM12878 diploid genome and subsequently identified differential chromatin-state regions (DCSRs) using heterozygous SNPs. The DCSR pairs represent the locations of imbalances of chromatin accessibility between alleles and are ideal to reveal chromatin motifs that may directly modulate chromatin accessibility. In this study, we used DNA 6-10mer sequences to interrogate all DCSRs, and subsequently discovered conserved chromatin motifs with significant changes in the occurrence frequency. To investigate their likely roles in biology, we studied the annotated protein associated with each of the top ten chromatin motifs genome-wide, in the intergenic regions and in genes, respectively. As a result, we found that most of these annotated motifs are associated with chromatin remodeling, reflecting their significance in biology.

Conclusions

Our method is the first one using fully phased diploid genome and FAIRE-seq to discover motifs associated with chromatin accessibility. Our results were collected to construct the first chromatin motif database (CMD), providing the potential DNA motifs recognized by chromatin-remodeling proteins and is freely available at http://syslab.nchu.edu.tw/chromatin.

Background

Chromatin is comprised of repeating nucleosome units consisting of ~146 base pairs of DNA coiled around an octamer of four core histone proteins (H2A, H2B, H3 and H4) [1]. The chromatin surrounding the actively transcribed genes is relaxed, and importantly, a nucleosome-depleted region (NDR) is observed immediately upstream the transcriptional start site. The presence of a NDR is characteristic of both CpG-rich [2] and CpG-poor [3] promoters where transcription factors (TFs) can approach to facilitate transcription.

Gene regulatory elements are intrinsically dynamic and alternate between inactive and active states through the recruitment of DNA binding proteins, such as chromatin remodelers, that regulate nucleosome stability [4]. The formation of open chromatin, or nucleosome disassembly, and its association with transcriptional activity are an evolutionarily conserved characteristic [5]. To date, FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) [6], or FAIRE-seq (concerting with massive parallel sequencing), is extensively used to identify cell-specific chromatin states, and to investigate the relationship between chromatin structures and diseases [7–11]. For example, Waki et al. performed computational motif analysis of the adipocyte-specific FAIRE peaks (open chromatin sites) and discovered an enrichment of a binding motif for nuclear family I (NFI) transcription factors [12]. In addition, Song et al. analyzed FAIRE-seq and DNase-seq data in seven cell lines and identified cell-specific regulatory elements [13]. Those studies take advantage of such technology to further reveal the nature of gene regulation. Nevertheless, the effects of allele-specific variations were not considered, and we believe that they may play important roles in chromatin structures.

Recently, Rozowsky et al. integrated RNA-seq, ChIP-seq and the diploid genome sequence to identify allele-specific TF binding sites [14]. Meanwhile, McDaniell et al. integrated DNase-seq, CTCF ChIP-seq and parent–child trios to identify heritable allele-specific chromatin signatures [15]. However, de novo DNA motifs associated with allele-specific chromatin accessibility have not been reported yet. Therefore, we developed the first method for discovering de novo DNA motifs associated with chromatin accessibility using FAIRE-seq and the diploid genome sequence. We mapped the FAIRE-seq reads to the diploid genome and found differential chromatin-state regions (DCSRs) using heterozygous SNPs. The DCSR pairs represent the locations of imbalances of chromatin accessibility between alleles and are ideal to identify motifs that may directly modulate chromatin accessibility [11].

Results and discussion

Identifying DCSRs

In this study, we developed a unique genome-wide method to discover DNA motifs associated with chromatin accessibility. We used a publicly available FAIRE-seq dataset with GM12878 cells from UCSC genome browser [6, 16] and obtained the corresponding diploid genome sequences from AlleleSeq [14]. The diploid genome allows us to identify binding motifs that differ between alleles and that correspond to differences in chromatin accessibility. The Bowtie tool [17] was used to align FAIRE-seq reads to the genome without any mismatch. Using FAIRE-seq reads and heterozygous SNPs, we can distinguish the reads from paternal or maternal alleles (Figure 1A). Therefore, the chromatin state (accessible or inaccessible) can be determined based on the read depth on heterozygous SNPs (Figure 1B). In other words, the genomic regions with high FAIRE-seq read depth indicate accessible chromatin.

Examining all heterozygous SNPs (~2.3 M heterozygous SNPs) across the genome, we selected the locations with a significant difference in chromatin accessibility between the two alleles. The DCSRs were selected with the following conditions: a SNP was taken into consideration when its read depth had a fold change of >2, and its greater read depth was at least 10 by FAIRE-seq reads. Next, a DCSR was defined to be ±20 bases surrounding such SNP, i.e. 41 bases in total. We used 41 base pairs from both paternal and maternal alleles to build a DCSR pair (Figure 1B).

As a result, we identified a total of 7,829 DCSR pairs in the GM12878 genome, among which 103 are in the promoter regions (TSS −2000 to 0), 2,262 in genes, and 5,464 in the intergenic regions (Table 1). For each pair of DCSRs, we identified the DCSR possessing the higher read depth as accessible chromatin, whereas that possessing the lower read depth as inaccessible chromatin. Figure 1A shows the FAIRE-seq read depth in the diploid genome, and Figure 1B shows that we identified DCSRs and established accessible and inaccessible chromatin groups.

Table 1 Chromatin motifs in genome-wide, intergenic, genic, and promoter regions (P value < 0.01)

Full size table

It is noteworthy that a fully phased diploid genome is important to this study because of the adjacent SNP/indel effect as follows: First, if the distance between the adjacent variations is shorter than the read length, it will affect DCSR detection by changing FAIRE read alignment. Second, if the distance between adjacent variations is < 20 bp, it will affect chromatin motif detection by changing motif frequencies between paternal and maternal alleles.

Discovering conserved motifs with significant changes in the occurrence frequency among DCSRs

Most of the DCSRs differ in one base from their pairs, implying the SNP directly affects chromatin accessibility (Figure 1B). When a heterozygous SNP occurs in a binding site of a TF, either a histone modification enzyme (HME) itself or a HME-recruiting protein, it may determine the state of chromatin. Thus, SNPs among the DCSRs are essential since they carry the information related to the chromatin accessibility. We thus defined the binding motifs associated with chromatin accessibility as chromatin motifs.

Using DNA 6-10mer sequences to interrogate all DCSRs, we subsequently discovered conserved chromatin motifs with significant changes in occurrence frequency between accessible and inaccessible DCSRs (Figure 1C). Additional file 1 shows the chromatin motifs with occurrence rates in inaccessible and accessible chromatin groups in genome-wide regions (P values < 0.01 by paired t-test). There are 1,453 motifs, totally, and 561 (38.6%) of them have higher occurrences in accessible regions than in inaccessible regions. To further eliminate the motifs with low occurrences that might be false positives, we selected motifs that have P values < 0.01 and occurrence rates > 1%. It resulted in 245 genome-wide, 166 intergenic, and 156 genic chromatin motifs (Table 1 and Additional file 2). Since promoter regions only have 103 DCSRs, the number of DCSRs is too small to discover significant motifs. Thus, we did not find any chromatin motif with a P value < 0.01 in promoter regions.

Annotating chromatin motifs using TF databases

Grewal and Jia suggested that TFs can recognize specific DNA sequences to nucleate heterochromatin structures [1]. In the same analogy, we proposed that a chromatin motif that can be recognized by a TF may modulate chromatin accessibility (Figure 2). To discover such TFs and to annotate our chromatin motifs, we used two TF databases, the transcription factor encyclopedia (TFe) [18] and the human protein-DNA interactome (hPDI) database constructed using protein microarray assays [19, 20], (Figure 1D). As a result, over 60% of motifs can be annotated, and they are listed in Table 1.

To investigate the chromatin motifs and demonstrate the biological significance of the chromatin motifs, we studied the protein annotations of the top ten chromatin motifs in genome-wide, intergenic and genic regions, respectively (Table 2). Within the top ten chromatin motifs, seven genome-wide motifs, six intergenic motifs, and two genic motifs have TF annotations. Surprisingly, most of these annotated motifs have biological reports associated with chromatin remodeling as follows:

Table 2 Top ten chromatin motifs and TF annotation

Full size table

MAX: Myc and Mad compete with each other to form a heterodimer with Max [21], and the resulting Myc/Max or Mad/Max protein complex binds to the CACGTG motif through its basic helix-loop-helix leucine zipper domains [22]. The Myc/Max heterodimer is a co-activator that recruits multiple histone acetyl transferase to maintain euchromatin status, whereas the Mad/Max a co-repressor that recruits histone deacetylase (HDAC) to repress transcription. Moreover, Lee et al. identified the genomic binding locations of MYC across 11 different human cell lines using ChIP-seq, and the MYC motif in GM12878 is CCACGTG [23]. This finding is consistent with our top motif CACGTG. RXR/RARA: RXR/RARA recruits histone deacetylase and represses transcription [24]. In addition, RXR/RARA functions as a local chromatin modulator [25]. SOX9: Sox9 interacts with chromatin and activates transcription through the regulation of chromatin modification [26]. MEF2A: The interactions between MyoD homodimers and MEF2 proteins may direct HMEs to the chromatin [27]. HCFC2: HCFC1 and HCFC2 are the core components of the MLL1 complex, which is a histone methyltransferase acting as a positive global regulator during gene transcription [28]. USF1/USF2: USF1/USF2 heterodimer recruits HMEs and maintains the chromatin barrier [29].

Validation

To systematically validate our method, we defined non-differential chromatin-state regions (NDCSRs) by the following conditions: the heterozygous SNPs have a read-depth fold change < 1.5 and the greater allele read depth at least 5 by FAIRE-seq reads. We identified a total of 5,047 pairs of NDCSRs in the GM12878 genome, and then we performed the same framework to obtain significant motifs on NDCSR. Additional file 3 shows the top 100 DCSR and NDCSR motifs with TF annotation. To investigate whether the TF annotation has enrichment with chromatin remodeling, we selected three chromatin-specific GO terms as follows: GO:0016585 chromatin remodeling, GO:0016570 histone modification, and GO:0031490 chromatin DNA binding. In Additional file 3, 29% of DCSR motifs have chromatin-specific annotation and only 8% of NDCSR motifs have chromatin-specific annotation (chi-squared test, P value = 0.0001), suggesting the significant enrichment with chromatin-specific annotation.

To investigate the association of MYC allele-specific binding and differential chromatin-state regions, we applied our method to MYC ChIP-Seq data. The GM12878 MYC ChIP-Seq and control data were downloaded from GEO GSE32883 [23]. The Bowtie tool [17] was used to map ChIP-seq reads to the GM12878 diploid genome to select perfect match and unique mapping tags, which later were fed toChIP-seq processing pipeline [30] to discover narrow peaks with FDR < 0.01. To investigate whether MYC allele-specific binding associates to DCSRs, we calculated the number of DCSRs/NDCSRs with allele-specific or non-allele-specific MYC peaks, and then performed Pearson's Chi-squared test (P value = 0.008). Out result suggests that the differential chromatin-state regions and allele-specific MYC bindings have a significant association (Table 3).

Table 3 MYC allele-specific binding associates and DCSR/NDCSR

Full size table

The most significant motif, CACGTG

In Table 2, CACGTG is the most significant motif in both genome-wide (P value = 7.6E-8) and intergenic regions (P value = 1.5E-7). Interestingly, Myc/Max, Mad/Max and USF1/USF2 bind to this motif, and these proteins play the following important roles in chromatin remodeling: the USF1/USF2 heterodimer recruits HMEs to the insulator sites to maintain the chromatin barrier [29]. In mouse studies, Myc and Mad compete with each other to be heterodimerized with Max [21], and the resulting Myc/Max heterodimer or Mad/Max heterodimer binds to the CACGTG motif through its basic helix-loop-helix leucine zipper domains [22]. Myc/Max heterodimer is a co-activator to recruit multiple histone acetyltransferase to maintain euchromatin status, whereas Mad/Max is a co-repressor that recruits histone deacetylase (HDAC) to repress transcription.

We also found that the significant chromatin motifs are not motifs with high occurrence rate. For example, CACGTG has a low occurrence rate of 1.15% in the inaccessible chromatin groups and a low occurrence rate of 0.69% in the accessible chromatin groups (Additional file 2).

Conclusions

Recently, an increasing number of studies have reported that chromatin accessibility is associated with diseases such as Huntington's disease [31], muscular dystrophy [32], breast cancer [33] and pancreatic cancer [34], reflecting its importance in biology. Our method is the first one to use a diploid genome and FAIRE-seq to discover motifs associated with chromatin accessibility, which leads to the first chromatin motif database (CMD). The CMD provides the potential DNA motifs recognized by chromatin-remodeling proteins.

Methods

FAIRE-seq dataset and the diploid genome sequence

FAIRE-seq reads in GM12878 cell line (GSE32883 [23]) were downloaded from the UCSC genome browser [6, 16] while diploid genome sequences of GM12878 cell from AlleleSeq [14]. The Bowtie tool [17] was used to map FAIRE-seq reads to the diploid genome without any mismatch. Accessible and inaccessible chromatin regions were identified based on the read depth on the heterozygous SNPs.

Discovering conserved chromatin motifs

To discover chromatin motifs, we used all combinations of DNA sequences from 6mers to 10mers (~1 M motif candidates) to scan 41-base DCSRs. Since the intergenic and genic regions may have different TFs associated with chromatin remodeling, we performed chromatin motif discovery on the following four type regions: genome-wide, intergenic, genic, and promoter regions (Table 1).

Given a motif candidate, we calculated the occurrence frequency using a sliding window on all pairs of DCSRs, and then performed a paired t-test between frequency vectors of the inaccessible chromatin group and the accessible chromatin group (Figure 1C). A simple example is illustrated in Figure 3. We assume that the motif has three bases (6 ~ 10 bases in real application); there are three pairs of DCSRs (7829 pairs in real application); and a DCSR has seven bases (41 bases in real application). The frequency vectors of the inaccessible and accessible chromatin groups are [2, 2, 0] and [1, 0, 0], respectively. Furthermore, the paired t-test would be performed between these two frequency vectors to select significant motifs.

To select the conserved chromatin motifs associated with differential chromatin states, we applied two conditions as follows: (1) either accessible or inaccessible occurrence rate among DCSRs > 1%; and (2) P values < 0.01 by paired t-test, suggesting a significant frequency change between differential chromatin states.

RNA-seq data analysis

We downloaded RNA-seq dataset SRP007417 [14] for GM12878 cells from the NCBI Sequence Read Archive [35]. We utilized Bowtie and TopHat [17, 36] to map short reads to human genome, and then Cufflinks [37] to estimate the isoform expression level with UCSC KnownGene annotation [38].

Protein annotation for chromatin motifs

To discover the potential TFs for chromatin modeling, we used two TF databases, the transcription factor encyclopedia (TFe) [18] and the human protein-DNA interactome (hPDI) database [19, 20]. We downloaded the TF binding site (TFBS) sequences from the TFe database. Since long TFBS sequences might contain several binding sites, we eliminated any of the TFBS sequences with length >30 bases. We aligned sequences of chromatin motifs to TFBS sequences using BLAST alignment with an E-value threshold of 10. With the TFe database, we annotated a chromatin motif using the best E-value of alignment and homology > 80% between the chromatin motif and TFBS sequences. The best alignment is between a chromatin motif and a short TFBS sequence, which indicates that the TF may bind to the chromatin motif. In addition, we used consensus logos from hDPI to provide TF annotation with consensus motifs.

The hDPI database used a protein microarray-based strategy to build human protein-DNA interactome [20]. Xie et al. selected 460 binding motifs and subsequently constructed double-strand DNA probes with lengths of 6 to 34 bases, in which a binding motif may associate with several TFs. The hDPI database has experimental protein-DNA interaction data for humans identified by the protein microarray assays [20]. The hDPI database provided 460 distinct binding motifs and 201 consensus logos for TFs [20]. With the consensus logos, we may have more precise TF annotation but many chromatin-remodeling TFs are not included in consensus logos such as MAX and HDAC1. Thus, we used both binding motifs and 201 consensus logos to annotate chromatin motifs. With the binding motifs, we annotated a chromatin motif using the best E-value of alignment and homology > 80% between the chromatin motif and binding motif sequences. To annotate chromatin motifs with the hDPI consensus logos, we submitted the chromatin motifs to the hPDI web server (http://bioinfo.wilmer.jhu.edu/PDI/) to obtain match scores, and then annotated the chromatin motifs using a match score > 5 between chromatin motifs and consensus logos (Figure 1D). To provide the confidence level of annotation of hDPI consensus logos, the match score of each annotation is shown in Additional file 2. In addition, to provide the accurate annotation, we filtered out the TFs with fragments per kilobase of transcript per million mapped reads (FPKM) < 1 using the RNA-seq data (Figure 1E).

Abbreviations

HME:: Histone-modification enzyme
FAIRE:: Formaldehyde-assisted isolation of regulatory elements
DCSR:: Differential chromatin-state region
NDCSR:: Non-differential chromatin-state region
CMD:: Chromatin motif database
SNP:: Single nucleotide polymorphism
TF:: Transcription factor
TFe:: Transcription factor encyclopedia
hPDI:: human Protein-DNA Interactome
FPKM:: Fragments Per Kilobase of transcript per Million mapped reads.

References

Grewal SI, Jia S: Heterochromatin revisited. Nat Rev Genet. 2007, 8 (1): 35-46. 10.1038/nrg2008.
Article CAS PubMed Google Scholar
Taberlay PC, Kelly TK, Liu CC, You JS, De Carvalho DD, Miranda TB, Zhou XJ, Liang G, Jones PA: Polycomb-repressed genes have permissive enhancers that initiate reprogramming. Cell. 2011, 147 (6): 1283-1294. 10.1016/j.cell.2011.10.040.
Article PubMed Central CAS PubMed Google Scholar
Han H, Cortez CC, Yang X, Nichols PW, Jones PA, Liang G: DNA methylation directly silences genes with non-CpG island promoters and establishes a nucleosome occupied promoter. Hum Mol Genet. 2011, 20 (22): 4299-4310. 10.1093/hmg/ddr356.
Article PubMed Central CAS PubMed Google Scholar
Henikoff S: Nucleosome destabilization in the epigenetic regulation of gene expression. Nat Rev Genet. 2008, 9 (1): 15-26.
Article CAS PubMed Google Scholar
Wallrath LL, Lu Q, Granok H, Elgin SC: Architectural variations of inducible eukaryotic promoters: preset and remodeling chromatin structures. Bioessays. 1994, 16 (3): 165-170. 10.1002/bies.950160306.
Article CAS PubMed Google Scholar
Giresi PG, Kim J, McDaniell RM, Iyer VR, Lieb JD: FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res. 2007, 17 (6): 877-885. 10.1101/gr.5533506.
Article PubMed Central CAS PubMed Google Scholar
Waki H, Nakamura M, Yamauchi T, Wakabayashi K, Yu J, Hirose-Yotsuya L, Take K, Sun W, Iwabu M, Okada-Iwabu M: Global mapping of cell type-specific open chromatin by FAIRE-seq reveals the regulatory role of the NFI family in adipocyte differentiation. PLoS Genet. 2011, 7 (10): e1002311-10.1371/journal.pgen.1002311.
Article PubMed Central CAS PubMed Google Scholar
Nammo T, Rodriguez-Segui SA, Ferrer J: Mapping open chromatin with formaldehyde-assisted isolation of regulatory elements. Methods Mol Biol. 2011, 791: 287-296. 10.1007/978-1-61779-316-5_21.
Article CAS PubMed Google Scholar
Song LY, Zhang ZC, Grasfeder LL, Boyle AP, Giresi PG, Lee BK, Sheffield NC, Graf S, Huss M, Keefe D: Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Genome Research. 2011, 21 (10): 1757-1767. 10.1101/gr.121541.111.
Article PubMed Central CAS PubMed Google Scholar
Anguita E, Villegas A, Iborra F, Hernandez A: GFI1B controls its own expression binding to multiple sites. Haematol-Hematol J. 2010, 95 (1): 36-46. 10.3324/haematol.2009.012351.
Article CAS Google Scholar
Gaulton KJ, Nammo T, Pasquali L, Simon JM, Giresi PG, Fogarty MP, Panhuis TM, Mieczkowski P, Secchi A, Bosco D: A map of open chromatin in human pancreatic islets. Nat Genet. 2010, 42 (3): 255-259. 10.1038/ng.530.
Article PubMed Central CAS PubMed Google Scholar
Waki H, Nakamura M, Yamauchi T, Wakabayashi K, Yu J, Hirose-Yotsuya L, Take K, Sun W, Iwabu M, Okada-Iwabu M: Global Mapping of Cell Type-Specific Open Chromatin by FAIRE-seq Reveals the Regulatory Role of the NFI Family in Adipocyte Differentiation. Plos Genetics. 2011, 7:
Google Scholar
Song L, Zhang Z, Grasfeder LL, Boyle AP, Giresi PG, Lee BK, Sheffield NC, Graf S, Huss M, Keefe D: Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Genome Res. 2011, 21 (10): 1757-1767. 10.1101/gr.121541.111.
Article PubMed Central CAS PubMed Google Scholar
Rozowsky J, Abyzov A, Wang J, Alves P, Raha D, Harmanci A, Leng J, Bjornson R, Kong Y, Kitabayashi N: AlleleSeq: analysis of allele-specific expression and binding in a network framework. Molecular Systems Biology. 2011, 7:
Google Scholar
McDaniell R, Lee BK, Song L, Liu Z, Boyle AP, Erdos MR, Scott LJ, Morken MA, Kucera KS, Battenhouse A: Heritable individual-specific and allele-specific chromatin signatures in humans. Science. 2010, 328 (5975): 235-239. 10.1126/science.1184655.
Article PubMed Central CAS PubMed Google Scholar
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res. 2002, 12 (6): 996-1006.
Article PubMed Central CAS PubMed Google Scholar
Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10 (3): R25-10.1186/gb-2009-10-3-r25.
Article PubMed Central PubMed Google Scholar
Yusuf D, Butland SL, Swanson MI, Bolotin E, Ticoll A, Cheung WA, Zhang XY, Dickman CT, Fulton DL, Lim JS: The Transcription Factor Encyclopedia. Genome Biol. 2012, 13 (3): R24-10.1186/gb-2012-13-3-r24.
Article PubMed Central PubMed Google Scholar
Hu SH, Xie Z, Onishi A, Yu XP, Jiang LZ, Lin J, Rho HS, Woodard C, Wang H, Jeong JS: Profiling the Human Protein-DNA Interactome Reveals ERK2 as a Transcriptional Repressor of Interferon Signaling. Cell. 2009, 139 (3): 610-622. 10.1016/j.cell.2009.08.037.
Article PubMed Central CAS PubMed Google Scholar
Xie Z, Hu SH, Blackshaw S, Zhu H, Qian J: hPDI: a database of experimental human protein-DNA interactions. Bioinformatics. 2010, 26 (2): 287-289. 10.1093/bioinformatics/btp631.
Article PubMed Central CAS PubMed Google Scholar
Luscher B: MAD1 and its life as a MYC antagonist: An update. Eur J Cell Biol. 2011
Google Scholar
Nair SK, Burley SK: X-ray structures of Myc-Max and Mad-Max recognizing DNA. Molecular bases of regulation by proto-oncogenic transcription factors. Cell. 2003, 112 (2): 193-205.
CAS PubMed Google Scholar
Lee BK, Bhinge AA, Battenhouse A, McDaniell RM, Liu Z, Song L, Ni Y, Birney E, Lieb JD, Furey TS: Cell-type specific and combinatorial usage of diverse transcription factors revealed by genome-wide binding studies in multiple human cells. Genome Res. 2012, 22 (1): 9-24. 10.1101/gr.127597.111.
Article PubMed Central CAS PubMed Google Scholar
Dilworth FJ, Fromental-Ramain C, Yamamoto K, Chambon P: ATP-driven chromatin remodeling activity and histone acetyltransferases act sequentially during transactivation by RAR/RXR In vitro. Mol Cell. 2000, 6 (5): 1049-1058. 10.1016/S1097-2765(00)00103-9.
Article CAS PubMed Google Scholar
Martens JH, Brinkman AB, Simmer F, Francoijs KJ, Nebbioso A, Ferrara F, Altucci L, Stunnenberg HG: PML-RARalpha/RXR Alters the Epigenetic Landscape in Acute Promyelocytic Leukemia. Cancer Cell. 2010, 17 (2): 173-185. 10.1016/j.ccr.2009.12.042.
Article CAS PubMed Google Scholar
Furumatsu T, Tsuda M, Yoshida K, Taniguchi N, Ito T, Hashimoto M, Asahara H: Sox9 and p300 cooperatively regulate chromatin-mediated transcription. J Biol Chem. 2005, 280 (42): 35203-35208. 10.1074/jbc.M502409200.
Article CAS PubMed Google Scholar
Guasconi V, Puri PL: Chromatin: the interface between extrinsic cues and the epigenetic regulation of muscle regeneration. Trends Cell Biol. 2009, 19 (6): 286-294. 10.1016/j.tcb.2009.03.002.
Article PubMed Central CAS PubMed Google Scholar
Guenther MG, Jenner RG, Chevalier B, Nakamura T, Croce CM, Canaani E, Young RA: Global and Hox-specific roles for the MLL1 methyltransferase. Proc Natl Acad Sci U S A. 2005, 102 (24): 8603-8608. 10.1073/pnas.0503072102.
Article PubMed Central CAS PubMed Google Scholar
Huang S, Li X, Yusufzai TM, Qiu Y, Felsenfeld G: USF1 recruits histone modification complexes and is critical for maintenance of a chromatin barrier. Mol Cell Biol. 2007, 27 (22): 7991-8002. 10.1128/MCB.01326-07.
Article PubMed Central CAS PubMed Google Scholar
Kharchenko PV, Tolstorukov MY, Park PJ: Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol. 2008, 26 (12): 1351-1359. 10.1038/nbt.1508.
Article PubMed Central CAS PubMed Google Scholar
Lee J, Hong YK, Jeon GS, Hwang YJ, Kim KY, Seong KH, Jung MK, Picketts DJ, Kowall NW, Cho KS: ATRX induction by mutant huntingtin via Cdx2 modulates heterochromatin condensation and pathology in Huntington's disease. Cell Death Differ. 2012
Google Scholar
Hahn M, Dambacher S, Schotta G: Heterochromatin dysregulation in human diseases. J Appl Physiol. 2010, 109 (1): 232-242. 10.1152/japplphysiol.00053.2010.
Article CAS PubMed Google Scholar
Zhu Q, Pao GM, Huynh AM, Suh H, Tonnu N, Nederlof PM, Gage FH, Verma IM: BRCA1 tumour suppression occurs via heterochromatin-mediated silencing. Nature. 2011, 477 (7363): 179-184. 10.1038/nature10371.
Article PubMed Central CAS PubMed Google Scholar
Baumgart S, Glesel E, Singh G, Chen NM, Reutlinger K, Zhang J, Billadeau DD, Fernandez-Zapico ME, Gress TM, Singh SK: Restricted heterochromatin formation links NFATc2 repressor activity with growth promotion in pancreatic cancer. Gastroenterology. 2012, 388–398 (2): 381-387.
Google Scholar
Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2008, 36: 13-21.
Article Google Scholar
Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25 (9): 1105-1111. 10.1093/bioinformatics/btp120.
Article PubMed Central CAS PubMed Google Scholar
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010, 28 (5): 511-515. 10.1038/nbt.1621.
Article PubMed Central CAS PubMed Google Scholar
Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D: The UCSC Known Genes. Bioinformatics. 2006, 22 (9): 1036-1046. 10.1093/bioinformatics/btl048.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the National Science Council grants NSC99-2320-B-005-008-MY3 and NSC101-2627-B-005-002 as well as the Ministry of Education, Taiwan, R.O.C. under the ATU plan.

Author information

Authors and Affiliations

Institute of Molecular Biology, National Chung Hsing University, Taiwan, ROC
Chia-Chun Yang, Hsin-Chi Lan & Jeremy JW Chen
Institute of Genomics and Bioinformatics, National Chung Hsing University, Taiwan, ROC
Chia-Chun Yang, Min-Hsuan Chen, Yun-Fan Chen & Chun-Chi Liu
Department of Biochemistry and the Center of Excellence in Bioinformatics and Life Sciences, State University of New York, Buffalo, NY, USA
Michael J Buck
Institute of Biomedical Sciences, National Chung Hsing University, Taiwan, ROC
Jeremy JW Chen & Chun-Chi Liu
Agricultural Biotechnology Center, National Chung Hsing University, Taiwan, ROC
Jeremy JW Chen & Chun-Chi Liu
Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
Chao Cheng
Institute for Quantitative Biomedical Sciences, Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
Chao Cheng

Authors

Chia-Chun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Michael J Buck
View author publications
You can also search for this author in PubMed Google Scholar
Min-Hsuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yun-Fan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hsin-Chi Lan
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy JW Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chao Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Chi Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chun-Chi Liu.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

MJB and CCL designed the algorithm. CCY, MJB, MHC and CCL prepared the manuscript. CCY, HCL, CC and JJWC contributed to the literature study. YFC and CCL constructed the web server. All authors read and approved the final manuscript.

Chia-Chun Yang, Michael J Buck contributed equally to this work.

Electronic supplementary material

12864_2012_5003_MOESM1_ESM.xlsx

Additional file 1: Chromatin motif lists with occurrence rates in inaccessible and accessible chromatin groups in genome-wide regions.(XLSX 85 KB)

12864_2012_5003_MOESM2_ESM.xlsx

Additional file 2: Chromatin motif lists with TF annotations in genome-wide, intergenic and genic regions. We selected motifs from either accessible or inaccessible occurrence rate among DCSRs > 1%. (XLSX 52 KB)

Additional file 3: Top 100 DCSR and NDCSR motifs with TF annotation.(XLSX 22 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Yang, CC., Buck, M.J., Chen, MH. et al. Discovering chromatin motifs using FAIRE sequencing and the human diploid genome. BMC Genomics 14, 310 (2013). https://doi.org/10.1186/1471-2164-14-310

Download citation

Received: 24 May 2012
Accepted: 30 April 2013
Published: 08 May 2013
DOI: https://doi.org/10.1186/1471-2164-14-310

Discovering chromatin motifs using FAIRE sequencing and the human diploid genome

Abstract

Background

Results

Conclusions

Background

Results and discussion

Identifying DCSRs

Discovering conserved motifs with significant changes in the occurrence frequency among DCSRs

Annotating chromatin motifs using TF databases

Validation

The most significant motif, CACGTG

Conclusions

Methods

FAIRE-seq dataset and the diploid genome sequence

Discovering conserved chromatin motifs

RNA-seq data analysis

Protein annotation for chromatin motifs

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Electronic supplementary material

12864_2012_5003_MOESM1_ESM.xlsx

12864_2012_5003_MOESM2_ESM.xlsx

Additional file 3: Top 100 DCSR and NDCSR motifs with TF annotation.(XLSX 22 KB)

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us