<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-10-269</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Definition, conservation and epigenetics of housekeeping and tissue-enriched genes</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>She</snm>
               <fnm>Xinwei</fnm>
               <insr iid="I1"/>
               <email>xinwei_she@merck.com</email>
            </au>
            <au id="A2">
               <snm>Rohl</snm>
               <mi>A</mi>
               <fnm>Carol</fnm>
               <insr iid="I1"/>
               <email>carol_rohl@merck.com</email>
            </au>
            <au id="A3">
               <snm>Castle</snm>
               <mi>C</mi>
               <fnm>John</fnm>
               <insr iid="I1"/>
               <email>john_castle@merck.com</email>
            </au>
            <au id="A4">
               <snm>Kulkarni</snm>
               <mi>V</mi>
               <fnm>Amit</fnm>
               <insr iid="I1"/>
               <email>amit_kulkarni@merck.com</email>
            </au>
            <au id="A5">
               <snm>Johnson</snm>
               <mi>M</mi>
               <fnm>Jason</fnm>
               <insr iid="I1"/>
               <email>jason_johnson@merck.com</email>
            </au>
            <au id="A6" ca="yes">
               <snm>Chen</snm>
               <fnm>Ronghua</fnm>
               <insr iid="I1"/>
               <email>ronghua_chen@merck.com</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Rosetta Inpharmatics LLC, a wholly owned subsidiary of Merck and Co., Inc., 401 Terry Avenue North, Seattle, WA 98109, USA</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2009</pubdate>
         <volume>10</volume>
         <issue>1</issue>
         <fpage>269</fpage>
         <url>http://www.biomedcentral.com/1471-2164/10/269</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">19534766</pubid>
               <pubid idtype="doi">10.1186/1471-2164-10-269</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>08</day>
               <month>11</month>
               <year>2008</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>17</day>
               <month>6</month>
               <year>2009</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>17</day>
               <month>6</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>She et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Housekeeping genes (HKG) are constitutively expressed in all tissues while tissue-enriched genes (TEG) are expressed at a much higher level in a single tissue type than in others. HKGs serve as valuable experimental controls in gene and protein expression experiments, while TEGs tend to represent distinct physiological processes and are frequently candidates for biomarkers or drug targets. The genomic features of these two groups of genes expressed in opposing patterns may shed light on the mechanisms by which cells maintain basic and tissue-specific functions.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Here, we generate gene expression profiles of 42 normal human tissues on custom high-density microarrays to systematically identify 1,522 HKGs and 975 TEGs and compile a small subset of 20 housekeeping genes which are highly expressed in all tissues with lower variance than many commonly used HKGs. Cross-species comparison shows that both the functions and expression patterns of HKGs are conserved. TEGs are enriched with respect to both segmental duplication and copy number variation, while no such enrichment is observed for HKGs, suggesting the high expression of HKGs are not due to high copy numbers. Analysis of genomic and epigenetic features of HKGs and TEGs reveals that the high expression of HKGs across different tissues is associated with decreased nucleosome occupancy at the transcription start site as indicated by enhanced DNase hypersensitivity. Additionally, we systematically and quantitatively demonstrated that the CpG islands' enrichment in HKGs transcription start sites (TSS) and their depletion in TEGs TSS. Histone methylation patterns differ significantly between HKGs and TEGs, suggesting that methylation contributes to the differential expression patterns as well.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We have compiled a set of high quality HKGs that should provide higher and more consistent expression when used as references in laboratory experiments than currently used HKGs. The comparison of genomic features between HKGs and TEGs shows that HKGs are more conserved than TEGs in terms of functions, expression pattern and polymorphisms. In addition, our results identify chromatin structure and epigenetic features of HKGs and TEGs that are likely to play an important role in regulating their strikingly different expression patterns.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The expression of most genes varies between different cell and tissue types and between different development and physiological states. Some genes, however, are constitutively expressed in all tissues and their expression levels are comparatively constant across different cell types. These genes have been referred to as housekeeping genes (HKGs) and are hypothesized to constitute a small set of genes required to maintain minimum basic cellular function <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. In contrast to the expression pattern of HKGs, tissue enriched genes (TEG) are highly expressed in one particular tissue type and are either not expressed or are expressed at much lower levels in other tissues. TEGs are generally responsible for the specialized functions of the particular tissues or cell types in which they are expressed and can therefore serve as biomarkers of specific biological processes or tissues. Since many diseases involve tissue- or organ-specific processes, TEGs may also be good candidate drug targets. HKGs, in contrast, have been widely used as experimental controls and normalization references for gene transcription and expression experiments, including RT-PCR, qPCR, Western blotting and microarray studies. The expression of many of the genes currently used for such purposes, however, varies across different cell types and conditions, and consequently there is a need for a better set of HKGs that have stable, high expression levels across a large number of tissues.</p>
         <p>The genomic organization of HKGs is comparatively compact: intronic regions, coding regions and the intergenic spaces are shorter for HKGs than for other genes <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>, and HKGs are strongly clustered in the human genome <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, suggesting selection for economy in transcription and translation <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and genomic co-regulation of broadly expressed genes. HKGs, as a result of their critical role in basic cell maintenance, are subject to stronger purifying selection and therefore evolve more slowly than TEGs in terms of sequence mutation <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. It is less well understood to what extent the functions and expression patterns of HKGs are conserved across species, whether HKGs are conserved at the genomic structure level and how polymorphic HKGs and TEGs are among different individuals within a species. To address these questions, we sought to define a high quality set of HKGs and then analyze the conservation of HKGs in terms of functions and expression patterns. We also analyzed the distribution of genomic component, such as segmental duplication, copy number variation regions and ultra conserved elements, which are closely related to conservation.</p>
         <p>The regulatory mechanisms underlying the differential expression patterns of HKGs compared to TEGs are also poorly characterized. Chromatin structure and epigenetic modifications of genomic structure have been documented to regulate gene expression and affect replication, recombination and DNA repair <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp> through various mechanisms including nucleosome positioning and occupancy, histone modification (mainly acetylation and methylation) and DNA cytosine methylation <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. Abnormal changes in chromatin structure have been linked to disease, particularly cancer <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Investigation of the differences in chromatin structure and epigenetic modification between HKGs and TEGs, consequently, may provide insight into epigenetic contributions to transcriptional patterns and the mechanisms of gene regulation and disease.</p>
         <p>Here we use microarray gene expression profiling and analysis to compile a set of 1,522 high quality HKGs that are highly expressed in 42 normal tissues and show minimal fluctuations in expression level across these tissues. Similarly, we describe the identification of 975 TEGs. These genes from both categories are potentially useful laboratory experimental controls. The distinct expression patterns of HKGs and TEGs and the high quality of these sets also provide an opportunity to enhance our understanding of transcriptional and epigenetic regulatory mechanisms. We compare and contrast the genomic and epigenetic properties of HKGs and TEGs, and identify epigenetic factors that may contribute to the underlying mechanisms of expression regulation differences between HKGs and TEGs.</p>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <sec>
            <st>
               <p>HKGs and TEGs</p>
            </st>
            <p>We identified 1,522 HKGs from a total of 18,149 genes in 42 normal human tissues monitored on the microarray (see Methods). This list of HKGs was used for analysis of genomic and epigenetic features (see Additional file <supplr sid="S1">1</supplr>). We also identified 975 TEGs from a subset of 29 representative tissues. These TEGs are expressed at much higher level in one single tissue than any other tissues (see Methods. Additional file <supplr sid="S2">2</supplr>). TEGs were found in 26 tissues, while no TEGs meeting our criteria were identified in spleen, colon and CD4+ T-cells.</p>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p><b>Housekeeping gene list</b>. List of 1552 human housekeeping genes.</p>
               </text>
               <file name="1471-2164-10-269-S1.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p><b>Tissue enriched gene list</b>. List of 975 human tissue enriched genes.</p>
               </text>
               <file name="1471-2164-10-269-S2.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>The 20 HKGs with the highest and most consistent expression (See Methods) were selected from this list as the best candidates to serve as reference HKGs in laboratory experiments (Figure <figr fid="F1">1A</figr>). Three of the 20 highest quality HKGs, <it>GAPDH</it>, <it>ACTB </it>and <it>UBC</it>, are commonly used HKGs for experimental controls. Expression data for several other commonly used or commercially available HKGs (<abbrgrp><abbr bid="B11">11</abbr></abbrgrp> and <url>http://invitrogen.com</url>) not included in the top 20 HKG lists (Figure <figr fid="F1">1B</figr>) illustrate that genes commonly used as controls do not necessarily show high expression with low variance across diverse tissues, and in fact, the expression of some of these genes varies by more than an order of magnitude across tissues.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Expression of housekeeping genes across normal tissues</p>
               </caption>
               <text>
                  <p><b>Expression of housekeeping genes across normal tissues</b>. (A) Expression levels of the top 20 housekeeping genes, as ranked by average expression intensity, are shown. Three commonly used housekeeping genes, <it>GAPDH</it>, <it>ACTB </it>and <it>UBC</it>, are among this list. (B) Expression patterns for commonly used or commercially available housekeeping genes that do not meet the criteria defined here for housekeeping genes are shown. The length of the horizontal bars represents the expression intensity of the genes in microarray.</p>
               </text>
               <graphic file="1471-2164-10-269-1"/>
            </fig>
            <p>There have been other recent efforts to identify HKGs in human tissues to serve as experimental references or internal controls <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>, but these studies have significant shortcomings. One study surveyed a much smaller set of 7012 potential candidate genes <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, while another one used a much smaller set of 15 tissues <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Others were based on microarray data from heterogeneous sources lacking systematic experimental controls <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> or were based on <it>in silico </it>predictions <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The HKGs identified here are based on high quality microarray expression data systematically gathered from a large and diverse set of tissues. This is a systematically and experimentally defined set of HKG which have both high expression and low fluctuation across all major organ/tissues.</p>
            <p>The human and mouse transcriptome in multiple tissues have been surveyed in microarray studies <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp> that built foundations for studies on housekeeping and tissue-specific genes <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. Large collections of EST and SAGE data have also been used to identify HKG <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and tissue-specific genes <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Comparing HKGs of this study and other studies based on microarray <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> or EST datasets <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, significant portions of genes in the three HKG sets overlap, while the HKG list described here has the fewest genes unique in a single study, suggesting our fluctuation-controlled microarray approach is more conservative than the other methods that either depend on sampling or representation in an EST dataset <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> or lack control of variation across different tissues <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> (see Additional file <supplr sid="S3">3</supplr>). We also compared our TEGs in testis, prostate, liver and skin with tissue-specific genes from another study <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> (see Additional file <supplr sid="S4">4</supplr>). A significant portion of our TEGs overlap with this human tissue-specific gene study, particularly for liver, in which 70% of the defined TEGs are identical. Other tissues are more discrepant, as a result of different tissue selection for the surveys and different criteria used to identify these genes.</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p><b>Comparison of housekeeping genes identified in different studies</b>. Venn diagram of housekeeping genes identified in three different studies.</p>
               </text>
               <file name="1471-2164-10-269-S3.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S4">
               <title>
                  <p>Additional file 4</p>
               </title>
               <text>
                  <p><b>Comparison of tissue enriched/specific genes identified in different studies</b>. Venn diagram of tissue enriched genes of four tissues identified in two different studies.</p>
               </text>
               <file name="1471-2164-10-269-S4.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>We next investigated the functions of HKGs by testing for enrichment of Panther Biological Process annotations <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. As expected, HKGs are enriched for biological processes related to basic maintenance of the cell, including protein biosynthesis, pre-mRNA processing, cell cycle and intracellular protein trafficking (Figure <figr fid="F2">2</figr>). In contrast, TEGs were observed to be enriched largely for tissue-specific biological processes, as expected (see Additional file <supplr sid="S5">5</supplr>). For instance, TEGs of bone marrow, brain, kidney, liver and skeletal muscle are enriched in immunity and defense, synaptic transmission, ion transport, lipid metabolism and muscle contraction, respectively. These biological processes are presumably less likely to be essential for cell maintenance and survival.</p>
            <suppl id="S5">
               <title>
                  <p>Additional file 5</p>
               </title>
               <text>
                  <p><b>Biological processes of tissue-enriched genes</b>. A list of enriched biological processes of tissue-enriched genes.</p>
               </text>
               <file name="1471-2164-10-269-S5.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Biological processes enriched in housekeeping genes</p>
               </caption>
               <text>
                  <p><b>Biological processes enriched in housekeeping genes</b>. Enriched biological processes (E value &lt; 0.05) and their ancestors from the Panther Ontology are shown in a hierarchical structure. E values were calculated by hypergeometric distribution with a Bonferroni correction. The intensity of color is proportional to the significance of gene enrichment (-log(E value), ranging from the lightest 2.3 to the darkest 39.1).</p>
               </text>
               <graphic file="1471-2164-10-269-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Conservation of functions and expression patterns in HKGs across species</p>
            </st>
            <p>The enriched biological processes observed for HKGs suggest that a minimum set of functions are required for cells to survive, but it is not obvious to what extent these functions and genes will be conserved across other mammalian and eukaryotic species. To address this question, we used the number of orthologs of human genes in other eukaryotic species as identified by NCBI HomoloGene <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> as an indication of functional conservation across species (Table <tblr tid="T1">1</tblr>). In general, fewer orthologs can be identified as the evolutionary distance between human and the target species increases. Human HKGs, however, are significantly more likely to have orthologs in other species relative to other genes (p = 0.01), while TEGs are less likely to have orthologs (p = 0.065). This difference is particularly striking in invertebrates, where the ratios between the fraction of HKGs and TEGs with orthologs in fly, worm or yeast are 3.9 (54%:14%), 3.8 (46%:12%) and 11.5 (23%:2%), respectively. This analysis suggests that human HKG functions, mostly involving gene expression, are functionally conserved in model organisms such as worm and yeast, while many human TEG-related functions were acquired after the divergence between humans and lower organisms.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Cross-species conservation of housekeeping genes and tissue enriched genes</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>HKG orthologs</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>TEG orthologs</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Orthologs of all genes on array</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Count</p>
                     </c>
                     <c ca="center">
                        <p>Ratio</p>
                     </c>
                     <c ca="center">
                        <p>Count</p>
                     </c>
                     <c ca="center">
                        <p>Ratio</p>
                     </c>
                     <c ca="center">
                        <p>Count</p>
                     </c>
                     <c ca="center">
                        <p>Ratio</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Human</p>
                     </c>
                     <c ca="center">
                        <p>1,380</p>
                     </c>
                     <c ca="center">
                        <p>100%</p>
                     </c>
                     <c ca="center">
                        <p>843</p>
                     </c>
                     <c ca="center">
                        <p>100%</p>
                     </c>
                     <c ca="center">
                        <p>17,641</p>
                     </c>
                     <c ca="center">
                        <p>100%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mouse</p>
                     </c>
                     <c ca="center">
                        <p>1,352</p>
                     </c>
                     <c ca="center">
                        <p>98%</p>
                     </c>
                     <c ca="center">
                        <p>799</p>
                     </c>
                     <c ca="center">
                        <p>95%</p>
                     </c>
                     <c ca="center">
                        <p>16,614</p>
                     </c>
                     <c ca="center">
                        <p>94%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Rat</p>
                     </c>
                     <c ca="center">
                        <p>1,258</p>
                     </c>
                     <c ca="center">
                        <p>91%</p>
                     </c>
                     <c ca="center">
                        <p>763</p>
                     </c>
                     <c ca="center">
                        <p>91%</p>
                     </c>
                     <c ca="center">
                        <p>15,312</p>
                     </c>
                     <c ca="center">
                        <p>87%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Dog</p>
                     </c>
                     <c ca="center">
                        <p>1,287</p>
                     </c>
                     <c ca="center">
                        <p>93%</p>
                     </c>
                     <c ca="center">
                        <p>697</p>
                     </c>
                     <c ca="center">
                        <p>83%</p>
                     </c>
                     <c ca="center">
                        <p>15,127</p>
                     </c>
                     <c ca="center">
                        <p>86%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Fly</p>
                     </c>
                     <c ca="center">
                        <p>752</p>
                     </c>
                     <c ca="center">
                        <p>54%</p>
                     </c>
                     <c ca="center">
                        <p>122</p>
                     </c>
                     <c ca="center">
                        <p>14%</p>
                     </c>
                     <c ca="center">
                        <p>5,111</p>
                     </c>
                     <c ca="center">
                        <p>29%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Worm</p>
                     </c>
                     <c ca="center">
                        <p>639</p>
                     </c>
                     <c ca="center">
                        <p>46%</p>
                     </c>
                     <c ca="center">
                        <p>103</p>
                     </c>
                     <c ca="center">
                        <p>12%</p>
                     </c>
                     <c ca="center">
                        <p>3,973</p>
                     </c>
                     <c ca="center">
                        <p>23%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Yeast</p>
                     </c>
                     <c ca="center">
                        <p>324</p>
                     </c>
                     <c ca="center">
                        <p>23%</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>2%</p>
                     </c>
                     <c ca="center">
                        <p>1,591</p>
                     </c>
                     <c ca="center">
                        <p>9%</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>P value</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>0.010</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>0.065</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>N/A</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The ratio indicates the fraction of human genes with orthologs in other species. 1380, 843 and 17,641 of total HKGs, TEGs and all genes on the array, respectively, are represented in HomoloGene and these numbers were used to calculate the ratio of orthologs.</p>
               </tblfn>
            </tbl>
            <p>HKGs were identified in this study on the basis of their observed expression in human tissues. To determine if this expression pattern is conserved across species, we examined the expression of human HKGs orthologs in mouse, rat, and dog gene expression profiling data sets (unpublished data) for these species (Table <tblr tid="T2">2</tblr>). Relative to orthologs for all genes on the microarray, HKG orthologs are more likely to remain highly expressed with small variations in other mammalian species. The average intensity level and CV (coefficient of variance) of the HKG orthologs are both comparable to their counterparts in human, suggesting that the expression pattern of HKGs is conserved among mammalian species and those HKGs in human are likely to be HKGs in other mammals. This finding is also supported by a recent study in which universally expressed genes across tissues were found generally more ancient in origin compared with specifically expressed genes <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Expression of human housekeeping genes in mammals</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Average Intensity</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>CV of Intensity</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>All Genes in array</p>
                     </c>
                     <c ca="center">
                        <p>HKG Orthologs</p>
                     </c>
                     <c ca="left">
                        <p>All Genes in Array</p>
                     </c>
                     <c ca="center">
                        <p>HKG Orthologs</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Human</p>
                     </c>
                     <c ca="center">
                        <p>1.08</p>
                     </c>
                     <c ca="center">
                        <p>5.18</p>
                     </c>
                     <c ca="center">
                        <p>2.27</p>
                     </c>
                     <c ca="center">
                        <p>0.45</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mouse</p>
                     </c>
                     <c ca="center">
                        <p>1.18</p>
                     </c>
                     <c ca="center">
                        <p>4.92</p>
                     </c>
                     <c ca="center">
                        <p>3.90</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Rat</p>
                     </c>
                     <c ca="center">
                        <p>1.33</p>
                     </c>
                     <c ca="center">
                        <p>3.94</p>
                     </c>
                     <c ca="center">
                        <p>3.49</p>
                     </c>
                     <c ca="center">
                        <p>0.57</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Dog</p>
                     </c>
                     <c ca="center">
                        <p>1.62</p>
                     </c>
                     <c ca="center">
                        <p>5.48</p>
                     </c>
                     <c ca="center">
                        <p>3.16</p>
                     </c>
                     <c ca="center">
                        <p>0.48</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Distribution of segmental duplication, copy number variation sites and ultraconserved elements in HKGs and TEGs</p>
            </st>
            <p>It is estimated more than 5% of the human genome is composed of segmental duplications (SDs), duplicated genomic blocks ranging from 1 kb to over 200 kb <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. To test if the high expression of HKGs is related to higher copy numbers in the genome, we calculated the distribution of SDs <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> and copy number variations (CNVs) <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> in HKGs and TEGs (See Methods. Table <tblr tid="T3">3</tblr>). To our surprise, SDs are not enriched in HKGs, but are enriched two-fold in TEGs relative to RefSeq genes. Generally, the copy numbers of genes are positively correlated with gene expression <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. Our results clearly show that the high expression level of HKGs does not rely on the redundancy of gene copies. CNV sites are only slightly enriched in HKGs, but are strongly enriched in TEGs. The increased polymorphism of TEGs is concordant with their weaker selection constraints than HKGs, and probably resulted in by the enriched SDs which are predisposed to non-allelic homologous recombination, chromosomal rearrangement and copy number variations <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Genes with segmental duplication, copy number variation and ultraconserved elements</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Segmental Duplication</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Copy Number Variation</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Ultra Conserved Elements</p>
                     </c>
                     <c ca="center">
                        <p>Total number of genes</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Number of genes</p>
                     </c>
                     <c ca="center">
                        <p>Percent of all genes</p>
                     </c>
                     <c ca="center">
                        <p>P-value</p>
                     </c>
                     <c ca="center">
                        <p>Number of genes</p>
                     </c>
                     <c ca="center">
                        <p>Percent of all genes</p>
                     </c>
                     <c ca="center">
                        <p>P-value</p>
                     </c>
                     <c ca="center">
                        <p>Number of genes</p>
                     </c>
                     <c ca="center">
                        <p>Percent of all genes</p>
                     </c>
                     <c ca="center">
                        <p>P-value</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>HKGs</p>
                     </c>
                     <c ca="center">
                        <p>88</p>
                     </c>
                     <c ca="center">
                        <p>5.8%</p>
                     </c>
                     <c ca="center">
                        <p>>0.05</p>
                     </c>
                     <c ca="center">
                        <p>218</p>
                     </c>
                     <c ca="center">
                        <p>14.3%</p>
                     </c>
                     <c ca="center">
                        <p>0.03</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>2.3%</p>
                     </c>
                     <c ca="center">
                        <p>0.03</p>
                     </c>
                     <c ca="center">
                        <p>1,522</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TEGs</p>
                     </c>
                     <c ca="center">
                        <p>119</p>
                     </c>
                     <c ca="center">
                        <p>12.2%</p>
                     </c>
                     <c ca="center">
                        <p>2.9E-11</p>
                     </c>
                     <c ca="center">
                        <p>176</p>
                     </c>
                     <c ca="center">
                        <p>18.1%</p>
                     </c>
                     <c ca="center">
                        <p>7.6E-05</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>0.8%</p>
                     </c>
                     <c ca="center">
                        <p>0.28</p>
                     </c>
                     <c ca="center">
                        <p>975</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RefSeq genes</p>
                     </c>
                     <c ca="center">
                        <p>1195</p>
                     </c>
                     <c ca="center">
                        <p>6.6%</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>2528</p>
                     </c>
                     <c ca="center">
                        <p>13.9%</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>156</p>
                     </c>
                     <c ca="center">
                        <p>0.9%</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>18149</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>We also calculated the distribution of ultra conserved elements (UCEs) which are sequences that are absolutely conserved (100% identical) between orthologous regions of the human, rat, and mouse genomes <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Consistent with the slower evolution of HKGs, UCEs are significantly enriched in HKGs and not changed in TEG (Table <tblr tid="T3">3</tblr>). HKGs have been found to evolve more slowly than TEGs at the sequence level point mutation <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. The distribution of SD, CNV and UCE demonstrated that HKGs are also more conserved than TEGs with respect to genomic structural changes.</p>
         </sec>
         <sec>
            <st>
               <p>Enriched CpG islands at HKG transcription start sites</p>
            </st>
            <p>It has long been known that HKGs are associated with CpG islands <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. Recently it was found that the lack of CpG islands around the transcription start site is associated with a higher degree of tissue specificity <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. These studies, however, lacked a systematic representation of HKGs or TEGs, either testing only a limited number of experimentally confirmed HKGs or relying on the redundancy of ESTs as an indirect indicator of HKGs. We compared the CpG island distribution (data from <url>http://genome.ucsc.edu</url>) 500 bp around transcription start sites (TSS) and end sites in HKGs and TEGs using the systematic HKG and TEG gene lists assembled here (Table <tblr tid="T4">4</tblr>). Both the fraction of genes containing CpG islands and the density of CpG islands at TSS are correlated with the expression patterns with TEGs being depleted for CpG islands and HKGs showing enrichment of CpG islands relative to RefSeq genes in general. A recent EST based study showed HKGs primarily use CpG-dependant core promoters <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. However, our report is a systematic and quantitative demonstration of enrichment of CpG islands at HKG TSS and depletion at TEG TSS. We also observed that at transcription end sites, the occurrence of CpG islands significantly decreases for all groups of genes, and both HKGs and TEGs show a slight depletion of CpG islands (Table <tblr tid="T4">4</tblr>). Despite the fact that the sequence of HKGs is generally more conserved than other genes, another analysis showed that HKGs tend to have reduced upstream sequence conservation particularly within CpG rich genes <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Enrichment of CpG islands in the TSS of HKGs may play a role in this reduced upstream conservation.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>CpG islands at transcription start and end sites</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Transcription start sites</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Transcription end sites</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Number of genes with CpG islands (ratio)</p>
                     </c>
                     <c ca="center">
                        <p>CpG density (/bp)</p>
                     </c>
                     <c ca="center">
                        <p>P-value</p>
                     </c>
                     <c ca="center">
                        <p>Number of genes with CpG islands (ratio)</p>
                     </c>
                     <c ca="center">
                        <p>CpG density (/bp)</p>
                     </c>
                     <c ca="center">
                        <p>P-value</p>
                     </c>
                     <c ca="center">
                        <p>Total number of genes in class</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>HKGs</p>
                     </c>
                     <c ca="center">
                        <p>1,230 (80.8%)</p>
                     </c>
                     <c ca="center">
                        <p>0.563</p>
                     </c>
                     <c ca="center">
                        <p>3.5E-40</p>
                     </c>
                     <c ca="center">
                        <p>84 (5.5%)</p>
                     </c>
                     <c ca="center">
                        <p>0.02</p>
                     </c>
                     <c ca="center">
                        <p>3.0E-14</p>
                     </c>
                     <c ca="center">
                        <p>1,522</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TEGs</p>
                     </c>
                     <c ca="center">
                        <p>279 (28.6%)</p>
                     </c>
                     <c ca="center">
                        <p>0.155</p>
                     </c>
                     <c ca="center">
                        <p>3.4E-40</p>
                     </c>
                     <c ca="center">
                        <p>49 (5.0%)</p>
                     </c>
                     <c ca="center">
                        <p>0.02</p>
                     </c>
                     <c ca="center">
                        <p>6.6E-13</p>
                     </c>
                     <c ca="center">
                        <p>975</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RefSeq genes</p>
                     </c>
                     <c ca="center">
                        <p>8,881 (48.9%)</p>
                     </c>
                     <c ca="center">
                        <p>0.321</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>2086 (11.5%)</p>
                     </c>
                     <c ca="center">
                        <p>0.053</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>18,149</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Chromatin structure and epigenetic modifications in HKGs</p>
            </st>
            <p>We next examined differences in chromatin structure and epigenetic modifications, including nucleosome occupancy, histone modifications, and DNA methylation between HKG and TEGs as possible mechanisms contributing to the differential expression patterns of these two groups of genes.</p>
            <p>DNase I hypersensitive (HS) sites, formed by nucleosome-free chromatin regions, have been used to identify locations of many different types of regulatory regions including enhancers, silencers, promoters, insulators, and locus control regions <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>. Figure <figr fid="F3">3A</figr> shows the distribution of HS sites in CD4+ T cells <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> at TSS for HKGs, TEGs and RefSeq genes. The density of HS sites peaks at TSS, as expected, and rapidly drops to background levels beyond 1 kb from TSS. HKGs show an elevated density of HS sites relative to RefSeq genes, indicating that TSS of HKGs are less likely to be packaged into nucleosomes and are more exposed to transcription and regulatory factors. HS site density in TEGs is low, consistent with the low levels of expression of these genes in CD4+ T cells, the cell type from which the HS sites data were generated.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>DNase I hypersensitive (HS) site enrichment at transcription start sites of housekeeping genes</p>
               </caption>
               <text>
                  <p><b>DNase I hypersensitive (HS) site enrichment at transcription start sites of housekeeping genes</b>. (A) The average density of HS sites detected in CD4+ T cells <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> is shown for each 500 bp sliding window advancing 100 bp each time within 4 kb of the transcription start site for three gene groups, HKGs, all RefSeq genes, and TEGs. (B) RefSeq genes and HKGs are further partitioned into subgroups based on their expression level (probe intensity) in CD4+ T cell profiling microarray: RefSeq-low (intensity &lt; 1), RefSeq-high (intensity > 1), HKG-low (1&lt; intensity &lt; 3), HKG-intermediate (3 &lt; intensity &lt; 6), HKG-high (intensity > 6). The surge of density of HS site around 1500 bp in the HKG-high group is likely an artifact of the small sample size in this group.</p>
               </text>
               <graphic file="1471-2164-10-269-3"/>
            </fig>
            <p>The HKGs identified in this study are highly expressed in CD4+ T cells (and all other tissues), leading to the possibility that the differences observed in HS site density seen for HKGs and TEGs may reflect only the overall expression level of these genes in CD4+ T cells, rather than the difference in expression patterns across tissues. To address this question, we partitioned both the HKGs and RefSeq genes into subgroups based on their expression level in CD4+ T cells: low, intermediate and high (Figure <figr fid="F3">3B</figr>). While a correlation between HS site density and expression level is still observed across the subgroups for either HKGs or RefSeq genes, the HKG-low expression subgroup (average expression intensity: 2.55) has a higher HS site density than the RefSeq-high expression subgroup (average intensity: 3.48), clearly demonstrating that the HS site density is not simply a function of gene expression level in CD4+ T cells, but also correlates with the high levels of expression across different tissues. A recent study showed the positive association of CpG density with the distribution of HS sites across different tissues <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, suggesting that the increase in HS sites in HKGs may be related to high CpG density. Another possible explanation for this observation is that HKGs may contain sequence elements at their TSS that inhibit formation of nucleosomes, leading to high promoter accessibility and higher expression levels of these genes across different tissues. Further investigation of TSS sequences and more HS site mapping in other tissues would be necessary to test this hypothesis.</p>
            <p>Histone acetylation regulates gene expression by allowing transcription factors access to promoters in the chromatin by neutralizing the positive charge of histone tails and weakening their contact with negatively charged DNA <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Using histone acetylation data of chromosome 21 and 22 derived from liver Hep G2 cells<abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, we show that the acetylation ratio of genes peaks at the TSS (Figure <figr fid="F4">4</figr>). The histone acetylation density observed for HKGs is higher than RefSeq genes on average, while the density observed for TEGs, which are not expressed in liver Hep G2 cells, is almost at the background level (Figure <figr fid="F4">4A</figr>). This observation is consistent with recent genome-wide studies in which histone acetylation is positively correlated with transcription factor binding or gene expression <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>. To study the correlation of gene expression with histone acetylation, HKGs and RefSeq genes were partitioned into subgroups based on their expression level in Hep G2 cells. 74% of all RefSeq genes have low expression (intensity &lt; 1) in the Hep G2 cells and the higher levels of histone acetylation levels for HKGs relative to RefSeq genes was maintained. The difference in histone acetylation between RefSeq genes and HKGs with medium (1 &lt; intensity &lt; 3) and high (intensity > 3) expression levels, however, virtually disappears. It appears that histone acetylation density depends only on gene expression level, indicating that histone acetylation of TSS does not contribute to the expression pattern of HKGs across tissues (Figure <figr fid="F4">4B</figr>). Interestingly, while histone acetylation density is low for genes with low expression levels, it is essentially identical for all subgroups with expression intensity >1. This observation of a threshold-like relationship suggests that histone acetylation of TSS may be a necessary condition for gene expression and may serve as a "transcriptional switch" that opens the chromatin structure and allows other transcription factors to regulate gene expression. Significant differences between HKGs and TEGs in both nucleosome occupancy and histone acetylation suggest that the regulation of gene expression for these different groups is affected by multiple epigenetic factors.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Histone acetylation enrichment at transcription start sites of housekeeping genes</p>
               </caption>
               <text>
                  <p><b>Histone acetylation enrichment at transcription start sites of housekeeping genes</b>. (A) Each dot represents the percentage of sites with histone acetylation <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> in a 500 bp sliding window advancing 100 bp each time within 5 kb of the transcription start site. (B) HKGs and RefSeq genes from panel A are further partitioned into subgroups based on their expression level (probe intensity) in Hep G2 cells: RefSeq-low (intensity &lt; 1), RefSeq-intermediate (1 &lt; intensity &lt; 3), RefSeq-high (intensity >3), HKG-low (1 &lt; intensity &lt; 3), HKG-intermediate (3 &lt; intensity &lt; 6), HKG-high (intensity > 6).</p>
               </text>
               <graphic file="1471-2164-10-269-4"/>
            </fig>
            <p>We also compared the transcription factor binding and histone methylation density between HKGs and TEGs using data collected in CD4+ T cells <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> (Figure <figr fid="F5">5</figr>). As expected, binding of the transcription factor pol II and regulatory factor CTCF reaches a peak level at the exact position of TSS. CTCF is a zinc finger protein with diverse regulatory functions, including a role in mediating chromatin interactions to form the genomic three-dimensional structure<abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. Recent studies on HKG clusters around the &#945;-Globin and &#946;-Globin loci suggest chromatin loop/hub forms during the transcription of the gene clusters <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp>. Our results suggest CTCF plays a role in HKG transcription. Transcription factor binding is highest for HKGs among all gene groups, while transcription factor binding of TEGs, which are not expressed in CD4+ T cells, is near background levels (Figure <figr fid="F5">5A, B</figr>). The patterns observed for different histone methylation sites of HKGs and TEGs are complex, likely reflecting the complex relationship between histone methylation and transcription (Figure <figr fid="F5">5C&#8211;I</figr>). Differences in the shape of the distribution are observed between HKGs, TEGs and RefSeq genes for some histone methylations. When similar distribution shapes are observed, histone methylations may be either positively or negatively correlated with the expression level. These features suggest that histone methylation likely contributes to the differential gene expression patterns of these genes in a complex fashion.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Density of histone methylation and transcription factor binding sites at transcription start sites for HKGs, TEGs, and RefSeq genes</p>
               </caption>
               <text>
                  <p><b>Density of histone methylation and transcription factor binding sites at transcription start sites for HKGs, TEGs, and RefSeq genes</b>. The density of transcription factor binding or histone methylation of different sites in CD4+ T-cells <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> is shown for a 500 bp sliding window advancing 100 bp each time near transcription start sites for HKGs, RefSeq genes, TEGs and their subgroups, based on expression levels. Gene subgroups are as defined in Figure 3. The densities of transcription factor binding or histone methylation of genes are displayed as follows: (A) Pol II, (B) CTCF, (C) H3K4me2, (D) H3K4me3, (E) h2BK5me1, (F) H4K20me1, (G) H3K27me2, (H) H3K27me3, (I) H3K27me1.</p>
               </text>
               <graphic file="1471-2164-10-269-5"/>
            </fig>
            <p>DNA methylation in mammals occurs on cytosine residues of CpG dinucleotides, which may lead to formation of heterochromatin, imprinting and transcriptional repression <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. The distribution of genome-wide DNA methylation <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> in HKGs, RefSeq genes and TEGs (see Additional file <supplr sid="S6">6</supplr>) shows that DNA methylation peaks at TSS for all gene groups and that there is no significant difference of DNA methylation levels between HKGs, TEGs and RefSeq genes in either sperm cells or fibroblast cells. Additionally, comparison of the list of methylated genes from another recent study <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> with our HKGs and TEGs did not yield any significant overlapping genes (data not shown). Based on these two pieces of evidence, HKGs do not appear to be enriched for DNA methylation, despite enrichment for CpG islands. This observation is consistent with previous reports that CpG islands in normal tissues are protected from methylation and that methylation of CpG islands is one of the mechanisms of tumorigenesis <abbrgrp><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr></abbrgrp>.</p>
            <suppl id="S6">
               <title>
                  <p>Additional file 6</p>
               </title>
               <text>
                  <p><b>DNA methylation at transcription start sites</b>. Genome wide DNA methylation level around transcription start sites in sperm and fibroblast cell lines.</p>
               </text>
               <file name="1471-2164-10-269-S6.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Using high quality microarray gene expression profiling data, we identified a small subset of housekeeping genes that are highly expressed in 42 diverse normal tissues with small variation in expression level across these tissues. Cross species studies indicate that the functions and expression patterns of these HKGs are conserved between different species. These features make these genes better candidates for experimental references of transcription and expression levels than currently commonly used housekeeping genes: they can be easily detected, are stable across different tissues and are likely to be HKGs in other species. To investigate the mechanisms behind transcriptional regulation of HKGs and TEGs, we compared genomic features, chromatin structure, and epigenetic modifications between a larger set of HKGs and TEGs. We find that CpG islands are enriched near the TSSs of HKGs, in line with previous studies. HKGs have lower nucleosome occupancy, as indicated by strong enrichment of DNase I hypersensitive sites in HKGs that cannot be fully explained by the high expression level of HKGs in a single tissue type (CD4+T-cells). HKGs are enriched for DNase I hypersensitive sites relative to RefSeq genes of comparable or higher expression levels. HKGs and TEGs show significant differences in various histone methylation patterns, suggesting that histone methylation likely plays a role in the differential expression patterns but the relationships between histone methylation patterns and expression patterns is complex. DNA methylation patterns, in contrast, are similar for both HKGs and TEGs, suggesting that DNA methylation does not play a significant role in the differential expression patterns of these different types of genes. Elevated histone acetylation is not seen for HKGs after the correlation with expression is accounted for. Interestingly, however, histone acetylation appears to be elevated in all genes with moderate to high expression levels, suggesting that histone acetylation may serve as a general transcriptional switch to open chromatin and provide access to other transcription factors, which then regulate the extent of expression.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Tissues</p>
            </st>
            <p>mRNA from human tissues was purchased from commercial vendors, including Clontech, Ambion, and Biochain. Most samples were pooled from multiple donors, typically twelve.42 normal tissues were tested, in cluding adipose, adrenal gland, bladder, activated CD4-positive T-lymphocyte, activated CD8-positive T-lymphocyte, bone marrow, brain, fetal brain, cerebellum, cerebral cortex, hippocampus, thalamus, pituitary gland, cervix uteri, colon, epididymis, heart, kidney, fetal kidney, liver, fetal liver, lung, fetal lung, trachea, lymph node, mammary gland, skeletal muscle, ovary, placenta, prostate, retina, salivary gland, skin, duodenum, ileum, jejunum, spinal cord, spleen, stomach, testis, thymus, and thyroid gland. These selected tissues cover most major organs and normal tissue types. Four fetal tissues of brain, kidney, liver and lung were included.</p>
         </sec>
         <sec>
            <st>
               <p>Microarray expression profiling</p>
            </st>
            <p>Human tissue microarray expression profiling was performed as described previously <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>. In brief, purchased mRNA pooled from multiple normal individuals was amplified and labeled using a full-length amplification protocol and hybridized in duplicate against a common reference pool in a two-color dye swap experiment <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. Each gene is represented by 3 microarray probes placed at exon-exon junctions or in exons. Gene expression was calculated as the median probe intensity, after normalization by the pool of all data. The dataset is available at National Center for Biotechnology Information's Gene Expression Omnibus database [GEO accession: GSE16546].</p>
         </sec>
         <sec>
            <st>
               <p>Selection of HKGs and TEGs</p>
            </st>
            <p>We used fairly conservative criteria to identify HKGs: the intensity of the gene must be greater than the median intensity of all genes in the microarray in at least 41 out of 42 tissues and the coefficient of variance (CV, standard deviation/average) of the gene intensity across tissues must be less than 1. The intensity and CV of the 18,149 genes monitored in the microarray are distributed over a wide range, with average intensity of all genes 1.04 &#177; 1.94 (SD) and average CV of all genes 0.83 &#177; 0.77 (SD). A recent study shows that genes' breadth of expression in tissues is positively correlated with the expression level of the genes <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Therefore it is reasonable to select HKGs from among those genes with higher intensity. While the CVs of most genes (76% of all genes) are below 1, some genes' expression is very volatile across tissues, with CV as high as 6. Our criteria guarantee the HKGs are highly expressed in vast majority of tissues with limited fluctuation in intensity level across tissues.</p>
            <p>More stringent criteria were used to identify a reference HKG list for laboratory experimental controls. We required that the intensity of each HKG be greater than the median of all genes in each of the 42 tissues and CV of intensity less than 0.35. A total of 362 HKGs meet these criteria. The top 20 genes ranked by their average intensity across all 42 tissues were selected as the experimental housekeeping genes reference.</p>
            <p>To identify TEGs, we selected 29 representative tissue types, removing fetal and redundant tissues from the set of 42 tissues described above. The resulting set was as follows: adipose, adrenal gland, bladder, bone marrow, brain, cervix uteri, colon, heart, kidney, liver, lung, trachea, mammary gland, ovary, skeletal muscle, lymph node, placenta, prostate, retina, salivary gland, skin, spinal cord, spleen, stomach, testis, thymus, thyroid gland, jejunum, and CD4-positive T-lymphocyte. To be identified as a TEG, the intensity of the gene in the relevant tissue was required to meet three criteria: 1) among the top 25% percentile of all genes in that particular tissue; 2) greater than 50% of the sum of intensities for that gene in all other tissues in the set of 29; and 3) greater than three times of intensity of the gene of interest in any other tissue.</p>
         </sec>
         <sec>
            <st>
               <p>Conservation of functions</p>
            </st>
            <p>We used the number of orthologs of human genes in other eukaryotic species as identified by NCBI HomoloGene <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> as an indication of functional conservation across species. We mapped human HKGs, TEGs and all genes represented in the microarray to orthologs in mouse, rat, dog, fly (<it>D. melanogaster</it>), worm (<it>C. elegans</it>) and budding yeast (<it>S. cerevisiae</it>). The numbers of human genes that map to genes of other species through HomoloGene are counted. Student's T-tests were applied between orthologs of HKG and all genes and between orthologs of TEG and all genes.</p>
         </sec>
         <sec>
            <st>
               <p>Distribution of SD and CNV in genes</p>
            </st>
            <p>We required at least a quarter of the total genomic length of a gene to overlap the SD or CNV region (Table <tblr tid="T3">3</tblr>). The p-values, indicating the statistical significance of the overlap for HKGs and TEGs relative to all RefSeq genes, were calculated according to the hypergeometric distribution with a Bonferroni correction.</p>
         </sec>
         <sec>
            <st>
               <p>CpG islands</p>
            </st>
            <p>CpG islands coordinates were obtained from UCSC genome browser <url>http://genome.ucsc.edu</url> human CpG island track. The number and length of CpG islands located within 500 bp upstream and downstream of transcription start sites and end sites are calculated for HKGs, TEGs and RefSeq genes. CpG density is indicated by the fraction of base pairs occupied by CpG islands. The hypergeometric distribution with Bonferroni correction is applied to determine the significance of the enrichment or depletion of CpG islands relative to the density seen for RefSeq genes.</p>
         </sec>
         <sec>
            <st>
               <p>Chromatin structure and epigenetics modifications</p>
            </st>
            <p>Data of DNase I hypersensitive (HS) sites, histone acetylation, methylation, transcription binding sites and DNA methylation were obtained from recent publications <abbrgrp><abbr bid="B35">35</abbr><abbr bid="B39">39</abbr><abbr bid="B43">43</abbr><abbr bid="B48">48</abbr></abbrgrp>. The density of each feature is calculated in a 500 bp sliding window advancing 100 bp each time near transcription start sites for HKGs, RefSeq genes, TEGs. The average intensity of all genes in each group is plotted as a function of the distance to transcription start site.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>XS carried out the analysis. XS and RC designed the analysis. CR and JJ participated in discussions and provided valuable suggestions. JC and AK carried out microarray probe design and intensity analysis. JJ conceived of the study. XS, CR and RC wrote the manuscript. All authors helped to draft the manuscript and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Rosetta's Gene Expression Laboratory for microarray experiment.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Further defining housekeeping, or "maintenance," genes Focus on "A compendium of gene expression in normal human tissues"</p>
            </title>
            <aug>
               <au>
                  <snm>Butte</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Dzau</snm>
                  <fnm>VJ</fnm>
               </au>
               <au>
                  <snm>Glueck</snm>
                  <fnm>SB</fnm>
               </au>
            </aug>
            <source>Physiol Genomics</source>
            <pubdate>2001</pubdate>
            <volume>7</volume>
            <issue>2</issue>
            <fpage>95</fpage>
            <lpage>96</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11773595</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Human housekeeping genes are compact</p>
            </title>
            <aug>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Levanon</snm>
                  <fnm>EY</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>7</issue>
            <fpage>362</fpage>
            <lpage>365</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(03)00140-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">12850439</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Compactness of human housekeeping genes: selection for economy or genomic design?</p>
            </title>
            <aug>
               <au>
                  <snm>Vinogradov</snm>
                  <fnm>AE</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>5</issue>
            <fpage>248</fpage>
            <lpage>253</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2004.03.006</pubid>
                  <pubid idtype="pmpid" link="fulltext">15109779</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Clustering of housekeeping genes provides a unified model of gene order in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Lercher</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Urrutia</snm>
                  <fnm>AO</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2002</pubdate>
            <volume>31</volume>
            <issue>2</issue>
            <fpage>180</fpage>
            <lpage>183</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng887</pubid>
                  <pubid idtype="pmpid" link="fulltext">11992122</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Mammalian housekeeping genes evolve more slowly than tissue-specific genes</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <issue>2</issue>
            <fpage>236</fpage>
            <lpage>239</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh010</pubid>
                  <pubid idtype="pmpid" link="fulltext">14595094</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Chromatin structure and epigenetics</p>
            </title>
            <aug>
               <au>
                  <snm>Quina</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Buschbeck</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Di Croce</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Biochem Pharmacol</source>
            <pubdate>2006</pubdate>
            <volume>72</volume>
            <issue>11</issue>
            <fpage>1563</fpage>
            <lpage>1569</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.bcp.2006.06.016</pubid>
                  <pubid idtype="pmpid" link="fulltext">16836980</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Gene silencing in cancer in association with promoter hypermethylation</p>
            </title>
            <aug>
               <au>
                  <snm>Herman</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Baylin</snm>
                  <fnm>SB</fnm>
               </au>
            </aug>
            <source>N Engl J Med</source>
            <pubdate>2003</pubdate>
            <volume>349</volume>
            <issue>21</issue>
            <fpage>2042</fpage>
            <lpage>2054</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1056/NEJMra023075</pubid>
                  <pubid idtype="pmpid" link="fulltext">14627790</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>The role of chromatin during transcription</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Carey</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Workman</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2007</pubdate>
            <volume>128</volume>
            <issue>4</issue>
            <fpage>707</fpage>
            <lpage>719</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2007.01.015</pubid>
                  <pubid idtype="pmpid" link="fulltext">17320508</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Chromatin structure in the genomics era</p>
            </title>
            <aug>
               <au>
                  <snm>Rando</snm>
                  <fnm>OJ</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <issue>2</issue>
            <fpage>67</fpage>
            <lpage>73</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2006.12.002</pubid>
                  <pubid idtype="pmpid" link="fulltext">17188397</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Cancer Epigenetics: Modifications, Screening, and Therapy</p>
            </title>
            <aug>
               <au>
                  <snm>Gal-Yam</snm>
                  <fnm>EN</fnm>
               </au>
               <au>
                  <snm>Saito</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Egger</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>PA</fnm>
               </au>
            </aug>
            <source>Annual Review of Medicine</source>
            <pubdate>2008</pubdate>
            <volume>59</volume>
            <fpage>267</fpage>
            <lpage>280</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.med.59.061606.095816</pubid>
                  <pubid idtype="pmpid" link="fulltext">17937590</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes</p>
            </title>
            <aug>
               <au>
                  <snm>Vandesompele</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>De Preter</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Pattyn</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Poppe</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Van Roy</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>De Paepe</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Speleman</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <issue>7</issue>
            <fpage>RESEARCH0034</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">126239</pubid>
                  <pubid idtype="pmpid" link="fulltext">12184808</pubid>
                  <pubid idtype="doi">10.1186/gb-2002-3-7-research0034</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>A compendium of gene expression in normal human tissues</p>
            </title>
            <aug>
               <au>
                  <snm>Hsiao</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>Dangond</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Yoshida</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hong</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>RV</fnm>
               </au>
               <au>
                  <snm>Misra</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dillon</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Haverty</snm>
                  <fnm>P</fnm>
               </au>
               <etal/>
            </aug>
            <source>Physiological Genomics</source>
            <pubdate>2001</pubdate>
            <volume>7</volume>
            <issue>2</issue>
            <fpage>97</fpage>
            <lpage>104</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11773596</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Further understanding human disease genes by comparing with housekeeping genes and other genes</p>
            </title>
            <aug>
               <au>
                  <snm>Tu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>31</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1397819</pubid>
                  <pubid idtype="pmpid" link="fulltext">16504025</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-7-31</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Mining housekeeping genes with a Naive Bayes classifier</p>
            </title>
            <aug>
               <au>
                  <snm>De Ferrari</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Aitken</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>277</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1635426</pubid>
                  <pubid idtype="pmpid" link="fulltext">17074078</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-7-277</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Evidence based selection of housekeeping genes</p>
            </title>
            <aug>
               <au>
                  <snm>de Jonge</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>Fehrmann</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>de Bont</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Hofstra</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Gerbens</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kamps</snm>
                  <fnm>WA</fnm>
               </au>
               <au>
                  <snm>de Vries</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Zee</snm>
                  <mnm>van der</mnm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>te Meerman</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>ter Elst</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>PLoS ONE</source>
            <pubdate>2007</pubdate>
            <volume>2</volume>
            <issue>9</issue>
            <fpage>e898</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1976390</pubid>
                  <pubid idtype="pmpid" link="fulltext">17878933</pubid>
                  <pubid idtype="doi">10.1371/journal.pone.0000898</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Housekeeping and tissue-specific genes in mouse tissues</p>
            </title>
            <aug>
               <au>
                  <snm>Kouadjo</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Nishida</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Cadrin-Girard</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Yoshioka</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>St-Amand</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <fpage>127</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1888706</pubid>
                  <pubid idtype="pmpid" link="fulltext">17519037</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-8-127</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>A gene atlas of the mouse and human protein-encoding transcriptomes</p>
            </title>
            <aug>
               <au>
                  <snm>Su</snm>
                  <fnm>AI</fnm>
               </au>
               <au>
                  <snm>Wiltshire</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Batalov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lapp</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ching</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Block</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Soden</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hayakawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kreiman</snm>
                  <fnm>G</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <issue>16</issue>
            <fpage>6062</fpage>
            <lpage>6067</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">395923</pubid>
                  <pubid idtype="pmpid" link="fulltext">15075390</pubid>
                  <pubid idtype="doi">10.1073/pnas.0400782101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>The functional landscape of mouse gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>QD</fnm>
               </au>
               <au>
                  <snm>Chang</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shai</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Bakowski</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Mitsakakis</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mohammad</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Robinson</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Zirngibl</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Somogyi</snm>
                  <fnm>E</fnm>
               </au>
               <etal/>
            </aug>
            <source>J Biol</source>
            <pubdate>2004</pubdate>
            <volume>3</volume>
            <issue>5</issue>
            <fpage>21</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">549719</pubid>
                  <pubid idtype="pmpid" link="fulltext">15588312</pubid>
                  <pubid idtype="doi">10.1186/jbiol16</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Detecting and profiling tissue-selective genes</p>
            </title>
            <aug>
               <au>
                  <snm>Liang</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Be</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Howes</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Physiol Genomics</source>
            <pubdate>2006</pubdate>
            <volume>26</volume>
            <issue>2</issue>
            <fpage>158</fpage>
            <lpage>162</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1152/physiolgenomics.00313.2005</pubid>
                  <pubid idtype="pmpid" link="fulltext">16684803</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Housekeeping genes tend to show reduced upstream sequence conservation</p>
            </title>
            <aug>
               <au>
                  <snm>Farre</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bellora</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Mularoni</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Messeguer</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Alba</snm>
                  <fnm>MM</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <issue>7</issue>
            <fpage>R140</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2323216</pubid>
                  <pubid idtype="pmpid" link="fulltext">17626644</pubid>
                  <pubid idtype="doi">10.1186/gb-2007-8-7-r140</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>How many human genes can be defined as housekeeping with current expression data?</p>
            </title>
            <aug>
               <au>
                  <snm>Zhu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Song</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2008</pubdate>
            <volume>9</volume>
            <fpage>172</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2396180</pubid>
                  <pubid idtype="pmpid" link="fulltext">18416810</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-9-172</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>PANTHER: a library of protein families and subfamilies indexed by function</p>
            </title>
            <aug>
               <au>
                  <snm>Thomas</snm>
                  <fnm>PD</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Kejariwal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Karlak</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Daverman</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Diemer</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Muruganujan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Narechania</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>9</issue>
            <fpage>2129</fpage>
            <lpage>2141</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403709</pubid>
                  <pubid idtype="pmpid" link="fulltext">12952881</pubid>
                  <pubid idtype="doi">10.1101/gr.772403</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Database resources of the National Center for Biotechnology Information</p>
            </title>
            <aug>
               <au>
                  <snm>Wheeler</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Barrett</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Benson</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Bryant</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Canese</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Chetvernin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>DiCuccio</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Edgar</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Federhen</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Research</source>
            <pubdate>2007</pubdate>
            <issue>35 Database</issue>
            <fpage>D5</fpage>
            <lpage>12</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1781113</pubid>
                  <pubid idtype="pmpid" link="fulltext">17170002</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl1031</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>On the nature of human housekeeping genes</p>
            </title>
            <aug>
               <au>
                  <snm>Zhu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Hu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2008</pubdate>
            <volume>24</volume>
            <issue>10</issue>
            <fpage>481</fpage>
            <lpage>484</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2008.08.004</pubid>
                  <pubid idtype="pmpid" link="fulltext">18786740</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Segmental duplications: organization and impact within the current human genome project assembly</p>
            </title>
            <aug>
               <au>
                  <snm>Bailey</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Yavor</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Massa</snm>
                  <fnm>HF</fnm>
               </au>
               <au>
                  <snm>Trask</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Eichler</snm>
                  <fnm>EE</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <issue>6</issue>
            <fpage>1005</fpage>
            <lpage>1017</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">311093</pubid>
                  <pubid idtype="pmpid" link="fulltext">11381028</pubid>
                  <pubid idtype="doi">10.1101/gr.GR-1871R</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>A preliminary comparative analysis of primate segmental duplications shows elevated substitution rates and a great-ape expansion of intrachromosomal duplications</p>
            </title>
            <aug>
               <au>
                  <snm>She</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Ventura</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Misceo</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Roberto</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Cardone</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Rocchi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Archidiacano</snm>
                  <fnm>N</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Research</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <issue>5</issue>
            <fpage>576</fpage>
            <lpage>583</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1457043</pubid>
                  <pubid idtype="pmpid" link="fulltext">16606706</pubid>
                  <pubid idtype="doi">10.1101/gr.4949406</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Global variation in copy number in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Redon</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ishikawa</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Fitch</snm>
                  <fnm>KR</fnm>
               </au>
               <au>
                  <snm>Feuk</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Perry</snm>
                  <fnm>GH</fnm>
               </au>
               <au>
                  <snm>Andrews</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Fiegler</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Shapero</snm>
                  <fnm>MH</fnm>
               </au>
               <au>
                  <snm>Carson</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>W</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>444</volume>
            <issue>7118</issue>
            <fpage>444</fpage>
            <lpage>454</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2669898</pubid>
                  <pubid idtype="pmpid" link="fulltext">17122850</pubid>
                  <pubid idtype="doi">10.1038/nature05329</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Genomic copy number and expression variation within the C57BL/6J inbred mouse strain</p>
            </title>
            <aug>
               <au>
                  <snm>Watkins-Chow</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Pavan</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2008</pubdate>
            <volume>18</volume>
            <issue>1</issue>
            <fpage>60</fpage>
            <lpage>66</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2134784</pubid>
                  <pubid idtype="pmpid" link="fulltext">18032724</pubid>
                  <pubid idtype="doi">10.1101/gr.6927808</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Genomic copy number and expression patterns in testicular germ cell tumours</p>
            </title>
            <aug>
               <au>
                  <snm>McIntyre</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Summersgill</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>YJ</fnm>
               </au>
               <au>
                  <snm>Missiaglia</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Kitazawa</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Oosterhuis</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Looijenga</snm>
                  <fnm>LH</fnm>
               </au>
               <au>
                  <snm>Shipley</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Br J Cancer</source>
            <pubdate>2007</pubdate>
            <volume>97</volume>
            <issue>12</issue>
            <fpage>1707</fpage>
            <lpage>1712</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.bjc.6604079</pubid>
                  <pubid idtype="pmpid" link="fulltext">18059402</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Genome-wide DNA copy number analysis in pancreatic cancer using high-density single nucleotide polymorphism arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Harada</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chelala</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bhakta</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Chaplin</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Caulee</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Baril</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Young</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>Lemoine</snm>
                  <fnm>NR</fnm>
               </au>
            </aug>
            <source>Oncogene</source>
            <pubdate>2008</pubdate>
            <volume>27</volume>
            <issue>13</issue>
            <fpage>1951</fpage>
            <lpage>1960</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2492386</pubid>
                  <pubid idtype="pmpid" link="fulltext">17952125</pubid>
                  <pubid idtype="doi">10.1038/sj.onc.1210832</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Primate segmental duplications: crucibles of evolution, diversity and disease</p>
            </title>
            <aug>
               <au>
                  <snm>Bailey</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Eichler</snm>
                  <fnm>EE</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <issue>7</issue>
            <fpage>552</fpage>
            <lpage>564</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1895</pubid>
                  <pubid idtype="pmpid" link="fulltext">16770338</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Ultraconserved elements in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Bejerano</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pheasant</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Makunin</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Stephen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Mattick</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>304</volume>
            <issue>5675</issue>
            <fpage>1321</fpage>
            <lpage>1325</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1098119</pubid>
                  <pubid idtype="pmpid" link="fulltext">15131266</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>CpG islands in vertebrate genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Gardiner-Garden</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Frommer</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1987</pubdate>
            <volume>196</volume>
            <issue>2</issue>
            <fpage>261</fpage>
            <lpage>282</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(87)90689-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">3656447</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Genome-wide analysis reveals strong correlation between CpG islands with nearby transcription start sites of genes and their tissue specificity</p>
            </title>
            <aug>
               <au>
                  <snm>Yamashita</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Sugano</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nakai</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2005</pubdate>
            <volume>350</volume>
            <issue>2</issue>
            <fpage>129</fpage>
            <lpage>136</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.gene.2005.01.012</pubid>
                  <pubid idtype="pmpid" link="fulltext">15784181</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS)</p>
            </title>
            <aug>
               <au>
                  <snm>Crawford</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Holt</snm>
                  <fnm>IE</fnm>
               </au>
               <au>
                  <snm>Whittle</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Webb</snm>
                  <fnm>BD</fnm>
               </au>
               <au>
                  <snm>Tai</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Margulies</snm>
                  <fnm>EH</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Bernat</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Ginsburg</snm>
                  <fnm>D</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Research</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <issue>1</issue>
            <fpage>123</fpage>
            <lpage>131</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1356136</pubid>
                  <pubid idtype="pmpid" link="fulltext">16344561</pubid>
                  <pubid idtype="doi">10.1101/gr.4074106</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>The formation and function of DNase I hypersensitive sites in the process of gene activation</p>
            </title>
            <aug>
               <au>
                  <snm>Elgin</snm>
                  <fnm>SC</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1988</pubdate>
            <volume>263</volume>
            <issue>36</issue>
            <fpage>19259</fpage>
            <lpage>19262</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">3198625</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Identification and characterization of cell type-specific and ubiquitous chromatin regulatory structures in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Xi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Shulha</snm>
                  <fnm>HP</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Vales</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Bodine</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>McKay</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Chenoweth</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Tesar</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Furey</snm>
                  <fnm>TS</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Genetics</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <issue>8</issue>
            <fpage>e136</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1950163</pubid>
                  <pubid idtype="pmpid" link="fulltext">17708682</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.0030136</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Histone acetylation and deacetylation in yeast</p>
            </title>
            <aug>
               <au>
                  <snm>Kurdistani</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Grunstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nat Rev Mol Cell Biol</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <issue>4</issue>
            <fpage>276</fpage>
            <lpage>284</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrm1075</pubid>
                  <pubid idtype="pmpid" link="fulltext">12671650</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Genomic maps and comparative analysis of histone modifications in human and mouse</p>
            </title>
            <aug>
               <au>
                  <snm>Bernstein</snm>
                  <fnm>BE</fnm>
               </au>
               <au>
                  <snm>Kamal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Bekiranov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bailey</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Huebert</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>McMahon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Karlsson</snm>
                  <fnm>EK</fnm>
               </au>
               <au>
                  <snm>Kulbokas</snm>
                  <fnm>EJ</fnm>
                  <suf>3rd</suf>
               </au>
               <au>
                  <snm>Gingeras</snm>
                  <fnm>TR</fnm>
               </au>
               <etal/>
            </aug>
            <source>Cell</source>
            <pubdate>2005</pubdate>
            <volume>120</volume>
            <issue>2</issue>
            <fpage>169</fpage>
            <lpage>181</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2005.01.001</pubid>
                  <pubid idtype="pmpid" link="fulltext">15680324</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Genomic analyses of transcription factor binding, histone acetylation, and gene expression reveal mechanistically distinct classes of estrogen-regulated promoters</p>
            </title>
            <aug>
               <au>
                  <snm>Kininis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>BS</fnm>
               </au>
               <au>
                  <snm>Diehl</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Isaacs</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Siepel</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Clark</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Kraus</snm>
                  <fnm>WL</fnm>
               </au>
            </aug>
            <source>Molecular and Cellular Biology</source>
            <pubdate>2007</pubdate>
            <volume>27</volume>
            <issue>14</issue>
            <fpage>5090</fpage>
            <lpage>5104</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">17515612</pubid>
                  <pubid idtype="doi">10.1128/MCB.00083-07</pubid>
                  <pubid idtype="pmcid">1951957</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Whole-genome maps of USF1 and USF2 binding and histone H3 acetylation reveal new aspects of promoter structure and candidate genes for common human disorders</p>
            </title>
            <aug>
               <au>
                  <snm>Rada-Iglesias</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ameur</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kapranov</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Enroth</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Komorowski</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gingeras</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Wadelius</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Genome Research</source>
            <pubdate>2008</pubdate>
            <volume>18</volume>
            <issue>3</issue>
            <fpage>380</fpage>
            <lpage>392</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2259102</pubid>
                  <pubid idtype="pmpid" link="fulltext">18230803</pubid>
                  <pubid idtype="doi">10.1101/gr.6880908</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Histone acetylation and transcriptional regulation in the genome of Saccharomyces cerevisiae</p>
            </title>
            <aug>
               <au>
                  <snm>Guo</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Tatsuoka</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics (Oxford, England)</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>4</issue>
            <fpage>392</fpage>
            <lpage>399</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti823</pubid>
                  <pubid idtype="pmpid" link="fulltext">16339282</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>High-resolution profiling of histone methylations in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Barski</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cuddapah</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Cui</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Roh</snm>
                  <fnm>TY</fnm>
               </au>
               <au>
                  <snm>Schones</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chepelev</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Zhao</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2007</pubdate>
            <volume>129</volume>
            <issue>4</issue>
            <fpage>823</fpage>
            <lpage>837</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2007.05.009</pubid>
                  <pubid idtype="pmpid" link="fulltext">17512414</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>The role of CTCF in regulating nuclear organization</p>
            </title>
            <aug>
               <au>
                  <snm>Williams</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Flavell</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>The Journal of Experimental Medicine</source>
            <pubdate>2008</pubdate>
            <volume>205</volume>
            <issue>4</issue>
            <fpage>747</fpage>
            <lpage>750</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2292214</pubid>
                  <pubid idtype="pmpid" link="fulltext">18347103</pubid>
                  <pubid idtype="doi">10.1084/jem.20080066</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Active chromatin hub of the mouse alpha-globin locus forms in a transcription factory of clustered housekeeping genes</p>
            </title>
            <aug>
               <au>
                  <snm>Zhou</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Xin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Song</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Di</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>XS</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Liang</snm>
                  <fnm>CC</fnm>
               </au>
            </aug>
            <source>Molecular and Cellular Biology</source>
            <pubdate>2006</pubdate>
            <volume>26</volume>
            <issue>13</issue>
            <fpage>5096</fpage>
            <lpage>5105</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">16782894</pubid>
                  <pubid idtype="doi">10.1128/MCB.02454-05</pubid>
                  <pubid idtype="pmcid">1489176</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Transcription and Chromatin Organization of a Housekeeping Gene Cluster Containing an Integrated beta-Globin Locus Control Region</p>
            </title>
            <aug>
               <au>
                  <snm>Noordermeer</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Branco</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Splinter</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Klous</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>van Ijcken</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Swagemakers</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Koutsourakis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Spek</snm>
                  <mnm>van der</mnm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Pombo</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>de Laat</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>PLoS Genetics</source>
            <pubdate>2008</pubdate>
            <volume>4</volume>
            <issue>3</issue>
            <fpage>e1000016</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2265466</pubid>
                  <pubid idtype="pmpid" link="fulltext">18369441</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.1000016</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Epigenetics: a landscape takes shape</p>
            </title>
            <aug>
               <au>
                  <snm>Goldberg</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Allis</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Bernstein</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2007</pubdate>
            <volume>128</volume>
            <issue>4</issue>
            <fpage>635</fpage>
            <lpage>638</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2007.02.006</pubid>
                  <pubid idtype="pmpid" link="fulltext">17320500</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Weber</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hellmann</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Ramos</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Paabo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rebhan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schubeler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2007</pubdate>
            <volume>39</volume>
            <issue>4</issue>
            <fpage>457</fpage>
            <lpage>466</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1990</pubid>
                  <pubid idtype="pmpid" link="fulltext">17334365</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Genome-wide profiling of DNA methylation reveals a class of normally methylated CpG island promoters</p>
            </title>
            <aug>
               <au>
                  <snm>Shen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kondo</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Guo</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Ahmed</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Waterland</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Issa</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>PLoS Genetics</source>
            <pubdate>2007</pubdate>
            <volume>3</volume>
            <issue>10</issue>
            <fpage>2023</fpage>
            <lpage>2036</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2041996</pubid>
                  <pubid idtype="pmpid" link="fulltext">17967063</pubid>
                  <pubid idtype="doi">10.1371/journal.pgen.0030181</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>DNA methylation and chromatin structure: the puzzling CpG islands</p>
            </title>
            <aug>
               <au>
                  <snm>Caiafa</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Zampieri</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Cell Biochem</source>
            <pubdate>2005</pubdate>
            <volume>94</volume>
            <issue>2</issue>
            <fpage>257</fpage>
            <lpage>265</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/jcb.20325</pubid>
                  <pubid idtype="pmpid" link="fulltext">15546139</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>The fundamental role of epigenetic events in cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Jones</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Baylin</snm>
                  <fnm>SB</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <issue>6</issue>
            <fpage>415</fpage>
            <lpage>428</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12042769</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>The epigenomics of cancer</p>
            </title>
            <aug>
               <au>
                  <snm>Jones</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Baylin</snm>
                  <fnm>SB</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2007</pubdate>
            <volume>128</volume>
            <issue>4</issue>
            <fpage>683</fpage>
            <lpage>692</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2007.01.029</pubid>
                  <pubid idtype="pmpid" link="fulltext">17320506</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Johnson</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Castle</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Garrett-Engele</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kan</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Loerch</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Armour</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Santos</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schadt</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Stoughton</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shoemaker</snm>
                  <fnm>DD</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>302</volume>
            <issue>5653</issue>
            <fpage>2141</fpage>
            <lpage>2144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1090100</pubid>
                  <pubid idtype="pmpid" link="fulltext">14684825</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Optimization of oligonucleotide arrays and RNA amplification protocols for analysis of transcript structure and alternative splicing</p>
            </title>
            <aug>
               <au>
                  <snm>Castle</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Garrett-Engele</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Armour</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Duenwald</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Loerch</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Schadt</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Stoughton</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Parrish</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Shoemaker</snm>
                  <fnm>DD</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <issue>10</issue>
            <fpage>R66</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">328455</pubid>
                  <pubid idtype="pmpid" link="fulltext">14519201</pubid>
                  <pubid idtype="doi">10.1186/gb-2003-4-10-r66</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
