<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2164-12-359</ui><ji>1471-2164</ji><fm>
<dochead>Research article</dochead>
<bibl>
<title>
<p>Gene discovery by genome-wide CDS re-prediction and microarray-based transcriptional analysis in phytopathogen <it>Xanthomonas campestris</it>
</p>
</title>
<aug>
<au ce="yes" id="A1"><snm>Zhou</snm><fnm>Lian</fnm><insr iid="I1"/><email>lianzhou@sjtu.edu.cn</email></au>
<au ce="yes" id="A2"><snm>Vorh&#246;lter</snm><fnm>Frank-J&#246;rg</fnm><insr iid="I2"/><email>frank@CeBiTec.Uni-Bielefeld.DE</email></au>
<au id="A3"><snm>He</snm><fnm>Yong-Qiang</fnm><insr iid="I3"/><email>yqhe@gxu.edu.cn</email></au>
<au id="A4"><snm>Jiang</snm><fnm>Bo-Le</fnm><insr iid="I3"/><email>jbl1974@gxu.edu.cn</email></au>
<au id="A5"><snm>Tang</snm><fnm>Ji-Liang</fnm><insr iid="I3"/><email>jltang@gxu.edu.cn</email></au>
<au id="A6"><snm>Xu</snm><fnm>Yuquan</fnm><insr iid="I1"/><email>xuyq@sjtu.edu.cn</email></au>
<au ca="yes" id="A7"><snm>P&#252;hler</snm><fnm>Alfred</fnm><insr iid="I2"/><email>Puehler@CeBiTec.Uni-Bielefeld.DE</email></au>
<au ca="yes" id="A8"><snm>He</snm><fnm>Ya-Wen</fnm><insr iid="I1"/><email>yawenhe@sjtu.edu.cn</email></au>
</aug>
<insg>
<ins id="I1"><p>National Center for Molecular Characterization of GMOs and State Key Laboratory of Microbial Metabolism, School of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China</p></ins>
<ins id="I2"><p>Universit&#228;t Bielefeld, CeBiTec, Universit&#228;tsstr.25, D-33615 Bielefeld, Germany</p></ins>
<ins id="I3"><p>State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Nanning 530004, China</p></ins>
</insg>
<source>BMC Genomics</source>
<issn>1471-2164</issn>
<pubdate>2011</pubdate>
<volume>12</volume>
<issue>1</issue>
<fpage>359</fpage>
<url>http://www.biomedcentral.com/1471-2164/12/359</url>
<xrefbib><pubidlist><pubid idtype="pmpid">21745409</pubid><pubid idtype="doi">10.1186/1471-2164-12-359</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>28</day><month>1</month><year>2011</year></date></rec><acc><date><day>12</day><month>7</month><year>2011</year></date></acc><pub><date><day>12</day><month>7</month><year>2011</year></date></pub></history>
<cpyrt><year>2011</year><collab>Zhou et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<kwdg>
<kwd>
<it>Xanthomonas campestris</it>
</kwd>
<kwd>CDS re-prediction</kwd>
<kwd>microarray analysis</kwd>
<kwd>new CDS</kwd>
</kwdg>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>One of the major tasks of the post-genomic era is "reading" genomic sequences in order to extract all the biological information contained in them. Although a wide variety of techniques is used to solve the gene finding problem and a number of prokaryotic gene-finding software are available, gene recognition in bacteria is far from being always straightforward.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>This study reported a thorough search for new CDS in the two published Xcc genomes. In the first, putative CDSs encoded in the two genomes were re-predicted using three gene finders, resulting in the identification of 2850 putative new CDSs. In the second, similarity searching was conducted and 278 CDSs were found to have homologs in other bacterial species. In the third, oligonucleotide microarray and RT-PCR analysis identified 147 CDSs with detectable mRNA transcripts. Finally, in-frame deletion and subsequent phenotype analysis of confirmed that Xcc_CDS002 encoding a novel SIR2-like domain protein is involved in virulence and Xcc_CDS1553 encoding a ArsR family transcription factor is involved in arsenate resistance.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>Despite sophisticated approaches available for genome annotation, many cellular transcripts have remained unidentified so far in <it>Xcc </it>genomes. Through a combined strategy involving bioinformatic, postgenomic and genetic approaches, a reliable list of 306 new CDSs was identified and a more thorough understanding of some cellular processes was gained.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>Over the past two decades, we have witnessed the publication of more than 1,000 complete microbial genome sequences (<url>http://www.ncbi.nlm.nih.gov/genomes/</url>). The trend towards genome sequencing is expected to continue or even accelerate in the near future. The wealth of sequence information has greatly enhanced our understanding of bacterial physiology and biological processes underlying the very organization of life. One of the major tasks of the post-genomic era is "reading" genomic sequences in order to extract all the biological information contained in them. An essential step in this quest is the identification of protein-coding genes, with subsequent functional annotation of the corresponding gene products <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>. A number of gene-finding methods have been developed to address this problem from different points of view. Generally, these gene-finding methods are divided into two broad categories <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>. "Extrinsic" methods take into account information derived from similarity search procedures <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>. "Intrinsic" methods, which deal with DNA sequence only, use statistic or pattern recognition algorithms to find genes in DNA through detection of specific motifs or global statistical patterns. For example, GeneMark employs a hidden Markov model (HMM) to find genes <abbrgrp>
<abbr bid="B4">4</abbr>
<abbr bid="B5">5</abbr>
<abbr bid="B6">6</abbr>
</abbrgrp> while GLIMMER employs an interpolated Markov model <abbrgrp>
<abbr bid="B7">7</abbr>
<abbr bid="B8">8</abbr>
<abbr bid="B9">9</abbr>
</abbrgrp>. Although a wide variety of techniques is used to solve the gene finding problem and a number of prokaryotic gene-finding software are available, gene recognition in bacteria is far from being always straightforward and there are still a lot of wrong or inaccurately annotated genes and missing genes in the published genomes <abbrgrp>
<abbr bid="B1">1</abbr>
<abbr bid="B10">10</abbr>
<abbr bid="B11">11</abbr>
<abbr bid="B12">12</abbr>
<abbr bid="B13">13</abbr>
<abbr bid="B14">14</abbr>
</abbrgrp>. A major reason for this situation may be that genes can be tightly packed in prokaryotes, resulting in frequent overlap. Thus, detection of translation initiation sites and/or selection of the correct coding regions remain difficult <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>. In addition, it is now well known that all microbial genomes contain an abundance of short genes <abbrgrp>
<abbr bid="B11">11</abbr>
<abbr bid="B15">15</abbr>
</abbrgrp>. For statistical reasons, the longer the sequences, the easier it is to detect the codon bias. The short length of these genes probably affects both pillars of CDS prediction, namely intrinsic and extrinsic approaches <abbrgrp>
<abbr bid="B11">11</abbr>
<abbr bid="B16">16</abbr>
</abbrgrp>.</p>
<p>The <it>Xanthomonas </it>genus is one of the most ubiquitous groups of plant-associated bacterial pathogens. Members of this genus have been shown to infect at least 124 monocotyledonous and 268 dicotyledonous plant species <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>. <it>Xanthomonas campestris </it>pv. <it>campestris </it>(Pammel) Dowson (<it>Xcc </it>hereafter) is the causal agent of black rot of crucifers, which is possibly the most important disease of crucifers worldwide <abbrgrp>
<abbr bid="B18">18</abbr>
</abbrgrp>. So far, genomes of the three <it>Xcc </it>strains ATCC 33913, 8004, and B100 have been sequenced <abbrgrp>
<abbr bid="B14">14</abbr>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
</abbrgrp>. The genome of <it>Xcc </it>strain ATCC33913 comprises a circular chromosome of 5,076,187 bp encoding a total of 4181 predicted CDSs <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp>. The genome of <it>Xcc </it>strain 8004 resides on a single circular chromosome of 5,148,708 bp, which encodes 4273 predicted CDSs <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>. Although the majority of the genes encoded by the two genomes were identical, a total of 108 and 62 CDSs unique to <it>Xcc </it>8004 and <it>Xcc </it>ATCC33913 were respectively identified <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>. In particular, analysis of the genome of <it>Xcc </it>strain 8004 identified a total of 87 CDSs that have homologs in <it>Xcc </it>ATCC33913, but were not annotated by da Silva et al. <abbrgrp>
<abbr bid="B19">19</abbr>
</abbrgrp>. Similarly, annotation of the recent sequenced genome of <it>Xcc </it>B100 identified more than 200 additional CDSs that were not annotated in the other two <it>Xcc </it>strains <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>. Although these newly identified CDSs need to be further verified, the findings suggest that there is still room for improvement in the state of gene identification of <it>Xcc </it>genomes.</p>
<p>In this study, putative protein coding sequences in the two genomes of the <it>Xcc </it>strains 8004 and ATCC33913 were re-predicted using the latest version of three gene-prediction programs. A total of additional 2850 putative new CDSs were identified. Based on the results of similarity searching, transcriptional pattern analysis and functional analysis, a reliable list of 306 new CDSs was obtained from this data set. The function of two newly identified genes was further confirmed by gene deletion and subsequent phenotype analysis.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<sec>
<st>
<p>CDS re-prediction and identification of putative new CDSs</p>
</st>
<p>In this study, by using a combined strategy (Figure <figr fid="F1">1</figr>) that the three well-established gene finders GLIMMER (<url>http://cbcb.umd.edu/software/glimmer</url>) <abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp>, GeneMark <abbrgrp>
<abbr bid="B21">21</abbr>
</abbrgrp>, and ZCURVE <abbrgrp>
<abbr bid="B22">22</abbr>
</abbrgrp> were respectively applied to predict putative protein coding sequences (CDSs) within the two genomes of <it>Xcc </it>strains 8004 and ATCC33913 <abbrgrp>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
</abbrgrp>, a total of 7164 CDSs were identified after further sequence analaysis (Figure <figr fid="F2">2</figr>). Among them, 4,314 CDSs, including 146 <it>Xcc </it>strain 8004-specific CDSs, 60 ATCC33913-specific CDSs and 4108 shared CDSs between the two genomes, have been previously annotated (Figure <figr fid="F2">2A</figr>). The remaining 2850 predicted CDSs have not been identified in the published two genomes and were defined as putative new CDSs (Figure <figr fid="F2">2A</figr>), including 1181 CDSs by GLIMMER, 957 CDSs by GeneMark, and 612 CDSs by ZCURVE (Figure <figr fid="F2">2B</figr>). Intriguingly, there were only 126 overlapping CDSs predicted by all the three gene finders (Figure <figr fid="F2">2B</figr>). The size of these putative CDSs ranged from 90 to 4545 bps, and most of them (2202 of 2850 CDSs) were less than 1 kb long (Figure <figr fid="F2">2A</figr>). In particular, 797 CDSs were less than 180 bp in length. BLASTN analysis revealed that 2410 of the 2850 putative new CDSs were located at intergenic regions of both strands (Figure <figr fid="F2">2A</figr>, indicated by "I" and "III") in the chromosome of <it>Xcc </it>strain 8004 (Figure <figr fid="F2">2A</figr>). The remaining 440 CDSs were partially or fully overlapped with the annotated genes, but within different reading frames (Figure <figr fid="F2">2A</figr>, indicated by "II" and "IV"). All of 648 putative new CDSs &gt;1000 bp in length were either antisense or overlapping to the annotated genes in two <it>Xcc </it>genomes (Additional file <supplr sid="S1">1</supplr>).</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>Overall strategy for the identification of new CDS in <it>Xcc</it></p></caption><text>
   <p><b>Overall strategy for the identification of new CDS in <it>Xcc</it></b>.</p>
</text><graphic file="1471-2164-12-359-1" hint_layout="single"/></fig>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Total CDSs identified in the genomes of <it>Xcc </it>strains 8004 and ATCC33913</p></caption><text>
   <p><b>Total CDSs identified in the genomes of <it>Xcc </it>strains 8004 and ATCC33913</b>. (A) The location of all putative new CDSs relative to existing neighbor CDSs on the chromosome of <it>Xcc </it>strain 8004 has been classified into four groups, indicated by I, II, III, and IV respectively on the left. Group I indicates intergenic regions on the coding strand; II indicates intragenic regions on the coding strand, CDSs are partially or fully overlapped with annotated CDSs, but they are in different reading frames; III indicates intergenic regions on the complementary strand; and IV indicates intragenic regions on the complementary strand, CDSs are partially or fully overlapped with annotated CDSs, but they are in different reading frames. Below the relative localization, the length distribution of the putative new CDSs is given in base pairs (BP). Numbers inside the brackets indicate the number of CDSs. (B) A VENN diagram showing the overlapping of the CDSs predicted by GLIMMER, GeneMark, ZCURVE, respectively. Numbers inside the brackets indicate the number of the CDSs that have been confirmed by extrinsic evidence and/or transcriptional analysis.</p>
</text><graphic file="1471-2164-12-359-2" hint_layout="single"/></fig>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>Supplementary tables</b>. The putative new CDSs identified by similarity searching. The new CDSs identical to the CDSs annotated in <it>Xcc </it>strain B100. The new CDSs with detectable transcripts by microarray analysis. Oligos used in this study.</p>
</text>
<file name="1471-2164-12-359-S1.PDF">
   <p>Click here for file</p>
</file>
</suppl>
</sec>
<sec>
<st>
<p>Validation of new CDS by extrinsic evidence</p>
</st>
<p>The set of 2850 putative new CDSs was probably contaminated by pseudogene fragments and false-prediction artifacts because all the 3 gene finders are entirely based on intrinsic evidence. To find true CDS, the next strategy used in this study was to get support by extrinsic evidence. All the putative new CDSs were blasted for similar entries within the NCBI non-redundent database by means of BLASTP. Based on the three criteria described in Materials and Methods, a total of 220 putative new CDSs were found to be significantly similar to other protein sequences in the database (Additional file <supplr sid="S1">1</supplr>).</p>
<p>More recently, the genome sequence of <it>Xcc </it>strain B100 has been published and the genome contained 496 additional CDSs <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>. About half of the these CDSs that were identified by the combined use of the gene finders GISMO <abbrgrp>
<abbr bid="B23">23</abbr>
</abbrgrp> and REGANOR <abbrgrp>
<abbr bid="B24">24</abbr>
</abbrgrp> were also present in the genomes of <it>Xcc </it>strains 8004 and ATCC33913, but have not been annotated <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>. Comparing the 2850 putative new CDSs identified in this study with the 496 additional CDSs in <it>Xcc </it>strain B100, we found an overlapping 72 CDSs (Additional file <supplr sid="S1">1</supplr>). Among them, 14 CDSs had more than one homologs in non-redundant database and have been included in the 220 putative new CDSs identified by similarity searching; the remaining 58 CDSs had no homologs in non-redundant database except in <it>Xcc </it>strain B100 and were also regarded as new CDSs in this study (Additional file <supplr sid="S1">1</supplr>). Taken together, a total of 278 CDSs were screened out of 2850 putative new CDSs by extrinsic evidence.</p>
<p>The majority of these CDSs (240 of 278) encodes conserved hypothetical proteins or hypothetical proteins (Figure <figr fid="F3">3</figr>). Eleven CDSs (<it>Xcc</it>_CDS105, <it>Xcc</it>_CDS107, <it>Xcc</it>_CDS411, <it>Xcc</it>_CDS1381, <it>Xcc</it>_CDS1831, <it>Xcc</it>_CDS2249, <it>Xcc</it>_CDS2324, <it>Xcc</it>_CDS2391, <it>Xcc</it>_CDS2668, <it>Xcc</it>_CDS2723, <it>Xcc</it>_CDS2777) encode putative secreted or exported proteins and three CDSs encode regulatory protein or transcription factors (Figure <figr fid="F3">3</figr>). Xcc_CDS002 encodes a Sir2-like transcriptional silencer protein; Xcc_CDS1553 encodes an ArsR family transcriptional regulator; Xcc_CDS1633 bears similarity to the Homeodomain of POU domain proteins or HTH_XRE domain proteins (Additional file <supplr sid="S1">1</supplr>). Two CDSs (<it>Xcc</it>_CDS2171 and <it>Xcc</it>_CDS2691) encode putative phenol hydroxylases and another 2 CDSs (<it>Xcc</it>_CDS2201 and <it>Xcc</it>_CDS2211) encode putative 50S ribosomal proteins. The remaining 20 CDSs respectively encode peptidase-like protein (<it>Xcc</it>_CDS073), hemolysin III (<it>Xcc</it>_CDS095), IS480b transposase (<it>Xcc</it>_CDS177), putative cell wall surface anchor family protein (<it>Xcc</it>_CDS346), endonuclease (<it>Xcc</it>_CDS528), chloramphenicol O-acetyltransferase (<it>Xcc</it>_CDS639), outer protein D (<it>Xcc</it>_CDS900), ABC transporter heme permease (<it>Xcc</it>_CDS1309), putative GTPase (<it>Xcc</it>_CDS1342), putative DNA methylase (<it>Xcc</it>_CDS1416), transmembrane protein (<it>Xcc</it>_CDS1446), putative tryptophan-rich sensory protein (<it>Xcc</it>_CDS1617), dihydroxydipicolinate synthase (<it>Xcc</it>_CDS1689), thermoresistant gluconokinase (<it>Xcc</it>_CDS1836), thiopurine methyltransferase (<it>Xcc</it>_CDS1899), putative tryptophan 2,3-dioxygenase oxidoreductase (<it>Xcc</it>_CDS2015), putative Atu protein (<it>Xcc</it>_CDS2546), WD40-like beta propeller (<it>Xcc</it>_CDS2674), ferric pseudobactins receptor protein (<it>Xcc</it>_CDS2714), and restriction endonuclease (<it>Xcc</it>_CDS2849) (Figure <figr fid="F3">3</figr>; Additional file <supplr sid="S1">1</supplr>).</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Functional classification of the new CDSs based on similarity searching</p></caption><text>
   <p><b>Functional classification of the new CDSs based on similarity searching</b>.</p>
</text><graphic file="1471-2164-12-359-3" hint_layout="single"/></fig>
</sec>
<sec>
<st>
<p>Transcription analysis for new CDS</p>
</st>
<p>An alternative approach to validate a CDS is to detect the transcribed mRNA. An oligonucleotide microarray chip, which contains 50-mer oligos specific for 4080 annotated CDSs and 8 negative controls, has been successfully used to analyze the DSF regulon, Clp regulon and RavR regulon in <it>Xcc </it>
<abbrgrp>
<abbr bid="B25">25</abbr>
<abbr bid="B26">26</abbr>
<abbr bid="B27">27</abbr>
</abbrgrp>. In this study, a new microarray chip with the above-mentioned oligos and additional oligos specific for the 1724 putative new CDSs was constructed. This microarray chip was used to detect transcripts of the putative new CDSs. To detect transcripts under different conditions, total RNA was extracted from cell culture grown under the following conditions: (i) different cell density: OD<sub>600 </sub>= 1.0, 1.6 and 2.0; (ii) different genetic backgrounds: &#916;rpfF strain, &#916;rpfC, &#916;clp and &#916;ravR <abbrgrp>
<abbr bid="B25">25</abbr>
<abbr bid="B26">26</abbr>
<abbr bid="B27">27</abbr>
</abbrgrp>; (iii) different media: rich YEB medium and poor NYG medium. By using the screening procedures described in Materials and Methods, 147 putative new CDSs were found with detectable transcripts (Figure <figr fid="F4">4A</figr>; Additional file <supplr sid="S1">1</supplr>). Further analysis revealed that 75 CDSs were constitutively expressed during the growth and the remaining 72 CDSs were only expressed at high cell density (OD<sub>600 </sub>= 2.0) (Figure <figr fid="F4">4A</figr>). Comparing the global gene expression profiles of <it>Xcc </it>wild type, <it>rpfF</it>, <it>rpfC</it>, and <it>clp </it>deletion mutants, we found that the transcription of 15 high cell density-dependent CDSs was also positively regulated by the quorum sensing signal DSF <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. The expression levels of these CDSs in an <it>rpfF </it>deletion mutant were respectively 2.7 to 5.2 times lower than those in the wild type XC1 strain (Figure <figr fid="F4">4B</figr>). The transcription of <it>Xcc</it>_CDS2497 was only induced in poor NYG medium at higher density (OD<sub>600 </sub>= 2.0) (Figure <figr fid="F4">4A</figr>).</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Overview on new CDS identified via microarray and RT-PCR analysis</p></caption><text>
   <p><b>Overview on new CDS identified via microarray and RT-PCR analysis</b>. (A) Transcriptional patterns of the new CDS based on microarry analysis. Constitutive expression indicates that these CDS are constitutively expressed at OD<sub>600 </sub>= 1.0, 1.6 and 2.0 during the growth. DSF-regulated expression indicates that the expression levels of these CDS at OD<sub>600 </sub>= 2.0 are significantly lower in &#916;rpfF, &#916;rpfC, or &#916;clp strains than those in wild type. NYG-induced expression indicates that this CDS is only transcribed at OD<sub>600 </sub>= 2.0 when grown in NYG medium. (B) RT-PCR analysis to verify the expression difference between the wild type and the <it>rpfF </it>deletion mutant &#916;rpfF. The 4 new CDS supported by extrinsic evidence were indicated by bold font.</p>
</text><graphic file="1471-2164-12-359-4" hint_layout="single"/></fig>
<p>In order to go further in the validation of our microarray-based method for selecting true CDS, and as we are more interested in DSF signal-regulated CDSs, we chose the 15 DSF signal-regulated CDSs for further transcriptional analysis by reverse transcription PCR. The products of 11 CDSs could be amplified by using total RNAs extracted from cell culture at OD<sub>600 </sub>= 2.0 (Figure <figr fid="F4">4B</figr>). The resultant RT-PCR products were further verified by sequencing analysis (data not shown). RT-PCR analysis also verified the transcriptional difference of the 11 new CDSs between wild type and <it>rpfF </it>deletion mutant (Figure <figr fid="F4">4B</figr>).</p>
</sec>
<sec>
<st>
<p>Total new CDSs identified by similarity searching and transcriptional analysis</p>
</st>
<p>While extrinsic evidence supported the presence of 278 new CDSs, and while transcriptional analysis indicated 147 new CDSs with detectable transcripts, a comparison of the two sets of new CDSs revealed a total of 119 overlapping CDSs that were identified by both approaches (Figure <figr fid="F5">5</figr>). Thus, a total of 306 (278+147-119) CDSs got support by extrinsic evidence or/and experimentally transcriptional analysis, suggesting that they are probably true CDSs. The remaining 2544 putative new CDSs failed to get support by extrinsic evidence or transcriptional analysis (Figure <figr fid="F5">5</figr>). Two of the overlapping 119 CDSs, Xcc_CDS002 and Xcc_CDS1553, which both encoded putative transcription factors, were chosen for further experimental characterization. The results are presented in the following sections.</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Total new CDS identified by similarity searching and transcriptional analysis</p></caption><text>
   <p><b>Total new CDS identified by similarity searching and transcriptional analysis</b>.</p>
</text><graphic file="1471-2164-12-359-5" hint_layout="single"/></fig>
</sec>
<sec>
<st>
<p>
<it>Xcc</it>_CDS002 encodes a SIR2-like domain protein and is associated with virulence on Chinese cabbage</p>
</st>
<p>
<it>Xcc</it>_CDS002 is a new CDS of 855 bps in length. It encodes a protein with a conserved silent information regulator 2 (SIR-2) or SIR2-like domain (Figure <figr fid="F6">6A</figr>), which has been found to confer NAD-dependent protein deacetylase activity in eukaryotes <abbrgrp>
<abbr bid="B28">28</abbr>
<abbr bid="B29">29</abbr>
</abbrgrp>. For the convenience of discussion, <it>Xcc</it>_CDS002 was renamed as <it>sir2x </it>for <ul>SIR2</ul>-like protein gene in <it>
<ul>X</ul>anthomonas campestris </it>in this study. The DNA sequence of <it>sir2x </it>was found in all the 3 published <it>Xcc </it>genomes and in <it>Xcc </it>strain XC1 (Figure <figr fid="F6">6C</figr>). In the genome of <it>Xcc </it>strain 8004, <it>sir2x </it>is flanked by XC_4281 and XC_4282 (Figure <figr fid="F6">6A</figr>), which respectively encode a phage-related regulatory protein cII and a hypothetical protein. <it>Sir2x </it>and XC_4281 share the same transcriptional orientation and are separated by only one base pair (Figure <figr fid="F6">6B</figr>). Further RT-PCR analysis confirmed that <it>sir2x </it>and XC_4281 are transcribed as an operon (Figure <figr fid="F6">6C</figr>). To further study its role in <it>Xcc</it>, the coding region (33 to 280 aa) of <it>sir2x </it>was in frame deleted in the chromosome of <it>Xcc </it>strain XC1 and the resultant mutant was named as &#916;sir2x. Deletion of <it>sir2x </it>did not affect the production of virulence factors, including extracellular protease, extracellular cellulase, and EPS (data not shown), but significantly reduced virulence of <it>Xcc </it>strain XC1 on Chinese cabbage (Figure <figr fid="F6">6D</figr>). Complementation of the mutant with the <it>sir2x </it>coding region resulted in the complete recovery of virulence to wild-type level (Figure <figr fid="F6">6E</figr>).</p>
<fig id="F6"><title><p>Figure 6</p></title><caption><p>The new CDS <it>sir2x </it>is involved in virulence in <it>Xcc</it></p></caption><text>
   <p><b>The new CDS <it>sir2x </it>is involved in virulence in <it>Xcc</it></b>. (A) Domain organization of Sir2x as predicted by SMART (<url>http://smart.embl-heidelberg.de/</url>). (B) Genomic localization of <it>sir2x </it>and its flanking genes in the chromosome of <it>Xcc </it>strain 8004. (C) RT-PCR analysis of the XC_4281-<it>sir2x </it>operon. No genomic DNA contamination was indicated by normal PCR amplification using total RNAs as template. (D) <it>In vitro </it>virulence assay on Chinese cabbage. &#916;sir2x (<it>sir2x</it>) indicates the complemented deletion mutant defective in <it>sir2x</it>.</p>
</text><graphic file="1471-2164-12-359-6" hint_layout="single"/></fig>
</sec>
<sec>
<st>
<p>
<it>Xcc</it>_CDS1553 is associated with arsenate resistance in <it>Xcc </it>strain 8004</p>
</st>
<p>
<it>Xcc</it>_CDS1553 encodes a 122-aa protein with a conserved HTH_ARSR domain (Figure <figr fid="F7">7A</figr>), which occurs in arsenical resistance operon repressors and similar prokaryotic, metal-regulated homodimeric repressors that belong to the ArsR superfamily of bacterial transcription-regulatory proteins <abbrgrp>
<abbr bid="B30">30</abbr>
<abbr bid="B31">31</abbr>
</abbrgrp>. For the convenience of discussion, this CDS was renamed as <it>arsR</it>. Interestingly, <it>arsR </it>was only found in the genome of <it>Xcc </it>strain 8004, not in <it>Xcc </it>strains ATCC33913 and B100. In the <it>Xcc </it>8004 genome, <it>arsR </it>is located upstream of XC_2295 and XC_2294, which respectively encode a putative high-affinity Fe<sup>2+</sup>/Pb<sup>2+ </sup>permease and an arsenite efflux pump AcR3 (Figure <figr fid="F7">7B</figr>). <it>arsR </it>and XC_2295 were separated by 64 bps and the gap between XC_2294 and XC_2295 was 83 bps (Figure <figr fid="F7">7B</figr>; 20). Further RT-PCR analysis showed that <it>arsR</it>, XC_2295, and XC_2294 belong to the same operon (Figure <figr fid="F7">7C</figr>), suggesting that ArsR, XC_2294 and XC_2295 might be functionally related. To further confirm this hypothesis, an <it>arsR </it>in frame deletion mutant termed &#916;arsR was generated in <it>Xcc </it>strain 8004. The results showed that the &#916;arsR strain was much more sensitive to arsenate than the wild type strain (Figure <figr fid="F7">7D</figr>). On LB plates with 0.5 mM arsenate, the wild type strain <it>Xcc </it>8004 grew well, while in contrast, the deletion mutant did not grow at all on this medium (Figure <figr fid="F7">7E</figr>). The mutant phenotype could be reverted by complementation with a plasmid carrying the coding region of <it>arsR</it>, demonstrating that the observed phenotype was due to <it>arsR</it>.</p>
<fig id="F7"><title><p>Figure 7</p></title><caption><p>The new CDS <it>arsR </it>is involved in arsenate resistance in <it>Xcc </it>strain 8004</p></caption><text>
   <p><b>The new CDS <it>arsR </it>is involved in arsenate resistance in <it>Xcc </it>strain 8004</b>. (A) Domain organization of <it>arsR </it>predicted by SMART (<url>http://smart.embl-heidelberg.de/</url>). (B) Genomic localization of ArsR and its flanking genes in the chromosome of <it>Xcc </it>strain 8004. (C) RT-PCR analysis of the <it>arsR</it>-XC_2295-XC_2294 operon. (D) Survival of <it>Xcc </it>strain 8004 and its derivatives in liquid NYG media with different concentrations of sodium arsenate. (E) Growth of <it>Xcc </it>strain 8004 and its derivatives on an NYG plate with 0.5 mM sodium arsenate. &#916;arsR (<it>arsR</it>) indicates the complemented deletion mutant defective in <it>arsR</it>.</p>
</text><graphic file="1471-2164-12-359-7" hint_layout="single"/></fig>
</sec>
</sec>
<sec>
<st>
<p>Discussion</p>
</st>
<p>In this study, we used a combined strategy for CDS prediction. GLIMMER is a computational gene-finding system and the technical underpinning of the system is an interpolated Markov model (IMM), a generalization of Markov chain methods <abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp>. The GeneMark program is an <it>ab initio </it>gene finder, which employs inhomogeneous (three-periodic) Markov chain models describing protein-coding DNA and homogeneous Markov chain models describing non-coding DNA <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp>. ZCURVE is a system for recognizing protein-coding genes in bacterial genome, which uses the "Z-transformation" of DNA as information source for classification <abbrgrp>
<abbr bid="B22">22</abbr>
</abbrgrp>. The results showed that 99.7% of the CDSs (4168 of 4181) in the existing annotations of strain ATCC33913 and 99.5% of the CDSs (4254 of 4273) of strain 8004 could be predicted by the combined strategy (Figure <figr fid="F2">2A</figr>), suggesting that the combined gene finding strategy works well for finding currently annotated genes in <it>Xcc </it>genomes. In addition to the CDSs in the existing annotations of <it>Xcc </it>genomes, a total of 2850 putative new CDSs were identified in the two <it>Xcc </it>genomes by the combined gene prediction strategy. Among them, 306 reliable new CDSs were further confirmed by subsequent analysis based on extrinsic similarity or/and transcript detection, suggesting that the combined gene finding strategy could be used for finding new CDS in bacterial genomes. Considering the number of putative CDSs predicted and those having been confirmed by extrinsic evidence and/or microarray analysis (Figure <figr fid="F2">2B</figr>), GLIMMER seems more powerful than GeneMark and ZCURVE in new CDS prediction.</p>
<p>Microarrays traditionally have been used to analyze the expression behavior of large numbers of annotated genes in bacteria. In this study, microarray analysis, applied together with CDS prediction, was used to find new genes, which was further validated by RT-PCR analysis. Compared to other transcript detection methods, microarray analysis is more sensitive and suitable for highthroughput analysis. So far, a similar strategy has only been reported for <it>Escherichia coli</it>. Selinger et al. <abbrgrp>
<abbr bid="B32">32</abbr>
</abbrgrp> introduced a high-density oligonucleotide probe array for <it>E. coli </it>that not only carries strand-specific probes for all mRNA, tRNA, and rRNA regions, but also covers intergenic regions of &gt;40 bp. Using <it>E. coli </it>RNA from cells grown on different media, over 1100 transcripts corresponding to intergenic regions were identified. Further classification revealed 317 novel transcripts with unknown function <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp>.</p>
<p>SIR2 proteins are found in organisms ranging from bacteria to humans <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>. In eukaryotes, SIR2 proteins regulate transcriptional repression, recombination, the cell division cycle, microtubule organization, cellular responses to DNA-damaging agents and aging <abbrgrp>
<abbr bid="B28">28</abbr>
<abbr bid="B29">29</abbr>
</abbrgrp>. A phylogenetically conserved NAD<sup>+</sup>-dependent protein deacetylase activity has been demonstrated in Sir2 family proteins in eukaryotes <abbrgrp>
<abbr bid="B34">34</abbr>
<abbr bid="B35">35</abbr>
<abbr bid="B36">36</abbr>
</abbrgrp>. So far very limited evidence is available regarding the function of SIR-2 proteins in bacteria. The only reported case was from <it>Salmonella typhimurium</it>, where the gene cobB is involved in the biosynthesis of cobalamin and the catabolism of propionate <abbrgrp>
<abbr bid="B37">37</abbr>
</abbrgrp>. Further analysis revealed that the recombinant SIR2 protein CobB had NAD-dependent ADP-ribosyltransferase activity <it>in vitro </it>
<abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>. The demonstration that the ribosyltransferase and NAD<sup>+</sup>-dependent protein deacetylase activities are both dependent on an acetylated substrate confirms the fundamental link between the two activities <abbrgrp>
<abbr bid="B29">29</abbr>
</abbrgrp>. The true enzymatic activity of Sir2x and how Sir2x is involved in the regulation of virulence in Chinese cabbage remains to be dissolved. The involvement of <it>sir2x </it>in virulence of <it>Xcc </it>strain XC1 is in good agreement with previous findings that transposon insertion in the promoter region of XC4281 encoding a phage-related regulatory protein cII led to a complete loss of virulence of <it>Xcc </it>strain 8004 on radish <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>. As shown in Figure <figr fid="F6">6</figr>, XC4281 and the newly identified <it>sir2x </it>are within the same operon and they share a common promoter. Transposon insertion in the promoter region probably disrupts not only the expression of XC4281, but also the expression of <it>sir2x</it>. The roles of Sir2x in <it>Xcc </it>virulence remains to be dissolved.</p>
<p>Arsenic, a toxic metalloid, is currently and has always been ranked first on the Superfund List of Hazardous Substances (available on the World Wide Web), in part because of its environmental ubiquity. As a consequence, many bacterial species have genes that confer resistance to arsenic. Environmental arsenic is sensed by members of the ArsR/SmtB family of metalloregulatory transcriptional repressors <abbrgrp>
<abbr bid="B30">30</abbr>
<abbr bid="B39">39</abbr>
</abbrgrp>, which represses the expression of operons involved in the uptake, efflux, sequestration, or detoxification of metal ions <abbrgrp>
<abbr bid="B40">40</abbr>
</abbrgrp>. This study identified an ArsR family repressor and found that the XC2294-XC2295-<it>arsR </it>operon is involved in arsenate resistance in <it>Xcc </it>strain 8004. Since no ArsR homologs were found in <it>Xcc </it>strains ATCC33913, B100 and XC1, we propose that the <it>arsR </it>may have been acquired by <it>Xcc </it>strain 8004 in a lateral gene transfer event.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>This study reported a thorough search for new CDS in the two published <it>Xcc </it>genomes. In the first, putative CDSs encoded in the two genomes were re-predicted using three gene finders, resulting in the identification of 2850 putative new CDSs. In the second, similarity searching was conducted and 278 CDSs were found to have homologs in other bacterial species. In the third, oligonucleotide microarray and RT-PCR analysis identified 147 CDSs with detectable mRNA transcripts. Finally, in-frame deletion and subsequent phenotype analysis of the two newly identified CDSs confirmed their functionality. Our results showed that, despite sophisticated approaches available for genome annotation, many cellular transcripts have remained unidentified so far in <it>Xcc </it>genomes. Through a combined strategy involving bioinformatic, postgenomic and genetic approaches as demonstrated in this study, a reliable list of 306 new CDSs was identified and a more thorough understanding of some cellular processes was gained.</p>
</sec>
<sec>
<st>
<p>Methods</p>
</st>
<sec>
<st>
<p>Bacterial strains and growth conditions</p>
</st>
<p>
<it>Xcc </it>strains XC1 and 8004 were grown at 30&#176;C with shaking (250 rpm/min) in YEB, LB or NYG medium as described by He et al. <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. <it>E. coli </it>strains were grown at 37&#176;C in LB medium. Antibiotics were added at the following concentrations when required: kanamycin, 100 &#956;g/ml, rifampicin, 25 &#956;g/ml, and tetracycline, 10 &#956;g/ml.</p>
</sec>
<sec>
<st>
<p>Nucleotide sequence source, gene prediction and domain analysis</p>
</st>
<p>Complete genome records of the <it>Xcc </it>strains ATCC33913 and 8004 <abbrgrp>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
</abbrgrp> were downloaded from the NCBI Microbial genome database (<url>http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi?view=1</url>). Gene prediction was conducted by the gene finders GLIMMER 2.03 <abbrgrp>
<abbr bid="B8">8</abbr>
</abbrgrp>, GeneMark <abbrgrp>
<abbr bid="B21">21</abbr>
</abbrgrp> and ZCURVE <abbrgrp>
<abbr bid="B22">22</abbr>
</abbrgrp>. For the prediction, the minimum length of CDS was set as 90 bp. BLASTN (<url>http://blast.ncbi.nlm.nih.gov/Blast.cgi</url>) was used to find the locations of all the putative new CDSs in the genomes of <it>Xcc </it>strain 8004 and ATCC33913. Multiple sequence alignment analysis was performed using CLUSTAL W (1.83) (<url>http://sbcr.bii.a-star.edu.sg/clustalw/</url>). Domain architecture analysis was performed using the SMART database application (<url>http://smart.embl-heidelberg.de/</url>). The nucleic acid sequences of two well-studied regulator <it>sir2x </it>and <it>arsR </it>have been deposited in the NCBI GeneBank database and the accession numbers are <ext-link ext-link-id="JF966390" ext-link-type="gen">JF966390</ext-link> and <ext-link ext-link-id="JF966391" ext-link-type="gen">JF966391</ext-link>.</p>
</sec>
<sec>
<st>
<p>Screening new CDS by extrinsic evidence</p>
</st>
<p>The amino acid sequences of all 2850 putative new CDSs were submitted for BLASTP analysis. Homologs in the nr database were selected on the basis of the following three criteria. Firstly, only the subjects with E-values lower than 10<sup>-4 </sup>were considered hits. Secondly, the subjects should have similar sizes as the queries. Thirdly, for each query there should be more than one matched subject unless the E-value is very low (less than 10<sup>-30</sup>).</p>
</sec>
<sec>
<st>
<p>Design and synthesis of CDS-specific oligonucleotides, and preparation of <it>Xcc </it>oligo microarray chip</p>
</st>
<p>Based on the annotated genome sequences of the <it>Xcc </it>strains ATCC33913 and 8004 <abbrgrp>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
</abbrgrp>, we used a CDS-specific oligonucleotide selection algorithm <abbrgrp>
<abbr bid="B41">41</abbr>
</abbrgrp> to successfully design unique 50-mer oligonucleotides for 1724 putative new CDSs. The majority of these CDSs were more than 300 bps in length. As specificity controls, 50-mer oligonucleotides were also designed based on the sinat5 (NCBI No.: AF480944) and nac1 (NCBI No.: AF198054) genes of <it>Arabidopsis thaliana</it>, and the genes <it>rag1 </it>(NCBI No.: NM_131389) of zebrafish and the <it>olf1 </it>(NCBI No.:U56420) of <it>Homo sapiens </it>
<abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. Thus, a total of 5770 CDS-specific oligonucleotides representing 4042 annotated CDS <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>, plus 1724 putative new CDSs, and 4 specificity controls were used for the oligonucleotide microarray chip preparation. Oligonucleotides were synthesized at a 50 nmol scale by Operon Technologies (Alameda, CA, USA). The protocol employed for constructing the oligo-chip has been previously described <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. Briefly, all oligos were dissolved in saline sodium citrate buffer (3 &#215; SSC) to a final concentration of 40 &#956;M. Oligo samples were arrayed with Pixsys 5500XL Arrayer (Cartesian) to poly-L-Lysine-coated microscope slides. DNA samples were fixed by rehydration, snap-drying and UV cross-linking. The remaining poly-L-Lysine on the slides was rendered non-reactive by treatment with blocking solution (150 mM succinic anhydride in 1-methy-2-pyrrolidinone, buffered with 85 mM sodium borate, pH 8.0) for 30 min. After washing with water, the array plates were rinsed with 95% ethanol and dried.</p>
</sec>
<sec>
<st>
<p>Isolation of total RNA and microarray analysis</p>
</st>
<p>Bacterial cells were collected by centrifugation at 4&#176;C for 5 min at 10,000 rpm. Total RNA samples were prepared by using RNeasy midi columns following the manufacturer's instructions (Qiagen). RNA integrity was confirmed by electrophoresis using a 1.3% formaldehyde agrose gel. The quality of DNA-free RNA was monitored by PCR and RT-PCR analysis of at least two known genes. Cy3- or Cy5-labeled cDNA was generated by using random hexamers as primers for reverse transcription (Invitrogen). cDNA labeling, purification and hybridization against the microarray were conducted as previously described <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. Slides were scanned for the fluorescent intensity using a ScanArray 5000 laser scanner. The signal intensities were quantified by using the software ImaGene 5 (BioDiscovery). Hybridization signals were normalized using the scale normalization procedure previously described <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. Each treatment was repeated three times and the data presented were the means of two representative replicates. The fold changes were then calculated from the normalized log ratios.</p>
</sec>
<sec>
<st>
<p>Screening new CDS by statistical analysis of microarray hybridization signal intensity</p>
</st>
<p>In this study, oligonucleotide microarray analysis was used to detect transcription, so as to confirm the functionality of the putative new CDSs. The putative CDSs with detectable transcript was identified using the normalized signal median of the corresponding probe. To calculate the normalized signal median, firstly the average signal median S<sub>0 </sub>of 8 negative control probes representing 4 <it>Arabidopsis </it>and zebrafish genes <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp> was determined by using the following formula: S<sub>0 </sub>= &#8721;(S<sub>AZ</sub>-B<sub>AZ</sub>)/8, where S<sub>AZ </sub>indicates the signal median of the negative control probe and B<sub>AZ </sub>indicates the corresponding background signal median. Secondly, the normalized signal median (S) of the putative new CDSs was calculated following the formula: S = S<sub>CDS </sub>- B<sub>CDS </sub>-S<sub>0</sub>, where S<sub>CDS </sub>indicates the signal median of the putative new CDS and B<sub>CDS </sub>indicates the background median of the putative new CDSs. Finally, if S &gt;0, it is regarded as CDS with detectable transcript.</p>
</sec>
<sec>
<st>
<p>Reverse transcription (RT) PCR analysis</p>
</st>
<p>RT-PCR analysis was conducted using a QIAGEN<sup>&#174;</sup>OneStep RT-PCR Kit following the manufacturer's instructions. The primers used for RT-PCR analysis are listed in Additional file1. Total RNAs were extracted from bacterial culture grown in YEB medium at OD<sub>600 </sub>= 2.0 and a total of 200 ng of total RNA was used for reverse transcription. The cycle number differed in the amplification of different CDS products.</p>
</sec>
<sec>
<st>
<p>Generation of in-frame deletion mutants and complementation analysis</p>
</st>
<p>Spontaneous rifampicin-resistant derivatives of strain XC1 or 8004 were used as parental strains for generation of deletion mutants. In-frame deletion of <it>Xcc</it>_CDS002 (<it>sir2x</it>) and <it>Xcc</it>_CDS1553 (<it>arsR</it>) was conducted using the primers listed in Additional file <supplr sid="S1">1</supplr> following the methods described previously <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. For complementation analysis, the coding regions of <it>sir2x </it>and <it>arsR </it>respectively were amplified by PCR using the primers listed in Additional file <supplr sid="S1">1</supplr> and cloned under the control of <it>lac </it>promoter in expression vector pLAFR3. The resultant constructs were transferred into <it>Xcc </it>strains through triparental mating.</p>
</sec>
<sec>
<st>
<p>Quantitative determination of extracellular enzyme activity, EPS production and virulence test</p>
</st>
<p>The extracellular cellulase and protease activity and EPS production in the culture supernatants of <it>Xcc </it>strains at OD600 = 2.3 were measured according to the methods described previously <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. The virulence of <it>Xcc </it>to Chinese cabbage was determined following the scissors-clipping method described previously <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp>. Fifteen plants were inoculated for each bacterial strain and the experiment was repeated three times.</p>
</sec>
<sec>
<st>
<p>Arsenate resistance assay</p>
</st>
<p>Sodium arsenate (SIGMA) was added in the following final concentrations (mM): 0.10, 0.25, 0.50, 0.75 and 1.00. Fifty microliters of fresh culture of <it>Xcc </it>strain 8004 were inoculated into 5 ml of NYG liquid media with rifampicin (25 &#956;g/ml) and sodium arsenate at different concentrations and grown at 28&#176;C with shaking (250 rpm/min) for overnight. Bacterial growth was indicated by measuring the optical density at 600 nm.</p>
</sec>
</sec>
<sec>
<st>
<p>Competing interests</p>
</st>
<p>They authors declare that they have no competing interests.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>LZ and FV carried out all the gene prediction, similarity searching. LZ conducted microarray analysis and generated all the mutants. The study was conceived, designed, and coordinated by AP and YWH, who also drafted the manuscript. YQH, BLJ and JLT did the virulence assay. YX was involved in discussion and draft preparation. All authors read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>This work was supported by a Research Foundation for Returned Scholars, Shanghai Jiao Tong University (WS3107208008 to YWH). We thank Mr. Jianli Wang at Institute of Molecular and Cell Biology (IMCB), A*STAR, Singapore for mass blasting analysis, and Prof. Lian-Hui Zhang and Prof. Byrappa Venkatesh at IMCB for technical support.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Re-annotation of genome microbial coding-sequences: finding new genes and inaccurately annotated genes</p></title><aug><au><snm>Bocs</snm><fnm>S</fnm></au><au><snm>Danchin</snm><fnm>A</fnm></au><au><snm>M&#233;digue</snm><fnm>C</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2002</pubdate><volume>3</volume><fpage>5</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-3-5</pubid><pubid idtype="pmcid">77393</pubid><pubid idtype="pmpid" link="fulltext">11879526</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Finding genes by computer: the state of the art</p></title><aug><au><snm>Fickett</snm><fnm>JW</fnm></au></aug><source>Trends in genetics</source><pubdate>1996</pubdate><volume>12</volume><fpage>316</fpage><lpage>320</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/0168-9525(96)10038-X</pubid><pubid idtype="pmpid" link="fulltext">8783942</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p></title><aug><au><snm>Altschul</snm><fnm>SF</fnm></au><au><snm>Madden</snm><fnm>TL</fnm></au><au><snm>Sch&#228;ffer</snm><fnm>AA</fnm></au><au><snm>Zhang</snm><fnm>J</fnm></au><au><snm>Zhang</snm><fnm>Z</fnm></au><au><snm>Miller</snm><fnm>W</fnm></au><au><snm>Lipman</snm><fnm>DJ</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1997</pubdate><volume>25</volume><issue>17</issue><fpage>3389</fpage><lpage>3402</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/25.17.3389</pubid><pubid idtype="pmcid">146917</pubid><pubid idtype="pmpid" link="fulltext">9254694</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>GeneMark.hmm: new solutions for gene finding</p></title><aug><au><snm>Lukashin</snm><fnm>AV</fnm></au><au><snm>Borodovsky</snm><fnm>M</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1998</pubdate><volume>26</volume><issue>4</issue><fpage>1107</fpage><lpage>1115</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/26.4.1107</pubid><pubid idtype="pmcid">147337</pubid><pubid idtype="pmpid" link="fulltext">9461475</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions</p></title><aug><au><snm>Besemer</snm><fnm>J</fnm></au><au><snm>Lomsadze</snm><fnm>A</fnm></au><au><snm>Borodovsky</snm><fnm>M</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2001</pubdate><volume>29</volume><issue>12</issue><fpage>2607</fpage><lpage>2618</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/29.12.2607</pubid><pubid idtype="pmcid">55746</pubid><pubid idtype="pmpid" link="fulltext">11410670</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses</p></title><aug><au><snm>Besemer</snm><fnm>J</fnm></au><au><snm>Borodovsky</snm><fnm>M</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2005</pubdate><issue>33 Web Server</issue><fpage>W451</fpage><lpage>454</lpage></bibl><bibl id="B7"><title><p>Microbial gene identification using interpolated Markov models</p></title><aug><au><snm>Salzberg</snm><fnm>SL</fnm></au><au><snm>Delcher</snm><fnm>AL</fnm></au><au><snm>Kasif</snm><fnm>S</fnm></au><au><snm>White</snm><fnm>O</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1998</pubdate><volume>26</volume><fpage>544</fpage><lpage>548</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/26.2.544</pubid><pubid idtype="pmcid">147303</pubid><pubid idtype="pmpid" link="fulltext">9421513</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>Improved microbial gene identification with GLIMMER</p></title><aug><au><snm>Delcher</snm><fnm>AL</fnm></au><au><snm>Harmon</snm><fnm>D</fnm></au><au><snm>Kasif</snm><fnm>S</fnm></au><au><snm>White</snm><fnm>O</fnm></au><au><snm>Salzberg</snm><fnm>SL</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1999</pubdate><volume>27</volume><fpage>4636</fpage><lpage>4641</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/27.23.4636</pubid><pubid idtype="pmcid">148753</pubid><pubid idtype="pmpid" link="fulltext">10556321</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Identifying bacterial genes and endosymbiont DNA with Glimmer</p></title><aug><au><snm>Delcher</snm><fnm>AL</fnm></au><au><snm>Bratke</snm><fnm>KA</fnm></au><au><snm>Powers</snm><fnm>EC</fnm></au><au><snm>Salzberg</snm><fnm>SL</fnm></au></aug><source>Bioinformatics</source><pubdate>2007</pubdate><volume>23</volume><issue>6</issue><fpage>673</fpage><lpage>679</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btm009</pubid><pubid idtype="pmcid">2387122</pubid><pubid idtype="pmpid" link="fulltext">17237039</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Re-annotation of the genome sequence of <it>Mycobacterium tuberculosis </it>H37Rv</p></title><aug><au><snm>Camus</snm><fnm>JC</fnm></au><au><snm>Pryor</snm><fnm>MJ</fnm></au><au><snm>M&#233;digue</snm><fnm>C</fnm></au><au><snm>Cole</snm><fnm>ST</fnm></au></aug><source>Microbiology</source><pubdate>2002</pubdate><volume>148</volume><fpage>2967</fpage><lpage>73</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">12368430</pubid></xrefbib></bibl><bibl id="B11"><title><p>A "polyORFomic" analysis of prokaryote genomes using disabled-homology filtering reveals conserved but undiscovered short ORFs</p></title><aug><au><snm>Harrison</snm><fnm>PM</fnm></au><au><snm>Carriero</snm><fnm>N</fnm></au><au><snm>Liu</snm><fnm>Y</fnm></au><au><snm>Gerstein</snm><fnm>M</fnm></au></aug><source>J Mol Biol</source><pubdate>2003</pubdate><volume>333</volume><issue>5</issue><fpage>885</fpage><lpage>892</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.jmb.2003.09.016</pubid><pubid idtype="pmpid" link="fulltext">14583187</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Large-scale prokaryotic gene prediction and comparison to genome annotation</p></title><aug><au><snm>Nielsen</snm><fnm>P</fnm></au><au><snm>Krogh</snm><fnm>A</fnm></au></aug><source>Bioinformatics</source><pubdate>2005</pubdate><volume>21</volume><issue>24</issue><fpage>4322</fpage><lpage>4329</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/bti701</pubid><pubid idtype="pmpid" link="fulltext">16249266</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Genome re-annotation: a wiki solution?</p></title><aug><au><snm>Salzberg</snm><fnm>SL</fnm></au></aug><source>Genome Biol</source><pubdate>2007</pubdate><volume>8</volume><issue>1</issue><fpage>102</fpage><xrefbib><pubidlist><pubid idtype="pmcid">1839116</pubid><pubid idtype="pmpid" link="fulltext">17274839</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>The genome of <it>Xanthomonas campestris </it>pv. campestris B100 and its use for the reconstruction of metabolic pathways involved in xanthan biosynthesis</p></title><aug><au><snm>Vorh&#246;lter</snm><fnm>FJ</fnm></au><au><snm>Schneiker</snm><fnm>S</fnm></au><au><snm>Goesmann</snm><fnm>A</fnm></au><au><snm>Krause</snm><fnm>L</fnm></au><au><snm>Bekel</snm><fnm>T</fnm></au><au><snm>Kaiser</snm><fnm>O</fnm></au><au><snm>Linke</snm><fnm>B</fnm></au><au><snm>Patschkowski</snm><fnm>T</fnm></au><au><snm>R&#252;ckert</snm><fnm>C</fnm></au><au><snm>Schmid</snm><fnm>J</fnm></au><au><snm>Sidhu</snm><fnm>VK</fnm></au><au><snm>Sieber</snm><fnm>V</fnm></au><au><snm>Tauch</snm><fnm>A</fnm></au><au><snm>Watt</snm><fnm>SA</fnm></au><au><snm>Weisshaar</snm><fnm>B</fnm></au><au><snm>Becker</snm><fnm>A</fnm></au><au><snm>Niehaus</snm><fnm>K</fnm></au><au><snm>P&#252;hler</snm><fnm>A</fnm></au></aug><source>J Biotechnol</source><pubdate>2008</pubdate><volume>134</volume><issue>1-2</issue><fpage>33</fpage><lpage>45</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.jbiotec.2007.12.013</pubid><pubid idtype="pmpid" link="fulltext">18304669</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>A genome-wide survey of short coding sequences in <it>streptococci</it></p></title><aug><au><snm>Ibrahim</snm><fnm>M</fnm></au><au><snm>Nicolas</snm><fnm>P</fnm></au><au><snm>Bessi&#232;res</snm><fnm>P</fnm></au><au><snm>Bolotin</snm><fnm>A</fnm></au><au><snm>Monnet</snm><fnm>V</fnm></au><au><snm>Gardan</snm><fnm>R</fnm></au></aug><source>Microbiology</source><pubdate>2007</pubdate><volume>153</volume><issue>11</issue><fpage>3631</fpage><lpage>3644</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1099/mic.0.2007/006205-0</pubid><pubid idtype="pmpid" link="fulltext">17975071</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>New genes in old sequence: a strategy for finding genes in the bacterial genome</p></title><aug><au><snm>Borodovsky</snm><fnm>M</fnm></au><au><snm>Koonin</snm><fnm>EV</fnm></au><au><snm>Rudd</snm><fnm>KE</fnm></au></aug><source>Trends in Biochemical Sciences</source><pubdate>1994</pubdate><volume>19</volume><issue>8</issue><fpage>309</fpage><lpage>313</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/0968-0004(94)90067-1</pubid><pubid idtype="pmpid">7940673</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>The host range of the genus <it>Xanthomonas</it></p></title><aug><au><snm>Leyns</snm><fnm>F</fnm></au><au><snm>De Cleene</snm><fnm>M</fnm></au><au><snm>Swings</snm><fnm>J</fnm></au><au><snm>De Ley</snm><fnm>J</fnm></au></aug><source>Bot Rev</source><pubdate>1984</pubdate><volume>50</volume><fpage>308</fpage><lpage>355</lpage><xrefbib><pubid idtype="doi">10.1007/BF02862635</pubid></xrefbib></bibl><bibl id="B18"><title><p>Black rot: a continuing threat to world crucifers</p></title><aug><au><snm>Williams</snm><fnm>PH</fnm></au></aug><source>Plant Dis</source><pubdate>1980</pubdate><volume>64</volume><fpage>736</fpage><lpage>742</lpage><xrefbib><pubid idtype="doi">10.1094/PD-64-736</pubid></xrefbib></bibl><bibl id="B19"><title><p>Comparison of the genomes of two <it>Xanthomonas </it>pathogens with differing host specificities</p></title><aug><au><snm>da Silva</snm><fnm>AC</fnm></au><au><snm>Ferro</snm><fnm>JA</fnm></au><au><snm>Reinach</snm><fnm>FC</fnm></au><etal/></aug><source>Nature</source><pubdate>2002</pubdate><volume>417</volume><fpage>459</fpage><lpage>463</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/417459a</pubid><pubid idtype="pmpid" link="fulltext">12024217</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>Comparative and functional genomic analyses of the pathogenicity of phytopathogen <it>Xanthomonas campestris </it>pv. campestris</p></title><aug><au><snm>Qian</snm><fnm>W</fnm></au><au><snm>Jia</snm><fnm>Y</fnm></au><au><snm>Ren</snm><fnm>SX</fnm></au><etal/></aug><source>Genome Res</source><pubdate>2005</pubdate><volume>15</volume><fpage>757</fpage><lpage>767</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.3378705</pubid><pubid idtype="pmcid">1142466</pubid><pubid idtype="pmpid" link="fulltext">15899963</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>GeneMark: parallel gene recognition for both DNA strands</p></title><aug><au><snm>Borodovsky</snm><fnm>M</fnm></au><au><snm>McIninch</snm><fnm>J</fnm></au></aug><source>Computers Chemistry</source><pubdate>1993</pubdate><volume>17</volume><fpage>123</fpage><lpage>133</lpage></bibl><bibl id="B22"><title><p>ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes</p></title><aug><au><snm>Guo</snm><fnm>FB</fnm></au><au><snm>Ou</snm><fnm>HY</fnm></au><au><snm>Zhang</snm><fnm>CT</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2003</pubdate><volume>31</volume><issue>6</issue><fpage>1780</fpage><lpage>1789</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg254</pubid><pubid idtype="pmcid">152858</pubid><pubid idtype="pmpid" link="fulltext">12626720</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>GISMO-gene identification using a support vector machine for ORF classification</p></title><aug><au><snm>Krause</snm><fnm>L</fnm></au><au><snm>McHardy</snm><fnm>AC</fnm></au><au><snm>Nattkemper</snm><fnm>TW</fnm></au><au><snm>P&#252;hler</snm><fnm>A</fnm></au><au><snm>Stoye</snm><fnm>J</fnm></au><au><snm>Meyer</snm><fnm>F</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2007</pubdate><volume>35</volume><issue>2</issue><fpage>540</fpage><lpage>549</lpage><xrefbib><pubidlist><pubid idtype="pmcid">1802617</pubid><pubid idtype="pmpid" link="fulltext">17175534</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>REGANOR: a gene prediction server for prokaryotic genomes and a database of high quality gene predictions for prokaryotes</p></title><aug><au><snm>Linke</snm><fnm>B</fnm></au><au><snm>McHardy</snm><fnm>AC</fnm></au><au><snm>Neuweger</snm><fnm>H</fnm></au><au><snm>Krause</snm><fnm>L</fnm></au><au><snm>Meyer</snm><fnm>F</fnm></au></aug><source>Appl Bioinformatics</source><pubdate>2006</pubdate><volume>5</volume><issue>3</issue><fpage>193</fpage><lpage>198</lpage><xrefbib><pubidlist><pubid idtype="doi">10.2165/00822942-200605030-00008</pubid><pubid idtype="pmpid" link="fulltext">16922601</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Genome scale analysis of diffusible signal factor regulon in <it>Xanthomonas campestris </it>pv. campestris: identification of novel cell-cell communicationdependent genes and functions</p></title><aug><au><snm>He</snm><fnm>YW</fnm></au><au><snm>Xu</snm><fnm>M</fnm></au><au><snm>Lin</snm><fnm>K</fnm></au><au><snm>Ng</snm><fnm>YJ</fnm></au><au><snm>Wen</snm><fnm>CM</fnm></au><au><snm>Wang</snm><fnm>LH</fnm></au><au><snm>Liu</snm><fnm>ZD</fnm></au><au><snm>Zhang</snm><fnm>HB</fnm></au><au><snm>Dong</snm><fnm>YH</fnm></au><au><snm>Dow</snm><fnm>JM</fnm></au><au><snm>Zhang</snm><fnm>LH</fnm></au></aug><source>Mol Microbiol</source><pubdate>2006</pubdate><volume>59</volume><fpage>610</fpage><lpage>622</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-2958.2005.04961.x</pubid><pubid idtype="pmpid" link="fulltext">16390454</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Co-regulation of <it>Xanthomonas campestris </it>virulence by quorum sensing and a novel two-component regulatory system RavS/RavR</p></title><aug><au><snm>He</snm><fnm>YW</fnm></au><au><snm>Boon</snm><fnm>C</fnm></au><au><snm>Zhou</snm><fnm>L</fnm></au><au><snm>Zhang</snm><fnm>LH</fnm></au></aug><source>Mol Microbiol</source><pubdate>2009</pubdate><volume>71</volume><issue>6</issue><fpage>1464</fpage><lpage>1476</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-2958.2009.06617.x</pubid><pubid idtype="pmpid" link="fulltext">19220743</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p><it>Xanthomonas campestris </it>cell-cell communication involves a putative nucleotide receptor protein Clp and a hierarchical signalling network</p></title><aug><au><snm>He</snm><fnm>YW</fnm></au><au><snm>Ng</snm><fnm>AY</fnm></au><au><snm>Xu</snm><fnm>M</fnm></au><au><snm>Lin</snm><fnm>K</fnm></au><au><snm>Wang</snm><fnm>LH</fnm></au><au><snm>Dong</snm><fnm>YH</fnm></au><au><snm>Zhang</snm><fnm>LH</fnm></au></aug><source>Mol Microbiol</source><pubdate>2007</pubdate><volume>64</volume><fpage>281</fpage><lpage>292</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-2958.2007.05670.x</pubid><pubid idtype="pmpid" link="fulltext">17378922</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Phylogenetic Classification of Prokaryotic and Eukaryotic Sir2-like Proteins</p></title><aug><au><snm>Frye</snm><fnm>RA</fnm></au></aug><source>Biochemical and Biophysical Research Communications</source><pubdate>2000</pubdate><volume>273</volume><fpage>793</fpage><lpage>798</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1006/bbrc.2000.3000</pubid><pubid idtype="pmpid" link="fulltext">10873683</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Sirtuins: Sir2-related NAD-dependent protein deacetylases</p></title><aug><au><snm>North</snm><fnm>BJ</fnm></au><au><snm>Verdin</snm><fnm>E</fnm></au></aug><source>Genome Biology</source><pubdate>2004</pubdate><volume>5</volume><fpage>224</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2004-5-5-224</pubid><pubid idtype="pmcid">416462</pubid><pubid idtype="pmpid" link="fulltext">15128440</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>The SmtB/ArsR family of metalloregulatory transcriptional repressors: structural insights into prokaryotic metal resistance</p></title><aug><au><snm>Busenlehner</snm><fnm>LS</fnm></au><au><snm>Pennella</snm><fnm>MA</fnm></au><au><snm>Giedroc</snm><fnm>DP</fnm></au></aug><source>FEMS Microbiology Reviews</source><pubdate>2003</pubdate><volume>27</volume><fpage>131</fpage><lpage>143</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0168-6445(03)00054-8</pubid><pubid idtype="pmpid">12829264</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Mycobacterial cells have dual nickel-cobalt sensors: sequence relationships and metal sites of metal-responsive repressors are not congruent</p></title><aug><au><snm>Campbell</snm><fnm>DR</fnm></au><au><snm>Chapman</snm><fnm>KE</fnm></au><au><snm>Waldron</snm><fnm>KJ</fnm></au><au><snm>Tottey</snm><fnm>S</fnm></au><au><snm>Kendall</snm><fnm>S</fnm></au><au><snm>Cavallaro</snm><fnm>G</fnm></au><au><snm>Andreini</snm><fnm>C</fnm></au><au><snm>Hinds</snm><fnm>J</fnm></au><au><snm>Stoker</snm><fnm>NG</fnm></au><au><snm>Robinson</snm><fnm>NJ</fnm></au><au><snm>Cavet</snm><fnm>JS</fnm></au></aug><source>J Bio Chem</source><pubdate>2007</pubdate><volume>282</volume><issue>44</issue><fpage>32298</fpage><lpage>32310</lpage><xrefbib><pubid idtype="doi">10.1074/jbc.M703451200</pubid></xrefbib></bibl><bibl id="B32"><title><p>RNA expression analysis using a 30 base pair resolution <it>Escherichia coli </it>genome array</p></title><aug><au><snm>Selinger</snm><fnm>DW</fnm></au><au><snm>Cheung</snm><fnm>KJ</fnm></au><au><snm>Mei</snm><fnm>R</fnm></au><au><snm>Johansson</snm><fnm>EM</fnm></au><au><snm>Richmond</snm><fnm>CS</fnm></au><au><snm>Blattner</snm><fnm>FR</fnm></au><au><snm>Lockhart</snm><fnm>DJ</fnm></au><au><snm>Church</snm><fnm>GM</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2000</pubdate><volume>18</volume><fpage>1262</fpage><lpage>1268</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/82367</pubid><pubid idtype="pmpid" link="fulltext">11101804</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>Transcriptome analysis of <it>Escherichia coli </it>using high-density oligonucleotide probe arrays</p></title><aug><au><snm>Tjaden</snm><fnm>B</fnm></au><au><snm>Saxena</snm><fnm>RM</fnm></au><au><snm>Stolyar</snm><fnm>S</fnm></au><au><snm>Haynor</snm><fnm>DR</fnm></au><au><snm>Kolker</snm><fnm>E</fnm></au><au><snm>Rosenow</snm><fnm>C</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2002</pubdate><volume>30</volume><fpage>3732</fpage><lpage>3738</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkf505</pubid><pubid idtype="pmcid">137427</pubid><pubid idtype="pmpid" link="fulltext">12202758</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>Transcriptional silencing and longevity protein Sir2 is an NAD-dependent histone deacetylase</p></title><aug><au><snm>Imai</snm><fnm>S</fnm></au><au><snm>Armstrong</snm><fnm>CM</fnm></au><au><snm>Kaeberlein</snm><fnm>M</fnm></au><au><snm>Guarente</snm><fnm>L</fnm></au></aug><source>Nature</source><pubdate>2000</pubdate><volume>403</volume><fpage>795</fpage><lpage>800</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/35001622</pubid><pubid idtype="pmpid" link="fulltext">10693811</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>The silencing protein SIR2 and its homologs are NAD-dependent protein deacetylases</p></title><aug><au><snm>Landry</snm><fnm>J</fnm></au><au><snm>Sutton</snm><fnm>A</fnm></au><au><snm>Tafrov</snm><fnm>ST</fnm></au><au><snm>Heller</snm><fnm>RC</fnm></au><au><snm>Stebbins</snm><fnm>J</fnm></au><au><snm>Pillus</snm><fnm>L</fnm></au><au><snm>Sternglanz</snm><fnm>R</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2000</pubdate><volume>97</volume><fpage>5807</fpage><lpage>5811</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.110148297</pubid><pubid idtype="pmcid">18515</pubid><pubid idtype="pmpid" link="fulltext">10811920</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>A phylogenetically conserved NAD+-dependent protein deacetylase activity in the Sir2 protein family</p></title><aug><au><snm>Smith</snm><fnm>JS</fnm></au><au><snm>Brachmann</snm><fnm>CB</fnm></au><au><snm>Celic</snm><fnm>I</fnm></au><au><snm>Kenna</snm><fnm>MA</fnm></au><au><snm>Muhammad</snm><fnm>S</fnm></au><au><snm>Starai</snm><fnm>VJ</fnm></au><au><snm>Avalos</snm><fnm>JL</fnm></au><au><snm>Escalante-Semerena</snm><fnm>JC</fnm></au><au><snm>Grubmeyer</snm><fnm>C</fnm></au><au><snm>Wolberger</snm><fnm>C</fnm></au><au><snm>Boeke</snm><fnm>JD</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2000</pubdate><volume>97</volume><fpage>6658</fpage><lpage>6663</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.97.12.6658</pubid><pubid idtype="pmcid">18692</pubid><pubid idtype="pmpid" link="fulltext">10841563</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p><it>cobB </it>function is required for catabolism of propionate in <it>Salmonella typhimurium </it>LT2: evidence for existence of a substitute function for CobB within the 1,2-propanediol utilization (pdu) operon</p></title><aug><au><snm>Tsang</snm><fnm>AW</fnm></au><au><snm>Escalante-Semerena</snm><fnm>JC</fnm></au></aug><source>J Bacteriol</source><pubdate>1996</pubdate><volume>178</volume><fpage>7016</fpage><lpage>7019</lpage><xrefbib><pubidlist><pubid idtype="pmcid">178609</pubid><pubid idtype="pmpid" link="fulltext">8955330</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>Characterization of five human cDNAs with homology to the yeast SIR2 gene: Sir2-like proteins (sirtuins) metabolize NAD and may have protein ADP-ribosyltransferase activity</p></title><aug><au><snm>Frye</snm><fnm>RA</fnm></au></aug><source>Biochem Biophys Res Commum</source><pubdate>1999</pubdate><volume>260</volume><fpage>273</fpage><lpage>279</lpage><xrefbib><pubid idtype="doi">10.1006/bbrc.1999.0897</pubid></xrefbib></bibl><bibl id="B39"><title><p>Metalloregulation of Soft Metal Resistance Pumps</p></title><aug><au><snm>Xu</snm><fnm>C</fnm></au><au><snm>Rosen</snm><fnm>BP</fnm></au></aug><source>Metals and Genetics</source><publisher>New York, Plenum Press</publisher><editor>Sarkar B</editor><pubdate>1999</pubdate><fpage>5</fpage><lpage>19</lpage></bibl><bibl id="B40"><title><p>Understanding how cells allocate metals using metal sensors and metallochaperones</p></title><aug><au><snm>Tottey</snm><fnm>S</fnm></au><au><snm>Harvie</snm><fnm>DR</fnm></au><au><snm>Robinson</snm><fnm>NJ</fnm></au></aug><source>Accounts of Chemical Research</source><pubdate>2005</pubdate><volume>38</volume><fpage>775</fpage><lpage>783</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1021/ar0300118</pubid><pubid idtype="pmpid" link="fulltext">16231873</pubid></pubidlist></xrefbib></bibl><bibl id="B41"><title><p>Genome-wide cDNA oligo design and its applications in <it>Schizosaccharomyces pombe</it></p></title><aug><au><snm>Lin</snm><fnm>K</fnm></au><au><snm>Liu</snm><fnm>J</fnm></au><au><snm>Miller</snm><fnm>DL</fnm></au><au><snm>Wong</snm><fnm>L</fnm></au></aug><source>The Practical Bioinformatician</source><publisher>Singapore, World Scientific Publishing</publisher><editor>Wong L</editor><pubdate>2004</pubdate><fpage>347</fpage><lpage>358</lpage></bibl></refgrp>
</bm></art>