Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Research article

Complete genome of Phenylobacterium zucineum – a novel facultative intracellular bacterium isolated from human erythroleukemia cell line K562

Yingfeng Luo123, Xiaoli Xu1, Zonghui Ding1, Zhen Liu1, Bing Zhang23, Zhiyu Yan1, Jie Sun1, Songnian Hu23* and Xun Hu1*

Author Affiliations

1 Cancer Institute (Key Laboratory for Cancer Intervention and Prevention, National Ministry of Education, PR China; Key Laboratory of Molecular Biology in Medical Sciences, Zhejiang Province, PR China), the Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, PR China

2 James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou, PR China

3 Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, PR China

For all author emails, please log on.

BMC Genomics 2008, 9:386  doi:10.1186/1471-2164-9-386


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2164/9/386


Received:21 August 2007
Accepted:13 August 2008
Published:13 August 2008

© 2008 Luo et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Phenylobacterium zucineum is a recently identified facultative intracellular species isolated from the human leukemia cell line K562. Unlike the known intracellular pathogens, P. zucineum maintains a stable association with its host cell without affecting the growth and morphology of the latter.

Results

Here, we report the whole genome sequence of the type strain HLK1T. The genome consists of a circular chromosome (3,996,255 bp) and a circular plasmid (382,976 bp). It encodes 3,861 putative proteins, 42 tRNAs, and a 16S-23S-5S rRNA operon. Comparative genomic analysis revealed that it is phylogenetically closest to Caulobacter crescentus, a model species for cell cycle research. Notably, P. zucineum has a gene that is strikingly similar, both structurally and functionally, to the cell cycle master regulator CtrA of C. crescentus, and most of the genes directly regulated by CtrA in the latter have orthologs in the former.

Conclusion

This work presents the first complete bacterial genome in the genus Phenylobacterium. Comparative genomic analysis indicated that the CtrA regulon is well conserved between C. crescentus and P. zucineum.

Background

Phenylobacterium zucineum strain HLK1T is a facultative intracellular microbe recently identified by us [1]. It is a rod-shaped Gram-negative bacterium 0.3–0.5 × 0.5–2 μm in size. It belongs to the genus Phenylobacterium [2], which presently comprises 5 species, P. lituiforme (FaiI3T) [3], P. falsum (AC49T) [4], P. immobile (ET) [2], P. koreense (Slu-01T) [5], and P. zucineum (HLK1T) [1]. They were isolated from subsurface aquifer, alkaline groundwater, soil, activated sludge from a wastewater treatment plant, and the human leukemia cell line K562, respectively. Except for P. zucineum, they are environmental bacteria, and there is no evidence that these microbes are associated with eukaryotic cells. The HLK1T strain, therefore, represents the only species so far in the genus Phenylobacterium that can infect and survive in human cells. Since most, if not all, of the known microbes that can invade human cells are pathogenic, we proposed that HLK1T may have pathogenic relevance to humans [1]. Unlike the known intracellular pathogens that undergo a cycle involving invasion, overgrowth, and disruption of the host cells, and repeating the cycle by invading new cells, HLK1T is able to establish a stable parasitic association with its host, i.e., the strain does not overgrow intracellularly to kill the host, and the host cells carry them to their progeny. One cell line (SW480) infected with P. zucineum has been stably maintained for nearly three years in our lab (data not shown).

In this report, we present the complete genome sequence of P. zucineum.

Results

Genome anatomy

The genome is composed of a circular chromosome (3,996,255 bp) and a circular plasmid (382,976 bp) (Figure 1; Table 1). The G + C contents of chromosome and plasmid are 71.35% and 68.5%, respectively. There are 3,861 putative protein-coding genes (3,534 in the chromosome and 327 in the plasmid), of which 3,180 have significant matches in the non-redundant protein database. Of the matches, 585 are conserved hypothetical proteins and 2,595 are proteins with known or predicted functions. Forty-two tRNA genes and one 16S-23S-5S rRNA operon were identified in the chromosome.

Table 1. Genome summary of P. zucineum Strain HLK1T

thumbnailFigure 1. Circular representation of the P. zucineum strain HLK1T chromosome and plasmid (smaller circle). Circles indicate (from the outside): (1) Physical map scaled in megabases from base 1, the start of the putative replication origin. (2) Coding sequences transcribed in the clockwise direction are color-coded according to COG functional category. (3) Coding sequences transcribed in the counterclockwise direction are color-coded according to COG functional category. (4) Proteins involved in establishment of intracellular niche are TonB-dependent receptors (orange) and pilus genes (sienna). (5) Functional elements responsible for environmental transition are extracytoplasmic function sigma factors (royal blue), transcriptional regulators (violet red), two-component signal transduction proteins (deep sky blue), heat shock molecular chaperons (spring green), type IV secretion systems (plum), chemotaxis systems (green yellow) and flagellum proteins (gray). (6) G + C percent content (10-kb window and 1-kb incremental shift for chromosome; 300 bp window and 150 bp for incremental shift for plasmid); values larger than average (71.35% in chromosome and 68.5% in plasmid) are in red and smaller in medium blue. (7) GC skew (10-kb window and 1-kb incremental shift for chromosome; 300 bp window and 150 bp for incremental shift for plasmid); values greater than zero are in gold and smaller in purple. (8) Repeat families, repeats 01-08 are in dark salmon, dark red, wheat, tomato, light green, salmon, dark blue and gold, respectively.

There are 7 families of protein-coding repetitive sequences and a family of noncoding repeats in the genome (Table 2). Notably, identical copies of repeats 02–04 were found in both the chromosome and the plasmid, suggesting their potential involvement in homologous recombination.

Table 2. Repetitive elements in the P. zucineum genome

On the basis of COG (Cluster of Orthologous Groups) classification, the chromosome is enriched in genes for basic metabolism, such as categories E (amino acid transport and metabolism) and I (lipid transport and metabolism), accounting for 8.29% and 6.09% of the total genes in the chromosome, respectively. On the other hand, the plasmid is enriched for genes in categories O (posttranslational modification, protein turnover, chaperones) and T (signal transduction mechanisms), constituting 12.96% and 9.72% of the total genes in the plasmid, respectively.

As to genes in the plasmid that cope with environmental stimuli, about half of the genes in category O are molecular chaperones (17/32), including 2 dnaJ-like molecular chaperones, 2 clusters of dnaK and its co-chaperonin grpE (PHZ_p0053-0054 and PHZ_p0121-122), a cluster of groEL and its co-chaperonin groES (PHZ_p0095-0096), and 9 heat shock proteins Hsp20. Of 23 genes in category T, there is one cluster (FixLJ, PHZ_p0187-0188), which is essential for the growth of C. crescentus under hypoxic conditions [6].

General metabolism

The enzyme sets of glycolysis and the Entner-Doudoroff pathway are complete in the genome. All genes comprising the pentose phosphate pathway except gluconate kinase were identified, consistent with our previous experimental result that the strain cannot utilize gluconate [1]. The genome lacks two enzymes (kdh, alpha ketoglutarate dehydrogenase and kgd, alpha ketoglutarate decarboxylase), making the oxidative and reductive branches of the tricarboxylic acid cycle operate separately. The genome has all the genes for the synthesis of fatty acids, 20 amino acids, and corresponding tRNAs. Although full sets of genes for the biosynthesis of purine and pyrimidine were identified, enzymes for the salvage pathways of purine (apt, adenine phosphoribosyltransferase; ade, adenine deaminase) and pyrimidine (cdd, cytidine deaminase; codA, cytosine deaminase; tdk, thymidine kinase; deoA, thymidine phosphorylase; upp, uracil phosphoribosyltransferase; udk, uridine kinase; and udp, uridine phosphorylase) were absent. The plasmid encodes some metabolic enzymes, such as those participating in glycolysis, the pentose phosphate pathway, and the citric acid cycle. However, it is worth noting that the plasmid has a gene (6-phosphogluconate dehydrogenase) that is the only copy in the genome (PHZ_p0183).

Like most other species in the genus Phenylobacterium, the strain is able to use L-phenylalanine as a sole carbon source under aerobic conditions [1]. A recent study revealed that phenylalanine can be completely degraded through the homogentisate pathway in Pseudomonas putida U [7]. P. zucineum may use the same strategy to utilize phenylalanine, because all the enzymes for the conversion of phenylalanine through intermediate homogentisate to the final products fumarate and acetoacetate are present in the chromosome (Table 3).

Table 3. Phenylalanine-degrading enzymes in the P. zucineum genome

Functional elements responding to environmental transition

HLK1T is able to survive intracellularly and extracellularly. Consistently, the genome contains the fundamental elements to support the life cycle in different environments. The genome contains abundant two-component signal transduction proteins, transcriptional regulators, and heat shock response proteins, enabling the strain to respond to extra- and intra-cellular stimuli at transcriptional and post-translational levels. Among the total of 102 two-component signal transduction proteins (91 in the chromosome and 11 in the plasmid), there are 36 histidine kinases, 48 response regulators, and 18 hybrid proteins fused with histidine kinase and response regulator. Sixteen pairs of histidine kinase and response regulator (1 in the plasmid) are adjacently aligned and may act as functional operons. These tightly linked modules make two-component signal transduction systems respond to environmental changes efficiently. The genome encodes 170 transcriptional regulators (16 in the plasmid) (Table 4). Notably, we annotated the proteins of 93 bacteria (see methods – comparative genomics) with the same annotation criteria used for P. zucineum and found that the fraction of two-component signal transduction proteins and transcriptional regulators was positively correlated with the capacity for environmental adaptation (Figure 2). The genome contains 17 extracytoplasmic function (ECF) sigma factors (3 in the plasmid) (Table 5). ECFs are suggested to play a role in environmental adaptation for Pseudomonas putida KT2440, whose genome contains 19 ECFs [8]. P. zucineum has 3 heat shock sigma factors rpoH (2 in the plasmid) and 33 heat shock molecular chaperons (17 in the plasmid) (Table 6), which can cope with a variety of stresses, including cellular energy depletion, extreme concentrations of heavy metals, and various toxic substances. [9].

Table 4. Transcriptional regulators in the P. zucineum genome

Table 5. Extracytoplasmic function (ECF) sigma factors in the P. zucineum genome

Table 6. Distribution of heat shock related proteins in P. zucineum and representative alphaproteobacteria with different living habitats

thumbnailFigure 2. Comparative analysis of transcriptional regulators and two-component signal transduction proteins in 6 groups of bacteria classified according to their habitats. (A): The mean number of transcriptional regulators in each megabase pair of the genomes. (B): The mean number of two-component signal transduction proteins in each megabase pair of the genomes. The fraction of transcriptional regulators and two-component signal transduction proteins (solid black circle) of P. zucineum were 41.56 genes/Mb and 23.30 genes/Mb, respectively. Error bars represent standard errors. O: Obligate (26 species), S: Specialized (5 species), AQ: Aquatic (4 species), F: Facultative (28 species), M: Multiple (27 species), T: Terrestrial (3 species).

The genes for cell motility include 3 chemotaxis operons, 7 MCP (methyl-accepting chemotaxis) genes, 15 other genes related to chemotaxis (Table 7), and 43 genes for the biogenesis of the flagellum (Table 8).

Table 7. Chemotaxis proteins in the P. zucineum genome

Table 8. Flagella genes in the P. zucineum genome

The genome contains sec-dependent, sec-independent, typical type II (Table 9) and IV secretion systems (Table 10), which are known to play important roles in adapting to diverse conditions [10,11].

Table 9. Distributions of proteins involved in environmental adaptation in P. zucineum and representative alphaproteobacteria with different living habitats

Table 10. Type IV secretion systems in the P. zucineum genome

To better understand the roles of proteins responsible for environmental transition, we computed the distributions of those proteins in 5 representative alphaproteobacteria with typical habitats (see methods – comparative genomics). Like other multiple bacteria and facultative bacteria, which can survive in multiple niches, P. zucineum encodes a higher fraction of ECFs, transcriptional regulators and two-component signal transduction proteins than obligate bacteria (Table 9). Notably, P. zucineum has the largest number of heat shock related proteins (Table 6), in comparison to the 5 representative alphaproteobacteria and 93 bacteria (data not shown). Among the plasmid-encoded heat shock related proteins are 2 RpoH (PHZ_p0049 and PHZ_p0288) and 2 DnaK-GrpE clusters (PHZ_p0053-0054 and PHZ_p0121-0122). Further phylogenetic analysis suggested that the plasmid-encoded DnaK-GrpE clusters may have undergone a genus-specific gene duplication event (Figure 3C &3D).

thumbnailFigure 3. Neighbor-joining trees of 5 representative alphaproteobacteria and P. zucineum, inferred from (A) 16S rRNA genes, (B) RpoH proteins, (C) DnaK proteins and (D) GrpE proteins. The node labels are bootstrap values (100 replicates). The plasmid-encoded DnaK and GrpE of P. zucineum may have undergone a genus-specific gene duplication event (C &

Adaptation to an intracellular life cycle

To survive intracellularly, P. zucineum must succeed in adhering to and subsequently invading the host cell [12], defending against a hostile intracellular environment [13-16], and capturing iron at very low concentration [17].

It is well known that the pilus takes part in adhering to and invading a host cell [12]. We identified one pili biosynthesis gene (pilA) and 2 operons for pili biosynthesis (Table 11).

Table 11. Pilus proteins in the P. zucineum genome

The genes involved in defense against oxidative stress include superoxide dismutase (PHZ_c0927, PHZ_c1092), catalase (PHZ_c2899), peroxiredoxin (PHZ_c1548), hydroperoxide reductase (ahpF, alkyl hydroperoxide reductase, subunit f, PHZ_c2725, ahpC, alkyl hydroperoxide reductase, subunit c, PHZ_c2724), and the glutathione redox cycle system (glutathione reductase [PHZ_c1740, PHZ_c1981], glutathione synthetase [PHZ_c3479], and γ-glutamylcysteine synthetase [PHZ_c0446, PHZ_c0523]).

Since intracellular free Fe is not sufficient to support the life of bacteria, to survive intracellularly, they must use protein-bound iron, such as heme and transferrin, via transporters and/or the siderophore system. The P. zucineum genome has one ABC type siderophore transporter system (PHZ_c1893-1895), one ABC type heme transporter system (PHZ_c0136, PHZ_c0139, PHZ_c0140), and 60 TonB-dependent receptors which may uptake the iron-siderophore complex (Table 12).

Table 12. TonB-dependent receptors in the P. zucineum genome

Comparative genomics between P. zucineum and C. crescentus

Comparative genomic analysis demonstrated that P. zucineum is phylogenetically the closest to C. crescentus [18] (Figure 4), consistent with the phylogenetic analysis based on 16S RNA gene sequences (Figure 5).

thumbnailFigure 4. List of top 10 complete sequenced bacteria closest to P. zucineum. All 10 are alphaproteobacteria. Among all the sequenced bacterial genomes, C. crescentus shares the greatest number of similar ORFs with P. zucineum

thumbnailFigure 5. Neighbor-joining tree of the alphaproteobacteria, inferred from 16S rRNA genes. The node labels are bootstrap values (100 replicates). C. crescentus is phylogenetically the closest to P. zucineum.

Though the genome size and protein number of P. zucineum (4.37 Mb, 3,861 proteins) are similar to those of C. crescentus (4.01 Mb, 3,767 proteins), no large-scale synteny was found between the genomes. The largest synteny region is only about 30 kb that encodes 24 proteins. The conservation region with the largest number of proteins is the operon encoding 27 ribosomal proteins. In addition, the species share only 57.8% (2,231/3,861) of orthologous proteins. Categories J (translation, ribosomal structure and biogenesis), F (nucleotide transport and metabolism), and L (replication, recombination and repair) are the top 3 conservative COG categories between the species, sharing 88.01%, 81.67%, and 80.65% of the orthologs, respectively.

Comparison of cell cycle genes between P. zucineum and C. crescentus

Since P. zucineum is phylogenetically closest to C. crescentus, and since the latter is a model organism for studies of the prokaryotic cell cycle [19,20], we compared the genes regulating the cell cycle between these species.

The cell cycle of C. crescentus is controlled to a large extent by the master regulator CtrA, which controls the transcription of 95 genes involved in the cycle [19,20]. On the other hand, ctrA is regulated at the levels of transcription, phosphorylation, and proteolytic degradation by its target genes, e.g., DNA methyltransferase (CcrM) regulates the transcription of ctrA, histidine kinases (CckA, PleC, DivJ, DivL) regulate its activity, and ClpXP degrades it. These regulatory 'loops' enable CtrA to precisely control the progression of the cell cycle.

P. zucineum has most of the orthologs mentioned above (Table 13). Among the 95 CtrA-regulated genes in C. crescentus, 75 have orthologs in the P. zucineum genome (Additional file 1). The fraction of CtrA-regulated genes with orthologs in P. zucineum (76.9%, 73/95) is significantly greater than the mean level of the whole genome (57.8%, 2,231/3,861), indicating that the CtrA regulatory system is highly conserved. Genes participating in regulating central events of the cell cycle, such as CcrM (CC0378), Clp protease (CC1963) and 14 regulatory proteins, except for one response regulator (CC3286), are present in the P. zucineum genome. The genes without counterparts in P. zucineum are mostly for functionally unknown proteins.

Additional file 1. Supplemental Table 1 Comparison of genes directly regulated by CtrA between P. zucineum and C. crescentus.

Format: XLS Size: 30KB Download file

This file can be viewed with: Microsoft Excel ViewerOpen Data

Table 13. Comparison of the signal transduction pathways regulating CtrA between the P. zucineum and the C. crescentus

Notably, the sequence of CtrA is strikingly similar between P. zucineum and C. crescentus, with 93.07% identity of amino acid sequence and 89.88% identity of nucleotide sequence. In addition, they share identical promoters (p1 and p2) [21] and the motif (GAnTC) recognized by DNA methyltransferase (CcrM) (Figure 6) [22], suggesting that they probably share a similar regulatory loop of CtrA.

thumbnailFigure 6. Nucleotide acid sequence alignment of the ctrA promoter regions (-200 to +21) of C. crescentus and P. zucineum. Blue background: identical nucleotides; "-": gaps; red and black box: P1 and P2 promoter; black underline: motif recognized by CcrM; red underline: first 21 nucleotides starting with initial codon "ATG.".

Consistent with the results from in silico sequence analysis, the CtrA of P. zucineum can restore the growth of temperature-sensitive strain LC2195 (a CtrA mutant) of C. crescentus [23] at 37°C, indicating that the CtrA of P. zucineum can functionally compliment that of C. crescentus in our experimental conditions (data not shown).

Taken together, the comparative genomics of P. zucineum and C. crescentus suggests that the cell cycle of the former is likely to be regulated similarly to that of the latter.

Presence of ESTs of the strain in human

Since P. zucineum strain HLK1T can invade and persistently live in several human cell lines [1], we were curious about whether this microbe can infect humans. By blasting against the human EST database (dbEST release 041307 with 7,974,440 human ESTs) with the whole genome sequence of P. zucineum, we found 9 matched ESTs (Table 14), of which 3 were from a library constructed from tissue adjacent to a breast cancer, and 6 were from a library constructed from a cell line of lymphatic origin. The preliminary data suggest that P. zucineum may invade humans.

Table 14. Human ESTs matching the genome sequences of P. zucineum

Conclusion

This work presents the first complete bacterial genome in the genus Phenylobacterium. Genome analysis reveals the fundamental basis for this strain to invade and persistently survive in human cells. P. zucineum is phylogenetically closest to C. crescentus based on comparative genome analysis.

Methods

Bacterial growth and genomic library construction

P. zucineum strain HLK1Twas grown in LB (Luria-Bertani) broth at 37°C and then harvested for the preparation of genomic DNA[1]. Genomic DNA was prepared using a bacterial genomic DNA purification kit (V-Gene Biotech., Hangzhou, China) according to the manufacturer's instructions. Sheared DNA samples were fractionated to construct three different genomic libraries, containing average insert sizes of 2.0–2.5 kb, 2.5–3.0 kb and 3.5–4.0 kb. The resulting pUC18-derived library plasmids were extracted using the alkaline lysis method and subjected to direct DNA sequencing with automated capillary DNA sequencers (ABI3730 or MegaBACE1000).

Sequencing and finishing

The genome of P. zucineum was sequenced by means of the whole genome shotgun method with the phred/phrap/consed software packages [24-27]. Sequencing and subsequent gene identification was carried out as described in our earlier publications [28-30]. Briefly, during the shotgun sequence phase, clones were picked randomly from three shotgun libraries and then sequenced from both ends. 44,667 successful sequence reads (>100 bp at Phred value Q13), accounting for 5.47× sequence coverage of the genome, were assembled into 563 sequence contigs representing 60 scaffolds connected by end-pairing information.

The finishing phase involved iterative cycles of laboratory work and computational analysis. To reduce the numbers of scaffolds, reads were added into initial contig assembly by using failed universal primers as primers and by using plasmid clones that extended outwards from the scaffolds as sequence reaction templates. To resolve the low-quality regions, resequencing of the involved reads in low quality regions with universal primers and primer walking the plasmid clones were the first choice, otherwise, resequencing with alternate temperature conditions resolved the remaining low-quality regions. New sequence reads obtained from the above laboratory work were assembled into existing contigs, which yielded new contigs and new scaffolds connected by end-pairing information. Then, consed interface helped us to do nest round of laboratory work based on new arisen contig assembly. After about four iterative cycles of the above "finish" procedures to close gaps and to resolve the low-quality regions, the PCR product obtained by using total genomic DNA as template was sequenced from both ends to close the last physical gap. In addition, the overall sequence quality of the genome was further improved by using the following criteria: (1) two independent high-quality reads as minimal coverage, and (2) Phred quality value = Q40 for each given base. Collectively, 3,542 successful reads were incorporated into initial assembles during the finishing phase. The final assembly was composed of two circular "contigs", of which a smaller one with a protein cluster (including repA, repB, parA and parB) related to plasmid replication was assigned as the plasmid, and the larger one was the chromosome.

Annotation

tRNA genes were predicted with tRNAscan-SE [31]. Repetitive sequences were detected by REPuter [32,33], coupled with intensive manual alignment. We identified and annotated the protein profiles of chromosome and plasmid with the same workstream. For the chromosome, the first set of potential CDSs in the chromosome was established with Glimmer 2.0 trained with a set of ORFs longer than 500 bp from its genomic sequence at default settings [34]. The resulting 5,029 predicted CDSs were BLAST searched against the NCBI non-redundant protein database to determine their homology [35]. 1,174 annotated proteins without the word "hypothetical" or "unknown" in their function description, and without frameshifts or in-frame stop codons, were selected as the second training set. The resulting second set of 4,018 predicted CDSs (assigned as "predicted CDSs") were searched against the NCBI non-redundant protein database. Predicted CDSs that accorded with the following BLAST search criteria were considered "true proteins": (1) 80% of the query sequence was aligned and (2) E-value ≤ 1e-10. Then, the ORFs extracted from the chromosome region among "true proteins" were searched against the NCBI non-redundant protein database. The ORFs satisfying the same criteria as true proteins were considered "true ORFs". Overlapping proteins were manually inspected and resolved, according to the principle we described previously [30]. The final version of the protein profile comprised three parts: true proteins, true ORFs, and predicted CDSs located in the rest of the genome. The translational start codon of each protein was identified by the widely used RBS script [36] and then refined by comparison with homologous proteins [30].

To further investigate the function of each protein, we used InterProScan to search against the InterPro protein family database [37]. The up-to-date KEGG pathway database was used for pathway analysis [38]. All proteins were searched against the COG database which included 66 completed genomes [39,40]. The final annotation was manually inspected by comprehensively integrating the results from searching against the databases of nr, COG, KEGG, and InterPro.

Phylogenetic tree construction

16S rRNA genes were retrieved from 63 alphaproteobacteria, P. zucineum and Escherichia coli O157:H7 EDL933. A neighbor-joining tree with bootstrapping was built using MEGA [41]. The gammaproteobacterium E. coli was used as the outgroup to root the tree. To illustrate the evolutionary history of heat shock related proteins (RpoH, DnaK and GrpE), neighbor-joining trees based on the 16S rRNA genes and the above three proteins of 5 representative alphaproteobacteria (Sinorhizobium meliloti 1021, Brucella suis 1330, C. crescentus CB15, Rickettsia conorii str. Malish 7, Gluconobacter oxydans 621H), P. zucineum and E. coli O157:H7 EDL933 were constructed.

Comparative genomics

Sequence data for comparative analyses were obtained from the NCBI database ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Bacteria/ webcite. The database has 520 completely sequenced bacterial genomes (sequences downloaded on 2007/06/05). All P. zucineum ORFs were searched against the ORFs from all other bacterial genomes with BLASTP. The number of P. zucineum ORFs matched to each genome with significance (E value = 1e-10) was calculated.

To illustrate the contribution of transcriptional regulators and two-component signal transduction proteins to environmental adaptation, we compared the mean fraction of these two types of proteins in bacteria living in 6 different habitats, as described by Merav Parter [42]. These are: (1) obligate bacteria that are necessarily associated with a host, (2) specialized bacteria that live in specific environments, such as marine thermal vents, (3) aquatic bacteria that live in fresh or seawater, (4) facultative bacteria, free-living bacteria that are often associated with a host, (5) multiple bacteria that live in many different environments, and (6) terrestrial bacteria that live in the soil. For bacteria with more than one sequenced strain, we chose only one strain for the comparative study. The numbers of bacterial species in each group were: 26 obligate, 5 specialized, 4 aquatic, 28 facultative, 27 multiple, and 3 terrestrial. We annotated the proteins of these 93 species with the same workflow used for P. zucineum and calculated the mean fraction of transcriptional regulators and two-component signal transduction proteins.

In addition, we annotated the ORFs of 5 representative alphaproteobacteria with different habitats (multiple bacteria S. meliloti 1021 and G. oxydans 621H, facultative bacterium B. suis 1330, aquatic bacterium C. crescentus CB15, and obligate bacterium R. conorii str. Malish 7) using the same workflow and computed the distributions of proteins involved in environmental adaptation.

Ortholog identification

All proteins encoded by one genome were BLASTP searched against a database of proteins encoded by another genome [35], and vice versa. The threshold used in these comparisons was 1e-10. Orthology was identified if two proteins were each other's best BLASTP hit (best reciprocal match).

Data accessibility

The sequences reported in this paper have been deposited in the GenBank database. The accession numbers for chromosome and plasmid are CP000747 and CP000748, respectively.

Abbreviations

EST: Expressed Sequence Tag; KEGG: Kyoto Encyclopedia of Genes and Genomes.

Authors' contributions

XH and SH designed the project; YL, XX, ZD, ZL, ZY and JS performed the research; SH and BZ contributed new reagents\analytical tools; YL, XX, and ZD analyzed the data; and XH, YL, and SH wrote the paper. All authors read and approved the final manuscript.

Acknowledgements

This work was supported in part by the Cheung Kong Scholars Programme (National Ministry of Education, China, and the Li Ka Shing Foundation, Hong Kong) to XH, a Natural Science Foundation of China grant (30672382) to XH, and a Zhejiang Natural Science Foundation, China, grant (R204204) to XH. We thank Dr. Lucy Shapiro (Department of Developmental Biology, Stanford University) for the gifts of the C. crescentus temperature sensitive strain LC2195 and the plasmid pSAL14. We are grateful to Dr. Iain Bruce (Department of Physiology, Zhejiang University School of Medicine) for English editing.

References

  1. Zhang K, Han W, Zhang R, Xu X, Pan Q, Hu X: Phenylobacterium zucineum sp. nov., a facultative intracellular bacterium isolated from a human erythroleukemia cell line K562.

    Syst Appl Microbiol 2007, 30(3):207-212. PubMed Abstract | Publisher Full Text OpenURL

  2. Lingens F, Blecher R, Blecher H, Blobel F, Eberspacher J, Frohner C, Gorisch H, Gorisch H, Layh G: Phenylobacterium immobile gen. nov., sp. nov., a gram-negative bacterium that degrades the herbicide chloridazon.

    Int J Syst Bacteriol 1985, 35:26-39. OpenURL

  3. Kanso S, Patel BK: Phenylobacterium lituiforme sp. nov., a moderately thermophilic bacterium from a subsurface aquifer, and emended description of the genus Phenylobacterium.

    Int J Syst Evol Microbiol 2004, 54(Pt 6):2141-2146. PubMed Abstract | Publisher Full Text OpenURL

  4. Tiago I, Mendes V, Pires C, Morais PV, Verssimo A: Phenylobacterium falsum sp. nov., an Alphaproteobacterium isolated from a nonsaline alkaline groundwater, and emended description of the genus Phenylobacterium.

    Syst Appl Microbiol 2005, 28(4):295-302. PubMed Abstract | Publisher Full Text OpenURL

  5. Aslam Z, Im WT, Ten LN, Lee ST: Phenylobacterium koreense sp. nov., isolated from South Korea.

    Int J Syst Evol Microbiol 2005, 55(Pt 5):2001-2005. PubMed Abstract | Publisher Full Text OpenURL

  6. Crosson S, McGrath PT, Stephens C, McAdams HH, Shapiro L: Conserved modular design of an oxygen sensory/signaling network with species-specific output.

    Proc Natl Acad Sci U S A 2005, 102(22):8018-8023. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  7. Arias-Barrau E, Olivera ER, Luengo JM, Fernandez C, Galan B, Garcia JL, Diaz E, Minambres B: The homogentisate pathway: a central catabolic pathway involved in the degradation of L-phenylalanine, L-tyrosine, and 3-hydroxyphenylacetate in Pseudomonas putida.

    J Bacteriol 2004, 186(15):5062-5077. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  8. Martinez-Bueno MA, Tobes R, Rey M, Ramos JL: Detection of multiple extracytoplasmic function (ECF) sigma factors in the genome of Pseudomonas putida KT2440 and their counterparts in Pseudomonas aeruginosa PA01.

    Environ Microbiol 2003/01/22 edition. 2002, 4(12):842-855. PubMed Abstract | Publisher Full Text OpenURL

  9. Missiakas D, Raina S: The extracytoplasmic function sigma factors: role and regulation.

    Mol Microbiol 1998, 28(6):1059-1066. PubMed Abstract | Publisher Full Text OpenURL

  10. Pallen MJ, Chaudhuri RR, Henderson IR: Genomic analysis of secretion systems.

    Curr Opin Microbiol 2003, 6(5):519-527. PubMed Abstract | Publisher Full Text OpenURL

  11. Wickner W, Schekman R: Protein translocation across biological membranes.

    Science 2005, 310(5753):1452-1456. PubMed Abstract | Publisher Full Text OpenURL

  12. Pizarro-Cerda J, Cossart P: Bacterial adhesion and entry into host cells.

    Cell 2006, 124(4):715-727. PubMed Abstract | Publisher Full Text OpenURL

  13. Roop RM 2nd, Bellaire BH, Valderas MW, Cardelli JA: Adaptation of the Brucellae to their intracellular niche.

    Mol Microbiol 2004, 52(3):621-630. PubMed Abstract | Publisher Full Text OpenURL

  14. Miller RA, Britigan BE: Role of oxidants in microbial pathophysiology.

    Clin Microbiol Rev 1997, 10(1):1-18. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  15. Master SS, Springer B, Sander P, Boettger EC, Deretic V, Timmins GS: Oxidative stress response genes in Mycobacterium tuberculosis: role of ahpC in resistance to peroxynitrite and stage-specific survival in macrophages.

    Microbiology 2002, 148(Pt 10):3139-3144. PubMed Abstract | Publisher Full Text OpenURL

  16. Nathan C, Shiloh MU: Reactive oxygen and nitrogen intermediates in the relationship between mammalian hosts and microbial pathogens.

    Proc Natl Acad Sci U S A 2000, 97(16):8841-8848. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  17. Ratledge C, Dover LG: Iron metabolism in pathogenic bacteria.

    Annu Rev Microbiol 2000, 54:881-941. PubMed Abstract | Publisher Full Text OpenURL

  18. Nierman WC, Feldblyum TV, Laub MT, Paulsen IT, Nelson KE, Eisen JA, Heidelberg JF, Alley MR, Ohta N, Maddock JR, Potocka I, Nelson WC, Newton A, Stephens C, Phadke ND, Ely B, DeBoy RT, Dodson RJ, Durkin AS, Gwinn ML, Haft DH, Kolonay JF, Smit J, Craven MB, Khouri H, Shetty J, Berry K, Utterback T, Tran K, Wolf A, Vamathevan J, Ermolaeva M, White O, Salzberg SL, Venter JC, Shapiro L, Fraser CM: Complete genome sequence of Caulobacter crescentus.

    Proc Natl Acad Sci U S A 2001, 98(7):4136-4141. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  19. Laub MT, Chen SL, Shapiro L, McAdams HH: Genes directly controlled by CtrA, a master regulator of the Caulobacter cell cycle.

    Proc Natl Acad Sci U S A 2002, 99(7):4632-4637. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  20. Skerker JM, Laub MT: Cell-cycle progression and the generation of asymmetry in Caulobacter crescentus.

    Nat Rev Microbiol 2004, 2(4):325-337. PubMed Abstract | Publisher Full Text OpenURL

  21. Domian IJ, Reisenauer A, Shapiro L: Feedback control of a master bacterial cell-cycle regulator.

    Proc Natl Acad Sci U S A 1999, 96(12):6648-6653. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  22. Reisenauer A, Kahng LS, McCollum S, Shapiro L: Bacterial DNA methylation: a cell cycle regulator?

    J Bacteriol 1999, 181(17):5135-5139. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  23. Quon KC, Marczynski GT, Shapiro L: Cell cycle control by an essential bacterial two-component signal transduction protein.

    Cell 1996, 84(1):83-93. PubMed Abstract | Publisher Full Text OpenURL

  24. Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities.

    Genome Res 1998, 8(3):186-194. PubMed Abstract | Publisher Full Text OpenURL

  25. Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment.

    Genome Res 1998, 8(3):175-185. PubMed Abstract | Publisher Full Text OpenURL

  26. Gordon D, Abajian C, Green P: Consed: a graphical tool for sequence finishing.

    Genome Res 1998, 8(3):195-202. PubMed Abstract | Publisher Full Text OpenURL

  27. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, McKenney K, Sutton G, Fitzhugh W, Fields C, Gocyne JD, Scott J, Shirley R, Liu L, Glodek A, Kelley JM, Weidman JF, Phillips CA, Spriggs T, Hedblom E, Cotton MD, Utterback TR, Hanna MC, Nguyen DT, Saudek DM, Brandon RC, Fine LD, Fritchman JL, Fuhrmann JL, Geoghagen NSM, Gnehm CL, McDonald LA, Small KV, Fraser CM, Smith HO, Venter JC: Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.

    Science 1995, 269(5223):496-512. PubMed Abstract | Publisher Full Text OpenURL

  28. Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, Li J, Liu Z, Li L, Liu J, Qi Q, Liu J, Li L, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Zhang J, Xu J, Zhang K, Zheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Ren X, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Wang J, Zhao W, Li P, Chen W, Wang X, Zhang Y, Hu J, Wang J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen S, Guo W, Li G, Liu S, Tao M, Wang J, Zhu L, Yuan L, Yang H: A draft sequence of the rice genome (Oryza sativa L. ssp. indica).

    Science 2002, 296(5565):79-92. PubMed Abstract | Publisher Full Text OpenURL

  29. Chen C, Tang J, Dong W, Wang C, Feng Y, Wang J, Zheng F, Pan X, Liu D, Li M, Song Y, Zhu X, Sun H, Feng T, Guo Z, Ju A, Ge J, Dong Y, Sun W, Jiang Y, Wang J, Yan J, Yang H, Wang X, Gao GF, Yang R, Wang J, Yu J: A glimpse of streptococcal toxic shock syndrome from comparative genomics of S. suis 2 Chinese isolates.

    PLoS ONE 2007, 2(3):e315. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  30. Bao Q, Tian Y, Li W, Xu Z, Xuan Z, Hu S, Dong W, Yang J, Chen Y, Xue Y, Xu Y, Lai X, Huang L, Dong X, Ma Y, Ling L, Tan H, Chen R, Wang J, Yu J, Yang H: A complete sequence of the T. tengcongensis genome.

    Genome Res 2002, 12(5):689-700. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  31. Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

    Nucleic Acids Res 1997, 25(5):955-964. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  32. Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R: REPuter: the manifold applications of repeat analysis on a genomic scale.

    Nucleic Acids Res 2001, 29(22):4633-4642. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  33. Kurtz S, Schleiermacher C: REPuter: fast computation of maximal repeats in complete genomes.

    Bioinformatics 1999, 15(5):426-427. PubMed Abstract | Publisher Full Text OpenURL

  34. Delcher AL, Harmon D, Kasif S, White O, Salzberg SL: Improved microbial gene identification with GLIMMER.

    Nucleic Acids Res 1999, 27(23):4636-4641. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  35. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

    Nucleic Acids Res 1997, 25(17):3389-3402. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  36. Suzek BE, Ermolaeva MD, Schreiber M, Salzberg SL: A probabilistic method for identifying start codons in bacterial genomes.

    Bioinformatics 2001, 17(12):1123-1130. PubMed Abstract | Publisher Full Text OpenURL

  37. Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJ, Vaughan R, Zdobnov EM: The InterPro Database, 2003 brings increased coverage and new features.

    Nucleic Acids Res 2003, 31(1):315-318. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  38. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG.

    Nucleic Acids Res 2006, 34(Database issue):D354-7. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  39. Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on protein families.

    Science 1997, 278(5338):631-637. PubMed Abstract | Publisher Full Text OpenURL

  40. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes.

    BMC Bioinformatics 2003, 4:41. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  41. Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0.

    Mol Biol Evol 2007, 24(8):1596-1599. PubMed Abstract | Publisher Full Text OpenURL

  42. Parter M, Kashtan N, Alon U: Environmental variability and modularity of bacterial metabolic networks.

    BMC Evol Biol 2007/09/25 edition. 2007, 7:169. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  43. Gupta A, Singh VK, Qazi GN, Kumar A: Gluconobacter oxydans: its biotechnological applications.

    J Mol Microbiol Biotechnol 2001/05/22 edition. 2001, 3(3):445-456. PubMed Abstract OpenURL

  44. Camargo AA, Samaia HP, Dias-Neto E, Simao DF, Migotto IA, Briones MR, Costa FF, Nagai MA, Verjovski-Almeida S, Zago MA, Andrade LE, Carrer H, El-Dorry HF, Espreafico EM, Habr-Gama A, Giannella-Neto D, Goldman GH, Gruber A, Hackel C, Kimura ET, Maciel RM, Marie SK, Martins EA, Nobrega MP, Paco-Larson ML, Pardini MI, Pereira GG, Pesquero JB, Rodrigues V, Rogatto SR, da Silva ID, Sogayar MC, Sonati MF, Tajara EH, Valentini SR, Alberto FL, Amaral ME, Aneas I, Arnaldi LA, de Assis AM, Bengtson MH, Bergamo NA, Bombonato V, de Camargo ME, Canevari RA, Carraro DM, Cerutti JM, Correa ML, Correa RF, Costa MC, Curcio C, Hokama PO, Ferreira AJ, Furuzawa GK, Gushiken T, Ho PL, Kimura E, Krieger JE, Leite LC, Majumder P, Marins M, Marques ER, Melo AS, Melo MB, Mestriner CA, Miracca EC, Miranda DC, Nascimento AL, Nobrega FG, Ojopi EP, Pandolfi JR, Pessoa LG, Prevedel AC, Rahal P, Rainho CA, Reis EM, Ribeiro ML, da Ros N, de Sa RG, Sales MM, Sant'anna SC, dos Santos ML, da Silva AM, da Silva NP, Silva WA Jr., da Silveira RA, Sousa JF, Stecconi D, Tsukumo F, Valente V, Soares F, Moreira ES, Nunes DN, Correa RG, Zalcberg H, Carvalho AF, Reis LF, Brentani RR, Simpson AJ, de Souza SJ: The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.

    Proc Natl Acad Sci U S A 2001, 98(21):12103-12108. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  45. Dias Neto E, Correa RG, Verjovski-Almeida S, Briones MR, Nagai MA, da Silva W Jr., Zago MA, Bordin S, Costa FF, Goldman GH, Carvalho AF, Matsukuma A, Baia GS, Simpson DH, Brunstein A, de Oliveira PS, Bucher P, Jongeneel CV, O'Hare MJ, Soares F, Brentani RR, Reis LF, de Souza SJ, Simpson AJ: Shotgun sequencing of the human transcriptome with ORF expressed sequence tags.

    Proc Natl Acad Sci U S A 2000, 97(7):3491-3496. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  46. Gerhard DS, Wagner L, Feingold EA, Shenmen CM, Grouse LH, Schuler G, Klein SL, Old S, Rasooly R, Good P, Guyer M, Peck AM, Derge JG, Lipman D, Collins FS, Jang W, Sherry S, Feolo M, Misquitta L, Lee E, Rotmistrovsky K, Greenhut SF, Schaefer CF, Buetow K, Bonner TI, Haussler D, Kent J, Kiekhaus M, Furey T, Brent M, Prange C, Schreiber K, Shapiro N, Bhat NK, Hopkins RF, Hsie F, Driscoll T, Soares MB, Casavant TL, Scheetz TE, Brown-stein MJ, Usdin TB, Toshiyuki S, Carninci P, Piao Y, Dudekula DB, Ko MS, Kawakami K, Suzuki Y, Sugano S, Gruber CE, Smith MR, Simmons B, Moore T, Waterman R, Johnson SL, Ruan Y, Wei CL, Mathavan S, Gunaratne PH, Wu J, Garcia AM, Hulyk SW, Fuh E, Yuan Y, Sneed A, Kowis C, Hodgson A, Muzny DM, McPherson J, Gibbs RA, Fahey J, Helton E, Ketteman M, Madan A, Rodrigues S, Sanchez A, Whiting M, Madari A, Young AC, Wetherby KD, Granite SJ, Kwong PN, Brinkley CP, Pearson RL, Bouffard GG, Blakesly RW, Green ED, Dickson MC, Rodriguez AC, Grimwood J, Schmutz J, Myers RM, Butterfield YS, Griffith M, Griffith OL, Krzywinski MI, Liao N, Morin R, Palmquist D, Petrescu AS, Skalska U, Smailus DE, Stott JM, Schnerch A, Schein JE, Jones SJ, Holt RA, Baross A, Marra MA, Clifton S, Makowski KA, Bosak S, Malek J: The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC).

    Genome Res 2004, 14(10B):2121-2127. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL