|
Resolution: standard / high Figure 7.
The TF-specific operator motif identification workflow. 1) First a particular TF-family
was selected and 2) a prominent representative of that family was chosen. 3) The related
sequence was used to search the genome of a particular species for intra-species homologs.
This search was iterated until no new sequences are recovered. A high e-value cut-off
was employed to ensure the recovery of all homologs. The sequences were aligned and
a NJ-tree was generated. Both the alignment and the NJ-tree were used to determine
the family or sub-family boundaries. 4) The procedure was repeated to retrieve all
inter-species homologs and the general features of the intra-species homologs were
used to determine the sequences that were taken into consideration. Orthologous relations
between sequences were established on basis of clustering in the NJ-tree and a sufficient
bootstrap support (in green) for the clustering. In the case of Lp_0172 and Lp_0173
the orthologous clusters are color-coded in brown and orange, respectively, and the
other TFs of L. plantarum are indicated in red. 5) The genomic context of the various orthologs was inspected
(legend bottom left) and in case clear differences existed, the orthologous groups
were sub-divided into different Groups of orthologous functional equivalents (GOOFEs),
as illustrated. Then, upstream regions of the conserved gene(s) in context were selected
and inspected for potential regulatory sequences (the selected regions are indicated
by colored triangles). The potential regulatory sequences were compared and those
that showed similar features were selected. In fact, only those sequences that showed
the highest conservation were selected to determine a specific operator motif. In
the case of Lp_0172 and Lp_0173, a 'CcpA-like' operator motif was found up to 3 times
in the upstream regions. The sequences that were selected to determine the Lp_0172
and Lp_0173 specific operator motifs are displayed (Px indicates the relative position
of the selected sequence with respect to other similar sequences and relative to the
translation start). 6) The selected sequences were used to create a GOOFE specific
operator motif. The thus identified specific motifs related to the orthologous groups
containing Lp_0172 and Lp_0173 demonstrate that the division into GOOFEs was essential
to arrive at highly specific operator motifs. Although the motifs within both orthologous
groups are highly similar, they differ distinctly in one position depending on the
GOOFE. In the case of the TFs orthologous to Lp_0172, the motifs are strikingly different
at position +5, with a fully conserved guanine in the GOOFE containing Lp_0172 and
a fully conserved thymidine in the other. And in the case of the TFs orthologous to
Lp_0173 the motifs are strikingly different at position -5, with a fully conserved
thymidine in the GOOFE containing Lp_0173 and a fully conserved adenine in the other.
remark: The gene/protein identifiers in the figure are derived from the ERGO resource [86].
A conversion to other identifiers can be found in [Additional file 2]. The functional annotation of the depicted genes were taken from the in-house annotation
database of L. plantarum WCFS1 ([38] and C. Francke unpublished results) and the ERGO resource. See [Additional
file 9] for a detailed description of the functional annotation in L. plantarum.
Francke et al. BMC Genomics 2008 9:145 doi:10.1186/1471-2164-9-145 |