Open Access Research article

The transcriptional landscape of the deep-sea bacterium Photobacterium profundum in both a toxR mutant and its parental strain

Stefano Campanaro1*, Fabio De Pascale1, Andrea Telatin1, Riccardo Schiavon1, Douglas H Bartlett2 and Giorgio Valle1

Author Affiliations

1 Department of Biology and CRIBI Biotechnology Centre, University of Padua, Via Ugo Bassi 58/B, Padova, 35131, Italy

2 Scripps Institution of Oceanography, UCSD, 9500 Gilman Drive, La Jolla, CA, 92093, USA

For all author emails, please log on.

BMC Genomics 2012, 13:567  doi:10.1186/1471-2164-13-567

Published: 29 October 2012

Additional files

Additional file 1:

Table S1. Gene coverage calculated from SOLiD sequences uniquely aligned on the P.profundum DB110 strain at 28 MPa. Results obtained were clustered accordingly to COG classes and considered separately for chr. 1 (table cells numbers "1" at line 2) and chr. 2 (table cells numbered"2" at line 2). Lines 2–8 reported respectively: average of the coverage values calculated for genes belonging to each COG class, the minimum value, the first quartile, the median, the third quartile, the maximum value and the p-value (calculated using the Wilkoxon test). Data were graphically represented in Additional file 7: Figure S3.

Format: XLS Size: 38KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

Figure S1. Pie charts reporting the coverage of the P. profundum SS9 genes. (A-D) coverage of the chr. 1 genes, (E-H) coverage of the chr. 2 genes. Coverage was calculated at a single base level on the genome considering the uniquely aligned SOLiD reads and then converted to the mean coverage value for each gene.

Format: TIFF Size: 2.7MB Download file

Open Data

Additional file 3:

Table S2. Number of operons identified in DB110 and TW30 strains at 28 and 0.1 MPa. In cells D4-G7 were reported the number of operons identified in the strains and the hydrostatic pressures analyzed; in cells I4-L7 were reported the number of genes belonging to the operons identified. Since chr. 1 and chr. 2 have different number of genes, we also reported the percentages of operons identified (cells N4-Q7). Operons identification was restricted to the genes having coverage higher than 2, for this reason in the calculation of percentage values we refer only to those having coverage higher than 2.

Format: XLS Size: 35KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

Table S3. Operons identified in the chrs 1, 2 and in the plasmid of P. profundum. Each worksheet refers to a specific experimental condition and strain. On each worksheet column (A) reports the number of genes belonging to each transcriptional unit, column (B) refers to the number of transcripts composed by a certain gene number. Column (C) and following report the genes belonging to each transcriptional unit.

Format: XLS Size: 158KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 5:

Figure S2. Number of genes transcribed in operons of different lengths in DB110 and TW30 strains at low and high pressure. From the histogram it is clearly evident that operons composed by small number of genes are more abundant. This is a general trend in Bacteria. Data were reported for chr. 1 (azure-blue) and chr. 2 (red-orange) and the result is similar. chr. 2 has on average a lower percentage of genes organized in operons but in this graph the result is also due to the absolute gene number that is lower on chr. 2 than on chr. 1.

Format: TIFF Size: 814KB Download file

Open Data

Additional file 6:

Table S4. Statistical analysis of the COG classes enriched in policistronic transcripts. Operons identified in all the four transcriptional analyses were considered in columns (C) (number of genes belonging to each COG class) and (D) (p-value), while those identified only in three, two or one experiment are reported respectively in columns (G-H), (I-J), (K-L). Graph reported below refers to the percentage of genes that are organized in operons in all the experiments considered (column E), results were calculated relatively to the total number of genes per class (column C). Description of COG classes is reported in column (M). Classes having p-values lower than 0.5% were reported in bold in table and in red in the graph.

Format: XLS Size: 42KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 7:

Figure S3. Size distribution of protein coding genes. Histogram showing the size distribution in aminoacids (AA) of the protein coding genes identified in this experiment using RNA-seq data (B) (Additional file 6: Table S4) along with the protein coding genes identified in the P. profundum genome sequencing project (A) using bioinformatics. AA size is reported on the y axes and was limited to 2000 despite a very small number of protein coding genes has higher values.

Format: TIFF Size: 17KB Download file

Open Data

Additional file 8:

Table S5. Main characteristics of the putative small ORF identified from RNA-seq experiment. Column (B): locus tag; column (C): localization on chr. 1 (1), chr. 2 (2) or plasmid; column (D): position; column (E): predicted protein size in AAs; column (F): strand; column (G): ribosome binding site predicted using MotifScanner (see materials and methods) is reported in red color and separated from start codon (in red) from black colored bases; column (H): RBS score calculated using MotifScanner software; column (I): BLASTp results obtained considering complete microbial genomes database (NCBI); column (J): gene name based on BLASTp similarity search; column (K): gene description based on BLASTp search versus complete microbial genomes and NR database; column (L): software (if any) confirming the ORF identified using RNA-seq; columns (M-N-O): COG prediction performed using COGnitior software (http://www.ncbi.nlm.nih.gov/COG/old/xognitor.html).

Format: XLS Size: 140KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 9:

Table S11. Number of the genes differentially expressed identified in the four comparisons performed. In rows 2–14 are reported the data obtained for the comparison between parental strain grown at low (0.1 MPa) and high (28 MPa) pressure (highlighted in blue “DB110LP/DB110HP”). Genes that are expressed equal or more than two times at low pressure and having a p-value lower than 0.0001 are reported in cell (B3); genes expressed equal or more than two times at low pressure and having a p-value comprised between 0.001 and 0.0001 are reported in cell (B9); genes expressed between 1.6 times and two times at low pressure and having a p-value lower than 0.0001 are reported in cell (D3); genes expressed between 1.6 times and two times at low pressure and having a p-value comprised between 0.001 and 0.0001 are reported in cell (D9); genes expressed equal or more than two times at high pressure and having a p-value lower than 0.0001 are reported in cell (G3); genes expressed equal or more than two times at high pressure and having a p-value comprised between 0.001 and 0.0001 are reported in cell (G9); genes expressed between 1.6 times and two times at 28 MPa and have a p-value lower than 0.0001 are reported in cell I3; genes expressed between 1.6 times and 2 times at high pressure and having a p-value comprised between 0.001 and 0.0001 are reported in cell (I9); in rows 16–28 are reported the data for the comparison between mutant strain (TW30) and parental strain (DB110) at high pressure (28 MPa) (highlighted in violet “TW30HP vs. DB110HP”). in rows 30–42 are reported the data for the comparison between mutant strain (TW30) and parental strain (DB110) at low pressure (0.1 MPa) (highlighted in pink “TW30LP vs. DB110LP”). in rows 44–56 are reported the data for the comparison between mutant strain (TW30) grown at low (0.1 MPa) and high (28 MPa) pressure (highlighted in azure “TW30LP vs. TW30HP”).

Format: XLS Size: 36KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 10:

Table S6. Gene expression analyses of the P. profundum SS9 genes. Column (A): starting from origin of duplication, genes are numbered according to the position on the genome and on chromosomes; column (B): locus tag; column (C): Swiss-prot ID; column (D): gene name; column (E): gene description downloaded from NCBI database; column (F): gene description obtained from CMR (Comprehensive Microbial Resource) (http://cmr.jcvi.org/tigr-scripts/CMR/CmrHomePage.cgi); columns (G-H: COG code and COD ID; columns (I, J, K, L): chromosome, strand, start and stop position of the genes; columns (M, N, O, P): number of reads mapped on genes in the four experiments (DB110 at 28 MPa, DB110 at 0.1 MPa, TW30 at 28 MPa and TW30 at 0.1 MPa. All values are normalized considering the total number of reads mapped on the reference sample (DB110 at 28 MPa); columns (Q, R, S, T): reads per kb per 1 million reads mapped (RPKM) values are reported for the same four samples (DB110 at 28 MPa, DB110 at 0.1 MPa, TW30 at 28 MPa, TW30 at 0.1 MPa); columns (U, V): log2 ratios and p-values calculated using DEGseq software for the comparison between DB110 0.1 MPa and DB110 28 MPa; columns (W, X): log2 ratios and p-values calculated using DEGseq software for the comparison between TW30 28 MPa and DB110 28 MPa; columns (Y, Z): log2 ratios and p-values calculated using DEGseq software for the comparison between TW30 0.1 MPa and DB110 0.1 MPa; columns (AA, AB): log2 ratios and p-values calculated using DEGseq software for the comparison between TW30 0.1 MPa and TW30 28 MPa; columns (AC, AD): log2 ratios and p-values calculated using DEGseq software for the comparison between TW30 0.1 MPa and DB110 28 MPa; column (AE): gene ID of the orthologous V. cholerae genes belonging to the ToxR regulon; column (AF): gene name; column (AG): gene function; column (AH): gene description; column (AI): gene ID; column (AJ): log2 ratio of gene expression values determined in V. cholerae toxRS mutant compared to N16961 strain; column (AK): log2 ratio of gene expression values determined in V. cholerae tcpPH mutant compared to N16961 strain; column (AL): log2 ratio of gene expression values determined in V. cholerae toxT mutant compared to N16961 strains.

Format: XLS Size: 4.1MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 11:

Table S7. Analysis of the transcription start sites and corrections introduced. Column (A): locus tag; column (B): position of the start codon identified in our analysis; column (C): position of the TSS identified in the previous gene finding performed using Glimmer and Orpheus software; column (D): modification introduced; column (E): strand; columns (F-G): sequence and score of the ribosome binding site identified using MotifScanner software (the RBS site and the start codon are labeled in red); column (H): gene description; column (I): gene finding software (if any) corroborating the new start codon identified; column (J): description of the alignment performed using BLAST, alignments that agree with our start codon prediction have been reported; column (K): agreement with the structure of the transcript identified using RNA-seq, genes not confirmed using this method were modified considering only bioinformatics (RBS position and BLAST with other bacteria); column (M) sequence determined considering the new gene prediction.

Format: XLS Size: 110KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 12:

Table S8. sRNAs identified using RNA-seq are classified considering their position with respect to protein coding genes. Column (A): locus tag was arbitrarily assigned considering the locus tag of the gene overlapped or the closer gene and adding “a”, “b” or “c” to the locus tag; column (B): chromosome; columns (C-D): transcript start and transcript end determined considering Rfam database and/or RNA-seq data; column (E): strand; column (F): gene name; column (G): comment; column (H): genes localized in repeated regions are labeled; column (I): protein-coding genes overlapped to the sRNA; column (J): sRNAs overlapped to ribosome binding sites of protein-coding genes are reported.

Format: XLS Size: 130KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 13:

Figure S4. Minimum free energy [kcal/mol] determined using RNAfold software (Vienna package). Data refer to the 5-UTR regions (A) and to different classes of sRNAs: intergenic (B), partially overlapped (C) and completely overlapped (D) to the ORFs. Grey points correspond to random sequences having base composition equivalent to that of the RNAs reported in the same analysis. Blue and red lines represent lowess interpolation of random sequences and putative small RNAs.

Format: TIFF Size: 1001KB Download file

Open Data

Additional file 14:

Table S9. Analyses of the 5 and 3-UTRs length for genes belonging to different COG classes. We reported the minimum (columns E, O), the first quartile (columns F, P), the median (columns G, Q), the mean (columns H, R) the third quartile (columns I, S) and the maximum length (columns J, T) of UTRs for genes classified considering COG classes (columns D, N). Genes having UTRs significantly longer (black bold) or shorter (red bold) are highlighted. We have considered only classes having p-value lower than 5% (Wilcoxon test). Results are reported in graph in Figure 5.

Format: XLS Size: 37KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 15:

Table S10. Length of the 5 and 3-UTRs determined using RNA-seq. Column (A): reports if 5-UTR or 3-UTR length was determined for a specific transcript; columns (B,C): start/end determined for each UTR region; column (D): strand of the gene (or operon) analyzed; columns (E-L): gene(s) belonging to the transcript; columns (M-T) - COG categories of the genes belonging to the transcript. column (U): UTR length.

Format: XLS Size: 775KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 16:

Figure S5. Histogram reporting 1153 distance values between the transcription termination sites determined using RNA-seq and the closest rho-independent terminator. Positive values indicate that termination site determined with RNA-seq is upstream of the predicted rho-independent terminator. The size of the 3-UTRs is slightly underestimated in RNA-seq, in fact the distribution is centered on positive values (28 bp) and the size of the 3-UTRs determined considering the terminator is generally larger than that determined with RNA-seq.

Format: TIFF Size: 22KB Download file

Open Data

Additional file 17:

Figure S6. Venn diagrams showing the number of differentially expressed genes identified in the comparisons: Since the number of possible intersections between subgroups is very high, to obtain a better visualization, data are subdivided into four diagrams. In (A) it is evident that blue (genes down regulated in DB110 LP vs. DB110 HP comparison) and yellow (genes down regulated in TW30 LP vs. TW30 HP comparison) ovals have only limited overlap and this indicates that genes differentially expressed in parental and mutant strains at different pressures are only partially coincident. The same is true for genes up-regulated in “DB110 LP vs. DB110 HP” and “TW30 LP vs. TW30 HP” comparisons, reported in (B), blue and yellow ovals. In (A) the arrow indicates the genes that have an expression similar to ompH, this number is higher if compared to those having a “ompL like” behaviour (reported in D).

Format: TIFF Size: 1MB Download file

Open Data

Additional file 18:

Table S13. Analysis of the COG classes enriched in differentially expressed gene performed using hypergeometric distribution. In the upper part of the table (rows 1–22) analyses was performed considering only the more significant differentially expressed genes (p-value <= 0.0001; log2 ratio >= 1 or log2 ratio <= −1), in the lower part of the figure (rows 24–45) analysis was performed considering all genes differentially expressed (p-value <= 0.001; log2 ratio >= 0.7 or log2 ratio <= −0.7). column (A): COG class; column (B): total number of genes belonging to each COG class in chrs 1–2 and in plasmid; columns (C, F, H, J): number of differentially expressed genes identified in each comparison; columns (D, G, I, K): p-values calculated for each comparison; column (E): total number of genes belonging to each COG class in chr. 1 and in chr. 2; column (L): COG description.

Format: XLS Size: 34KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 19:

Table S12. Gene Ontology analysis performed considering differentially expressed genes. Analyses were performed using GoMiner software and considering only the more statistically significant data (p-value <= 0.0001; log2 ratio >= 1 or log2 ratio <= −1). Genes up and down-regulated on each comparison are considered separately. In column (A) is reported the description of the GO classes identified. For each comparison are reported the total number of genes belonging to each GO biological process class (columns B, H, N, T, Z, AF, AL, AR), the number of genes up or down-regulated (columns C, I, O, U, AA, AG, AM, AS), the enrichment value calculated by the software (columns D, J, P, V, AB, AH, AN, AT) and the p-value (E, K, Q, W, AC, AI, AO, AU).

Format: XLS Size: 121KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 20:

Table S14. Putative ToxR regulated genes. Column (A): locus tag; column (B): Swiss-prot ID; column (C): gene name; column (D): gene description downloaded from NCBI database; column (E): gene description obtained from CMR (Comprehensive Microbial Resource) (http://cmr.jcvi.org/tigr-scripts/CMR/CmrHomePage.cgi); columns (F-G): COG code and COG ID; columns (H, I, J, K): chromosome, strand, start and stop position of the genes; columns (L, M, N, O): number of reads mapped on each gene in the four experiments (DB110 at 28 MPa, DB110 at 0.1 MPa, TW30 at 28 MPa and TW30 at 0.1 MPa). All values are normalized considering the total number of reads mapped on the reference sample (DB110 at 28 MPa); columns (P, Q): log2 ratios and p-values calculated using DEGseq software for the comparison between DB110 at 0.1 MPa and DB110 at 28 MPa; columns (R, S): log2 ratios and p-values calculated using DEGseq software for the comparison between TW30 at 28 MPa and DB110 at 28 MPa; columns (T, U): log2 ratios and p-values calculated using DEGseq software for the comparison between TW30 at 0.1 MPa and DB110 at 0.1 MPa; columns (V, W): log2 ratios and p-values calculated using DEGseq software for the comparison between TW30 at 0.1 MPa and TW30 at 28 MPa; column (X): distance between transcription profile of each gene and that of ompH, calculated considering the Pearson correlation.

Format: XLS Size: 43KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data