Comparative analysis of the Hydra vulgaris (Illumina-454 RNAseq), Hydra AEP (454 RNAseq) and Hydra magnipapillata (genome-predicted) transcriptomes. A) Boxplot representing the ORF lengths (nucleotides) of the intermediary (white) and final (blue) assemblies of the genome-assisted (Hydra-dn) and “best of” (Hydra-bo) transcriptomes. For comparison see the distribution of ORF lengths from the AEP-454 (white)  and predicted (pred-RP, pred-CA, grey)  transcriptomes. Open circles represent outliers. Horizontal bars represent, from bottom to top, minimum, lower quartile, median, upper quartile, and maximum ORF lengths (excluding outliers). Numbers at the top indicate redundancy indexes. B) Comparison of the sizes of the coding sequences between the datasets shown in A and the pred-CA transcriptome. The pred-CA coding sequences were aligned against each sequence of every other dataset using BlastN+ without low complexity filter. First hits were retained if the alignment was uninterrupted for more than 100 nt with at least 95% sequence identity. The sizes of the matched and queried sequences were compared and classified into three classes according to the size of the tested sequence (hit): ≥ 100% if larger or equal to the size of the corresponding pred-CA sequence (greyish shadow), between 99% and 75% (blue), lower than 75% (orange). Top numbers indicate the percentage of pred-CA sequences matched by the transcriptome indicated on the x-axis. C) Characteristics of the Hydra-bo, Hydra-meta and AEP-454 RNAseq transcriptomes. As Hydra-bo and Hydra-meta contain exclusively sequences that are at least 150 coding nucleotides long, the same criteria was applied to the AEP-454 dataset. The last column indicates the number of full-length (start and stop codons) ORFs longer than 100 AAs. D) Number of functionally annotated sequences in the RNAseq and genome-predicted transcriptomes when analyzed with BlastX+ (left), Pfam or Panther (right).
Wenger and Galliot BMC Genomics 2013 14:204 doi:10.1186/1471-2164-14-204