Open Access Research article

De novo assembly and characterization of a maternal and developmental transcriptome for the emerging model crustacean Parhyale hawaiensis

Victor Zeng1, Karina E Villanueva2, Ben S Ewen-Campen1, Frederike Alwes1, William E Browne2* and Cassandra G Extavour1*

Author Affiliations

1 Department of Organismic and Evolutionary Biology, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA

2 Department of Biology, University of Miami, 234 Cox Science Center, 1301 Memorial Drive, Coral Gables, FL 33146, USA

For all author emails, please log on.

BMC Genomics 2011, 12:581  doi:10.1186/1471-2164-12-581

Published: 25 November 2011

Additional files

Additional file 1:

Embryonic stages pooled for creation of the P. hawaiensis transcriptome. Staging as per [55].

Format: PDF Size: 31KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Comparison of read lengths from Newbler v2.5 de novo assembly of the P. hawaiensis transcriptome. (A) Distribution of read lengths after assembly with Newbler v2.5 (red). (B) Distribution of read lengths of the shortest assembled reads and raw reads. The assembly yielded assembled reads of over ~4000 bp.

Format: PDF Size: 124KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Distribution of average coverage (reads/bp) within contigs produced by Newbler v2.5 de novo assembly of the P. hawaiensis transcriptome. The coverage within contigs is calculated by dividing the total number of base pairs contained in the reads used to construct a contig by the length of that contig.

Format: EPS Size: 1.2MB Download file

Open Data

Additional file 4:

Analysis of the effect of trans-splicing transcripts on de novo transcriptome assembly. Assembly of all trimmed sequences compared to assembly of sequences lacking the trans-splicing leader sequences [47]. Number of BLAST hits reflects a search against the nr database with an E-value cut-off value of 1e-10.

Format: PDF Size: 56KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 5:

Phylogenetic distribution of species of top unique BLAST hit for Newbler v2.5 assembly of the P. hawaiensis transcriptome. Of the unique BLAST hits to all non-redundant assembly products (isotigs + singletons), 90% were from species belonging to the clades shown. Over 50% of these top BLAST hits are from arthropod species. The large number (12.2%) of top BLAST hits in the complete assembly to sequences from Branchiostoma floridae is due to the high similarity of C2H2 zinc finger domain-containing sequences with a particular linker sequence (TGEKP) that is also highly represented in the genome of the aphid Acyrthosiphon pisum [48]. Red: values after removal of reads and sequences containing this domain. Phylogenetic tree modified from [63,74,75]. Hits from the following most abundant species are represented: D. mojavensis, D. willistoni, D. ananassae, D. grimshawi, D. pseudoobscura pseudoobscura, Aedes aegypti, Anopheles gambiae, Culex quinquefasciatus (Flies & Mosquitoes), Bombyx mori (Moth), Tribolium castaneum (Beetle), Harpegnathos saltator, Camponotus floridanus, Apis mellifera, Nasonia vitripennnis (Bee, Wasp & Ants), Acyrthosiphon pisum (Aphid), Pediculus humanus corporis (Louse), Ixodes scapularis (Tick), Penaeus monodon, Lepeophtheirus salmonis, Litopenaeus vannamei (Crustaceans), Caenorhabditis remanei (Nematode), Gallus gallus, Taeniopygia guttata (Birds), Rattus norvegicus, Mus musculus, Monodelphis domestica (Mammals), Xenopus laevis, X. tropicalis (Amphibian), Danio rerio, Tetraodon nigroviridis (Fish), Branchiostoma floridae (Amphioxus), Ciona intestinalis (Sea Squirt), Saccoglossus kowalevskii (Acorn Worm), Strongylocentrotus purpuratus (Sea Urchin), Trichoplax adherens (Trichoplax), Hydra magnipapillata, Nematostella vectensis (Cnidarians), Perkinsus marinus (Dinoflagellate).

Format: PDF Size: 136KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

Sequences with strong similarity to Daphnia pulex gene sequences identified in the de novo P. hawaiensis transcriptome. Because the D. pulex genome and nr are databases of inevitably different sizes, E-values shown here are for information only and are not strictly comparable. See text for additional details.

Format: PDF Size: 107KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 7:

Presence of existing P. hawaiensis GenBank accessions in the de novo transcriptome. Sequences of P. hawaiensis developmental genes from GenBank were used as a query to BLAST the de novo transcriptome. Most genes with hits had several matches in the transcriptome, among both assembled reads and singletons.

Format: PDF Size: 91KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 8:

The P. hawaiensis transcriptome adds sequence data to GenBank accession number HM191476, the P. hawaiensis prospero homologue. Extended contig for Ph-prospero, comprising the complete mRNA GenBank accession (top, light grey), one isotigs and one contig from the Newbler assembly of the transcriptome (dark grey). The isotig provides an additional 445 bp of 3' UTR sequence and 116 bp of 5' UTR sequence (black) to the GenBank sequence. Comparison with the GenBank sequence shows that isotig24415 and singleton GAP9EXG06HFGHB belong to the same contig.

Format: EPS Size: 514KB Download file

Open Data

Additional file 9:

Selected signaling pathway genes identified in the P. hawaiensis transcriptome. Hit ID indicates if gene hits were found assembled reads (A) or singletons (S). Sequence length (range) indicates the shortest and longest A or S hit sequences for each gene. These results are shown graphically in Figure 7. Groups of hits of a given colour indicate transcriptome sequences that mapped to the same overlapping region of the BLAST target; hits of different colours indicate transcriptome sequences that map to different, non-overlapping regions of the BLAST target. Query organisms: Dm = D. melanogaster; Dr = Danio rerio; Xt = Xenopus tropicalis. Query sequence details: 1. Kinase domain was masked. 2. FERM domain used as query. 3. Amino acids 500-833 (Dl/Ser domain) used as query. 4. Amino acids 1-250 (groucho/TLE domain) used as query. 5. Kinase domain masked; amino acids 420-1390 used as query. 6. Kinase domain masked; amino acids 175-372 used as query. 7. Kinase domain masked; amino acids 150-516 used as query. 8. Kinase domain masked; amino acids 1-100 used as query. 9. Kinase domain masked; amino acids 1-890 used as query. Asterisks indicate genes that appear elsewhere in the same table (in a different pathway).

Format: PDF Size: 833KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 10:

Selected developmental process genes identified in the P. hawaiensis transcriptome. Hit ID indicates if gene hits were found assembled reads (A) or singletons (S). Sequence length (range) indicates the shortest and longest A or S hit sequences for each gene. Groups of hits of a given colour indicate transcriptome sequences that mapped to the same overlapping region of the BLAST target; hits of different colours indicate transcriptome sequences that map to different, non-overlapping regions of the BLAST target. Query organism was D. melanogaster for all cases. Boldface indicates genes also present in other tables (Additional Files 9, 11); asterisks indicate genes that appear elsewhere in the same table (in a different functional category).

Format: PDF Size: 457KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 11:

Selected genes involved in gametogenesis identified in the P. hawaiensis transcriptome. Hit ID indicates if gene hits were found assembled reads (A) or singletons (S). Sequence length (range) indicates the shortest and longest A or S hit sequences for each gene. Groups of hits of a given colour indicate transcriptome sequences that mapped to the same overlapping region of the BLAST target; hits of different colours indicate transcriptome sequences that map to different, non-overlapping regions of the BLAST target. Query organism was D. melanogaster for all cases. Query sequence details: 1. S/T kinase domain was masked. 2. Dead box/Zn finger domains were masked. 3. HLH domain was masked 4. Peptidase C14 domain was masked. 5. Kinase domain masked; amino acids 175-372 used as query. 6. BTB domain used as query. 7. Kinase domain masked; amino acids 1-890 used as query. Boldface indicates genes also present in other tables (Additional Files 9, 10); asterisks indicate that genes are also present elsewhere (in a different functional category) in the same table.

Format: PDF Size: 1.9MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 12:

Representative of consecutively numbered isotigs with highly similar lengths. An example of two isotigs which both have Cyclin D as their top BLAST hit (see Additional File 9), differ in length by only two nucleotides, and have highly similar sequences. Isotig07129 is 4,279 bp long; isotig07130 is 4,277 bp long. Only a portion of the sequence of each isotig is shown. Nucleotide positions differing between the two are indicated in black (likely to be SNPs), white (deletions) or grey (apparent sequence difference may be due to poor quality sequence (lower case letters) at this position).

Format: EPS Size: 1.7MB Download file

Open Data