Open Access Highly Accessed Research article

Smed454 dataset: unravelling the transcriptome of Schmidtea mediterranea

Josep F Abril12, Francesc Cebrià12, Gustavo Rodríguez-Esteban12, Thomas Horn3, Susanna Fraguas12, Beatriz Calvo12, Kerstin Bartscherer4 and Emili Saló12*

  • * Corresponding author: Emili Saló esalo@ub.edu

  • † Equal contributors

Author Affiliations

1 Departament de Genètica, Facultat de Biología, Universitat de Barcelona (UB), Av. Diagonal 645, edifici annex, planta 1, 08028, Barcelona, Catalunya, Spain

2 Institut de Biomedicina de la Universitat de Barcelona (IBUB), Av. Diagonal 645, edifici annex, planta 1, 08028, Barcelona, Catalunya, Spain

3 Division of Signaling and Functional Genomics German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany

4 Max Planck Research Group Stem Cells and Regeneration Max-Planck-Institute for Molecular Biomedicine, Von-Esmarch-Strasse 54, 48149 Muenster, Germany

For all author emails, please log on.

BMC Genomics 2010, 11:731  doi:10.1186/1471-2164-11-731

Published: 31 December 2010

Additional files

Additional file 1:

GO annotation for 90e contigs not mapping onto the WUSL 3.1 genome assembly. 8,831 90e contigs were not found in the genome. 3,480 had a BLASTX hit to a sequence of NCBI NRprot; yet only 2,401 had a hit to a protein functionally annotated in the GO database. This file contains the description of the best HSP for 71 of those annotated contigs, after filtering out as described above. (Header: CONTIG ID = Smed454 sequence identifier, E-VALUE = BLASTX HSP E-value, ALN_SCORE = HSP alignment score, IDENTITIES = number of identical amino acids, POSITIVES = number of similar amino acids, SEQUENCE ID = Protein sequence identifier, ACCESSION NUMBER = Protein sequence full accession number, SEQUENCE DESCRIPTION = Full protein GenBank description).

Format: XLS Size: 30KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

Splice sites for a subset of Smed454 sequences mapped onto the Schmidtea mediterranea genome. (Header: GID = Genomic contig IDentifier from WUSLv3.1 genome assembly--including the start and end nucleotide coordinates for the complete match--, CIG=90e contig IDentifier, INTNUM = Intron number within the 90e contig, EXO = splice signals found by exonerate, ORI = sequence orientation--here -1 means that the match was found on the reverse strand of the genomic contig--, CEXO = corrected splice site signals after reverse complementing the genomic sequence when required, ILEN = Intron length in bp, IORI = Intron start--relative to the match coordinates--, IEND = Intron end--relative to the match coordinates--, STRAND, SSSEQ = Splice sites sequences--where a point separates three nucleotides from the 5' and 3' exons, and the three dots in the middle denote intron sequence not shown for clarity--).

Format: TBL Size: 1.3MB Download file

Open Data

Additional file 3:

List of 90e transcripts validated by RT-PCR. (Header: # = Number, CONTIG=90e contig ID, PRIMER_FORWARD = 5' to 3' sequence of the forward primer used, REVERSE_FORWARD = 5' to 3' sequence of the reverse primer used, AMPLICON SIZE = Size amplified in bp, SET = refers to the subset of origin of the 90e contig: no hit genome, hit genome, - blast (no BLASTX hit), +blast (BLASTX hit)).

Format: XLS Size: 37KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

Smed454 sequences matching known Schmidtea mediterranea genes. (Header: ACCESSION NUMBER = Known gene sequence identifier as target, NAME = Description for that sequence, LENGTH = Nucleotide length for that sequence, A&T CONTENT = Sequence composition, 454 90e CONTIG/SINGLETON = Smed454 sequence identifier as query, LENGTH = Nucleotide sequence length for this sequence, ALIGNMENT LENGTH = HSP length, START = Start nucleotide of alignment on target, END = Final nucleotide of alignment on target, IDENTITY = Identity score, BITSCORE = Alignment bit score, E-VALUE = HSP BLAST e-value, HIT LENGTH = Un-gapped length of the alignment on the target, %COVERAGE = Sum of co-linear HSPs on target coordinates divided by the total length of the target, #SEQs = Number of co-linear HSPs considered, avg%COV = The coverage divided by the number of co-linear HSPs).

Format: XLS Size: 325KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 5:

Gene Ontology for all three Smed454 sets: 90, 98 and 90e. Level one and two GO codes are shown in order to simplify the listings. Although there are small changes in GO frequencies, annotation is consistent throughout all three sets. (Header: GO = Gene Ontology unique identifier, Count = Number of sequences with a given GO annotation, Freq% = Frequencies for every GO annotation. The total shown does not include the un-annotated and over-represented features, that is, the first two rows on each table).

Format: XLS Size: 55KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 6:

List of cell cycle, cell division, DNA repair or DNA damage candidates. Short list of candidates annotated as genes involved in cell cycle, cell division, DNA repair or DNA damage. (Header: ID = Smed454 sequence identifier, BLASTX HIT = Description of the best sequence hit, ACCESSION NUMBER = Sequence identifier of the best sequence hit, E-VALUE = BLASTX e-value for that sequence hit).

Format: XLS Size: 55KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 7:

Summary report for the consensus set of 4,663 predicted transmembrane proteins including functional annotations. (Header: Sequence_ID = Protein sequence identifier, Sequence_AA = Amino acid sequence, Length[aa] = Length of amino acid sequence, Phobius_TM = Phobius prediction of number of transmembrane domains, Phobius_SP = Phobius prediction of signal peptide, Phobius_Top = Phobius prediction of membrane topology, TMHMM_TM = TMHMM2.0 prediction of number of transmembrane domains, TMHMM_Top = TMHMMv2.0 prediction of membrane topology, SOSUI_TM = SOSUI prediction of number of transmembrane domains, SOSUI_Top = SOSUI prediction of membrane topology, UFO_PFAM = UFO annotation of Pfam protein families, UFO_GO = UFO annotation of gene ontologies).

Format: XLS Size: 2MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 8:

List of neurotransmitter, peptide and hormone receptor sequence candidates. Complete complement of Smed454 dataset contigs and singletons showing homology to neurotransmitter and hormone receptors, totalling 287 sequences. (Header: ID = Smed454 sequence identifier, BLASTX HIT = Description of the best sequence hit, ACCESSION NUMBER = Sequence identifier of the best sequence hit, E-VALUE = BLASTX e-value for that sequence hit).

Format: XLS Size: 58KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 9:

List of eye-related gene sequence candidates. Complete complement of Smed454 dataset contigs and singletons showing homology to eye-related genes, totalling 95 sequences. (Header: ID = Smed454 sequence identifier, BLASTX HIT = Description of the best sequence hit, ACCESSION NUMBER = Sequence identifier of the best sequence hit, E-VALUE = BLASTX e-value for that sequence hit).

Format: DOC Size: 80KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data