Additional file 4.

Summary of the gene annotation of the EST sequences. In this file we report the gene annotation for three set of sequences based on BLAST searches in NCBI and in the Daphnia portal (http://wfleabase.org/ webcite), called wfleabase in the remaining text): 1) ESTs generated for this study exposing animals to three key environmental stressors and using suppressive subtractive hybridization. The results for this set of sequences are summarized in the spreadsheets EST_1070_NCBI and EST_1070_wfleabase_aa. In EST_1070_NCBI we summarize the gene annotation results obtained from BLAST searches in the NCBI non-redundant protein database using the program tblastx. In EST_1070_wfleabase_aa we summarize the results obtained from BLAST searches in the non-redundant protein database of the Daphnia portal (wfleabase) using the program tblastx. 2) Contigs obtained by assembling EST sequences produced in this study (see point 1 above) and sequences of Daphnia magna downloaded from NCBI GenBank at the time of the analysis. The results for this set of sequences are summarized in the spreadsheets Contigs_NCBI_1812, Contigs_wfleabase_aa_1812, and Contigs_wfleabase_na_1812. In Contigs_NCBI_1812 we summarize the gene annotation results obtained from BLAST searches in the NCBI non-redundant protein database using the program tblastx. In Contigs_wfleabase_aa_1812, and Contigs_wfleabase_na_1812 we summarize the results obtained from BLAST searches in the non-redundant protein database and in the nucleotide database of the Daphnia portal (wfleabase) using the programs tblastx and tblastn, respectively. 3) Contigs obtained from clusters of sequences mined for SNP markers. The number of contigs mined for SNPs is lower than the total number of contigs including our sequences and sequences from GenBank (point 2 above) as several stringent criteria were adopted to select them (see Methods). The results for this set of sequences are summarized in the spreadsheets Contigs_NCBI_574, Contigs_wfleabase_aa_574, and Contigs_wfleabase_na_574. Results from BLAST searches were obtained as in point 2 of this table legend. Columns ID in the described spreadsheets are as follows: 1) SID: sequence identity; 2) GOID - Gene ontology term identity; 3) PID - Protein identity as from BLAST searches; 4) P_desc - Gene description as from BLAST searches and indication of the species where it was identified; 5) e-value - significant homology between the sequence query and the hit in NCBI; 6) Paralog - the paralog group identity (several members may be shown); 7) Start-End: FrameFS - open reading frames predictor results with indication of the start and end coordinates and the frame; 8) DomainID:desc - protein site scan domain identity and description of the protein domain; 9) length - length of the EST; 10) OG_ID - group identity of the ortholog group of protein sequences. This analysis is based on searches for orthologs in several genomes; 11) E-value - significant homology to the ortholog group of protein sequences; 12) Score - score for the ortholog group of protein sequences analysis. The columns ID from 1 to 12 can be found in the spreadsheets: EST_1070_NCBI, Contigs_NCBI_1812, and Contigs_NCBI_574. In the remaining spreadsheets the following columns ID are present: 1) query id - query identity; 2) database sequence (subject) id - sequence identity in wfleabase; 3) gene id - gene identity in wfleabase; 4) percent identity - percentage of identity between query and the gene in wfleabase; 5) alignment length - match in bp between the query and the gene in wfleabase; 6) number of mismatches - number of mismatches between the query and the gene in wfleabase; 7) number of gap openings - gap openings between the query and the gene in wfleabase; 8) query start; 9) query end; 10) subject start - database sequence (subject) start; 11) subject end - database sequence (subject) end; 12) Expect value-E-value of the match between the query and the subject; 13) HSP bit score - blastp e-value score; 14) Gene_ID - gene identity in wfleabase; 15) Gname - gene name; 16) Gnomon - gene prediction in NCBI; 17) Paralog; 18) Paralog,# - number of paralogs identified; 19) OrthoID - ortholog identity; 20) ArpGene - homology to the arthropod genes list; 21) ArpDE - arthropod genes description; 22) Scaffold - scaffold number where the query was annotated; 23) Begin - query start on the scaffold; 24) End - query end on the scaffold; 25) Or - orphan gene; 26) KOG_JGI - ortholog and paralog proteins identities provided for a JGI-sequenced organism; 27) KOG_EMBL - ortholog and paralog proteins identities provided in the EMBL database; 28) meNOG_EMBL - evolutionary genealogy of genes; 29) Enzyme_JGI - protein identity reported in JGI; 30) Enzyme_JGI - protein identity reported in EMBL; 31) Description_JGI - protein description based on JGI database; 32) GeneOntology_JGI - Gene ontology as described in the JGI database; 33) Tandem_ID - identity of tandem genes arrangements. The columns ID are listed in the column_IDs spreadsheet.

Format: XLS Size: 3.4MB Download file

This file can be viewed with: Microsoft Excel Viewer

Orsini et al. BMC Genomics 2011 12:309   doi:10.1186/1471-2164-12-309