Open Access Highly Accessed Research article

Inconsistencies of genome annotations in apicomplexan parasites revealed by 5'-end-one-pass and full-length sequences of oligo-capped cDNAs

Hiroyuki Wakaguri1, Yutaka Suzuki1, Masahide Sasaki1, Sumio Sugano1 and Junichi Watanabe2*

1 Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwanoha, Kashiwa, Chiba, Japan

2 Departments of Parasitology, Institute of Medical Science, The University of Tokyo, Shirokanedai, Minatoku, Tokyo, Japan

For all author emails, please log on.

BMC Genomics 2009, 10:312 doi:10.1186/1471-2164-10-312

Published: 15 July 2009

Additional files

Additional file 1:

Statistics for the oligo-capped cDNA clones. Statistics for the oligo-capped cDNA clones are shown. Pf, Pv, Py, Pb, Cp and Tg indicate Plasmodium falciparum, Plasmodium vivax, Plasmodium yoelii, Plasmodium berghei, Cryptosporidium parvum and Toxoplasma gondii, respectively.

Format: XLS Size: 20KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

Details of 5'-end-one-pass cDNA analysis results. Information about mapped positions of the cDNAs, corresponding annotated gene models, predicted protein motifs, and GO categories are shown. For the cDNAs which did not correspond to any annotated gene, the columns for annotated gene information are blank. Genomic versions used in this study are pf_rel5.4, pv_rel5.4, py_rel5.4, pb_rel5.4, cp_rel3.7 and tg_rel4.3, respectively. Column A: Type of hit for annotated gene. One asterisk (*) indicates no hit to annotated gene. Two asterisks (**) indicate a hit to more than one annotated gene. Column C: Cluster ID [internal use]. Column E: Nucleotide length of one-pass read sequences. Columns F-I: Mapped position of cDNA. Start/End simply represents the order of genomic coordinates, therefore the TSS is at the "start" when the genomic strand is "+" and at the "end" when it is "-". Columns J-O: Information on annotated gene models corresponding to the cDNA. "Putative UTR length" is the distance between the cDNA start (TSS) and the CDS start of the annotated gene model.

Format: XLS Size: 10MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 3:

Statistics for apicomplexan genomes and annotated gene models. ND: not done. NA: not applicable.

Format: XLS Size: 21KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 4:

Example of erroneous annotation. PF13_0024 (annotated gene of Pf) is separated into two annotated genes in Pb, possibly because PB000757.01.0 is on contig PB000757.01.0 and PB000975.00.0 is on contig PB_RP1621.

Format: PNG Size: 7KB Download file

Open Data

Additional file 5:

Results of Wilcoxon Rank Sum Test. Result of "corresponding to ribosome" is listed for each species.

Format: XLS Size: 21KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 6:

Affected Pfam motifs. Affected Pfam motifs are listed. InterProScan was used to search the Pfam database for protein motifs.

Format: XLS Size: 24KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 7:

Details of Tg full-length sequence analysis results. Information about mapped positions, corresponding annotated genes, predicted protein motifs and GO categories, blastP hits and subcellular localization are shown. Column A: Type of hit for annotated gene. One asterisk (*) indicates no hit to annotated gene. Two asterisks (**) indicate a hit to more than one annotated gene. Column D: Amino acids sequence as deduced from the nucleotide sequence of the longest open reading frame (ORF). Columns G-J: Mapped position of the cDNA. Columns M-R: Functional annotation from the deduced amino acid sequence based on our cDNA data. Columns T-AB: Functional annotation using the amino acid sequence deduced from the annotated CDS. Columns K-L: Position of the CDS identified as the longest ORF. Columns M and V: Results of transmembrane domain search using SOSUI [44]. Columns N and W: Results of subcellular localization site search using PSORT [38]. Columns Q, R, AA and AB: Results of protein domain search using InterProScan [43]. Columns T-AB: Information about the annotated gene model(s) which correspond to the clone. Column X: Marked when the subcellular localization predicted by PSORT was different between Column M (cDNA) and V (annotated gene).

Format: XLS Size: 1.8MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 8:

Results of BlastP and Pfam motif searches. List of affected BlastP and Pfam motifs for Tg full-length cDNAs which do not overlap annotated genes.

Format: XLS Size: 20KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 9:

Real-time RT-PCR evidence for transcription. Examples of the results for real-time RT-PCR. IDs of the cDNAs are shown in the margin. (+): with reverse transcriptase; (-) without reverse transcriptase. Gray box: negative (For more details, See Additional file 4).

Format: JPEG Size: 499KB Download file

Open Data

Additional file 10:

RT-PCR result details. UD: undetermined. ND: not done. NA: not applicable. UD, ND and NA are all shaded. Ct value (PCR cycle value for threshold) more than 35.0 is also shaded.

Format: XLS Size: 42KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 11:

Culture and development stage information. Information about strains, development stages and cultures are shown.

Format: XLS Size: 21KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 12:

Database search for cDNA-genome alignment.

Format: PNG Size: 230KB Download file

Open Data