The Babesia bovis gene and promoter model: an update from full-length EST analysis
1 Tohoku Medical Megabank Organization, Tohoku University, 6-3-09, aza Aoba, Sendai, Miyagi 980-8579, Japan
2 National Research Center for Protozoan Diseases, Obihiro University of Agriculture and Veterinary Medicine, Inada-cho west 2-13, Obihiro, Hokkaido 080-8555, Japan
3 Department of Medical Genome Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan
BMC Genomics 2014, 15:678 doi:10.1186/1471-2164-15-678Published: 13 August 2014
Babesia bovis is an apicomplexan parasite that causes babesiosis in infected cattle. Genomes of pathogens contain promising information that can facilitate the development of methods for controlling infections. Although the genome of B. bovis is publically available, annotated gene models are not highly reliable prior to experimental validation. Therefore, we validated a preproposed gene model of B. bovis and extended the associated annotations on the basis of experimentally obtained full-length expressed sequence tags (ESTs).
From in vitro cultured merozoites, 12,286 clones harboring full-length cDNAs were sequenced from both ends using the Sanger method, and 6,787 full-length cDNAs were assembled. These were then clustered, and a nonredundant referential data set of 2,115 full-length cDNA sequences was constructed. The comparison of the preproposed gene model with our data set identified 310 identical genes, 342 almost identical genes, 1,054 genes with potential structural inconsistencies, and 409 novel genes. The median length of 5' untranslated regions (UTRs) was 152 nt. Subsequently, we identified 4,086 transcription start sites (TSSs) and 2,023 transcriptionally active regions (TARs) by examining 5' ESTs. We identified ATGGGG and CCCCAT sites as consensus motifs in TARs that were distributed around -50 bp from TSSs. In addition, we found ACACA, TGTGT, and TATAT sites, which were distributed periodically around TSSs in cycles of approximately 150 bp. Moreover, related periodical distributions were not observed in mammalian promoter regions.
The observations in this study indicate the utility of integrated bioinformatics and experimental data for improving genome annotations. In particular, full-length cDNAs with one-base resolution for TSSs enabled the identification of consensus motifs in promoter sequences and demonstrated clear distributions of identified motifs. These observations allowed the illustration of a model promoter composition, which supports the differences in transcriptional regulation frameworks between apicomplexan parasites and mammals.