Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Research article

The Babesia bovis gene and promoter model: an update from full-length EST analysis

Junya Yamagishi12, Hiroyuki Wakaguri3, Naoaki Yokoyama2, Riu Yamashita1, Yutaka Suzuki3, Xuenan Xuan2 and Ikuo Igarashi2*

Author Affiliations

1 Tohoku Medical Megabank Organization, Tohoku University, 6-3-09, aza Aoba, Sendai, Miyagi 980-8579, Japan

2 National Research Center for Protozoan Diseases, Obihiro University of Agriculture and Veterinary Medicine, Inada-cho west 2-13, Obihiro, Hokkaido 080-8555, Japan

3 Department of Medical Genome Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8562, Japan

For all author emails, please log on.

BMC Genomics 2014, 15:678  doi:10.1186/1471-2164-15-678

Published: 13 August 2014

Abstract

Background

Babesia bovis is an apicomplexan parasite that causes babesiosis in infected cattle. Genomes of pathogens contain promising information that can facilitate the development of methods for controlling infections. Although the genome of B. bovis is publically available, annotated gene models are not highly reliable prior to experimental validation. Therefore, we validated a preproposed gene model of B. bovis and extended the associated annotations on the basis of experimentally obtained full-length expressed sequence tags (ESTs).

Results

From in vitro cultured merozoites, 12,286 clones harboring full-length cDNAs were sequenced from both ends using the Sanger method, and 6,787 full-length cDNAs were assembled. These were then clustered, and a nonredundant referential data set of 2,115 full-length cDNA sequences was constructed. The comparison of the preproposed gene model with our data set identified 310 identical genes, 342 almost identical genes, 1,054 genes with potential structural inconsistencies, and 409 novel genes. The median length of 5' untranslated regions (UTRs) was 152 nt. Subsequently, we identified 4,086 transcription start sites (TSSs) and 2,023 transcriptionally active regions (TARs) by examining 5' ESTs. We identified ATGGGG and CCCCAT sites as consensus motifs in TARs that were distributed around -50 bp from TSSs. In addition, we found ACACA, TGTGT, and TATAT sites, which were distributed periodically around TSSs in cycles of approximately 150 bp. Moreover, related periodical distributions were not observed in mammalian promoter regions.

Conclusions

The observations in this study indicate the utility of integrated bioinformatics and experimental data for improving genome annotations. In particular, full-length cDNAs with one-base resolution for TSSs enabled the identification of consensus motifs in promoter sequences and demonstrated clear distributions of identified motifs. These observations allowed the illustration of a model promoter composition, which supports the differences in transcriptional regulation frameworks between apicomplexan parasites and mammals.

Keywords:
Babesia bovis; Expressed sequence tags; Full-length cDNA; Transcription start sites; Cis-elements