Improving the prediction of mRNA extremities in the parasitic protozoan Leishmania
1 Research Centre in Infectious Diseases, CHUL Research Centre, 2705 Laurier Blvd., Quebec, QC G1V 4G2, Canada
2 McGill Center for Bioinformatics, 3775 University St., Montreal, QC H3A 2B4, Canada
BMC Bioinformatics 2008, 9:158 doi:10.1186/1471-2105-9-158Published: 20 March 2008
Leishmania and other members of the Trypanosomatidae family diverged early on in eukaryotic evolution and consequently display unique cellular properties. Their apparent lack of transcriptional regulation is compensated by complex post-transcriptional control mechanisms, including the processing of polycistronic transcripts by means of coupled trans-splicing and polyadenylation. Trans-splicing signals are often U-rich polypyrimidine (poly(Y)) tracts, which precede AG splice acceptor sites. However, as opposed to higher eukaryotes there is no consensus polyadenylation signal in trypanosomatid mRNAs.
We refined a previously reported method to target 5' splice junctions by incorporating the pyrimidine content of query sequences into a scoring function. We also investigated a novel approach for predicting polyadenylation (poly(A)) sites in-silico, by comparing query sequences to polyadenylated expressed sequence tags (ESTs) using position-specific scanning matrices (PSSMs). An additional analysis of the distribution of putative splice junction to poly(A) distances helped to increase prediction rates by limiting the scanning range. These methods were able to simplify splice junction prediction without loss of precision and to increase polyadenylation site prediction from 22% to 47% within 100 nucleotides.
We propose a simplified trans-splicing prediction tool and a novel poly(A) prediction tool based on comparative sequence analysis. We discuss the impact of certain regions surrounding the poly(A) sites on prediction rates and contemplate correlating biological mechanisms. This work aims to sharpen the identification of potentially functional untranslated regions (UTRs) in a large-scale, comparative genomics framework.