Email updates

Keep up to date with the latest news and articles from BMC Bioinformatics and BioMed Central.

Open Access Research article

Impact of RNA structure on the prediction of donor and acceptor splice sites

Sayed-Amir Marashi1, Changiz Eslahchi2,5, Hamid Pezeshk3,5 and Mehdi Sadeghi4,5*

Author Affiliations

1 Department of Biotechnology, University College of Science, University of Tehran, Tehran, Iran

2 Faculty of Mathematics, Shahid-Beheshti University, Tehran, Iran

3 Center of Excellence in Biomathematics, School of Mathematics, Statistics and Computer Sciences, University College of Science, University of Tehran, Tehran, Iran

4 National Institute for Genetic Engineering and Biotechnology, Tehran-Karaj Highway, Tehran, Iran

5 Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran

For all author emails, please log on.

BMC Bioinformatics 2006, 7:297 doi:10.1186/1471-2105-7-297

Published: 13 June 2006

Abstract

Background

gene identification in genomic DNA sequences by computational methods has become an important task in bioinformatics and computational gene prediction tools are now essential components of every genome sequencing project. Prediction of splice sites is a key step of all gene structural prediction algorithms.

Results

we sought the role of mRNA secondary structures and their information contents for five vertebrate and plant splice site datasets. We selected 900-nucleotide sequences centered at each (real or decoy) donor and acceptor sites, and predicted their corresponding RNA structures by Vienna software. Then, based on whether the nucleotide is in a stem or not, the conventional four-letter nucleotide alphabet was translated into an eight-letter alphabet. Zero-, first- and second-order Markov models were selected as the signal detection methods. It is shown that applying the eight-letter alphabet compared to the four-letter alphabet considerably increases the accuracy of both donor and acceptor site predictions in case of higher order Markov models.

Conclusion

Our results imply that RNA structure contains important data and future gene prediction programs can take advantage of such information.