Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Computing expectation values for RNA motifs using discrete convolutions

André Lambert1, Matthieu Legendre2, Jean-Fred Fontaine23 and Daniel Gautheret2*

Author Affiliations

1 CNRS UMR 6207, Université de la Méditerranée, Luminy Case 907, 13288 Marseille cedex 9, France

2 INSERM ERM 206, Université de la Méditerranée, Luminy Case 928, 13288 Marseille Cedex 9, France

3 INSERM EMI U 00.18, CHU d'Angers, 49033 Angers, France

For all author emails, please log on.

BMC Bioinformatics 2005, 6:118  doi:10.1186/1471-2105-6-118

Published: 13 May 2005

Abstract

Background

Computational biologists use Expectation values (E-values) to estimate the number of solutions that can be expected by chance during a database scan. Here we focus on computing Expectation values for RNA motifs defined by single-strand and helix lod-score profiles with variable helix spans. Such E-values cannot be computed assuming a normal score distribution and their estimation previously required lengthy simulations.

Results

We introduce discrete convolutions as an accurate and fast mean to estimate score distributions of lod-score profiles. This method provides excellent score estimations for all single-strand or helical elements tested and also applies to the combination of elements into larger, complex, motifs. Further, the estimated distributions remain accurate even when pseudocounts are introduced into the lod-score profiles. Estimated score distributions are then easily converted into E-values.

Conclusion

A good agreement was observed between computed E-values and simulations for a number of complete RNA motifs. This method is now implemented into the ERPIN software, but it can be applied as well to any search procedure based on ungapped profiles with statistically independent columns.