Multiple alignment and structure prediction of non-coding RNA sequences

Lindgreen, Stinus; Gardner, Paul P; Krogh, Anders

doi:10.1186/1471-2105-8-S8-P8

Volume 8 Supplement 8

Highlights from the Third International Society for Computational Biology (ISCB) Student Council Symposium at the Fifteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB)

Poster presentation
Open access
Published: 20 November 2007

Multiple alignment and structure prediction of non-coding RNA sequences

Stinus Lindgreen¹,
Paul P Gardner² &
Anders Krogh¹

BMC Bioinformatics volume 8, Article number: P8 (2007) Cite this article

4404 Accesses
1 Citations
Metrics details

Background

As the importance of non-coding RNAs becomes more evident, the need for computational methods for ncRNAs grows. Predicting the secondary structure is of great importance, and combining this with multiple alignment yields a useful tool for researchers. The exact solution to the problem of simultaneous multiple alignment and structure prediction for RNA sequences was described by Sankoff [1], but to date only pairwise implementations (e.g. Foldalign [2], Dynalign [3]) or heuristics for multiple sequences (e.g. FoldalignM [4], LocARNA [5], RNA Sampler [6]) exist.

Methods

We present a novel approach to the problem: Using Markov chain Monte Carlo in a simulated annealing framework, we sample multiple alignments and secondary structures. The sampling is based on a scoring system that combines a sequence measure with a structure measure: The sequence alignment is scored using the log-likelihood, and the structure is scored using basepair probabilities and a covariation term. The sampling procedure itself uses simple local moves to optimize the solution. These moves either act on the sequence alignment or the predicted structure. The input to the program can be unaligned sequences or an alignment obtained using e.g. Clustal. The structure can be constrained by indicating e.g. basepairs or unpaired positions in one of the sequences. The program MASTR (M ultiple A lignment of ST ructural R NAs) is implemented in C++.

Results

MASTR is compared to LocARNA, FoldalignM, RNA Sampler and Clustal+RNAalifold on various RNA families. The datasets are unaligned and of varying average pairwise identities ranging from 30% to 100%. The sequence alignments are consistently better than or comparable to all other methods, the running time is significantly faster than FoldalignM and RNA Sampler, and MASTR can handle larger datasets than both these programs. RNA Sampler is best on datasets with identities between 30% and 45%, but MASTR is better than all other programs tested from 45% ID up to 80% ID, where the structure predictions deteriorate due to the poor covariation signal.

References

Sankoff D: Simultaneous solution of the RNA folding, alignment, and protosequence problems. SIAM J Appl Math 1985, 45: 810–825. 10.1137/0145048
Article Google Scholar
Havgaard JH, Lyngsø RB, Stormo GD, Gorodkin J: Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%. Bioinformatics 2005, 21(9):1815–1824. 10.1093/bioinformatics/bti279
Article CAS PubMed Google Scholar
Mathews DH, Turner DH: Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol 2002, 317(2):191–203. 10.1006/jmbi.2001.5351
Article CAS PubMed Google Scholar
Torarinsson E, Havgaard JH, Gorodkin J: Multiple structural alignment and clustering of RNA sequences. Bioinformatics 2007, 23(8):926–932. 10.1093/bioinformatics/btm049
Article CAS PubMed Google Scholar
Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R: Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering. PLoS Comput Biol 2007, 3(4):e65. 10.1371/journal.pcbi.0030065
Article PubMed Central PubMed Google Scholar
Xu X, Ji Y, Stormo GD: RNA Sampler: A new sampling based algorithm for common RNA secondary structure prediction and structural alignment. Bioinformatics 2007, in press.
Google Scholar

Download references

Author information

Authors and Affiliations

Bioinformatics Centre, Institute of Molecular Biology, University of Copenhagen, Denmark
Stinus Lindgreen & Anders Krogh
Wellcome Trust Sanger Institute, Cambridge, UK
Paul P Gardner

Authors

Stinus Lindgreen
View author publications
You can also search for this author in PubMed Google Scholar
Paul P Gardner
View author publications
You can also search for this author in PubMed Google Scholar
Anders Krogh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stinus Lindgreen.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Lindgreen, S., Gardner, P.P. & Krogh, A. Multiple alignment and structure prediction of non-coding RNA sequences. BMC Bioinformatics 8 (Suppl 8), P8 (2007). https://doi.org/10.1186/1471-2105-8-S8-P8

Download citation

Published: 20 November 2007
DOI: https://doi.org/10.1186/1471-2105-8-S8-P8

Highlights from the Third International Society for Computational Biology (ISCB) Student Council Symposium at the Fifteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB)

Multiple alignment and structure prediction of non-coding RNA sequences

Background

Methods

Results

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Highlights from the Third International Society for Computational Biology (ISCB) Student Council Symposium at the Fifteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB)

Multiple alignment and structure prediction of non-coding RNA sequences

Background

Methods

Results

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us