|
| This article is part of the supplement: Highlights from the Third International Society for Computational Biology (ISCB) Student Council Symposium at the Fifteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) .Multiple alignment and structure prediction of non-coding RNA sequences1Bioinformatics Centre, Institute of Molecular Biology, University of Copenhagen, Denmark 2Wellcome Trust Sanger Institute, Cambridge, UK
from Third International Society for Computational Biology (ISCB) Student Council Symposium at the Fifteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) BMC Bioinformatics 2007, 8(Suppl 8):P8doi:10.1186/1471-2105-8-S8-P8 The electronic version of this abstract is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/8/S8/P8
© 2007 Lindgreen et al; licensee BioMed Central Ltd. BackgroundAs the importance of non-coding RNAs becomes more evident, the need for computational methods for ncRNAs grows. Predicting the secondary structure is of great importance, and combining this with multiple alignment yields a useful tool for researchers. The exact solution to the problem of simultaneous multiple alignment and structure prediction for RNA sequences was described by Sankoff [1], but to date only pairwise implementations (e.g. Foldalign [2], Dynalign [3]) or heuristics for multiple sequences (e.g. FoldalignM [4], LocARNA [5], RNA Sampler [6]) exist. MethodsWe present a novel approach to the problem: Using Markov chain Monte Carlo in a simulated annealing framework, we sample multiple alignments and secondary structures. The sampling is based on a scoring system that combines a sequence measure with a structure measure: The sequence alignment is scored using the log-likelihood, and the structure is scored using basepair probabilities and a covariation term. The sampling procedure itself uses simple local moves to optimize the solution. These moves either act on the sequence alignment or the predicted structure. The input to the program can be unaligned sequences or an alignment obtained using e.g. Clustal. The structure can be constrained by indicating e.g. basepairs or unpaired positions in one of the sequences. The program MASTR (Multiple Alignment of STructural RNAs) is implemented in C++. ResultsMASTR is compared to LocARNA, FoldalignM, RNA Sampler and Clustal+RNAalifold on various RNA families. The datasets are unaligned and of varying average pairwise identities ranging from 30% to 100%. The sequence alignments are consistently better than or comparable to all other methods, the running time is significantly faster than FoldalignM and RNA Sampler, and MASTR can handle larger datasets than both these programs. RNA Sampler is best on datasets with identities between 30% and 45%, but MASTR is better than all other programs tested from 45% ID up to 80% ID, where the structure predictions deteriorate due to the poor covariation signal. References
Have something to say? Post a comment on this article! |



on Google Scholar






author email
corresponding author email