Potential pitfalls of modelling ribosomal RNA data in phylogenetic tree reconstruction: Evidence from case studies in the Metazoa
1 Zoologisches Forschungsmuseum Alexander Koenig, Zentrum für molekulare Biodiversitätsforschung, Adenauerallee 160, 53113 Bonn, Germany
2 Rutgers University, Department of Ecology Evolution and Natural Resources Faculty, 14 College Farm Rd., New Brunswick, NJ 08901, USA
BMC Evolutionary Biology 2011, 11:146 doi:10.1186/1471-2148-11-146Published: 27 May 2011
Failure to account for covariation patterns in helical regions of ribosomal RNA (rRNA) genes has the potential to misdirect the estimation of the phylogenetic signal of the data. Furthermore, the extremes of length variation among taxa, combined with regional substitution rate variation can mislead the alignment of rRNA sequences and thus distort subsequent tree reconstructions. However, recent developments in phylogenetic methodology now allow a comprehensive integration of secondary structures in alignment and tree reconstruction analyses based on rRNA sequences, which has been shown to correct some of these problems. Here, we explore the potentials of RNA substitution models and the interactions of specific model setups with the inherent pattern of covariation in rRNA stems and substitution rate variation among loop regions.
We found an explicit impact of RNA substitution models on tree reconstruction analyses. The application of specific RNA models in tree reconstructions is hampered by interaction between the appropriate modelling of covarying sites in stem regions, and excessive homoplasy in some loop regions. RNA models often failed to recover reasonable trees when single-stranded regions are excessively homoplastic, because these regions contribute a greater proportion of the data when covarying sites are essentially downweighted. In this context, the RNA6A model outperformed all other models, including the more parametrized RNA7 and RNA16 models.
Our results depict a trade-off between increased accuracy in estimation of interdependencies in helical regions with the risk of magnifying positions lacking phylogenetic signal. We can therefore conclude that caution is warranted when applying rRNA covariation models, and suggest that loop regions be independently screened for phylogenetic signal, and eliminated when they are indistinguishable from random noise. In addition to covariation and homoplasy, other factors, like non-stationarity of substitution rates and base compositional heterogeneity, can disrupt the signal of ribosomal RNA data. All these factors dictate sophisticated estimation of evolutionary pattern in rRNA data, just as other molecular data require similarly complicated (but different) corrections.