Mobilomics in Saccharomyces cerevisiae strains
1 Istituto Nazionale di Alta Matematica, Città Universitaria, Roma, Italia
2 Dipartimento di Informatica, Università di Pisa, Pisa, Italia
3 Dipartimento di Biologia, Università di Pisa, Pisa, Italia
4 , CNR–Istituto di Biofisica, Italia
5 LIACS – Leiden Institute of Advanced Computer Science, Leiden University, Leiden, the Netherlands
BMC Bioinformatics 2013, 14:102 doi:10.1186/1471-2105-14-102Published: 20 March 2013
Mobile Genetic Elements (MGEs) are selfish DNA integrated in the genomes. Their detection is mainly based on consensus–like searches by scanning the investigated genome against the sequence of an already identified MGE. Mobilomics aims at discovering all the MGEs in a genome and understanding their dynamic behavior: The data for this kind of investigation can be provided by comparative genomics of closely related organisms. The amount of data thus involved requires a strong computational effort, which should be alleviated.
Our approach proposes to exploit the high similarity among homologous chromosomes of different strains of the same species, following a progressive comparative genomics philosophy. We introduce a software tool based on our new fast algorithm, called REGENDER, which is able to identify the conserved regions between chromosomes. Our case study is represented by a unique recently available dataset of 39 different strains of S.cerevisiae, which REGENDER is able to compare in few minutes. By exploring the non–conserved regions, where MGEs are mainly retrotransposons called Tys, and marking the candidate Tys based on their length, we are able to locate a priori and automatically all the already known Tys and map all the putative Tys in all the strains. The remaining putative mobile elements (PMEs) emerging from this intra–specific comparison are sharp markers of inter–specific evolution: indeed, many events of non–conservation among different yeast strains correspond to PMEs. A clustering based on the presence/absence of the candidate Tys in the strains suggests an evolutionary interconnection that is very similar to classic phylogenetic trees based on SNPs analysis, even though it is computed without using phylogenetic information.
The case study indicates that the proposed methodology brings two major advantages: (a) it does not require any template sequence for the wanted MGEs and (b) it can be applied to infer MGEs also for low coverage genomes with unresolved bases, where traditional approaches are largely ineffective.