Email updates

Keep up to date with the latest news and content from BMC Biology and BioMed Central.

Journal App

google play app store
Open Access Open Badges Research article

Multiple, non-allelic, intein-coding sequences in eukaryotic RNA polymerase genes

Timothy JD Goodwin, Margaret I Butler* and Russell TM Poulter

Author Affiliations

Department of Biochemistry, University of Otago, P.O. Box 56, Dunedin, New Zealand

For all author emails, please log on.

BMC Biology 2006, 4:38  doi:10.1186/1741-7007-4-38

Published: 27 October 2006



Inteins are self-splicing protein elements. They are translated as inserts within host proteins that excise themselves and ligate the flanking portions of the host protein (exteins) with a peptide bond. They are encoded as in-frame insertions within the genes for the host proteins. Inteins are found in all three domains of life and in viruses, but have a very sporadic distribution. Only a small number of intein coding sequences have been identified in eukaryotic nuclear genes, and all of these are from ascomycete or basidiomycete fungi.


We identified seven intein coding sequences within nuclear genes coding for the second largest subunits of RNA polymerase. These sequences were found in diverse eukaryotes: one is in the second largest subunit of RNA polymerase I (RPA2) from the ascomycete fungus Phaeosphaeria nodorum, one is in the RNA polymerase III (RPC2) of the slime mould Dictyostelium discoideum and four intein coding sequences are in RNA polymerase II genes (RPB2), one each from the green alga Chlamydomonas reinhardtii, the zygomycete fungus Spiromyces aspiralis and the chytrid fungi Batrachochytrium dendrobatidis and Coelomomyces stegomyiae. The remaining intein coding sequence is in a viral relic embedded within the genome of the oomycete Phytophthora ramorum. The Chlamydomonas and Dictyostelium inteins are the first nuclear-encoded inteins found outside of the fungi.

These new inteins represent a unique dataset: they are found in homologous proteins that form a paralogous group. Although these paralogues diverged early in eukaryotic evolution, their sequences can be aligned over most of their length. The inteins are inserted at multiple distinct sites, each of which corresponds to a highly conserved region of RNA polymerase. This dataset supports earlier work suggesting that inteins preferentially occur in highly conserved regions of their host proteins.


The identification of these new inteins increases the known host range of intein sequences in eukaryotes, and provides fresh insights into their origins and evolution. We conclude that inteins are ancient eukaryote elements once found widely among microbial eukaryotes. They persist as rarities in the genomes of a sporadic array of microorganisms, occupying highly conserved sites in diverse proteins.