Open Access Open Badges Research article

The metazoan history of the COE transcription factors. Selection of a variant HLH motif by mandatory inclusion of a duplicated exon in vertebrates

Virginie Daburon1, Sébastien Mella12, Jean-Louis Plouhinec34, Sylvie Mazan3, Michèle Crozatier1 and Alain Vincent1*

  • * Corresponding author: Alain Vincent

  • † Equal contributors

Author Affiliations

1 Centre de Biologie du Développement, UMR 5547 and IFR 109 CNRS/UPS, 118 route de Narbonne 31062 Toulouse cedex 4, France

2 MRC Human Genetics Unit, Western General Hospital, Edinburgh EH4 2XU, UK

3 Développement et Evolution des vertébrés, UMR 6218, 3b rue de la Ferollerie 45071 ORLEANS cedex 2, France

4 Howard Hughes Medical Institute and Department of Biological Chemistry, University of California, Los Angeles, CA 90095-1662, USA

For all author emails, please log on.

BMC Evolutionary Biology 2008, 8:131  doi:10.1186/1471-2148-8-131

Published: 2 May 2008



The increasing number of available genomic sequences makes it now possible to study the evolutionary history of specific genes or gene families. Transcription factors (TFs) involved in regulation of gene-specific expression are key players in the evolution of metazoan development. The low complexity COE (Collier/Olfactory-1/Early B-Cell Factor) family of transcription factors constitutes a well-suited paradigm for studying evolution of TF structure and function, including the specific question of protein modularity. Here, we compare the structure of coe genes within the metazoan kingdom and report on the mechanism behind a vertebrate-specific exon duplication.


COE proteins display a modular organisation, with three highly conserved domains : a COE-specific DNA-binding domain (DBD), an Immunoglobulin/Plexin/transcription (IPT) domain and an atypical Helix-Loop-Helix (HLH) motif. Comparison of the splice structure of coe genes between cnidariae and bilateriae shows that the ancestral COE DBD was built from 7 separate exons, with no evidence for exon shuffling with other metazoan gene families. It also confirms the presence of an ancestral H1LH2 motif present in all COE proteins which partly overlaps the repeated H2d-H2a motif first identified in rodent EBF. Electrophoretic Mobility Shift Assays show that formation of COE dimers is mediated by this ancestral motif. The H2d-H2a α-helical repetition appears to be a vertebrate characteristic that originated from a tandem exon duplication having taken place prior to the splitting between gnathostomes and cyclostomes. We put-forward a two-step model for the inclusion of this exon in the vertebrate transcripts.


Three main features in the history of the coe gene family can be inferred from these analyses: (i) each conserved domain of the ancestral coe gene was built from multiple exons and the same scattered structure has been maintained throughout metazoan evolution. (ii) There exists a single coe gene copy per metazoan genome except in vertebrates. The H2a-H2d duplication that is specific to vertebrate proteins provides an example of a novel vertebrate characteristic, which may have been fixed early in the gnathostome lineage. (iii) This duplication provides an interesting example of counter-selection of alternative splicing.