Skip to main content

Inventory of the cichlid olfactory receptor gene repertoires: identification of olfactory genes with more than one coding exon

Abstract

Background

To help understand the molecular mechanisms underlying the remarkable phenotypic diversity displayed by cichlids, the genome sequences of O. niloticus, P. nyererei, H. burtoni, N. brichardi and M. zebra were recently determined. Here, we present the contents of the olfactory receptor (OR) repertoires in the genomes of these five fishes.

Results

We performed an exhaustive TBLASTN search of the five cichlid genomes to identify their OR repertoires as completely as possible. We used as bait a set of ORs described in the literature. The cichlid repertoires thereby extracted contained large numbers of complete genes (O. niloticus 158; H. burtoni 90; M. zebra 102; N. brichardi 69; P. nyererei 88), a small numbers of pseudogenes and many “edge genes” corresponding to incomplete genes located at the ends of contigs. A phylogenetic tree was constructed and showed these repertoires include a large number of families and subfamilies. It also allowed the identification of a large number of OR analogues between cichlids with very high amino-acid identity (≥99%). Nearly 9% of the full-length cichlid OR genes are composed of several coding exons. This is very unusual for vertebrate OR genes. Nevertheless, the evidence is strong, and includes the donor and acceptor splice junction sequences; also, the positions of these genes in the phylogenetic tree indicate that they constitute subfamilies well apart from non-OR G protein-coupled receptor families.

Conclusions

Cichlid OR repertoires are made up of a larger number of genes and fewer pseudogenes than those in other teleosts except zebrafish. These ORs share all identified properties common to all fish ORs; however, the large number of families and subfamilies, each containing few ORs implies that they have evolved more rapidly. This high level of OR diversity is consistent with the substantial phenotypic diversity that characterizes cichlids.

Background

With more than 2,000 species, the cichlid family is by far the largest fish family. Members of this family occupy all sorts of ecological niches everywhere in the world with a remarkable concentration of species in the great African lakes [1, 2]. Consequently, they constitute a good model for studying evolution and adaptation. Also tilapia, O. niloticus, is the second most economically important fish in aquaculture [3]. The complete nucleotide sequences of five cichlid genomes have recently been determined: O. niloticus, P. nyererei, H. burtoni, N. brichardi and M. zebra[4].

All animal species, whatever their ecological niches, have sophisticated systems to sense the outside world for diverse purposes: to avoid attack by predators, to find food and to select appropriate partners to mate and reproduce. Several of these biological systems are based on volatile and soluble odorant molecules, and such systems involve olfactory receptors (OR), the first components of these systems to be identified [5]. ORs are G protein-coupled receptors (GPCR) [6, 7]. They are found at the cilia membrane of olfactory neurons (OSN) [810], which are embedded in the olfactory epithelium. The family of genes encoding ORs is the largest known gene family, with approximately 100 members identified in the genomes of insects and up to around 1,000 in mammals [1113].

Given the importance of the olfactory system in behaviour, it is believed to be important role in shaping species evolution [1416]. We therefore tried to identify the complete OR gene repertoires of five members of the cichlid family: O. niloticus, P. nyererei, H. burtoni, N. brichardi and M. zebra. These species are potentially good models for evolution studies and their genomes were recently sequenced [4].

Results and discussion

Cichlid OR repertoires

A comprehensive search of the genome sequences of five fishes belonging to the cichlid family (Oreochromis niloticus, Pundamilia nyererei, Haplochromis (Astatotilapia) burtoni, Neolaprologus brichardi, Mitriaclima zebra) was undertaken in order to identify their OR gene repertoires. First, we retrieved 183 fish OR sequences from the literature [17, 18] to construct a query set for TBLASTN searches of each cichlid genome sequence determined by the BROAD Institute. This search, performed with a cut-off of 1e−50, identified 820 candidates OR genes distributed over 733 contigs. These candidate genes were checked by TBLASTN against a set of 247 (Additional file 1) non-OR GPCRs to eliminate false positives. The remaining candidate genes were checked with TBLASTX against the fish protein database (NCBI, taxiD: 7898).

Table 1 shows the number of genes identified in each of the five cichlid genomes as well as those of five fish models retrieved from the literature [17, 18], GenBank and ENSEMBL databases and after manual curation as part of this study. Their nucleotide and amino-acid (AA) sequences and position in the genome are provided in supplementary materials (Additional files 2 and 3). In addition to complete and potentially functional genes, we identified a number of pseudogenes, edge genes and gene fragments. Pseudogenes are common to any olfactory repertoires [12, 13, 1719]. OR pseudogenes, which are not retrogenes arose by gene duplication and their prevalence in vertebrate genomes is thought to be a consequence of both gene duplication and nucleotide misincorporation during DNA replication. They appear to be less numerous in the cichlid family olfactory repertoires than in those of other fish, except zebrafish [17, 18]. Many (33/54) of the pseudogenes we found in the cichlid genomes are due to frameshift mutations, whereas the others are due to a nucleotide misincorporation, changing a sense codon into a stop codon (Table 2). The distribution of pseudogenes appears to be largely random; they are found in many different subfamilies (22 out of 57 – see Table 3), whatever their size. Fragment genes are sequences with substantial similarity to a restricted part of a functional gene. They have been identified in many complete genome sequences [20, 21]. Their significance, if any, is unknown. They may correspond to “dead” genes or more prosaically be artefacts generated by sequencing problems. We also identified another type of gene fragments: they are located at the edges of the contigs and correspond to either the 5’ or the 3’ end of an OR. As such, they potentially correspond to actual OR genes whose sequences were interrupted by genome fragmentation into many contigs. If these genome sequences were completed and the mean contig size, which is currently around 10 Kb were much longer, we suspect these edge genes would become complete genes, pseudogenes or gene fragments.

Table 1 OR genes identified in the five cichlid and five fish model genomes
Table 2 Distribution of pseudogenes in the five cichlids
Table 3 Distribution of OR into families and subfamilies

We used MAFFT [22] and PHYML [23] to align the OR AA sequences and construct a phylogenetic tree with the OR repertoires of the five cichlids and the 376 OR known AA sequences identified in the genomes of zebrafish (Danio rerio), medaka (Oryzias latipes), stickleback (gasterosteus aculeatus), takifugu (takifugu rubripes) and tetraodon (Tetraodon nigroviridis) (Table 1 and Additional files 2 and 4). OR repertoires are usually classified into classes, families and subfamilies according to the percentages of AA identity shared by the different ORs. In this study, we used the same 40% and 60% AA identity thresholds as proposed by Glusman et al. [19]. Each cichlid contains ORs from 24 to 26 families and between 47 and 51 subfamilies. There are, however, only 56 subfamilies in total indicating that most subfamilies are common to the five cichlids (Figure 1, Additional files 2 and 4, Table 3). Compared with the OR cichlid repertoires, four of the fish model repertoires (stickleback, medaka, fugu and tetraodon) have many fewer sub-families (25 to 33) grouped into 12 to 18 families. However the OR zebrafish repertoire appears more similar to the cichlid repertoires with 37 sub-families and 22 families. Thus, among the 507 complete cichlid sequences and the 376 complete model fish sequences, there are 111 subfamilies in all; 37 of these subfamilies contain both cichlid (n = 347) and model fish (n = 182) sequences (Figure 1, Additional file 2 and Table 3). This level of subfamily overlap between the cichlid and the model fish sequences suggests substantial divergence between the cichlid and other teleost repertoires. Of interest regarding the evolution of the Nile tilapia and lake cichlids is the existence of a number of pairs or triplets and even quadruplets genes sharing 99% or more amino-acid sequence identity (Table 4 and Additional file 5). The large number (roughly 50% of each repertoire) of OR genes sharing 99% AA identity observed between H. burtoni, P. nyereri and M. zebra is in agreement with their close phylogenetic positions [24]. Compared with this, we found fewer paralogous OR pairs except for tilapia for which we identified 7 pairs, 1 triplet and 2 quadruplets of genes with an AA identity above 99%. This last finding is in light of the larger number of ORs present in this species. This observation is in agreement with Nikaido et al. [25] who, by analyzing the expansion of vomeronasal type 2 receptor-like (OlfC) genes in cichlids, noted that recently duplicated paralogs are more variable than orthologs.

Figure 1
figure 1

Phylogenetic tree for the cichlid and fish model ORs (see also Table 1and Additional file 2). Family names A to AN are alternately coloured in red and blue and similarly sub-families designated by Arabic numbers are coloured in green and purple.

Table 4 Distribution of OR gene pairs, triplets and quadruplets sharing a strong percentage level of nucleotide and AA identities

Evolution of the dN/dS ratio

The dN/dS ratio also named KaKs is commonly used to measure the selective pressure exerted on genes during evolution. We used the Nei-Gojobori method modified by Zhang [26] to calculate this ratio for each pair of OR genes from the 14 cichlid OR families containing four or more genes. The mean dN/dS values for these families extend from 0.28 for family G, which includes only one subfamily to 0.50 for family L made of two subfamilies (Table 5a and Additional file 6). These values are clearly above the 0.11 mean value calculated for 1,880 human rodent orthologous gene pairs [27] and similar to the values obtained for medaka and stickleback OR [18]. Although below 1, the theoretical limit between negative and positive evolution trends, the values obtained indicate a tendency for a positive selection favouring OR repertoire diversification as previously noted for other fishes [17, 28] and mammals [29, 30]. However, it is important to note that the different OR pairs behaved very differently. As detailed in Additional file 6, we identified a number of OR gene pairs with only synonymous mutations as in families A, H and W and OR pairs with only non-synonymous mutations as in families A, I and K. Table 5b displays the number of OR pairs with dN/dS ratios above 1. Interestingly, intra-species dN/dS ratios (paralogous comparison) have values that are similar to those found in inter-species values (orthologous comparison) as indicated by a ratio close to 1, suggesting a similar evolution of the five cichlid OR repertoires (Table 5c).

Table 5 dN/dS ratios for the various OR gene pairs identified in 14 families

We also calculated the dN/dS ratio of the different OR protein domains (TM regions, internal and external loops) for five families (D, E, H, L and N) selected for their high number of genes. As shown in Table 6 and Additional file 7a to f, the dN/dS values are highly variable along the different parts of the molecules with the TM regions having a tendency to be higher, although this is not always the case (see TM 6 and TM7 of family E). On the other hand, no clear tendency can be drawn for the dN/dS ratio of the internal and external loops, although one should note that the standard deviations are very high in all cases, indicating that the various OR pairs behaved differently.

Table 6 dN/dS ratios for various OR protein domains

Conserved amino-acid motifs

ORs are GPCRs that belong to the rodhopsin subfamily. They are characterized by a number of AA patterns of which the MAYDRY motif in the internal loop 2 is the most characteristic. These patterns have often been used for mining whole genome sequences for OR identification [12]. We used the MEME program [31] to search for the five best motifs for each of the five cichlids and D. rerio. The first four motifs identified for each fish are very similar in both sequence and position between fishes (Figure 2). They are also not very different from those identified in OR mammals, despite the substantial distance from any common ancestor [13, 14, 32, 33]. However, no clear fifth pattern was identified for the fish ORs. ORs are transmembrane proteins. Although no signal peptide has been identified in their sequences using Polyphobius [34], an N-glycosylation site, Asn N Ser/Thr has been detected in all ORs identified to date [18]. OR protein sequences identified in the five cichlid genomes were inspected for the presence of such putative glycosylation sites using NetNGlycserver [35]. For each of them, one to several putative sites were proposed (Additional file 8). But interestingly, in all cases, a site with a very high score was present in close proximity to the extracellular N-terminal part in agreement with the importance of the glycosylation site of this region for the correct expression of the OR and membrane trafficking [36]. OR proteins are also characterized by the presence of cysteine residues located at fixed positions in particular in the extracellular loop 1 and 2 (EC1 and EC2) regions as observed for all D. rerio[18] and mouse OR genes [32, 37]. We observed a similar situation for all complete cichlid receptors identified in this study. However, we noted the existence of two subgroups of ORs: one subgroup of ORs with one cysteine residue only in EC2 and one subgroup with three cysteine residues. Interestingly, these two groups have slightly different MAYDRY motifs with an E replacing D in the subgroup with one cysteine residue (Figure 3 and Additional file 9). These two groups differ also by motif 2 located in the cytoplasmic C terminal extremity. Whether these differences affect the recognition and binding of the G alpha subunit and the transduction signal is a matter of interest [38].

Figure 2
figure 2

WebLogo graphical representation of the 5 most significant motifs identified by MEME in cichlid and D. rerio OR repertoires and located at particular positions: motif 1: internal loop 2, motif 2: internal loop 1, motif 3: TM7-intracellular extension and motif 4: internal loop 3. Motif 5 is not well conserved and its position differs between fish species.

Figure 3
figure 3

WebLogo representation of the AA conservation around the MAYDRY motif. Multiple alignment with MAFFT followed by PHYML clustering revealed two subgroups of cichlid OR: one with a classical MAYDRY motif followed by 3 cysteine residues indicated by a blue star; and a second with an altered MAYDRY motif in which the aspartate residue (D) is replaced by a glutamate residue (E). See Additional file 9 for the genes of each of these two groups and a complete alignment of their sequences.

Moreover, these feature inspections of AA alignments (Additional file 9) made it possible to identify a number of positions occupied by the same or nearly the same AA, pointing to positions submitted to strong purifying constraint during evolution like, for example, another cysteine residue in the N terminal extension also previously noted [39].

Spliced OR genes

Our searches for OR genes unexpectedly identified a number of spliced ORs. Among the 507 complete OR genes identified in the five cichlids, 46 ORs (9%) consist of two to four coding exons (Table 1). Proof that these spliced cichlid OR genes are functional will require transcriptional and functional analysis. Nevertheless, there are four types of evidence indicating that they are active OR genes: (i) the splice site junctions, (ii) the intron position, (iii) the BLASTX analysis and (iv) their position within the phylogenetic tree.

  1. i.

    Exon-intron boundaries. Nucleotide sequences overlapping the exon-intron boundaries were identified by the alignment of the spliced OR sequences with their cognate contig sequences. With the MEME suite, we identified two nucleotide motifs (Figure 4) defining exon boundaries while maintaining the reading frames open through the junction of the adjacent exons. Interestingly, these two motifs are similar to those found at mammalian gene exon/intron boundaries [40]. Most of these donor/acceptor sites were also predicted by the FSPLICE program [41] with the FISH model weight matrix (data not shown).

Figure 4
figure 4

Sequence logo representation of donor and acceptor splice sites identified in cichlid OR genes aligned onto their cognate contigs and manually corrected using both MAFFT multiple alignment and the FSPLICE tool (Softberry, Fish model).

  1. ii.

    Introns were in nearly the same positions in all the ORs (Table 7): By comparing the gene nucleotide sequences and the amino acid sequences, we inferred the intron position relative to the 2D OR structure (Additional file 10) and noticed a nearly fixed position (Table 7). In 27 of the 31 OR genes with one intron interrupting the coding frame, the intron is in phase 0 and in phase 2 for the remaining; also for 26, the intron is within the sequence encoding the MAYDRY motif in the first internal loop. Similarly, of the 11 OR genes with two introns within the coding sequence, first intron is in the sequence encoding the extracellular part in five, and in the sequence encoding internal loop 2 in a further five. All 11 OR genes with two introns have their distal intron in the sequence encoding external loop 2.

Table 7 Intron positions within OR genes
  1. iii.

    BLASTX analysis. All proteins identified by TBLASTN search were subjected to a BLASTX search against the non-redundant NCBI protein database. Only proteins giving a strong hit with OR proteins, and no hits or a meaningless hit with other GPCRs, were kept as true ORs.

  2. iv.

    Phylogenetic analysis. The AA sequences of 507 cichlid ORs and of 247 non OR class A (Additional file 1) GPCRs identified in GenBank were aligned with MAFFT, and a tree constructed with PHYML and drawn with FigTree [42] (Figure 5). All ORs, with one or several coding exons, clearly form a separate branch from the non OR class A GPCRs; this argues for them being true OR genes and not another type of GPCR. Most of the spliced cichlid ORs (39/45) cluster in families or subfamilies not shared by ORs with only one coding exon. Also, the spliced OR genes are grouped according to the number of their coding exons. For example, family W is made up of 24 ORs all with two coding exons, whereas families I and U contain six and five ORs, respectively, all with three coding exons (Table 8). The clustering of the ORs with two or more coding exons into particular families indicates that splice OR genes arose before cichlid speciation and have evolved independently from single coding-exon ORs. However, the possibility that there was horizontal transfer cannot be formally excluded.

Figure 5
figure 5

Phylogenetic tree constructed with the cichlid OR repertoires (in blue) and 247 non-OR class A GPCRs (Additional file 1) (in red). This tree clearly shows that the cichlid ORs are clearly distinct from the non-OR class A GPCRs.

Table 8 Distribution of OR genes with more than 1 coding exon among the families of OR

Conclusions

The olfactory system enables animals to sense the outside world and contributes to searching for food and sexual partners. It may also help prevent or avoid attack by enemies and predators. As such, olfaction is a vital function. Olfactory receptors (OR) are transmembrane proteins found on the surface of olfactory neurons, and are the first component of the olfactory transduction cascade. They bind odorant molecules inducing a cascade of protein interactions that transform a primary chemical signal into an electrical signal that is conveyed to the brain; there, it is decoded and stored, leading to an appropriate response [14, 43]. Here, we report the identification of repertoires of OR genes in the genomes of five cichlids, recently sequenced by a consortium led by the BROAD Institute [4].

To identify these five olfactory repertoires as completely as possible, we applied the strategy used by Alioto and Ngai [18] for the analysis of the zebrafish OR repertoire. We performed an exhaustive TBLASTN search for sequences corresponding to a set of fish olfactory receptors retrieved from the literature [17, 18]. Although, all OR genes previously found in vertebrates consist of an uninterrupted open reading frame (ORF) [1114], we did not limit our search to positive hits longer than 700 nucleotides as Alioto and Ngai did [18]. Rather, we retrieved all hits with an e-value cut-off of 1e−50. We then checked each of these candidate genes or gene fragments against a set of non-OR non-TAAR class A GPCR sequences. All genes retained as true ORs shared a number of predicted properties characterizing this class of molecule [5]: an extracellular N-terminal extremity, seven hydrophobic transmembrane stretches of 21 to 26 amino-acids each, and an intracellular C-terminal extremity (Additional file 10). They have an N-glycosylation site, Asn N Ser/Thr, near the N-terminus involved in addressing these proteins to the cellular membrane [36].

Characteristic amino-acid motifs have been identified in all ORs. These patterns of AA as well as their localization inside the molecules have often been used as a means for retrieving the corresponding genes from newly determined genome sequences [13]. In the present study, we started with a different perspective that consisted in the characterization of these proteins as actual OR and not in their mining. We identified with MEME, four AA motifs, shared by the five cichlids as well as D. rerio. Although, minute variations can be observed when comparing the different motifs between these fishes, they looked very much the same. Interestingly, some striking similarities can be observed with the AA motifs characterizing rat and dog ORs [13]. Finally, we noted the presence of cysteine residues at positions shared by all ORs (Additional file 9). These residues are of prime importance for the correct folding of proteins and mutations changing these cysteine residues into another AA have been shown to often impair their function [44, 45]. Moreover, these alignments enable the identification of regions or amino-acid positions strongly conserved and others highly variable, such as in the ligand binding pocket [46, 47]. The birth and death hypothesis as previously described [48, 49] with a relaxed purifying selection favouring multiple amino-acid changes explains the large repertoire of ORs found in numerous species [1114] as well as their partition in many families and subfamilies. Their multiplicity, which enables the detection of a large number of odorant molecules, favours the search for food or sexual partners and as such, is a good observer of evolution. Considering the size of the OR repertoires, even in the absence of formal and definitive numbers owing to the non-completeness of the genome sequences, it appears that the numbers of potentially active OR genes found in these five cichlids would be higher than those identified in other teleosts [17, 18], except zebrafish. In addition, fewer pseudogenes were identified in the cichlid genomes, even if one cannot exclude that some of the edge genes could in fact correspond to pseudogenes. But perhaps more importantly from an evolutionary point of view, the OR cichlid repertoires are dispersed in many more sub-families, than most of the teleost repertoires except for zebrafish (Table 3) Nevertheless, despite this substantial variability of OR sequences, there are numerous ORs which share more than 99% AA sequence identity between species (Table 4).

The discovery of a subset of OR genes in the cichlid genomes made of more than one coding exon came as a surprise. While OR genes belong to Class A of the GPCR superfamily within which numerous genes are made of several coding exons, based on their original discovery, it was assumed that vertebrate OR genes were made of two exons, a 5’ non-coding exon and a second exon encompassing an ORF coding for a protein of 300–330 AA [5]. It was due to this belief that several authors mining genome sequences restricted their search to long ORF or eliminated short ORF afterwards [18].

Definitive proof of their status as OR will be provided by functional studies, implying RNA transcription analysis of olfactory epithelium and identification, at least for some of their ligands. Obviously, such studies are out of the scope of the present paper. There are several arguments, in particular the results of the BLAST analysis and their phylogenetic positions (arguments 3 and 4 developed in the Results section) strongly indicate these multi-coding exon genes code actual ORs. Functional studies are required for a conclusive demonstration that these sequences are indeed active OR genes. This will probably involve analysis of RNA transcription in olfactory epithelium and identification, at least for some of them, of their ligands. Such studies are beyond the scope of the present paper. Nevertheless, our work provides several arguments, in particular the results of the BLAST analysis and the phylogenetic positions of the sequences ((iii) and (iv) in the Results section), strongly indicating that these multi-coding exon genes indeed encode true ORs.

We performed a TBLASTN search to determine whether these multi-coding exon genes are specific to the cichlids or whether they had been overlooked during the mining of other fish genomes. We searched the OR fish repertoires in NCBI and ENSEMBL databases with a set of cichlid multi-exon OR gene sequences. We also inspected, one by one, the AA and gene sequences of the medaka, stickleback and zebrafish OR genes in the ENSEMBL database. A number of OR genes made up of two or more coding exons were found in various fish species (Additional file 11). These preliminary findings strongly suggest that ORs in many fishes, and not only cichlids, can be encoded by multi-coding exon genes.

Given the fact that invertebrate [50, 51] and some fish ORs could have more than one coding exon, a more general question would be, why do mammal ORs have only one coding exon and are the only subgroup of GPCRs with this characteristic? Would the peptides, corresponding to one or a subset of exons that made multicoding exon OR genes, have an Additional function lost during mammalian evolution and leading to the loss of these OR genes? Would some RNA transcripts, corresponding to a subset of exons and with no real coding capacity, regulate the expression of their corresponding OR mRNA? These are matters of speculation.

Methods

The sequences of the five cichlid genomes were determined by the BROAD Institute using DNA samples prepared from a single double-haploid individual of each species, except in the case of M. zebra, which was caught in the wild. (http://www.broadinstitute.org). A dataset of 143 zebrafish ORs and 40 takifugu ORs [17, 18] was used as bait for exhaustive TBLASTN searches (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Candidate genes were then compared to a negative dataset of 247 non-OR and non-TAAR animal GPCRs retrieved from the NCBI and ENSEMBL databases (Additional file 1).

TBLASTN results were filtered with a homemade python script so that candidate OR sequences conformed to the following rules: (1) one or more matches with the positive dataset and (2) no match with the negative dataset using an e-value cut-off of 1e−50. Selected candidates were re-checked using both BLASTX and BLASTP against the fish protein database (NCBI, taxiD: 7898) using default parameters with a cut-off of 1.e-100.

All genes were manually collected, biocurated and translated into protein sequences using Geneious software 6.1 [52]. Incomplete OR genes found at the ends of contigs were annotated as “edges” whereas incomplete OR genes found inside contigs were considered to be “fragments”. Genes with disruptive frameshifts or stop codons were annotated as pseudogenes. For spliced OR genes, predicted sequences and splice sites were manually corrected on the basis of multiple alignment using MAFFT 7 [22] and also by using FSPLICE [41]. The list and sequences of complete cichlid OR genes (spliced or not spliced), pseudogenes, edges and fragments are available as supplementary information (Additional file 2).

Positions of transmembrane domains in selected OR predicted proteins were determined using both TMHMM [53] and PolyPhobius [34].

The deduced AA sequences of all cichlids, zebrafish, sticklebach, tetraodon, takifugu and medaka ORs (Table 1) were aligned using MAFFT 7 with the E-INS version (optimal for sequences with conserved motifs and carrying multiple domains) with default parameters. A classification was proposed based on the estimated relatedness developed by using a bootstrapped maximum-likelihood unrooted tree generated by PHYML (1,000 rounds of bootstrapping) and drawn using FigTree 1.3.1. Thresholds of 40% and 60% AA similarity were used to distinguish between families and subfamilies, respectively, as described by Glusman et al. [19]. The cichlid OR sequences were named according to their phylogenetic positions as follows: Fish Symbol (Bri, Bur, Nye, Til or Zeb for N. brichardi, H. burtoni, P. nyererei, O. niloticus and M. zebra respectively) then “OR”, then p for pseudogene, e for edge or f for fragment followed by a letter or the family and three digits to designate the gene itself. For example, BRIORe.E041 designates the edge OR gene 041 belonging to family E.

Ratios of non-synonymous to synonymous nucleotide substitutions (ω = dN/dS) were calculated with the method of Nei-Gojobori as modified by Zhang et al. [26] using Perl and python scripts to automate the whole process. These ratios were calculated for both the entire proteins and different subregions (i.e. individual transmembrane domains or loop regions).

Conserved motifs were identified in predicted OR protein sequences with the online program Multiple Expectation Maximization for Motif Elicitation (MEME) online program v.4.9.0 [31]. Potential N-glycosylation sites were detected by NetNGlycserver [35]. Only N-glycosylation sites with a “potential” score > 0.5 and board agreement of “++” or higher) were considered as positive in our analyses.

References

  1. Turner GF, Seehausen O, Knight ME, Allender CJ, Robinson RL: How many species of cichlid fishes are there in African lakes?. Mol Ecol. 2001, 10: 793-806.

    Article  CAS  PubMed  Google Scholar 

  2. Kocher TD: Adaptive evolution and explosive speciation: the cichlid fish model. Nat Rev Genet. 2004, 5: 288-298. 10.1038/nrg1316.

    Article  CAS  PubMed  Google Scholar 

  3. FAO: The State of World Fisheries and Aquaculture. 2010, http://www.fao.org/docrep/013/i1820e/i1820e00.htm,

    Google Scholar 

  4. Brawand D, Wagner C, Li YI, Malinsky M, Keller I, Fan S, Simakov O, Ng AY, Wei Lim Z, Bezault E, Turner-Maier J, Johnson J, Alcazar R, Russell P, Aken B, Alföldi J, Amemiya C, Azzouzi N, Barollier JF, Barloy-Hubler F, Berlin A, Bloomquist R, Carleton KL, Conte MA, D'Cotta H, Eshel O, Gaffney L, Galibert F, Gante HF, Gnerre S, et al: The genomic substrate for adaptive radiation: genomes of five African cichlid fish. Nature. in press

  5. Buck L, Axel R: A novel multigene family may encode odorant receptors: a molecular basis for odor recognition. Cell. 1991, 65 (1): 175-187. 10.1016/0092-8674(91)90418-X.

    Article  CAS  PubMed  Google Scholar 

  6. Raming K, Krieger J, Strotmann J, Boekhoff I, Kubick S, Baumstark C, Breer H: Cloning and expression of odorant receptors. Nature. 1993, 361 (6410): 353-356. 10.1038/361353a0.

    Article  CAS  PubMed  Google Scholar 

  7. Pace U, Hanski E, Salomon Y, Lancet D: Odorant-sensitive adenylate cyclase may mediate olfactory reception. Nature. 1985, 316 (6025): 255-258. 10.1038/316255a0.

    Article  CAS  PubMed  Google Scholar 

  8. Bakalyar HA, Reed RR: Identification of a specialized adenylyl cyclase that may mediate odorant detection. Science. 1990, 250 (4986): 1403-1406. 10.1126/science.2255909.

    Article  CAS  PubMed  Google Scholar 

  9. Asanuma N, Nomura H: Cytochemical localization of adenylate cyclase activity in rat olfactory cells. Histochem J. 1991, 23 (2): 83-90. 10.1007/BF01047112.

    Article  CAS  PubMed  Google Scholar 

  10. Laberge F, Hara TJ: Neurobiology of fish olfaction: a review. Brain Res Rev. 2001, 36: 46-59. 10.1016/S0165-0173(01)00064-9.

    Article  CAS  PubMed  Google Scholar 

  11. Godfrey PA, Malnic B, Buck LB: The mouse olfactory receptor gene family. Proc Natl Acad Sci USA. 2004, 101: 2156-2161. 10.1073/pnas.0308051100.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  12. Malnic B, Godfrey PA, Buck LB: The human olfactory receptor gene family. Proc Natl Acad Sci USA. 2004, 101 (8): 2584-2589. 10.1073/pnas.0307882100.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  13. Quignon P, Giraud M, Rimbault M, Lavigne P, Tacher S, Morin E, Retout E, Valin AS, Lindblad-Toh K, Nicolas J, Galibert F: The dog and rat olfactory receptor repertoires. Genome Biol. 2005, 6 (10): R83-10.1186/gb-2005-6-10-r83.

    Article  PubMed Central  PubMed  Google Scholar 

  14. Mombaerts P: Genes and ligands for odorant, vomeronasal and taste receptors. Nat Rev Neurosci. 2004, 5: 263-278. 10.1038/nrn1365.

    Article  CAS  PubMed  Google Scholar 

  15. Keller A, Vosshall LB: Better smelling through genetics: mammalian odor perception. Curr Opin Neurobiol. 2008, 18: 364-369. 10.1016/j.conb.2008.09.020.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  16. Hayden S, Bekaert M, Crider TA, Mariani S, Murphy WJ, Teeling EC: Ecological adaptation determines functional mammalian olfactory subgenomes. Genome Res. 2010, 20: 1-9. 10.1101/gr.099416.109.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  17. Niimura Y, Nei M: Evolutionary dynamics of olfactory receptor genes in fishes and tetrapods. Proc Natl Acad Sci USA. 2005, 102: 6039-6044. 10.1073/pnas.0501922102.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  18. Alioto TS, Ngai J: The odorant receptor repertoire of teleost fish. BMC Genomics. 2005, 6: 173-10.1186/1471-2164-6-173.

    Article  PubMed Central  PubMed  Google Scholar 

  19. Glusman G, Bahar A, Sharon D, Pilpel Y, White J, Lancet D: The olfactory receptor gene superfamily: data mining, classification, and nomenclature. Mamm Genome. 2000, 11 (11): 1016-1023. 10.1007/s003350010196.

    Article  CAS  PubMed  Google Scholar 

  20. Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera , Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, et al: Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004, 428 (6982): 493-521.

    Article  CAS  PubMed  Google Scholar 

  21. Zhang X, Rodriguez I, Mombaerts P, Firestein S: Odorant and vomeronasal receptor genes in two mouse genome assemblies. Genomics. 2004, 83 (5): 802-811. 10.1016/j.ygeno.2003.10.009.

    Article  CAS  PubMed  Google Scholar 

  22. Katoh K, Standley DM: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013, 30: 772-780. 10.1093/molbev/mst010. http://mafft.cbrc.jp/alignment/server/,

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  23. PHYML. http://www.trex.uqam.ca/,

  24. Walker CE, Harmon LJ, Seechausen O: Ecological opportunity and sexual selection together predict adaptative radiation. Nature. 2012, 487: 366-370. 10.1038/nature11144.

    Article  Google Scholar 

  25. Nikaido M, Suzuki H, Toyoda A, Fujiyama A, Hagino-Yamagishi K, Kocher TD, Carleton K, Okada N: Lineage-specific expansion of vomeronasal type 2 receptor-like (OlfC) genes in cichlids may contribute to diversification of amino acid detection systems. Genome Biol Evol. 2013, 5 (4): 711-722. 10.1093/gbe/evt041.

    Article  PubMed Central  PubMed  Google Scholar 

  26. Zhang Z, Li J, Zhao XQ, Wang J, Wong GK, Yu J: KaKs Calculator: calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinformatics. 2006, 4: 259-263. 10.1016/S1672-0229(07)60007-2.

    Article  CAS  PubMed  Google Scholar 

  27. Makalowski W, Bogulski MS: Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc Natl Acad Sci USA. 1998, 95 (16): 9407-9412. 10.1073/pnas.95.16.9407.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  28. Kondo R, Kaneko S, Sun H, Sakaizumi M, Chigusa SI: Diversification of olfactory receptor genes in the Japanese medaka fish, Oryzias latipes. Gene. 2002, 282: 113-120. 10.1016/S0378-1119(01)00843-5.

    Article  CAS  PubMed  Google Scholar 

  29. Gilad Y, Bustamante CD, Lancet D, Paabo S: Natural selection on the olfactory receptor gene family in humans and chimpanzees. Am J Hum Genet. 2003, 73 (3): 489-501. 10.1086/378132.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  30. Robin S, Tacher S, Rimbault M, Vaysse A, Dréano S, André C, Hitte C, Galibert F: Genetic diversity of canine olfactory receptors. BMC Genomics. 2009, 10: 21-10.1186/1471-2164-10-21.

    Article  PubMed Central  PubMed  Google Scholar 

  31. MEME. http://meme.nbcr.net/meme/cgi-bin/meme.cgi,

  32. Liu AH, Zhang X, Stolovitzky GA, Califano A, Firestein SJ: Motif-based construction of a functional map for mammalian olfactory receptors. Genomics. 2003, 81: 443-456. 10.1016/S0888-7543(03)00022-3.

    Article  CAS  PubMed  Google Scholar 

  33. Zozulya S, Echeverri F, Nguyen T: The human olfactory receptor repertoire. Genome Biol. 2001, 2 (6): RESEARCH0018-Epub 2001 Jun 1

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  34. PolyPhobius. http://phobius.sbc.su.se/,

  35. NetNGlycserver. http://www.cbs.dtu.dk/services/NetNGlyc/,

  36. Katada S, Tanaka M, Touhara K: Structural determinants for membrane trafficking and G protein selectivity of a mouse olfactory receptor. J Neurochem. 2004, 90 (6): 1453-1463. 10.1111/j.1471-4159.2004.02619.x.

    Article  CAS  PubMed  Google Scholar 

  37. Young JM, Friedman C, Williams EM, Ross JA, Tonnes-Priddy L, Trask BJ: Different evolutionary processes shaped the mouse and human olfactory receptor gene families. Hum Mol Genet. 2002, 11 (5): 535-546. 10.1093/hmg/11.5.535.

    Article  CAS  PubMed  Google Scholar 

  38. Masters SB, Stroud RM, Bourne HR: Family of G protein alpha chains: amphipathic analysis and predicted structure of functional domains. Protein Eng. 1986, 1 (1): 47-54. 10.1093/protein/1.1.47.

    Article  CAS  PubMed  Google Scholar 

  39. Zhao H, Firestein S: Vertebrate odorant receptors. Cell Mol Life Sci. 1999, 56 (7–8): 647-659.

    Article  CAS  PubMed  Google Scholar 

  40. Wahl MC, Will CL, Lührmann R: The spliceosome: design principles of a dynamic RNP machine. Cell. 2009, 136 (4): 701-718. 10.1016/j.cell.2009.02.009.

    Article  CAS  PubMed  Google Scholar 

  41. FSPLICE. http://linux1.softberry.com/berry.phtml?topic=fsplice&group=programs&subgroup=gfind,

  42. FigTree 1.3.1. http://tree.bio.ed.ac.uk/software/figtree/,

  43. Barber RD, Ronnet GV: Reconstructing smell. Mol Neurobiol. 2000, 21: 161-173. 10.1385/MN:21:3:161.

    Article  CAS  PubMed  Google Scholar 

  44. Tiedge M, Richter T, Richter T: Importance of Cysteine Residues for the stability and Catalytic activity of Human Pancreatic Beta Cell Glucokinase. Arch Biochem Biophys. 2000, 375 (2): 251-260. 10.1006/abbi.1999.1666.

    Article  CAS  PubMed  Google Scholar 

  45. Chen ZW, Jiang CY, She Q, Liu SJ, Zhou PJ: Key role of cysteine residues in catalysis and subcellular localization of sulfur oxygenase-reductase of Acidianus tengchongensis. Appl Environ Microbiol. 2005, 71 (2): 621-628. 10.1128/AEM.71.2.621-628.2005.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  46. Abaffy T, Malhotra A, Luetje CW: The molecular basis for ligand specificity in a mouse olfactory receptor. J Biol Chem. 2007, 282: 1216-1224. 10.1074/jbc.M609355200.

    Article  CAS  PubMed  Google Scholar 

  47. Doszczak L, Kraft P, Weber HP, Bertermann R, Triller A, Hatt H, Tacke R: Prediction of perception: probing the hOR17-4 olfactory receptor model with silicon analogues of bourgeonal and lilial. Angew Chem Int. 2007, 46: 3367-3371. 10.1002/anie.200605002.

    Article  CAS  Google Scholar 

  48. Niimura Y, Nei M: Extensive gains and losses of olfactory receptor genes in mammalian evolution. PLoS One. 2007, 2: e708-10.1371/journal.pone.0000708.

    Article  PubMed Central  PubMed  Google Scholar 

  49. Dong D, He G, Zhang S, Zhang Z: Evolution of olfactory receptor genes in primates dominated by birth-and-death process. Genome Biol Evol. 2009, 1: 258-264.

    Article  PubMed Central  PubMed  Google Scholar 

  50. Vosshall LB, Amrein H, Morozov PS, Rzhetsky A, Axel R: A spatial map of olfactory receptor expression in the Drosophila antenna. Cell. 1999, 96 (5): 725-736. 10.1016/S0092-8674(00)80582-6.

    Article  CAS  PubMed  Google Scholar 

  51. Robertson HM: The large srh family of chemoreceptor genes in Caenorhabditis nematodes reveals processes of genome evolution involving large duplications and deletions and intron gains and losses. Genome Res. 2000, 10 (2): 192-203. 10.1101/gr.10.2.192.

    Article  CAS  PubMed  Google Scholar 

  52. GENEIOUS. http://www.geneious.com,

  53. TMHMM. http://www.cbs.dtu.dk/services/TMHMM/,

Download references

Acknowledgements

This work was supported by the CNRS and the University of Rennes.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francis Galibert.

Additional information

Competing interests

The authors declare that they no competing interests.

Authors’ contributions

NA and FBH performed the computing analysis and analyzed the data. FG designed the experiments and wrote the manuscript. All authors have read and approved the manuscript.

Naoual Azzouzi, Frederique Barloy-Hubler contributed equally to this work.

Electronic supplementary material

Additional file 1: Negative data set composed of 247 non-OR GPCRs retrieved from NCBI database.(PDF 27 KB)

12864_2014_6314_MOESM2_ESM.pdf

Additional file 2: Nucleotide and AA sequences of cichlids, tetraodon, medaka and stickleback OR present in the phylogenetic tree shown in Figure 1. Cichlid ORs are designated by the name of the contig within which they were identified, followed by an acronym indicating the fish species, a capital letter identifying its family, and an Arabic number indicating a particular OR, “s” is for genes with more than 1 coding exon, “p” is for pseudogenes, “e” for edge sequences and “f” for fragments. A sequence can have a combination of more than one of these symbols (for example, see ep). A shorter version of the gene names, from which the contig number is omitted, is found in all the following tables, figures and supplementary materials. Tetraodon, medaka and stickleback sequences correspond to a subset of OR sequences retrieved from NCBI and ENSEMBL databases and validated as true OR through AA multiple alignments and BLAST analysis. (PDF 9 MB)

Additional file 3: Contigs and scaffolds harbouring ORs.(PDF 13 MB)

12864_2014_6314_MOESM4_ESM.pdf

Additional file 4: Phylogenetic tree constructed from the AA sequences of the cichlid ORs identified in Table 1 and Additional file 2 and 143 zebrafish, 73 medaka, 78 stickleback, 40 fugu and 42 tetraodon OR AA sequences (Additional file 2). Fish species are colour coded: O. niloticus in red, M. zebra in pink, N. brichardi in blue, H. burtoni in green, P. nyererei in orange and fish models in black. (PDF 4 MB)

Additional file 5: List of pairs, triplets and quadruplets of genes with 99% of identity or more.(PDF 328 KB)

Additional file 6: Details of dN/dS ratios for families A, D, E, G, H, I, K, L, N, O, P, R, S, and W.(PDF 5 MB)

12864_2014_6314_MOESM7_ESM.zip

Additional file 7: a to f. Details of dN/dS ratios of TM regions, external and internal loops for families D, E, H, L and N.(ZIP 2 KB)

Additional file 8: N-glycosylation sites as predicted by NetNGly Server for each cichlid OR.(PDF 9 MB)

Additional file 9: 2C or 3 C groups of OR AA: MAFFT multiple alignments and LOGO presentation.(PDF 13 MB)

Additional file 10: 2D structure prediction of the cichlid ORs made by PolyPhobius.(PDF 4 MB)

12864_2014_6314_MOESM11_ESM.pdf

Additional file 11: DNA sequences of 6 fish model ORs with more than 1 coding exon. Exons are indicated by bold letters. (PDF 328 KB)

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( https://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Azzouzi, N., Barloy-Hubler, F. & Galibert, F. Inventory of the cichlid olfactory receptor gene repertoires: identification of olfactory genes with more than one coding exon. BMC Genomics 15, 586 (2014). https://doi.org/10.1186/1471-2164-15-586

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1471-2164-15-586

Keywords