Email updates

Keep up to date with the latest news and content from BMC Biology and BioMed Central.

Journal App

google play app store
Open Access Research article

The Caenorhabditis chemoreceptor gene families

James H Thomas1* and Hugh M Robertson2

Author Affiliations

1 Department of Genome Sciences, University of Washington, Seattle, WA, USA

2 Department of Entomology, University of Illinois, Urbana-Champaign, IL, USA

For all author emails, please log on.

BMC Biology 2008, 6:42  doi:10.1186/1741-7007-6-42

Published: 6 October 2008

Additional files

Additional file 1:

All predicted Caenorhabditis elegans chemoreceptor proteins, including those encoded by putative defective genes. The '~' character in last field of the fasta name indicates that the protein is likely to be defective, followed by a code for the nature of the probable defect: '#' indicates a deletion, '*' indicates a stop codon, and '!' indicates some other defect (usually a splice-site defect). Each putative defect is listed separately, so names with more than one defect code have multiple defects and are likely to represent fixed pseudogenes. We have worked closely with WormBase in updating gene models; other than defective genes, the vast majority of these predictions should correspond exactly to the current WormBase model (WS170 frozen release).

Format: XLS Size: 21KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

Alignments of Sra superfamily proteins. Each panel shows 20 randomly sampled full-length members of one family within the Sra superfamily. Background shading is proportional to the sum-of-pairs alignment score of each residue relative to the aligned column. Approximate positions of predicted transmembrane domains are marked with bars below the alignment. Domains predicted to be extracellular are marked 'OUT' below the alignment. Probable extracellular disulfide bonds are marked above the alignments. In cases where there is more than one potential disulfide bond, the pairs were inferred by covariance in presence among family members (including other proteins not shown).

Format: XLS Size: 69KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 3:

Alignments of Srg superfamily proteins. See Additional file 2 for the legend.

Format: JPEG Size: 5.8MB Download file

Open Data

Additional file 4:

Alignments of Str superfamily proteins. See Additional file 2 for the legend.

Format: JPEG Size: 8.5MB Download file

Open Data

Additional file 5:

Alignments of srbc family proteins. See Additional file 2 for the legend.

Format: JPEG Size: 7.2MB Download file

Open Data

Additional file 6:

Alignments of srsx family proteins. See additional file 2 legend.

Format: JPEG Size: 1.5MB Download file

Open Data

Additional file 7:

Alignments of srw family proteins. See Additional file 2 for the legend.

Format: JPEG Size: 1.4MB Download file

Open Data

Additional file 8:

Alignments of srz family proteins. See Additional file 2 for the legend.

Format: JPEG Size: 1.4MB Download file

Open Data

Additional file 9:

Table presenting the summary of chemoreceptor gene families in Caenorhabditis elegans. Good genes refers to the number of genes we predict will encode functional receptors in the reference N2 genome. Defective genes are all other genes that encode at least half of the family-typical protein; they are about equally divided between those with a single defect (flatliners, potentially defective alleles in N2 [28]) and those with multiple defects (presumed fixed pseudogenes).

Format: JPEG Size: 1.5MB Download file

Open Data

Additional file 10:

Table presenting the summary of evolutionary properties of chemoreceptor genes from Caenorhabditis elegans, C. briggsae, and C. remanei. Metabotropic neurotransmitter and FRMF-amide receptor related genes are also shown for comparison. Fraction strict orthologs is the fraction of C. elegans genes with single orthologs in both C. briggsae and C. remanei, as determined the protein tree. Fraction clustered in the genome is the fraction of C. elegans genes that have another family member located within five genes in the genome. Tree gene number indicates the number of genes used for protein tree analysis (see Methods for specifics). For the C. elegans tree gene number column, the number in parentheses is the number of genes predicted to encode functional receptors in the reference N2 genome. Naively, we expect that a similar fraction of genes from the other two species will be functional in their respective reference genomes. For example, in the srh family there will be (218/294) × 214 functional genes in C. remanei and (218/294) × 165 functional genes in C. briggsae.

Format: JPEG Size: 1.2MB Download file

Open Data

Additional file 14:

Maximum-likelihood tree of SRA proteins.Caenorhabditis elegans names are green, C. briggsae names are blue, and C. remanei names are red. Species-specific clades are emphasized by having their branch lines match the species color. The smaller inset is the same tree with names removed, which shows the tree structure more clearly. Open circles on branches indicate a branch support value of 0.9 or higher, as computed by phyml-alrt. Strict ortholog trios (1-1-1) are marked with a filled black square. The tree was rooted by inclusion of a sampling of SRAB proteins (not shown). The scale bar indicates number of amino acid changes per site in the large tree. Each name includes a species identifier, gene identifiers, and genome start and end coordinates for the corresponding gene model. The C. elegans gene names include both a standard genome project name (for example, F28C12.3) and a genetic gene name (for example, sra-19). The C. briggsae gene name is the brigpep WormBase name when applicable (for example, CBG04324) or an arbitrarily numbered GeneWise prediction number (for example, gw15). The C. remanei names are either the WormBase wum gene prediction (for example, wum.4.1), the WormBase genefinder prediction (for example, gf170), or an arbitrarily numbered GeneWise prediction number (for example, gw16). The wum or gf names combined with the supercontig number uniquely identify the prediction in the current C. remanei prediction set on WormBase. The sequences analyzed are given in Additional files 11 to 13. All trees are available in Newick format upon request.

Format: JPEG Size: 2.4MB Download file

Open Data

Additional file 15:

Maximum-likelihood tree of SRAB proteins. See Additional file 14 for the legend. The tree was rooted by inclusion of a sampling of SRA proteins (not shown).

Format: JPEG Size: 2.3MB Download file

Open Data

Additional file 16:

Maximum-likelihood tree of SRB proteins. See Additional file 14 for the legend. The tree was rooted by inclusion of a sampling of SRAB proteins (not shown).

Format: JPEG Size: 3.3MB Download file

Open Data

Additional file 17:

Maximum-likelihood tree of SRBC proteins. See Additional file 14 for the legend. The tree is unrooted.

Format: JPEG Size: 8.8MB Download file

Open Data

Additional file 18:

Maximum-likelihood tree of SRD proteins. See Additional file 14 for the legend. The tree was rooted by inclusion of a sampling of STR proteins (not shown).

Format: JPEG Size: 3.2MB Download file

Open Data

Additional file 19:

Maximum-likelihood tree of SRE proteins. See Additional file 14 for the legend. The tree was rooted by inclusion of a sampling of SRA proteins (not shown).

Format: JPEG Size: 1.7MB Download file

Open Data

Additional file 20:

Maximum-likelihood tree of SRG proteins. See Additional file 14 for the legend. The tree was rooted by inclusion of a sampling of SRU proteins (not shown).

Format: JPEG Size: 1.4MB Download file

Open Data

Additional file 21:

Maximum-likelihood tree of SRH proteins. See Additional file 14 for the legend. The tree was rooted by inclusion of a sampling of SRI proteins (not shown).

Format: JPEG Size: 3MB Download file

Open Data

Additional file 22:

Maximum-likelihood tree of SRI proteins. See Additional file 14 for the legend. The tree was rooted by inclusion of a sampling of SRH proteins (not shown).

Format: JPEG Size: 2.4MB Download file

Open Data

Additional file 23:

Maximum-likelihood tree of SRJ proteins. See Additional file 14 for the legend. The tree was rooted by inclusion of a sampling of STR proteins (not shown).

Format: JPEG Size: 1.7MB Download file

Open Data

Additional file 24:

Maximum-likelihood tree of SRSX proteins. See Additional file 14 for the legend. The tree is unrooted.

Format: JPEG Size: 4.7MB Download file

Open Data

Additional file 25:

Maximum-likelihood tree of SRT proteins. See Additional file 14 for the legend. The tree was rooted by inclusion of a sampling of SRX proteins (not shown).

Format: JPEG Size: 5.1MB Download file

Open Data

Additional file 26:

Maximum-likelihood tree of SRU proteins. See Additional file 14 for the legend. The tree was rooted by inclusion of a sampling of SRV proteins (not shown).

Format: JPEG Size: 927KB Download file

Open Data

Additional file 27:

Maximum-likelihood tree of SRV proteins. See Additional file 14 for the legend. The tree was rooted by inclusion of a sampling of SRU proteins (not shown).

Format: JPEG Size: 3.1MB Download file

Open Data

Additional file 28:

Maximum-likelihood tree of SRW proteins. See Additional file 14 for the legend. The tree is unrooted.

Format: JPEG Size: 5.8MB Download file

Open Data

Additional file 29:

Maximum-likelihood tree of SRX proteins. See Additional file 14 for the legend. The tree was rooted by inclusion of a sampling of SRT proteins (not shown).

Format: JPEG Size: 1.5MB Download file

Open Data

Additional file 30:

Maximum-likelihood tree of SRXA proteins. See Additional file 14 for the legend. The tree was rooted by inclusion of a sampling of SRV proteins (not shown).

Format: JPEG Size: 5.1MB Download file

Open Data

Additional file 31:

Maximum-likelihood tree of SRZ proteins. See Additional file 14 for the legend. The tree is unrooted.

Format: JPEG Size: 6.9MB Download file

Open Data

Additional file 32:

Maximum-likelihood tree of STR proteins. See Additional file 14 for the legend. The tree was rooted by inclusion of a sampling of SRJ proteins (not shown).

Format: JPEG Size: 4.5MB Download file

Open Data

Additional file 33:

Color-coded positions of all Caenorhabditis elegans genes from 19 chemoreceptor families. The center position of every gene is shown as a grey circle and the chemoreceptor genes are filled and colored. The vertical position of a gene has no significance: it is used merely to space the genes for presentation. Sequence coordinates are shown at the top of each chromosome. Sra superfamily members are shown in shades of green, Srg superfamily members are shown in shades of blue, Str superfamily members are shown in shades of red, and each solo family is shown in a distinct color. Notable concentrations of chemoreceptor genes are apparent on both arms of chromosome II, on the left arm of chromosome IV, and on much of chromosome V, especially both arms.

Format: FAST Size: 617KB Download file

Open Data

Additional file 34:

Table of the summary of systematic analysis of positive selection. Analysis included all usable clades of Caenorhabditis elegans paralogs from all gene families (see Methods). The specific family sequence file is given for internal reference. The number of sequences in each clade and their mean length in codons is given. Key results from codeml analysis are shown for each clade, including the value of the added 11th dN/dS class from model 8 and the delta maximum-likelihood value used for statistical testing. The Bonferoni corrected P-value was computed using a χ-square test with two degrees of freedom. Highly significant evidence of positive selection was found for two clades of srz genes (a more detailed analysis of this family is published elsewhere [6]). Among other families, only one clade of str genes had a marginal significance (set N).

Format: FAST Size: 608KB Download file

Open Data

Additional file 11:

All Caenorhabditis elegans proteins used in protein tree analysis. The genome start and end position and family are given as part of the fasta name. Coding strand is implied by the order of the two genome coordinates. The list includes some possibly defective proteins if they met our criteria for inclusion in tree analysis (see Methods).

Format: JPEG Size: 992KB Download file

Open Data

Additional file 12:

All Caenorhabditis briggsae proteins used in protein tree analysis. The second field indicates the WormBase gene identifier or an arbitrary GeneWise number (see Methods). See also Additional file 11.

Format: JPEG Size: 850KB Download file

Open Data

Additional file 13:

All Caenorhabditis remanei proteins used in protein tree analysis. The second field indicates the wum gene identifier, the genefinder identifier, or an arbitrary GeneWise number (see Methods). See also Additional file 11.

Format: JPEG Size: 2.4MB Download file

Open Data

Additional file 35:

Maximum-likelihood tree of Sra superfamily proteins. Family members are shown in the same color. Open circles on branches indicate a branch support value of 0.9 or higher, as computed by phyml-alrt.

Format: FAST Size: 405KB Download file

Open Data

Additional file 36:

Maximum-likelihood tree of Srg superfamily proteins. See Additional file 35 for the legend.

Format: FAST Size: 798KB Download file

Open Data

Additional file 37:

Maximum-likelihood tree of Srt superfamily proteins. See Additional file 35 for the legend.

Format: XLS Size: 31KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data