Genome-wide modeling of complex phenotypes in Caenorhabditis elegans and Drosophila melanogaster

De, Supriyo; Zhang, Yongqing; Wolkow, Catherine A; Zou, Sige; Goldberg, Ilya; Becker, Kevin G

doi:10.1186/1471-2164-14-580

Methodology article
Open access
Published: 28 August 2013

Genome-wide modeling of complex phenotypes in Caenorhabditis elegans and Drosophila melanogaster

Supriyo De¹,
Yongqing Zhang¹,
Catherine A Wolkow²,
Sige Zou³,
Ilya Goldberg⁴ &
…
Kevin G Becker¹

BMC Genomics volume 14, Article number: 580 (2013) Cite this article

3595 Accesses
1 Citations
Metrics details

Abstract

Background

The genetic and molecular basis for many intermediate and end stage phenotypes in model systems such as C. elegans and D. melanogaster has long been known to involve pleiotropic effects and complex multigenic interactions. Gene sets are groups of genes that contribute to multiple biological or molecular phenomena. They have been used in the analysis of large molecular datasets such as microarray data, Next Generation sequencing, and other genomic datasets to reveal pleiotropic and multigenic contributions to phenotypic outcomes. Many model systems lack species specific organized phenotype based gene sets to enable high throughput analysis of large molecular datasets.

Results and discussion

Here, we describe two novel collections of gene sets in C. elegans and D. melanogaster that are based exclusively on genetically determined phenotypes and use a controlled phenotypic ontology. We use these collections to build genome-wide models of thousands of defined phenotypes in both model species. In addition, we demonstrate the utility of these gene sets in systems analysis and in analysis of gene expression-based molecular datasets and show how they are useful in analysis of genomic datasets connecting multigenic gene inputs to complex phenotypes.

Conclusions

Phenotypic based gene sets in both C. elegans and D. melanogaster are developed, characterized, and shown to be useful in the analysis of large scale species-specific genomic datasets. These phenotypic gene set collections will contribute to the understanding of complex phenotypic outcomes in these model systems.

Background

Traditional experimentation in animal model systems such as the worm Caenorhabditis elegans and the fly Drosophila melanogaster often results in complex molecular and phenotypic outcomes. Frequently a targeted deletion or ectopic expression of a single gene product results in pleiotropic phenotypes. Similarly, broad high-throughput multiplex experimental strategies such as microarray based gene expression, RNA interference (RNAi) screens, or next-generation DNA and RNA sequencing, analyzing phenomena such as development, behavior, mating, diet, and life span, typically produce large datasets requiring complex analytical approaches.

Gene sets are collections of keyword terms with annotated genes derived from multiple sources of a priori information. They have been used in computational analysis of gene expression data [1–3] with the goal of identifying higher order relationships beyond simple gene list results, as well as in analysis of population based GWAS in humans [4, 5]. The most commonly used gene sets include those derived from GO annotations [6], biological pathways from KEGG [7] or BioCarta, expression modules, DNA binding sites, or other sources of molecular information [1, 3, 8]. Each collection of gene sets has its own unique qualities and features which are useful in different ways. For instance, KEGG emphasizes metabolic and biochemical pathways; GO annotations, while having some phenotypic content, emphasizes molecular function, cellular component, and biological processes, while MSigDB [8] emphasizes gene expression signatures. This information is often closely related, or “proximal” to gene and molecular function, rather than more “distal” information regarding phenotypic outcomes and disease susceptibility. Recently, phenotype based gene sets have been derived exclusively from genetically determined phenotypic associations for mouse phenotypes and common human disease [9, 10], resulting in gene sets for specific phenotypes, organized by a structured systematic ontology.

Here, we present gene sets for worm and fly, which use the structured ontology found in the Worm Phenotype Ontology from the C. elegans database - WormBase [11] and phenotypic descriptions for D. melanogaster found in FlyBase [12]. These gene sets are derived from information on gene-phenotype relationships based on genetically determined phenotypes. We use these collections in large scale phenotypic modeling in worms and flies and demonstrate their utility in complex analysis in multiple ways, including analysis of gene expression datasets representing complex phenotypic and biological phenomena in both C. elegans and D. melanogaster. In this way, we integrate large scale genome analysis with large scale phenotypic analysis in these two model systems.

Results

Derivation of worm gene sets

The worm gene sets presented here are derived from two lists of genes and assigned phenotypes provided by Gary Schindelman and Paul Sternberg as a component of the Worm Phenotype Ontology [13]. These two lists originated from information curated from RNAi experiments and genetic variations (VAR) as archived in WormBase [14].

Two worm gene set files (CE- RNAi-GS and CE-VAR-GS) were produced by parsing each gene list separately into non-redundant lists of unique phenotypic terms with all genes assigned to their corresponding phenotypic terms. This produced two non-redundant gene set files containing 850 and 1109 gene sets for RNAi and VAR, respectively. In addition, we developed a master worm file by combining the original RNAi and VAR gene lists into a combined file (CE-Combined-GS) containing 1,385 non-redundant phenotypes and their associated gene sets.

Derivation of fly gene sets

The Drosophila gene sets described here are derived from phenotypic data provided in FlyBase (see Methods). A file containing 259,162 phenotypic descriptions with assigned Drosophila genes was collapsed and parsed resulting in a non-redundant gene set file of 11,999 unique phenotypic terms with annotated genes. This file named DM-narrow-GS was used for systems biology and gene expression analysis.

Table 1 shows representative examples of individual gene sets from the C. elegans and D. melanogaster gene set files. Official gene symbols are shown where available, locus tags (C. elegans) where gene symbols are not available. As in other gene set collections, as the number of genes in any given gene set decreases, the phenotypes progress from broad categories to more specific phenotypic descriptors. The full gene set lists consist of a wide range of developmental, structural, metabolic and behavioral phenotypes, representing a large majority of the experimentally determined phenotypes found in worms and flies. They range from broad phenotype categories such as “sterile”, “slow_growth”, or “larval_arrest” in worms and “viable”, “lethal” and “fertile” in flies; to narrow phenotypic descriptors such as “flaccid”, “DNA_synthesis_variant” or “no_posterior_pharynx” in worms and “ejaculatory_bulb”, “dorsal_vessel_primordium”, or “dense_body” in flies. In addition, there is often overlap of the genes found in related gene sets in both species, emphasizing the contributions of the same genes to multiple phenotypic traits. The complete C. elegans (Additional file 1: Table S1: Additional file 2: Table S2: Additional file 3: Table S3) and D. melanogaster (Additional file 4: Table S4) gene set files are available at this address http://www.grc.nia.nih.gov/branches/rrb/dna/index/Worm-fly_gene_sets_5-9-12.html.

Table 1 Selected Phenotype gene sets

Full size table

General uses of phenotype based gene sets in both worm and fly

As described here, a single gene set is essentially a single phenotypic term followed by a single row of genes that have been associated with that phenotype. A collection of gene sets consists of a list of phenotypic terms with their corresponding gene sets. Gene sets can be used individually, as a collection, or compared across collections in a number of ways including network analysis, genome-wide model representations, hierarchical clustering, gene set analysis (GSA) of microarray data, and principal component analysis (PCA) of gene set values; among others. A property of this collection of gene sets is that they describe complex intermediate and end stage phenotypes as opposed to molecular function or lists of coordinately regulated genes. They can be used in a variety of bioinformatics applications to reveal higher order or emergent biological and phenotypic relationships and to provide insight into the biological relevance of complex molecular datasets.

Network analysis

Each individual gene set can be used to build networks to determine transcriptional regulation or protein-protein interactions. Figure 1 is a representative network of six networks showing regulatory relationships analyzed by Ingenuity Pathway Analysis (IPA) (Ingenuity® Systems, http://www.ingenuity.com) from a single 169 gene, C. elegans gene set, “life span variant”, found in the worm CE-Combined-GS 7-28-2011 file. This analysis identifies members of the gene set (shaded) as well as regulatory or transcriptional partners not found (unshaded) in the original gene set. This network highlights the central role of insulin, ERK family members, and PI3 Kinase as important contributors to longevity in worms.

An example of a network showing regulatory relationships from a single 82 gene “long_lived” gene set, found in the fly gene set file (DM-narrow-GS 9-7-2011), is also shown in Figure 2. Like in the worm, insulin is central in this fly network, as well as ERKs, AKT, and histones, demonstrating significant overlap in age related biochemical pathways between worms and flies. Each individual gene set (one phenotype with one row of annotated genes) produces multiple network diagrams showing the transcriptional neighbors and protein-protein partners of the core genes, while the entire collection of thousands of gene sets would produce many thousands of individual networks relative to phenotypic descriptions.

Genome-wide phenotypic modeling in worms and flies

In addition to analysis of a single gene set, a collection of phenotypic gene sets can be compared to itself to reveal biological relationships between all members of the collection. Figure 3 shows a dendrogram of the combined C. elegans file (CE-Combined-GS), using gene sets, having three or more genes, compared to each other based on the degree of gene sharing between individual gene sets. The overall worm tree (Figure 3) is composed of eleven large branches enriched for related biological functions. Moreover, local relationships within a specific branch suggest functional relationships between closely spaced individual gene lists. For instance, in branch 2 (Additional file 5: Figure S1) cell cycle phenotypes such as “cell cycle timing”, “cell cycle delayed” and “cell cycle variant” are closely positioned in space and close to spindle assembly phenotypes. Likewise, in branch 6 (Additional file 6: Figure S2) Dauer phenotypes are closely aligned with multiple lifespan phenotypes based on individual gene sharing within their respective gene sets. Close apposition of related phenotypes as determined by gene sharing between gene sets is a pervasive feature of these dendrogram displays and represents overlap of related phenotypes being influenced by shared genes.

The Drosophila gene set collection also produced a similar complex dendrogram of phenotypic functional groups based on gene sharing between gene sets (Figure 4). Like the worm dendrogram, individual branches of the fly dendrogram display a functional relatedness within subregions in each branch. For example, chromosome related phenotypes are grouped in branch 2 (Additional file 7: Figure S3) with mitotic and meiotic phenotypes, including meiotic telophase phenotypes, being closely aligned to each other, as well as spermatid and spermatocyte phenotypes. Behavioral, neuronal, and sensory response phenotypes are shown closely aligned in branch 11 of Figure 4 (Additional file 8: Figure S4), demonstrating overlapping genetic control of related complex phenotypes.

Phenotype Gene Set Analysis (GSA) of microarray data and Principal Components Analysis (PCA) of gene sets

C. elegans: In addition to comparisons of gene sets either individually or collectively to themselves, these phenotype gene sets are useful in analysis of microarray based gene expression datasets in worm and fly. Figure 5a illustrates statistically significant gene sets resulting from Gene Set Analysis (GSA) of a single 4 day old larva versus 15 day old whole genome gene expression comparison in a C. elegans aging microarray dataset [15]. This dataset (GEO # GSE21784) represents a 15 day time course with incremental stages of infection with P. areuginosa. Statistically significant up-regulated gene sets include germ cell gene groups, as well as meiosis and cell division gene sets, among others. Down-regulated gene sets include gene groups involved in body vacuoles, as well as alae and cuticle formation. Figure 5b is a heat map of the significant changes across the entire time course.

Figure 6 shows changes in selected gene sets from a different aging time course in C. elegans over 24 days [16] (GEO # GSE12290). Aging related increases (Figure 6a) or decreases (Figure 6b) in gene groups related to locomotion, energy metabolism, and life span are highlighted.

In addition to GSA of microarray data the gene set values derived from gene expression data can be further analyzed by principal components analysis (PCA) using the Z-score values of the original gene set data output. This is in contrast to more commonly described PCA resulting from individual gene expression values. Figure 7 shows tight grouping of individual biological samples within three groups; larvae, adult day 6, and adult day 15, and dramatic separation of time points within the experiment, based solely on PCA analysis of the gene sets values from the previous gene set analysis. This demonstrates that there is useful biological information content in the aggregate gene set results, in addition to that found in any individual gene set, which can discriminate between discrete biological states.

D. melanogaster: In a similar fashion to the worm (above), microarray data from young versus aging flies was analyzed with the Drosophila gene set file DM-narrow-GS containing 11,999 gene sets. Gene set analysis was performed using the WEB-PAGE gene set analysis tool [10] on a dataset of gene expression values from young versus old flies [17] (GEO# GSE22437). The top 100 statistically significant enriched gene sets using Z ratios of the expression values from day 10 versus day 40 fly heads is shown in Figure 8. Over enriched gene sets include minute phenotypes, life span, as well as developmental growth rate phenotypes, among others. The discriminative ability of PCA using gene set Z-scores (as opposed to individual gene values) is illustrated using the individual samples of day 10 versus day 40 fly heads in Figure 8.

Conclusion

Here we describe genome-wide phenotypic modeling using gene sets based on gene-phenotypic assignments in C. elegans and D. melanogaster. Unlike previous gene set collections such as KEGG, GO, MSigDB, in these and other species, every gene in every gene set described here is based on genetic evidence contributing to each specific phenotype. Although very useful, these gene sets should be considered a first generation. They may not be complete. Some may describe certain phenotypes in different developmental contexts, or in particular applications and not in others. In addition, many subtleties and details were not included in deriving these gene sets including penetrance of different alleles, strain differences, and environmental modifiers. Moreover, these gene sets may produce different results depending on the statistical algorithms used in complex analysis.

However, we have demonstrated these gene sets can be used to identify complex higher order biological and genetic relationships through network analysis, whole genome phenotypic modeling, and analysis of complex molecular datasets. They will help elucidate complex multigenic relationships between genes and phenotypes in worms and flies in many experimental and biological contexts and will provide a bridge for phenotypic comparisons between model and intermediate species.

Methods

Derivation of phenotypic gene sets

Worm

Phenotype-gene lists obtained from WormBase on 4/24/11 were titled RNAi and VAR. RNAi, consisted of 34,433 gene phenotype pairs having 7,289 unique genes and 850 unique phenotypes. These phenotypes were the results of observations of phenotypes from knockdown of the gene products (RNAi experiments). The list VAR contained 8,440 records, having 2,165 unique genes, and 1,109 unique phenotypes and was the result of observations of phenotypes from genetic mutations as deposited in WormBase. The overlap between each file consists of 1,410 genes and 237 phenotypes.

Phenotype gene set files were created by parsing the original gene lists into non-redundant phenotype lists with annotated genes using a custom Perl script as previously described [9]. This was done for RNAi and VAR independently, as well as combined to create the gene set files; CE-RNAi-GS 7-26-11, CE-VAR-GS 7-26-11, and CE-Combined-GS 7-28-11. The resultant individual Phenotype Gene set names are identical to the Phenotype descriptors found in the original WormBase Phenotype file. These files can be downloaded here: http://www.grc.nia.nih.gov/branches/rrb/dna/index/Worm-fly_gene_sets_5-9-12.html.

Fly

Phenotypes and gene assignments were obtained from FlyBase on 9-11-11 at this web address: http://FlyBase.org/static_pages/downloads/FB2011_07/alleles/allele_phenotypic_data_fb_2011_07.tsv.gz. This file began with 259,162 phenotypic descriptions with assigned Drosophila genes. Redundant phenotype-gene combinations were removed resulting in a list of 154,428 unique phenotype-single gene pairs. Parsing of this file resulted in a non-redundant gene set file of 11,999 unique phenotypic terms with annotated genes. The resultant individual Phenotype Gene set names are identical to the Phenotype descriptors found in the original FlyBase Phenotype file. This Phenotype gene set file named DM-narrow-GS 9-7-2011, can be downloaded here: http://www.grc.nia.nih.gov/branches/rrb/dna/data/worm-fly/DM-narrow-GS_9-7-2011.txt.

Gene set nomenclature

It should be noted that nomenclature of many phenotype gene sets in both worm and fly often have a directionality in the name which may or may not be relevant to any given microarray or other analysis. Please see Additional file 9: S7 for an explanation of directionality in gene set nomenclature and interpretation in their use.

Network analysis

Networks for C. elegans and D. melanogaster were produced using Ingenuity Pathway Analysis (IPA) (Ingenuity® Systems, http://www.ingenuity.com). Using the “life_span_variant” gene set in C. elegans generated on 7-26-2011, and the “long_lived” gene set in D. melanogaster generated on 12-07-2011. The input and output files can be found here for C. elegans (Additional file 10: Table S5) and D. melanogaster (Additional file 11: Table S6).

Genome-wide phenotypic modeling

Genome-wide dendrograms were produced by a unique method similar to phylogenetic classification as previously described [9]. Briefly, the distance between each phenotypic gene set was calculated by pairwise comparison of every gene set pair by finding the number of common genes between each pair and dividing that number by the number of genes in the smallest group of the pair, resulting in a correlation value between 1 and 0 for each pair. This was done for all gene sets to produce a distance matrix. This number was then subtracted from 1because if two lists are identical (100 % match) then the resultant distance should be 0. This is represented as:

d_{i, j} = 1 - \frac{N (C_{i} \cap C_{j})}{min [N (C_{i}), N (C_{j})]}

when i ≠ j. If i = j then d = 0

Where: C_k: Genes in each disease set (where k = i,j) ; N(C_k): Number of genes in each disease set (where k = i,j) ; d_ij is the pairwise distance ; i,j: index of genes in each disease set where; i = 1,2,3,………,n ; j = 1,2,3,………,m.

The gene set relationships were calculated from the distance matrix using the Fitch program [18]. It calculates the relationships based on the Fitch and Margoliash method of constructing the phylogenetic trees using the following formula (from the Phylip manual):

Sum_of_squares = \sum_{i} \sum_{j} \frac{n_{ij} {(D_{ij} - d_{ij})}^{2}}{D_{ij}^{P}}

where D is the observed distance between gene sets i and j and d is the expected distance, computed as the sum of the lengths of the segments of the tree from gene set i to gene set j. The quantity n is the number of times each distance has been replicated. In simple cases n is taken to be one. If n is chosen more than 1, the distance is then assumed to be a mean of those replicates. The power P is what distinguished between the Fitch and Neighbor-Joining methods. For the Fitch- Margoliash method P is 2.0 and for Neighbor-Joining method it is 0.0. The resulting coefficient matrix file was displayed using the Phylodraw graphics program [19].

Gene set analysis

This analysis used the Disease/Phenotype WEB-PAGE GSA web tool[10] using the PAGE algorithm [2] with the CE-Combined-GS gene set file excluding gene sets containing over 500 and less than 3 genes. Briefly, for each gene set a Z score was computed as, $Z_{phenotype} (i) = \frac{\sqrt{n_{i} - 1} • dif f_{i}}{σ_{a}}$ In which the phenotype index i = 1,2,…,K; where K is the total number of the disease phenotypes we included in our data set; n_I is the number of genes in the sub-group of phenotype i in the current sample array; σ_A is the standard deviation of the current gene expression changes of the sample. Diff(i): is the difference between the mean value of gene expression changes in the subgroup disease phenotype (i) (GC_I ) and the mean value of the gene expression changes on the whole sample (GC_A) i.e. $dif f_{i} = \bar{G C_{i}} - {\bar{G C}}_{a}$ . The empirical p-value of the disease phenotype i changes is described by: $p (i) = 2 [1 - Φ (\frac{dif f_{i}}{σ (dif f_{i})})]$ in which Φ(x) is the standard normal distribution function with the variable as X = DIFF_I/σ(DIFFF_I). σ(DIFF_I) is the standard deviation of the difference for gene expression changes between phenotype subgroup (i) and the whole array $σ (dif f_{i}) = \sqrt{\frac{σ_{i}^{2}}{n_{i}} + \frac{σ_{a}^{2}}{n_{a}}}$ σ_I is the standard deviation of the average gene expression changes in the disease phenotype (i). N_A is the total number of genes in the whole sample set. The plots were drawn with R-statistical programming language (R Development Core Team 2005) using either calculated or absolute z-score values.

Principal components analysis

Principal components analysis was performed on the gene set Z values using DIANE 8.0 a JMP based software package (http://www.grc.nia.nih.gov/branches/rrb/dna/diane_software.pdf) based on the Singular Value Decomposition (SVD) function in JMP 9.0. In short, the data was organized as m × n matrix where m is the different samples (columns) and n is gene set Z-values (rows), mean of each row was subtracted and SVD was calculated using JMP’s in-built SVD function as illustrated in this document: http://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf and also used in this script: http://abs.cit.nih.gov/MSCLtoolbox.

Data access

The complete C. elegans and D. melanogaster gene set files are available at this address: http://www.grc.nia.nih.gov/branches/rrb/dna/index/Worm-fly_gene_sets_5-9-12.html.

References

Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E: PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003, 34 (3): 267-273. 10.1038/ng1180.
Article CAS PubMed Google Scholar
Kim SY, Volsky DJ: PAGE: parametric analysis of gene set enrichment. BMC Bioinforma. 2005, 6: 144-10.1186/1471-2105-6-144.
Article Google Scholar
Nam D, Kim SY: Gene-set approach for expression pattern analysis. Brief Bioinform. 2008, 9 (3): 189-197. 10.1093/bib/bbn001.
Article PubMed Google Scholar
Nam D, Kim J, Kim SY, Kim S, 4.: GSA-SNP: a general approach for gene set analysis of polymorphisms. Nucleic Acids Res. 2010, 38: 749-754. 10.1093/nar/gkq428. Web Server issue
Article Google Scholar
Wang L, Jia P, Wolfinger RD, Chen X, Zhao Z: Gene set analysis of genome-wide association studies: methodological issues and perspectives. Genomics. 2011, 98 (1): 1-8.
Article CAS PubMed Google Scholar
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29.
CAS PubMed Google Scholar
Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M: KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2012, 40: 109-114. Database issue
Article Google Scholar
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005, 102 (43): 15545-15550. 10.1073/pnas.0506580102.
Article PubMed Central CAS PubMed Google Scholar
Zhang Y, De S, Garner JR, Smith K, Wang SA, Becker KG: Systematic analysis, comparison, and integration of disease based human genetic association data and mouse genetic phenotypic information. BMC Med Genomics. 2010, 3: 1-10.1186/1755-8794-3-1.
Article PubMed Central PubMed Google Scholar
De S, Zhang Y, Garner JR, Wang SA, Becker KG: Disease and phenotype gene set analysis of disease-based gene expression in mouse and human. Physiol Genomics. 2010, 42A (2): 162-167. 10.1152/physiolgenomics.00008.2010.
Article PubMed Central CAS PubMed Google Scholar
Yook K, Harris TW, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, Dela Cruz N, Duong A, Fang R: WormBase 2012: more genomes, more data, new website. Nucleic Acids Res. 2012, 40: 735-741. 10.1093/nar/gkr954. Database issue
Article Google Scholar
McQuilton P, St Pierre SE, Thurmond J: FlyBase 101--the basics of navigating FlyBase. Nucleic Acids Res. 2012, 40: 706-714. 10.1093/nar/gkr1030. Database issue
Article Google Scholar
Schindelman G, Fernandes JS, Bastiani CA, Yook K, Sternberg PW: Worm Phenotype Ontology: integrating phenotype data within and beyond the C. elegans community. BMC Bioinforma. 2011, 12: 32-10.1186/1471-2105-12-32.
Article CAS Google Scholar
Harris TW, Antoshechkin I, Bieri T, Blasiar D, Chan J, Chen WJ, De la Cruz N, Davis P, Duesbury M, Fang R: WormBase: a comprehensive resource for nematode research. Nucleic Acids Res. 2010, 38 (Database issue): 463-467.
Article Google Scholar
Youngman MJ, Rogers ZN, Kim DH: A decline in p38 MAPK signaling underlies immunosenescence in Caenorhabditis elegans. PLoS Genet. 2011, 7 (5): e1002082-10.1371/journal.pgen.1002082.
Article PubMed Central CAS PubMed Google Scholar
Golden TR, Hubbard A, Dando C, Herren MA, Melov S: Age-related behaviors have distinct transcriptional profiles in Caenorhabditis elegans. Aging Cell. 2008, 7 (6): 850-865. 10.1111/j.1474-9726.2008.00433.x.
Article PubMed Central CAS PubMed Google Scholar
Wood JG, Hillenmeyer S, Lawrence C, Chang C, Hosier S, Lightfoot W, Mukherjee E, Jiang N, Schorl C, Brodsky AS: Chromatin remodeling in the aging genome of Drosophila. Aging Cell. 2010, 9 (6): 971-978. 10.1111/j.1474-9726.2010.00624.x.
Article PubMed Central CAS PubMed Google Scholar
Felsenstein J: An alternating least squares approach to inferring phylogenies from pairwise distances. Syst Biol. 1997, 46 (1): 101-111. 10.1093/sysbio/46.1.101.
Article CAS PubMed Google Scholar
Fitch WM, Margoliash E: Construction of phylogenetic trees. Science. 1967, 155 (3760): 279-284. 10.1126/science.155.3760.279.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors would like to thank Gary Schindelman and Paul Sternberg from WormBase for providing gene-phenotype files and Dr Elin Lehrmann for critical reading of the manuscript. This research was supported entirely by the Intramural Research Program of the NIH, National Institute on Aging.

Author information

Authors and Affiliations

Gene Expression and Genomics Unit, Laboratory of Genetics, National Institute on Aging, National Institutes of Health, Biomedical Research Center, 251 Bayview Boulevard, Baltimore, MD, 21224, USA
Supriyo De, Yongqing Zhang & Kevin G Becker
Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
Catherine A Wolkow
Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Biomedical Research Center, 251 Bayview Boulevard, Baltimore, MD, 21224, USA
Sige Zou
Image Informatics and Computational Biology Unit, Laboratory of Genetics, National Institute on Aging, National Institutes of Health, Biomedical Research Center, 251 Bayview Boulevard, Baltimore, MD, 21224, USA
Ilya Goldberg

Authors

Supriyo De
View author publications
You can also search for this author in PubMed Google Scholar
Yongqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Catherine A Wolkow
View author publications
You can also search for this author in PubMed Google Scholar
Sige Zou
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Goldberg
View author publications
You can also search for this author in PubMed Google Scholar
Kevin G Becker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin G Becker.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

SD participated in study design and implemented the graphing algorithm. YZ participated in study design and developed the primary gene set files for both species. CW, SZ, and IG provided biological insights into the relevance and applicability in both C. elegans and D melanogaster. KGB conceived the study design, participated in gene set development, ran analysis, and wrote the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material

Additional file 1: Table S1: The complete phenotype gene sets for C. elegans. (TXT 416 KB)

Additional file 2: Table S2: The RNAi phenotype gene sets for C. elegans. (TXT 329 KB)

Additional file 3: Table S3: The VAR phenotype gene sets for C. elegans. (TXT 120 KB)

Additional file 4: Table S4: The complete phenotype gene sets for D. melanogaster. (TXT 2 MB)

Additional file 5: Figure S1: Branch 2 of the gene set dendrogram of C. elegans. (PDF 440 KB)

Additional file 6: Figure S2: Branch 6 of the gene set dendrogram of C. elegans. (PDF 511 KB)

Additional file 7: Figure S3: Branch 2 of the gene set dendrogram of D. melanogaster. (PDF 628 KB)

Additional file 8: Figure S4: Branch 11 of the gene set dendrogram of D. melanogaster. (PDF 499 KB)

Additional file 9: S7: Directionality in gene set nomenclature and interpretation in their use. (TXT 2 KB)

Additional file 10: Table S5: The gene set for C. elegans for “life_span_variant”. (TXT 2 KB)

Additional file 11: Table S6: The gene set for D. melanogaster for “long lived”. (TXT 1 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

De, S., Zhang, Y., Wolkow, C.A. et al. Genome-wide modeling of complex phenotypes in Caenorhabditis elegans and Drosophila melanogaster. BMC Genomics 14, 580 (2013). https://doi.org/10.1186/1471-2164-14-580

Download citation

Received: 14 February 2013
Accepted: 23 May 2013
Published: 28 August 2013
DOI: https://doi.org/10.1186/1471-2164-14-580

Genome-wide modeling of complex phenotypes in Caenorhabditis elegans and Drosophila melanogaster

Abstract

Background

Results and discussion

Conclusions

Background

Results

Derivation of worm gene sets

Derivation of fly gene sets

General uses of phenotype based gene sets in both worm and fly

Network analysis

Genome-wide phenotypic modeling in worms and flies

Phenotype Gene Set Analysis (GSA) of microarray data and Principal Components Analysis (PCA) of gene sets

Conclusion

Methods

Derivation of phenotypic gene sets

Worm

Fly

Gene set nomenclature

Network analysis

Genome-wide phenotypic modeling

Gene set analysis

Principal components analysis

Data access

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us