Annotating novel genes by integrating synthetic lethals and genomic information
1 Institute of Plant Science, ETH Zurich, Universitaetsstr. 2, 8092 Zurich, Switzerland
2 Seminar for Statistics, ETH Zurich, Leonhardstr. 27, 8092 Zurich, Switzerland
3 Institute of Biochemistry, ETH Zurich, Schafmattstr. 18, 8093 Zurich, Switzerland
4 Friedrich Miescher Institute, Maulbeerstrasse 66, Basel, Switzerland
5 Competence Center for Systems Physiology and Metabolic Diseases (CC-SPMD), Zurich, Switzerland
BMC Systems Biology 2008, 2:3 doi:10.1186/1752-0509-2-3Published: 14 January 2008
Large scale screening for synthetic lethality serves as a common tool in yeast genetics to systematically search for genes that play a role in specific biological processes. Often the amounts of data resulting from a single large scale screen far exceed the capacities of experimental characterization of every identified target. Thus, there is need for computational tools that select promising candidate genes in order to reduce the number of follow-up experiments to a manageable size.
We analyze synthetic lethality data for arp1 and jnm1, two spindle migration genes, in order to identify novel members in this process. To this end, we use an unsupervised statistical method that integrates additional information from biological data sources, such as gene expression, phenotypic profiling, RNA degradation and sequence similarity. Different from existing methods that require large amounts of synthetic lethal data, our method merely relies on synthetic lethality information from two single screens. Using a Multivariate Gaussian Mixture Model, we determine the best subset of features that assign the target genes to two groups. The approach identifies a small group of genes as candidates involved in spindle migration. Experimental testing confirms the majority of our candidates and we present she1 (YBL031W) as a novel gene involved in spindle migration. We applied the statistical methodology also to TOR2 signaling as another example.
We demonstrate the general use of Multivariate Gaussian Mixture Modeling for selecting candidate genes for experimental characterization from synthetic lethality data sets. For the given example, integration of different data sources contributes to the identification of genetic interaction partners of arp1 and jnm1 that play a role in the same biological process.