Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

Mining for genotype-phenotype relations in Saccharomyces using partial least squares

Tahir Mehmood1*, Harald Martens23, Solve Sæbø1, Jonas Warringer24 and Lars Snipen1

Author Affiliations

1 Biostatistics, Department of Chemistry, Biotechnology and Food Sciences, Norwegian University of Life Sciences, Norway

2 Centre of Integrative Genetics (CIGENE), Norway

3 Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, Norway

4 Department of Cell and Molecular Biology, University of Gothenburg, Sweden

For all author emails, please log on.

BMC Bioinformatics 2011, 12:318  doi:10.1186/1471-2105-12-318

Published: 3 August 2011

Abstract

Background

Multivariate approaches are important due to their versatility and applications in many fields as it provides decisive advantages over univariate analysis in many ways. Genome wide association studies are rapidly emerging, but approaches in hand pay less attention to multivariate relation between genotype and phenotype. We introduce a methodology based on a BLAST approach for extracting information from genomic sequences and Soft- Thresholding Partial Least Squares (ST-PLS) for mapping genotype-phenotype relations.

Results

Applying this methodology to an extensive data set for the model yeast Saccharomyces cerevisiae, we found that the relationship between genotype-phenotype involves surprisingly few genes in the sense that an overwhelmingly large fraction of the phenotypic variation can be explained by variation in less than 1% of the full gene reference set containing 5791 genes. These phenotype influencing genes were evolving 20% faster than non-influential genes and were unevenly distributed over cellular functions, with strong enrichments in functions such as cellular respiration and transposition. These genes were also enriched with known paralogs, stop codon variations and copy number variations, suggesting that such molecular adjustments have had a disproportionate influence on Saccharomyces yeasts recent adaptation to environmental changes in its ecological niche.

Conclusions

BLAST and PLS based multivariate approach derived results that adhere to the known yeast phylogeny and gene ontology and thus verify that the methodology extracts a set of fast evolving genes that capture the phylogeny of the yeast strains. The approach is worth pursuing, and future investigations should be made to improve the computations of genotype signals as well as variable selection procedure within the PLS framework.