Open Access Methodology article

PhenoLink - a web-tool for linking phenotype to ~omics data for bacteria: application to gene-trait matching for Lactobacillus plantarum strains

Jumamurat R Bayjanov12, Douwe Molenaar45, Vesela Tzeneva34, Roland J Siezen1234 and Sacha A F T van Hijum1234*

Author affiliations

1 Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, PO Box 9101, Nijmegen, The Netherlands

2 Netherlands Bioinformatics Centre, 260 NBIC, P.O. Box 9101, Nijmegen 6500 HB, The Netherlands

3 TI Food and Nutrition, P.O. Box 557, Wageningen 6700 AN, The Netherlands

4 Kluyver Centre for Genomics of Industrial Fermentation, NIZO food research, P.O. Box 20, Ede 6710 BA, The Netherlands

5 Systems Bioinformatics IBIVU, Free University of Amsterdam, Amsterdam 1081HV, The Netherlands

For all author emails, please log on.

Citation and License

BMC Genomics 2012, 13:170  doi:10.1186/1471-2164-13-170

Published: 4 May 2012

Abstract

Background

Linking phenotypes to high-throughput molecular biology information generated by ~omics technologies allows revealing cellular mechanisms underlying an organism's phenotype. ~Omics datasets are often very large and noisy with many features (e.g., genes, metabolite abundances). Thus, associating phenotypes to ~omics data requires an approach that is robust to noise and can handle large and diverse data sets.

Results

We developed a web-tool PhenoLink (http://bamics2.cmbi.ru.nl/websoftware/phenolink/ webcite) that links phenotype to ~omics data sets using well-established as well new techniques. PhenoLink imputes missing values and preprocesses input data (i) to decrease inherent noise in the data and (ii) to counterbalance pitfalls of the Random Forest algorithm, on which feature (e.g., gene) selection is based. Preprocessed data is used in feature (e.g., gene) selection to identify relations to phenotypes. We applied PhenoLink to identify gene-phenotype relations based on the presence/absence of 2847 genes in 42 Lactobacillus plantarum strains and phenotypic measurements of these strains in several experimental conditions, including growth on sugars and nitrogen-dioxide production. Genes were ranked based on their importance (predictive value) to correctly predict the phenotype of a given strain. In addition to known gene to phenotype relations we also found novel relations.

Conclusions

PhenoLink is an easily accessible web-tool to facilitate identifying relations from large and often noisy phenotype and ~omics datasets. Visualization of links to phenotypes offered in PhenoLink allows prioritizing links, finding relations between features, finding relations between phenotypes, and identifying outliers in phenotype data. PhenoLink can be used to uncover phenotype links to a multitude of ~omics data, e.g., gene presence/absence (determined by e.g.: CGH or next-generation sequencing), gene expression (determined by e.g.: microarrays or RNA-seq), or metabolite abundance (determined by e.g.: GC-MS).