Open Access Research article

Exploration of multivariate analysis in microbial coding sequence modeling

Tahir Mehmood1*, Jon Bohlin2, Anja Bråthen Kristoffersen34, Solve Sæbø1, Jonas Warringer56 and Lars Snipen1

Author Affiliations

1 Biostatistics, Department of Chemistry, Biotechnology and Food Sciences, Norwegian University of Life Sciences, Aas, Norway

2 EpiCenter, Department of Food Safety and Infection Biology, , Oslo, Norway

3 Section for Epidemiology, Norwegian Veterinary Institute, Oslo, Norway

4 Department of Informatics, University of Oslo, Oslo, Norway

5 Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden

6 Center of Integrative Genetics (CIGENE) and Department of animal and aquaculture, Norwegian University of Life Sciences, Aas, Norway

For all author emails, please log on.

BMC Bioinformatics 2012, 13:97  doi:10.1186/1471-2105-13-97

Published: 14 May 2012

Additional files

Additional file 1:

Figure S1. The number of positives against different thresholds. The number of Positive genes obtained for different thresholds t for all species. A threshold of t = 0.3 means members in a gene cluster differ by no more than roughly 30%, and the ’center’ gene (medoide) in each cluster is used as a Positive. If a species has sequences more than 400, then a sample of size 400 sequences are taken as positives. A small threshold (close to 0) gives fewer, but tighter, clusters.

Format: EPS Size: 14KB Download file

Open Data