Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Comparative analysis of microbiome measurement platforms using latent variable structural equation modeling

Xiao Wu1*, Kathryn Berkow1, Daniel N Frank2, Ellen Li34, Ajay S Gulati5 and Wei Zhu1

Author Affiliations

1 Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, USA

2 Division of Infectious Diseases, University of Colorado Anschutz Medical Campus, Aurora, CO, USA

3 Department of Medicine, Stony Brook University, Stony Brook, NY, USA

4 Department of Medicine, Washington University, St. Louis, MO, USA

5 Department of Pediatrics, University of North Carolina, Chapel Hill, NC, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14:79  doi:10.1186/1471-2105-14-79

Published: 5 March 2013

Abstract

Background

Culture-independent phylogenetic analysis of 16S ribosomal RNA (rRNA) gene sequences has emerged as an incisive method of profiling bacteria present in a specimen. Currently, multiple techniques are available to enumerate the abundance of bacterial taxa in specimens, including the Sanger sequencing, the ‘next generation’ pyrosequencing, microarrays, quantitative PCR, and the rapidly emerging, third generation sequencing, and fourth generation sequencing methods. An efficient statistical tool is in urgent need for the followings tasks: (1) to compare the agreement between these measurement platforms, (2) to select the most reliable platform(s), and (3) to combine different platforms of complementary strengths, for a unified analysis.

Results

We present the latent variable structural equation modeling (SEM) as a novel statistical application for the comparative analysis of measurement platforms. The latent variable SEM model treats the true (unknown) relative frequency of a given bacterial taxon in a specimen as the latent (unobserved) variable and estimates the reliabilities of, and similarities between, different measurement platforms, and subsequently weighs those measurements optimally for a unified analysis of the microbiome composition. The latent variable SEM contains the repeated measures ANOVA (both the univariate and the multivariate models) as special cases and, as a more general and realistic modeling approach, yields superior goodness-of-fit and more reliable analysis results, as demonstrated by a microbiome study of the human inflammatory bowel diseases.

Conclusions

Given the rapid evolution of modern biotechnologies, the measurement platform comparison, selection and combination tasks are here to stay and to grow – and the latent variable SEM method is readily applicable to any other biological settings, aside from the microbiome study presented here.

Keywords:
Bioinformatics; Latent variable structural equation modeling; Measurement model; Reliability; Repeated measures ANOVA