Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Multivariate Analysis and Visualization of Splicing Correlations in Single-Gene Transcriptomes

Mark C Emerick1*, Giovanni Parmigiani2 and William S Agnew3

Author Affiliations

1 Department of Physiology, Johns Hopkins Medical School, Baltimore, MD 21205 USA

2 Departments of Oncology, Zoology, Johns Hopkins Medical School, and Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205 USA

3 Departments of Physiology and Neuroscience, Johns Hopkins Medical School, Baltimore, MD 21205 USA

For all author emails, please log on.

BMC Bioinformatics 2007, 8:16  doi:10.1186/1471-2105-8-16

Published: 18 January 2007

Abstract

Background

RNA metabolism, through 'combinatorial splicing', can generate enormous structural diversity in the proteome. Alternative domains may interact, however, with unpredictable phenotypic consequences, necessitating integrated RNA-level regulation of molecular composition. Splicing correlations within transcripts of single genes provide valuable clues to functional relationships among molecular domains as well as genomic targets for higher-order splicing regulation.

Results

We present tools to visualize complex splicing patterns in full-length cDNA libraries. Developmental changes in pair-wise correlations are presented vectorially in 'clock plots' and linkage grids. Higher-order correlations are assessed statistically through Monte Carlo analysis of a log-linear model with an empirical-Bayes estimate of the true probabilities of observed and unobserved splice forms. Log-linear coefficients are visualized in a 'spliceprint,' a signature of splice correlations in the transcriptome. We present two novel metrics: the linkage change index, which measures the directional change in pair-wise correlation with tissue differentiation, and the accuracy index, a very simple goodness-of-fit metric that is more sensitive than the integrated squared error when applied to sparsely populated tables, and unlike chi-square, does not diverge at low variance. Considerable attention is given to sparse contingency tables, which are inherent to single-gene libraries.

Conclusion

Patterns of splicing correlations are revealed, which span a broad range of interaction order and change in development. The methods have a broad scope of applicability, beyond the single gene – including, for example, multiple gene interactions in the complete transcriptome.