Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

dsPIG: a tool to predict imprinted genes from the deep sequencing of whole transcriptomes

Hua Li12, Xiao Su3, Juan Gallegos4, Yue Lu5, Yuan Ji6, Jeffrey J Molldrem2 and Shoudan Liang7*

Author Affiliations

1 Shanghai Center for Systems Biomedicine, Key Laboratory of Systems Biomedicine (Ministry of Education), Shanghai Jiao Tong University, Shanghai, 200240, China

2 Department of Stem Cell Transplantation and Cellular Therapy, The University of Texas M D Anderson Cancer Center, Houston, TX, 77030, USA

3 Division of Biostatistics, The University of Texas School of Public Health at Houston, Houston, TX, 77030, USA

4 Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA

5 Department of Leukemia, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA

6 Center for Clinical and Research Informatics, NorthShore University HealthSystem, Chicago, Il, 60201, USA

7 Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA

For all author emails, please log on.

BMC Bioinformatics 2012, 13:271  doi:10.1186/1471-2105-13-271

Published: 19 October 2012

Abstract

Background

Dysregulation of imprinted genes, which are expressed in a parent-of-origin-specific manner, plays an important role in various human diseases, such as cancer and behavioral disorder. To date, however, fewer than 100 imprinted genes have been identified in the human genome. The recent availability of high-throughput technology makes it possible to have large-scale prediction of imprinted genes. Here we propose a Bayesian model (dsPIG) to predict imprinted genes on the basis of allelic expression observed in mRNA-Seq data of independent human tissues.

Results

Our model (dsPIG) was capable of identifying imprinted genes with high sensitivity and specificity and a low false discovery rate when the number of sequenced tissue samples was fairly large, according to simulations. By applying dsPIG to the mRNA-Seq data, we predicted 94 imprinted genes in 20 cerebellum samples and 57 imprinted genes in 9 diverse tissue samples with expected low false discovery rates. We also assessed dsPIG using previously validated imprinted and non-imprinted genes. With simulations, we further analyzed how imbalanced allelic expression of non-imprinted genes or different minor allele frequencies affected the predictions of dsPIG. Interestingly, we found that, among biallelically expressed genes, at least 18 genes expressed significantly more transcripts from one allele than the other among different individuals and tissues.

Conclusion

With the prevalence of the mRNA-Seq technology, dsPIG has become a useful tool for analysis of allelic expression and large-scale prediction of imprinted genes. For ease of use, we have set up a web service and also provided an R package for dsPIG at http://www.shoudanliang.com/dsPIG/ webcite.

Keywords:
Prediction of imprinted genes; Transcriptome deep sequencing; mRNA-Seq; Bayesian model; Analysis of allelic expression