Email updates

Keep up to date with the latest news and content from BMC Biology and BioMed Central.

Journal App

google play app store
Open Access Highly Accessed Open Badges Research article

Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls

Daniela Witten1, Robert Tibshirani12*, Sam Guoping Gu3, Andrew Fire34* and Weng-Onn Lui35*

Author affiliations

1 Department of Statistics, Stanford University, Stanford, California 94305-4065, USA

2 Department of Health Research and Policy, Stanford University, Stanford, California 94305-5405, USA

3 Department of Pathology, Stanford University School of Medicine, Stanford, California 94305-5324, USA

4 Department of Genetics, Stanford University School of Medicine, Stanford, California 94305-5324, USA

5 Department of Molecular Medicine and Surgery, Karolinska University Hospital-Solna, Stockholm 17176, Sweden

For all author emails, please log on.

Citation and License

BMC Biology 2010, 8:58  doi:10.1186/1741-7007-8-58

Published: 11 May 2010



Ultra-high throughput sequencing technologies provide opportunities both for discovery of novel molecular species and for detailed comparisons of gene expression patterns. Small RNA populations are particularly well suited to this analysis, as many different small RNAs can be completely sequenced in a single instrument run.


We prepared small RNA libraries from 29 tumour/normal pairs of human cervical tissue samples. Analysis of the resulting sequences (42 million in total) defined 64 new human microRNA (miRNA) genes. Both arms of the hairpin precursor were observed in twenty-three of the newly identified miRNA candidates. We tested several computational approaches for the analysis of class differences between high throughput sequencing datasets and describe a novel application of a log linear model that has provided the most effective analysis for this data. This method resulted in the identification of 67 miRNAs that were differentially-expressed between the tumour and normal samples at a false discovery rate less than 0.001.


This approach can potentially be applied to any kind of RNA sequencing data for analysing differential sequence representation between biological sample sets.