Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Prediction of novel long non-coding RNAs based on RNA-Seq data of mouse Klf1 knockout study

Lei Sun123, Zhihua Zhang2, Timothy L Bailey3, Andrew C Perkins4, Michael R Tallack4, Zhao Xu1 and Hui Liu1*

Author Affiliations

1 School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, 221008, JiangSu, PR China

2 Center for Computational Biology, and Laboratory of Disease Genomics and Personalized Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, No.7 Beitucheng West Road, Chaoyang District, Beijing, 100029, PR China

3 Institute for Molecular Bioscience, The University of Queensland, Brisbane, 4072, Queensland, Australia

4 Mater Medical Research Institute, Mater Hospital, Brisbane, 4101, Queensland, Australia

For all author emails, please log on.

BMC Bioinformatics 2012, 13:331  doi:10.1186/1471-2105-13-331

Published: 13 December 2012

Abstract

Background

Study on long non-coding RNAs (lncRNAs) has been promoted by high-throughput RNA sequencing (RNA-Seq). However, it is still not trivial to identify lncRNAs from the RNA-Seq data and it remains a challenge to uncover their functions.

Results

We present a computational pipeline for detecting novel lncRNAs from the RNA-Seq data. First, the genome-guided transcriptome reconstruction is used to generate initially assembled transcripts. The possible partial transcripts and artefacts are filtered according to the quantified expression level. After that, novel lncRNAs are detected by further filtering known transcripts and those with high protein coding potential, using a newly developed program called lncRScan. We applied our pipeline to a mouse Klf1 knockout dataset, and discussed the plausible functions of the novel lncRNAs we detected by differential expression analysis. We identified 308 novel lncRNA candidates, which have shorter transcript length, fewer exons, shorter putative open reading frame, compared with known protein-coding transcripts. Of the lncRNAs, 52 large intergenic ncRNAs (lincRNAs) show lower expression level than the protein-coding ones and 13 lncRNAs represent significant differential expression between the wild-type and Klf1 knockout conditions.

Conclusions

Our method can predict a set of novel lncRNAs from the RNA-Seq data. Some of the lncRNAs are showed differentially expressed between the wild-type and Klf1 knockout strains, suggested that those novel lncRNAs can be given high priority in further functional studies.