Prediction of novel long non-coding RNAs based on RNA-Seq data of mouse Klf1 knockout study
1 School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, 221008, JiangSu, PR China
2 Center for Computational Biology, and Laboratory of Disease Genomics and Personalized Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, No.7 Beitucheng West Road, Chaoyang District, Beijing, 100029, PR China
3 Institute for Molecular Bioscience, The University of Queensland, Brisbane, 4072, Queensland, Australia
4 Mater Medical Research Institute, Mater Hospital, Brisbane, 4101, Queensland, Australia
Citation and License
BMC Bioinformatics 2012, 13:331 doi:10.1186/1471-2105-13-331Published: 13 December 2012
Study on long non-coding RNAs (lncRNAs) has been promoted by high-throughput RNA sequencing (RNA-Seq). However, it is still not trivial to identify lncRNAs from the RNA-Seq data and it remains a challenge to uncover their functions.
We present a computational pipeline for detecting novel lncRNAs from the RNA-Seq data. First, the genome-guided transcriptome reconstruction is used to generate initially assembled transcripts. The possible partial transcripts and artefacts are filtered according to the quantified expression level. After that, novel lncRNAs are detected by further filtering known transcripts and those with high protein coding potential, using a newly developed program called lncRScan. We applied our pipeline to a mouse Klf1 knockout dataset, and discussed the plausible functions of the novel lncRNAs we detected by differential expression analysis. We identified 308 novel lncRNA candidates, which have shorter transcript length, fewer exons, shorter putative open reading frame, compared with known protein-coding transcripts. Of the lncRNAs, 52 large intergenic ncRNAs (lincRNAs) show lower expression level than the protein-coding ones and 13 lncRNAs represent significant differential expression between the wild-type and Klf1 knockout conditions.
Our method can predict a set of novel lncRNAs from the RNA-Seq data. Some of the lncRNAs are showed differentially expressed between the wild-type and Klf1 knockout strains, suggested that those novel lncRNAs can be given high priority in further functional studies.