Open Access Highly Accessed Open Badges Research article

The majority of total nuclear-encoded non-ribosomal RNA in a human cell is 'dark matter' un-annotated RNA

Philipp Kapranov1*, Georges St Laurent28, Tal Raz1, Fatih Ozsolak1, C Patrick Reynolds3, Poul HB Sorensen4, Gregory Reaman5, Patrice Milos1, Robert J Arceci6, John F Thompson1* and Timothy J Triche7*

Author affiliations

1 Helicos BioSciences Corporation, One Kendall Square, Building 700, Cambridge, MA 02139, USA

2 Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, SFH Life Sciences Building, 185 Meeting St, Providence, RI 02912, USA

3 Cancer Center, Departments of Cell Biology & Biochemistry, Pediatrics, and Internal Medicine, School of Medicine, Texas Tech University Health Sciences Center, 3601 4th Street STOP 9445, Lubbock, TX 79430-6450, USA

4 British Columbia Cancer Research Centre, 675 West 10th Avenue, Room 4112, Vancouver, BC, Canada V5Z 1L3

5 Department of Pediatrics, The George Washington University School of Medicine and Health Sciences, Division of Oncology, Children's National Medical Center, 11 Michigan Ave, NW, Washington, DC, 20422, USA

6 Kimmel Comprehensive Cancer Center at John Hopkins, Department of Oncology/Pediatric Oncology, The Buntings Blaustein Cancer Research Building, 1650 Orleans Street, Suite 207, Baltimore, MD, 21287, USA

7 Department of Pathology, University of Southern California, 1975 Zonal Avenue, Los Angeles, CA 90089-9034, USA

8 Grupo de Inmunovirologia, SIU, Universidad de Antioquia, Calle 67 Número 53 - 108, Medellin, Antioquia, Colombia

For all author emails, please log on.

Citation and License

BMC Biology 2010, 8:149  doi:10.1186/1741-7007-8-149

Published: 21 December 2010



Discovery that the transcriptional output of the human genome is far more complex than predicted by the current set of protein-coding annotations and that most RNAs produced do not appear to encode proteins has transformed our understanding of genome complexity and suggests new paradigms of genome regulation. However, the fraction of all cellular RNA whose function we do not understand and the fraction of the genome that is utilized to produce that RNA remain controversial. This is not simply a bookkeeping issue because the degree to which this un-annotated transcription is present has important implications with respect to its biologic function and to the general architecture of genome regulation. For example, efforts to elucidate how non-coding RNAs (ncRNAs) regulate genome function will be compromised if that class of RNAs is dismissed as simply 'transcriptional noise'.


We show that the relative mass of RNA whose function and/or structure we do not understand (the so called 'dark matter' RNAs), as a proportion of all non-ribosomal, non-mitochondrial human RNA (mt-RNA), can be greater than that of protein-encoding transcripts. This observation is obscured in studies that focus only on polyA-selected RNA, a method that enriches for protein coding RNAs and at the same time discards the vast majority of RNA prior to analysis. We further show the presence of a large number of very long, abundantly-transcribed regions (100's of kb) in intergenic space and further show that expression of these regions is associated with neoplastic transformation. These overlap some regions found previously in normal human embryonic tissues and raises an interesting hypothesis as to the function of these ncRNAs in both early development and neoplastic transformation.


We conclude that 'dark matter' RNA can constitute the majority of non-ribosomal, non-mitochondrial-RNA and a significant fraction arises from numerous very long, intergenic transcribed regions that could be involved in neoplastic transformation.