Computational prediction of associations between long non-coding RNAs and proteins
1 Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
2 MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China
3 Institute of Systems Biomedicine, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
4 National Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing, China
5 Current address: Department of Biostatistics, Yale University, New Haven, CT 06511, USA
BMC Genomics 2013, 14:651 doi:10.1186/1471-2164-14-651Published: 24 September 2013
Though most of the transcripts are long non-coding RNAs (lncRNAs), little is known about their functions. lncRNAs usually function through interactions with proteins, which implies the importance of identifying the binding proteins of lncRNAs in understanding the molecular mechanisms underlying the functions of lncRNAs. Only a few approaches are available for predicting interactions between lncRNAs and proteins. In this study, we introduce a new method lncPro.
By encoding RNA and protein sequences into numeric vectors, we used matrix multiplication to score each RNA–protein pair. This score can be used to measure the interactions between an RNA–protein pair. This method effectively discriminates interacting and non-interacting RNA–protein pairs and predicts RNA–protein interactions within a given complex. Applying this method on all human proteins, we found that the long non-coding RNAs we collected tend to interact with nuclear proteins and RNA-binding proteins.
Compared with the existing approaches, our method shortens the time for training matrix and obtains optimal results based on the model being used. The ability of predicting the associations between lncRNAs and proteins has also been enhanced. Our method provides an idea on how to integrate different information into the prediction process.