Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: The International Conference on Intelligent Biology and Medicine (ICIBM) – Genomics

Open Access Research

Automatically clustering large-scale miRNA sequences: methods and experiments

Linxia Wan1, Jiandong Ding1, Ting Jin1, Jihong Guan2* and Shuigeng Zhou13*

Author Affiliations

1 School of Computer Science, Fudan University, Shanghai 200433, China

2 Department of Computer Science and Technology, Tongji Uinversity, Shanghai 201804, China

3 Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China

For all author emails, please log on.

BMC Genomics 2012, 13(Suppl 8):S15  doi:10.1186/1471-2164-13-S8-S15

Published: 17 December 2012

Abstract

Background

Since the initial annotation of microRNAs (miRNAs) in 2001, many studies have sought to identify additional miRNAs experimentally or computationally in various species. MiRNAs act with the Argonaut family of proteins to regulate target messenger RNAs (mRNAs) post-transcriptionally. Currently, researches mainly focus on single miRNA function study. Considering that members in the same miRNA family might participate in the same pathway or regulate the same target(s) and thus share similar biological functions, people can explore useful knowledge from high quality miRNA family architecture.

Results

In this article, we developed an unsupervised clustering-based method miRCluster to automatically group miRNAs. In order to evaluate this method, several data sets were constructed from the online database miRBase. Results showed that miRCluster can efficiently arrange miRNAs (e.g identify 354 families in miRBase16 with an accuracy of 92.08%, and can recognize 9 of all 10 newly-added families in miRBase 17). By far, ~30% mature miRNAs registered in miRBase are unclassified. With miRCluster, over 85% unclassified miRNAs can be assigned to certain families, while ~44% of these miRNAs distributed in ~300novel families.

Conclusions

In short, miRCluster is an automatic and efficient miRNA family identification method, which does not require any prior knowledge. It can be helpful in real use, especially when exploring functions of novel miRNAs. All relevant materials could be freely accessed online (http://admis.fudan.edu.cn/projects/miRCluster webcite).