Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Software

iMir: An integrated pipeline for high-throughput analysis of small non-coding RNA data obtained by smallRNA-Seq

Giorgio Giurato1, Maria Rosaria De Filippo2, Antonio Rinaldi1, Adnan Hashim1, Giovanni Nassa1, Maria Ravo1, Francesca Rizzo1, Roberta Tarallo1 and Alessandro Weisz13*

  • * Corresponding author: Alessandro Weisz aweisz@unisa.it

  • † Equal contributors

Author Affiliations

1 Laboratory of Molecular Medicine and Genomics, Department of Medicine and Surgery, University of Salerno, via Allende, 1, Salerno, Baronissi, Italy

2 Fondazione IRCCS SDN, Napoli, Italy

3 Division of Molecular Pathology and Medical Genomics, “SS. Giovanni di Dio e Ruggi d’Aragona – Schola Medica Salernitana” University of Salerno Hospital, Salerno, Italy

For all author emails, please log on.

BMC Bioinformatics 2013, 14:362  doi:10.1186/1471-2105-14-362

Published: 13 December 2013

Abstract

Background

Qualitative and quantitative analysis of small non-coding RNAs by next generation sequencing (smallRNA-Seq) represents a novel technology increasingly used to investigate with high sensitivity and specificity RNA population comprising microRNAs and other regulatory small transcripts. Analysis of smallRNA-Seq data to gather biologically relevant information, i.e. detection and differential expression analysis of known and novel non-coding RNAs, target prediction, etc., requires implementation of multiple statistical and bioinformatics tools from different sources, each focusing on a specific step of the analysis pipeline. As a consequence, the analytical workflow is slowed down by the need for continuous interventions by the operator, a critical factor when large numbers of datasets need to be analyzed at once.

Results

We designed a novel modular pipeline (iMir) for comprehensive analysis of smallRNA-Seq data, comprising specific tools for adapter trimming, quality filtering, differential expression analysis, biological target prediction and other useful options by integrating multiple open source modules and resources in an automated workflow. As statistics is crucial in deep-sequencing data analysis, we devised and integrated in iMir tools based on different statistical approaches to allow the operator to analyze data rigorously. The pipeline created here proved to be efficient and time-saving than currently available methods and, in addition, flexible enough to allow the user to select the preferred combination of analytical steps. We present here the results obtained by applying this pipeline to analyze simultaneously 6 smallRNA-Seq datasets from either exponentially growing or growth-arrested human breast cancer MCF-7 cells, that led to the rapid and accurate identification, quantitation and differential expression analysis of ~450 miRNAs, including several novel miRNAs and isomiRs, as well as identification of the putative mRNA targets of differentially expressed miRNAs. In addition, iMir allowed also the identification of ~70 piRNAs (piwi-interacting RNAs), some of which differentially expressed in proliferating vs growth arrested cells.

Conclusion

The integrated data analysis pipeline described here is based on a reliable, flexible and fully automated workflow, useful to rapidly and efficiently analyze high-throughput smallRNA-Seq data, such as those produced by the most recent high-performance next generation sequencers. iMir is available at http://www.labmedmolge.unisa.it/inglese/research/imir webcite.

Keywords:
Next generation sequencing; SmallRNA-Seq; Data analysis pipeline; Breast cancer; Small non-coding RNA; microRNA; Piwi-interacting RNA