This article is part of the supplement: The International Conference on Intelligent Biology and Medicine (ICIBM) Genomics
BM-Map: an efficient software package for accurately allocating multireads of RNA-sequencing data
1 Graduate Program in Structural & Computational Biology & Molecular Biophysics, Baylor College of Medicine, Houston, TX 77030, USA
2 Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
3 Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
4 Department of Statistics, Rice University, Houston, TX 77005, USA
5 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
6 Department of Statistics, University of Wisconsin - Madison, Madison, WI 53792, USA
7 Current address, NorthShore University HealthSystem, Evanston, IL 60201, USA
BMC Genomics 2012, 13(Suppl 8):S9 doi:10.1186/1471-2164-13-S8-S9Published: 17 December 2012
RNA sequencing (RNA-seq) has become a major tool for biomedical research. A key step in analyzing RNA-seq data is to infer the origin of short reads in the source genome, and for this purpose, many read alignment/mapping software programs have been developed. Usually, the majority of mappable reads can be mapped to one unambiguous genomic location, and these reads are called unique reads. However, a considerable proportion of mappable reads can be aligned to more than one genomic location with the same or similar fidelities, and they are called "multireads". Allocating these multireads is challenging but critical for interpreting RNA-seq data. We recently developed a Bayesian stochastic model that allocates multireads more accurately than alternative methods (Ji et al. Biometrics 2011).
In order to serve a greater biological community, we have implemented this method in a stand-alone, efficient, and user-friendly software package, BM-Map. BM-Map takes SAM (Sequence Alignment/Map), the most popular read alignment format, as the standard input; then based on the Bayesian model, it calculates mapping probabilities of multireads for competing genomic loci; and BM-Map generates the output by adding mapping probabilities to the original SAM file so that users can easily perform downstream analyses. The program is available in three common operating systems, Linux, Mac and PC. Moreover, we have built a dedicated website, http://bioinformatics.mdanderson.org/main/BM-Map webcite, which includes free downloads, detailed tutorials and illustration examples.
We have developed a stand-alone, efficient, and user-friendly software package for accurately allocating multireads, which is an important addition to our previous methodology paper. We believe that this bioinformatics tool will greatly help RNA-seq and related applications reach their full potential in life science research.