Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Software

MethyQA: a pipeline for bisulfite-treated methylation sequencing quality assessment

Shuying Sun12*, Aaron Noviski3 and Xiaoqing Yu1

Author Affiliations

1 Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland 44106, Ohio, USA

2 Department of Mathematics, Texas State University, San Marcos 78666, Texas, USA

3 Department of Electrical Engineering and Computer Sciences, Case Western Reserve University, Cleveland 44106, Ohio, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14:259  doi:10.1186/1471-2105-14-259

Published: 23 August 2013

Abstract

Background

DNA methylation is an epigenetic event that adds a methyl-group to the 5’ cytosine. This epigenetic modification can significantly affect gene expression in both normal and diseased cells. Hence, it is important to study methylation signals at the single cytosine site level, which is now possible utilizing bisulfite conversion technique (i.e., converting unmethylated Cs to Us and then to Ts after PCR amplification) and next generation sequencing (NGS) technologies. Despite the advances of NGS technologies, certain quality issues remain. Some of the more prevalent quality issues involve low per-base sequencing quality at the 3’ end, PCR amplification bias, and bisulfite conversion rates. Therefore, it is important to conduct quality assessment before downstream analysis. To the best of our knowledge, no existing software packages can generally assess the quality of methylation sequencing data generated based on different bisulfite-treated protocols.

Results

To conduct the quality assessment of bisulfite methylation sequencing data, we have developed a pipeline named MethyQA. MethyQA combines currently available open-source software packages with our own custom programs written in Perl and R. The pipeline can provide quality assessment results for tens of millions of reads in under an hour. The novelty of our pipeline lies in its examination of bisulfite conversion rates and of the DNA sequence structure of regions that have different conversion rates or coverage.

Conclusions

MethyQA is a new software package that provides users with a unique insight into the methylation sequencing data they are researching. It allows the users to determine the quality of their data and better prepares them to address the research questions that lie ahead. Due to the speed and efficiency at which MethyQA operates, it will become an important tool for studies dealing with bisulfite methylation sequencing data.

Keywords:
DNA methylation; Next generation sequencing; Alignment; BRAT; Quality assessment