Analysis of quality raw data of second generation sequencers with Quality Assessment Software
1 Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém-PA, Brazil
2 Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte-MG, Brazil
3 Institute for Genome Research and Systems Biology. Center for Biotechnology. Germany Institute for Genome Research, Bielefeld University, Bielefeld, Germany
BMC Research Notes 2011, 4:130 doi:10.1186/1756-0500-4-130Published: 18 April 2011
Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated.
We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads.
Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.