Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes
1 CLC bio Japan Inc., a QIAGEN Company, 204 Daikanyama Park Side Village, 9-8 Sarugakucho, Shibuya-ku, Tokyo 150-0033, Japan
2 Department of Infection Metagenomics, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan
3 Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa 277-8561, Japan
4 Laboratory of DNA Data Analysis, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
5 Institute for Advanced Biosciences, Keio University, 5322 Endo, Fujisawa, Kanagawa 252-0882, Japan
6 Department of Bacteriology, Okayama University Graduate School of Medicine, 2-5-1 Kita-ku Shikata-cho, Okayama 700-8558, Japan
BMC Genomics 2014, 15:699 doi:10.1186/1471-2164-15-699Published: 21 August 2014
The availability of diverse second- and third-generation sequencing technologies enables the rapid determination of the sequences of bacterial genomes. However, identifying the sequencing technology most suitable for producing a finished genome with multiple chromosomes remains a challenge. We evaluated the abilities of the following three second-generation sequencers: Roche 454 GS Junior (GS Jr), Life Technologies Ion PGM (Ion PGM), and Illumina MiSeq (MiSeq) and a third-generation sequencer, the Pacific Biosciences RS sequencer (PacBio), by sequencing and assembling the genome of Vibrio parahaemolyticus, which consists of a 5-Mb genome comprising two circular chromosomes.
We sequenced the genome of V. parahaemolyticus with GS Jr, Ion PGM, MiSeq, and PacBio and performed de novo assembly with several genome assemblers. Although GS Jr generated the longest mean read length of 418 bp among the second-generation sequencers, the maximum contig length of the best assembly from GS Jr was 165 kbp, and the number of contigs was 309. Single runs of Ion PGM and MiSeq produced data of considerably greater sequencing coverage, 279× and 1,927×, respectively. The optimized result for Ion PGM contained 61 contigs assembled from reads of 77× coverage, and the longest contig was 895 kbp in size. Those for MiSeq were 34 contigs, 58× coverage, and 733 kbp, respectively. These results suggest that higher coverage depth is unnecessary for a better assembly result. We observed that multiple rRNA coding regions were fragmented in the assemblies from the second-generation sequencers, whereas PacBio generated two exceptionally long contigs of 3,288,561 and 1,875,537 bps, each of which was from a single chromosome, with 73× coverage and mean read length 3,119 bp, allowing us to determine the absolute positions of all rRNA operons.
PacBio outperformed the other sequencers in terms of the length of contigs and reconstructed the greatest portion of the genome, achieving a genome assembly of “finished grade” because of its long reads. It showed the potential to assemble more complex genomes with multiple chromosomes containing more repetitive sequences.