Table 1

Summary statistics for individual and merged assemblies
Kmer Assembly No. transcripts >100 bp N50 Mean length Max length Total no. bases
21 Initial 33024 844 525 5689 17,354,832
Representative 29082 786 501 5659 14,561,997
25 Initial 28723 746 491 5689 14,105,603
Representative 26715 706 474 5659 12,660,658
29 Initial 26236 615 431 5689 11,307,053
Representative 25016 590 419 5659 10,488,297
33 Initial 23648 488 363 5584 8,591,562
Representative 22972 469 355 5584 8,148,996
37 Initial 19180 369 311 5111 5,898,486
Representative 18821 357 301 5111 5,664,511
41 Initial 12230 281 263 5750 3,218,609
Representative 12090 273 258 5750 3,122,927
Merged 35680 747 479 5750 17,086,468
Final 32911 675 451 5659 14,828,283
Annotated 15965 927 586 5659 9,357,209

For each kmer, data from both the initial Velvet/Oases assembly (Initial), and the assembly containing only one representative transcript from each locus (Representative) are shown. The “Merged” assembly is the result of merging representative assemblies from different kmers using CD-HIT-EST, the “Final” assembly is after potentially misassembled transcripts were removed, and the “Annotated” set only contains transcripts with a significant BLAST match. Kmer = required length of overlap match between two reads in Velvet; N50 = length-weighted median contig length.

Miller et al.

Miller et al. BMC Genomics 2012 13:439   doi:10.1186/1471-2164-13-439

Open Data