Table 1

Contig Alignment Statistics.

Dataset

K

Total Contigs

Accurate Contigs

% Accurate Contigs

N50

Total Bases

Bases in Accurate Contigs

% Bases in Accurate Contigs


C

81451

74790

91.82

466

24892639

24163037

97.07

C-21

67448

66144

98.07

447

21509228

21300001

99.03

21

125340

119834

95.61

191

19014129

18475501

97.17

23

74630

73850

98.95

325

18475496

18350279

99.32

simLC-36m

25

69028

68731

99.57

279

16751244

16697244

99.68

27

68245

68010

99.66

206

14123037

14087137

99.75

29

52885

52765

99.77

302

10562147

10545749

99.84

31

26382

26339

99.84

2363

7276731

7272359

99.94

33

27332

27306

99.9

340

6214325

6211238

99.95


C

119667

112827

94.28

493

34920211

34050685

97.51

C-21

100852

98658

97.82

566

31992027

31595445

98.76

21

183383

178586

97.38

161

24663986

24173013

98.01

23

106981

106133

99.21

324

24661676

24510881

99.39

simMC-36m

25

88466

88122

99.61

419

23280192

23216974

99.73

27

78074

77825

99.68

569

21119299

21075047

99.79

29

75800

75571

99.7

400

18839097

18777496

99.67

31

114336

113948

99.66

168

17128489

17050171

99.54

33

156709

156272

99.72

78

12688930

12646726

99.67


C

73480

55508

75.54

138

10007373

7649152

76.44

C-21

39196

35369

90.24

131

5366108

4823423

89.89

21

51371

36693

71.43

142

6923506

5037993

72.77

23

28614

25707

89.84

137

4132863

3703686

89.62

simHC-36m

25

17418

16557

95.06

122

2289332

2179104

95.19

27

9822

9524

96.97

109

1184664

1149541

97.04

29

5309

5211

98.15

102

603680

593152

98.26

31

3047

3005

98.62

93

315736

311501

98.66

33

1895

1885

99.47

77

162625

161704

99.43


C

25742

25359

98.51

1223

9985001

9913743

99.29

21

24883

24709

99.3

544

6660066

6627437

99.51

23

20550

20459

99.56

847

6560491

6545844

99.78

EcoliStrains-10m

25

19570

19506

99.67

933

6370414

6356780

99.79

27

17474

17422

99.7

1195

5995915

5986494

99.84

29

17338

17278

99.65

925

5560578

5550393

99.82

31

25468

25436

99.87

317

5237879

5233758

99.92


simLC-36m, simMC-36m, simHC-36m are the results for the low, medium and high complexity datasets with 36 million reads, respectively. EcoliStrains-10m are the results for the co-assembly strain dataset with 10 million reads. C shows the clustering results after pooling contigs obtained from running ABYSS with k ranging from 21 to 33. C-21 shows the clustering results after excluding the contigs obtained by running ABYSS for k = 21.

Charuvaka and Rangwala BMC Genomics 2011 12(Suppl 2):S8   doi:10.1186/1471-2164-12-S2-S8

Open Data