Table 2

Crumble results for different sized simulated datasets and underlying alignment methods.

60 kb

150 kb

500 kb

1000 kb





Time

Agreement

Time

Agreement

Time

Agreement

Time

Agreement


Pecan1

3.43

0.896

10.6

0.905

46.9

0.906

100

0.906

Crumble w/Pecan

60%

3.29

0.894

7.18

0.904

21.5

0.905

51.9

0.906

30%

2.56

0.889

4.66

0.903

11.9

0.905

23.5

0.905

15%

2.39

0.859

3.77

0.893

8.29

0.903

13.9

0.905


FSA2

37.4

0.886

_a

_a

_a

_a

_a

_a

Crumble w/FSA

60%

25.8

0.881

69.8

0.903

_a

_a

_a

_a

30%

21.0

0.873

3act9.2

0.898

_a

_a

_a

_a

15%

17.7

0.849

25.5

0.893

104.

0.811

_a

_a


MUSCLE3

_a

_a

_a

_a

_a

_a

_a

_a

Crumble w/MUSCLE

60%

_a

_a

_a

_a

_a

_a

_a

_a

30%

128

0.707

_a

_a

_a

_a

_a

_a

15%

63.1

0.679

251.

0.705

_a

_a

_a

_a


1 Pecan was run with default parameters.

2 FSA was run with the --exonerate, --anchored, and --softmasked flags.

3 MUSCLE was run with default parameters.

a The majority of these problems were unable to be aligned due to running out of memory.

The run-time and average agreement score of Crumble alignments of different sized datasets. Several sets of simulated alignment problems were generated using a root sequence of 60, 150, 500, and 1000 kilobases. The neutral evolution of each root sequence was simulated over a nine species tree. Fifty problems were generated per root size for a total of two hundred test alignment problems. The agreement and run-time (in minutes) for each problem size is the average over the fifty simulated alignments. Crumble was used to break the problems down to sub-problems that were 60%, 30%, and 15% of the length of the original problem. The approximate core size was set to 60%, 30%, and 15% of the length of the original problem and the block was allowed to be at most 4 kb larger as measured in any of the sequences. Pecan, FSA, and MUSCLE were used as the underlying alignment method. PrePecan was used to generate the constraints. We were unable to apply FSA directly (not using Crumble) to 150 kb or larger problems because FSA required more than the 4GBs of memory we had available per cluster node. Using Crumble we were able to run FSA on problems as large as half a megabase. MUSCLE had more memory issues but we were able to use it on problems as large as 150 kb using Crumble. For Pecan, Crumble achieved more than a seven fold speedup with almost no loss of accuracy on the largest problem size.

Roskin et al. BMC Bioinformatics 2011 12:144   doi:10.1186/1471-2105-12-144

Open Data