Table 3

Prune results for different sized datasets and underlying alignment methods.

50 leaves

100 leaves

500 leaves

1000 leaves





Time

Agreement

Time

Agreement

Time

Agreement

Time

Agreement


Pecan1

21.9

0.914

297.

0.879

_a

_a

_a

_a

Prune w/Pecan

60%

7.26

0.880

39.2

0.862

_a

_a

_a

_a

30%

3.13

0.909

19.6

0.839

_a

_a

_a

_a

15%

7.26

0.912

13.3

0.878

125.

0.844

_a

_a

7%

4.24

0.909

13.5

0.849

29.1

0.907

122.

0.877


FSA2

63.1

0.933

266.

0.856

_a

_a

_a

_a

Prune w/FSA

60%

33.8

0.912

78.9

0.838

589.

0.871

_a

_a

30%

10.5

0.893

23.8

0.838

142.

0.879

_a

_a

15%

4.25

0.885

17.1

0.857

40.8

0.877

150.

0.861

7%

3.00

0.866

4.23

0.842

12.7

0.903

34.8

0.887


MUSCLE3

55.6

0.905

138.

0.799

_b

_b

_b

_b

Prune w/MUSCLE

60%

40.7

0.899

77.9

0.777

886.

0.862

_b

_b

30%

24.7

0.896

42.8

0.777

368.

0.883

_b

_b

15%

15.1

0.905

29.1

0.828

185.

0.899

440.

0.900

7%

24.7

0.905

18.8

0.841

114.

0.924

228

0.928


MAFFT4

3.17

0.897

5.39

0.806

20.1

0.886

25.2

0.912


SATé5

101.

0.915

301.

0.840

_b

_b

_b

_b


1 Pecan was run with default parameters.

2 FSA was run with the --exonerate, --anchored, --softmasked, and --fast flags.

3 MUSCLE was run with default parameters.

4 MAFFT was run with the --treein option.

5 SATé was run with the -t option but limited to two iterations. We found that more iterations did almost nothing for accuracy.

a The majority of these problems were unable to be aligned due to running out of memory.

b The majority of these problems took longer than 3 days and were aborted.

The run-time and average agreement score of Prune alignments of different sized datasets. Several sets of simulated alignment problems were generated using a root sequence of 10 kilobases. The neutral evolution of each root sequence was simulated over 50, 100, 500, and 1000 species trees. Fifty problems were generated per tree size for a total of two hundred test alignment problems. The agreement and run-time (in minutes) for each problem size is the average over the fifty simulated alignments. Each underlying alignment method was tested on the dataset (Pecan, FSA, MUSCLE). Prune was then used to break the problems down into sub-trees that contained at most 60%, 30%, 15%, and 7% of the nodes in the entire tree. The largest number of stages was six but most of the problems had no more than 3 stages. Pecan, FSA, and MUSCLE were used as the underlying alignment method to Prune. We also performed alignment using MAFFT and SATé to compare against. To ensure a fair comparison, the true tree topology was passed to SATé (using -t option) and to MAFFT (using the poorly documented --treein option). We were unable to apply some alignment algorithms to large problems because of very long run-times and memory issues. Using Prune, we were able to use Pecan, FSA, and MUSCLE to solve alignment problems that were much deeper than could be solved without Prune. Prune achieved a very large speedup with little loss of accuracy and sometimes with an increase in accuracy.

Roskin et al. BMC Bioinformatics 2011 12:144   doi:10.1186/1471-2105-12-144

Open Data