Table 1

Results on synthetic yeast-like data. Performance of various multiple sequence alignment programs on synthetic data generated with dinucleotide correlations that mimic actual yeast genomic data. q is the "proximity" of the species to their common ancestor, ie the probability that a given base is conserved from its common ancestor. This means that q2 is the conservation rate of bases in any pair of descendants. N+ is the number of bases correctly aligned. N- is the number of bases incorrectly aligned. Each data set consisted of 10 sets each containing Ns sequences, each 1000 bases long, so the number of bases is 10000NS. Sen is the sensitivity, ie the ratio of number of bases correctly aligned to total number of bases, N+/(10000Ns). Er is the error rate, N-/(N+ + N-). sigma+ indicates sigma with a background model incorporating dinucleotide correlations. sigma— indicates sigma with an uncorrelated background model.

q = 0.35

q = 0.45

q = 0.55

q = 0.65


No embedded WM's


Ns

Prog

N+

N-

Sen

Er

N+

N-

Sen

Er

N+

N-

Sen

Er

N+

N-

Sen

Er


3

sigma+

0

0

0.00

N/A

166

0

0.01

0.00

3210

0

0.11

0.00

28755

0

0.96

0.00

sigma-

0

0

0.00

N/A

761

0

0.03

0.00

10737

0

0.36

0.00

29606

0

0.99

0.00

dialign

0

0

0.00

N/A

266

0

0.03

0.00

854

0

0.04

0.00

9115

0

0.30

0.00

alignm

320

136

0.01

0.30

6009

846

0.20

0.12

22710

1083

0.76

0.05

28258

434

0.94

0.02

clustalw

15244

14756

0.51

0.49

25659

4341

0.86

0.14

28959

1041

0.97

0.03

29779

221

0.99

0.01

mlagan

13691

16309

0.46

0.54

24766

5234

0.83

0.17

29596

404

0.99

0.01

30000

0

1.00

0.00

tcoffee

3781

26219

0.13

0.87

15253

14747

0.51

0.49

26542

3458

0.88

0.12

29759

241

0.99

0.01

6

sigma+

74

0

0.00

0.00

334

0

0.01

0.00

27765

24

0.46

0.00

57820

0

0.96

0.00

sigma -

74

112

0.00

0.60

590

0

0.01

0.00

42882

40

0.71

0.00

58948

0

0.98

0.00

dialign

66

158

0.00

0.71

604

176

0.01

0.23

7364

114

0.12

0.02

30871

0

0.51

0.00

alignm

0

0

0.00

N/A

7192

123

0.12

0.02

53534

978

0.89

0.02

59326

222

0.99

0.00

clustalw

29878

30122

0.50

0.50

52295

7705

0.87

0.13

57712

2288

0.96

0.04

59580

420

0.99

0.01

mlagan

17411

42589

0.29

0.71

48105

11895

0.80

0.20

58736

1264

0.98

0.02

60000

0

1.00

0.00

tcoffee

13215

46785

0.22

0.78

41965

18035

0.70

0.30

58084

1916

0.97

0.03

59925

75

1.00

0.00

9

sigma+

0

0

0.00

N/A

597

0

0.01

0.00

41873

40

0.47

0.00

87769

0

0.98

0.00

sigma -

160

64

0.00

0.29

2577

162

0.03

0.06

63579

228

0.71

0.00

88764

0

0.99

0.00

dialign

78

264

0.00

0.77

1045

228

0.01

0.18

12162

176

0.14

0.01

54753

0

0.61

0.00

alignm

44

159

0.00

0.78

29033

460

0.32

0.02

83545

960

0.93

0.01

89261

330

0.99

0.00

clustalw

52761

37239

0.59

0.41

79733

10267

0.89

0.11

86758

3242

0.96

0.04

89429

571

0.99

0.01

mlagan

16445

73555

0.18

0.82

68421

21579

0.76

0.24

88828

1172

0.99

0.01

90000

0

1.00

0.00

tcoffee

27005

62995

0.30

0.70

67009

22991

0.74

0.26

88534

1466

0.98

0.02

89955

45

1.00

0.00


Siddharthan BMC Bioinformatics 2006 7:143   doi:10.1186/1471-2105-7-143

Open Data