Table 2

Performance of the MSCleaner version 2.0 over a large test set.

A1

A2

A3

A4

A5

A6

A7

A8

A9

A10

A11

A12

A13

A14

A15

A16


alphaAmyl_col1

10108

633

24

11.30

667

24

31.65

60.07

667

24

15.13

51.09

667

24

18.07

alphaAmyl_col2

10184

698

35

9.82

780

35

34.20

50.22

780

35

19.05

20.25

780

35

22.76

AmylGlu_col1

10030

736

28

13.26

761

28

28.40

79.24

761

28

8.66

73.58

761

28

10.63

AmylGlu_col2

9870

801

36

13.31

860

37

29.50

72.62

860

37

11.70

63.95

860

37

14.29

apo_col1

10032

2606

63

11.72

2814

63

30.76

63.10

2814

63

13.93

54.49

2814

63

16.78

apo_col2

10090

2571

60

12.13

2761

60

32.95

53.12

2761

60

17.53

44.32

2761

60

21.03

betaGal_col1

10324

1459

56

7.17

1567

57

34.98

48.06

1567

57

22.05

40.53

1567

57

24.60

betaGal_col2

10368

1309

51

8.12

1508

56

36.71

42.90

1454

55

24.76

33.10

1454

55

28.61

CarAnly_col1

9946

586

49

12.35

616

49

26.35

90.31

573

49

3.65

84.94

607

49

5.48

CarAnly_col2

9534

582

52

13.40

616

52

26.27

86.07

616

52

5.08

78.44

616

52

7.66

Cat_col1

10098

1798

61

11.13

1886

61

30.88

67.26

1879

61

13.13

57.89

1879

61

16.50

Cat_col2

10034

1567

65

11.78

1693

65

31.90

59.50

1693

65

15.91

48.55

1693

65

19.56

phosB_col1

10118

2780

59

10.30

3079

61

35.13

63.49

3014

60

14.26

54.46

3047

61

17.25

phosB_col2

10096

2655

61

10.52

3116

65

32.58

53.96

3084

65

17.58

44.31

3116

65

21.16

GluDey_col1

10006

892

36

11.29

986

36

27.30

79.55

986

36

7.75

73.42

986

36

9.71

GluDey_col2

9886

850

34

11.81

962

34

28.73

72.51

962

34

10.13

62.25

962

34

13.51

GluTra_col1

10022

351

25

10.36

389

25

28.61

71.64

348

25

10.25

62.78

389

25

14.30

GluTra_col2

10156

341

33

9.18

384

33

31.31

61.15

384

33

14.25

49.59

384

33

28.11

Immo_col1

10330

506

35

9.27

565

35

36.20

42.30

565

35

24.95

34.44

565

35

27.66

Immo_col2

10334

356

66

8.61

500

66

38.05

37.06

500

66

27.31

28.47

500

66

30.31

LacDe_col1

10286

1549

58

10.36

1694

58

35.36

53.20

1694

58

20.03

44.86

1694

58

23.15

LacDe_col2

10250

1346

54

9.07

1483

54

36.48

40.16

1483

54

25.60

31.67

1483

54

28.31

LactoPee_col1

10242

1613

45

13.16

1764

45

34.78

62.12

1756

45

15.91

52.37

1764

45

19.53

LactoPee_col2

10402

1679

43

9.09

1890

44

35.18

51.70

1890

44

20.31

41.76

1890

44

23.85

Myo_col1

9958

561

66

11.67

594

66

27.26

85.42

594

66

5.46

79.25

594

66

7.45

Myo_col2

9744

530

66

12.15

584

66

28.01

80.83

584

66

6.95

70.92

584

66

10.35


A1 name of test set (.mgf file; see Methods),

A2 total number of spectra (.dta files),

A3 MASCOT score of top protein hit with the original .mgf file

(without application of MS Cleaner),

A4 sequence coverage (in %) without application of MS Cleaner,

A5 fraction of non-interpretable "bad" spectra found with sequence ladder

length n = 4 among all peaks (intensity threshold s = 100%)

A6 MASCOT score of the top protein hit for this search,

A7 sequence coverage (in % of the whole protein length) for this search,

A8 MS Cleaner processing time (in min) on a PC with a single Pentium IV (to

achieve exact time consumption values, we did not use the cluster version and

stopped the "soft frequency recognition option")

A9 fraction of non-interpretable "bad" spectra found with sequence ladder

length n = 4 among the s = 20% most intense peaks

A10 MASCOT score of the top protein hit for this search,

A11 sequence coverage (in % of the whole protein length) for this search,

A12 MS Cleaner processing time (in min),

A13 fraction of non-interpretable "bad" spectra found with sequence ladder

length n = 4 among the s = 25% most intense peaks (in % of A2; i.e.,

of all spectra)

A14 MASCOT score of the top protein hit for this search,

A15 sequence coverage (in % of the whole protein length) for this MASCOT

search,

A16 MS Cleaner processing time on the same machine as described in the legend of

Table 1 (in min).

The sequence ladder criterion (minimal ladder length 4 with varying peak intensity thresholds) and the noise suppression algorithms of MS Cleaner 2.0 have been applied over a large set of tandem MS results. For each of the test proteins, two independent sample preparations and dataset recordings (marked with appendices _col1 and _col2 in the dataset name) were carried out: α-amylase, amylogucosidase, apo-transferrin, β-galactidase, carbonic anhydrase, catalase, phosphorylase B, glutamic dehydrogenase, glutathione transferase, immunoglobulin γ, lactic dehydrogenase, lactoperoxidase, myoglobin). For these datasets, the MASCOT interpretation was carried out on a cluster in parallel with other jobs; therefore, no computation time is provided.

Mujezinovic et al. BMC Genomics 2010 11(Suppl 1):S13   doi:10.1186/1471-2164-11-S1-S13

Open Data