Skip to main content

Tandem mass spectrometry data quality assessment by self-convolution

Abstract

Background

Many algorithms have been developed for deciphering the tandem mass spectrometry (MS) data sets. They can be essentially clustered into two classes. The first performs searches on theoretical mass spectrum database, while the second based itself on de novo sequencing from raw mass spectrometry data. It was noted that the quality of mass spectra affects significantly the protein identification processes in both instances. This prompted the authors to explore ways to measure the quality of MS data sets before subjecting them to the protein identification algorithms, thus allowing for more meaningful searches and increased confidence level of proteins identified.

Results

The proposed method measures the qualities of MS data sets based on the symmetric property of b- and y-ion peaks present in a MS spectrum. Self-convolution on MS data and its time-reversal copy was employed. Due to the symmetric nature of b-ions and y-ions peaks, the self-convolution result of a good spectrum would produce a highest mid point intensity peak. To reduce processing time, self-convolution was achieved using Fast Fourier Transform and its inverse transform, followed by the removal of the "DC" (Direct Current) component and the normalisation of the data set. The quality score was defined as the ratio of the intensity at the mid point to the remaining peaks of the convolution result. The method was validated using both theoretical mass spectra, with various permutations, and several real MS data sets. The results were encouraging, revealing a high percentage of positive prediction rates for spectra with good quality scores.

Conclusion

We have demonstrated in this work a method for determining the quality of tandem MS data set. By pre-determining the quality of tandem MS data before subjecting them to protein identification algorithms, spurious protein predictions due to poor tandem MS data are avoided, giving scientists greater confidence in the predicted results. We conclude that the algorithm performs well and could potentially be used as a pre-processing for all mass spectrometry based protein identification tools.

Background

Mass spectrometry

Mass spectrometry (MS) is a common analytical technique used to identify unknown compounds, quantify known materials, and elucidate the molecular structure and chemical composition of organic and inorganic substances. A mass spectrometer is an instrument used to measure the mass-to-charge ratio of individual molecules that have been converted into electrically charged molecules, or ions [1]. These ions are filtered and ordered from a lower to higher mass-to-charge ratio (m/z) before passing through an ion detector in the instrument [2]. In the field of proteomic analysis, matrix assisted laser desorption ionisation (MALDI) and electrospray ionization (ESI) are two ionisation techniques generally used. Mass spectrometry is currently experiencing rapid growth in mass-spectrometry-based biomarker discovery and clinical proteomics, where hundreds of proteins can be sequenced quickly. As a consequence, large amounts of proteomics data are produced and made available to the public [3–5].

Although the generation of raw MS spectra has become easier, the analysis and identification of the data still post many challenges. Many protein identification tools have been developed, such as PEAKS [6] MASCOT [7, 8], Phenyx [9], SEQUEST [10] and OMSSA [11]. In the case of high throughput proteomics, it involves the analysis of hundreds of thousands of peptide spectra derived from biological samples. Four general types of algorithms can identify these spectra,

1. De novo calling of the sequence directly from the spectrum [6, 12, 13].

  1. 2.

    Use of unambiguous "peptide sequence tags" derived from spectra that are used to search known sequences [14–16].

  2. 3.

    Cross-correlation methods that correlate experimental spectra with theoretical spectra [17, 18].

  3. 4.

    Probability-based matching that calculates a score based on the statistical significance of a match between an observed peptide fragment and those calculated from a sequence search library [7, 19–22].

Cross-correlation methods and probability-based matching are two well-received methods for protein identification. In these methods, a theoretical mass spectra database is first generated from known protein sequences. To search this database with experimental spectra, the correlation of the experimental and theoretical spectra is calculated. Based on the statistical properties of the protein database and the correlation values (actual implementation is more complex), a score is given for the matched spectra.

Most of these tools have attained a certain degree of success thus far; nevertheless reliable protein identification using these methods is still a time-consuming and program-dependent task. A considerable frequency of false positive protein identifications has been reported from independent studies [23, 24]. Knowing that the quality of mass spectra is crucial in protein identification, several attempts to address the issue have been made using some information obtained from mass spectra generated by fragmented peptides [25–28]. In particular, Purvine et al [27] used a prefilter with three features for tandem MS spectra classification; one feature addressed the uncertainty in charge state assignments, the second was based on total signal intensity and the third on a signal-to-noise estimate. They obtained good results by adjusting these features. Although these approaches have been useful, we introduce an additional prefilter feature based on the symmetry property of the b- and y-ions, to compliment and improve the pre-filter process.

Convolution

Convolution is a mathematical operation commonly used in digital signal processing (DSP). For discrete time series, the convolution is given as:

h i = ∑ j = 0 m f i g i − j MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGObaAdaWgaaWcbaGaemyAaKgabeaakiabg2da9maaqahabaGaemOzay2aaSbaaSqaaiabdMgaPbqabaGccqWGNbWzdaWgaaWcbaGaemyAaKMaeyOeI0IaemOAaOgabeaaaeaacqWGQbGAcqGH9aqpcqaIWaamaeaacqWGTbqBa0GaeyyeIuoaaaa@3F95@

where f j and g j are two time series data sets. Self-convolution refers to convolution applied onto the same data series, where gi-jis the time-reversal copy of the data series fj.

Self-convolution has been used in many applications, where symmetry property is key feature of the signal, such as those found in the field of digital communication [29] and image processing [30]. We will show in this work that MS do have such property inherited naturally from the fragmentation process, and hence the same approach can be used to extract information from the spectra. The success of this method depends on the availability of the complementary b- and y-ions, which are the two types of most commonly found ions in the conventional tandem mass spectrometry.

Peptide fragmentation

Peptide fragmentation is a process where peptide fragment ions are generated by dissociation in an ion trap of a mass spectrometer. In this process, the breakage can occur between any bonds in the peptide, but commonly occurs at the peptide bond. When a peptide is fragmented at a single peptide bond between the carbonyl and nitrogen, two fragments are formed. In the case where one peptide fragment retains the positive charge at the C-terminus of the peptide ion, it is called a y-ion. If the fragment retains the positive charge at the N-terminus, it is known as a b-ion. When a singly charged peptide is fragmented, the charge is retained only at one terminus and only the fragment containing the charge is detected while the other fragment is lost as a neutral fragment. Doubly charged peptides tend to produce two singly charged ions, though sometimes doubly charged ions can also be formed.

The types of fragment ions observed in an tandem MS spectrum depend on many factors, including primary peptide sequence, amount of internal energy and how the energy was introduced, charge state, etc. The accepted nomenclature for fragment ions was first proposed by Roepstorff and Fohlman [31], and subsequently modified by Johnson et al [32] and Biemann [33, 34]. There are different dissociation methods available, including commonly used gas phase collision-induced dissociation (CID) [33], surface-induced dissociation [35], photodissociation [36], electron-capture dissociation [37], and electron transfer dissociation [38]. The b-ions and y-ions are usually formed when fragmentation occurs under low energy conditions. Fig. 1 shows all possible breakage points along a peptide bond.

Figure 1
figure 1

Peptide fragmentation. This figure shows various breakage points along a peptide bond and ions are formed in complementary to the N-terminal and C-terminal.

Other ions like a-ions and x-ions, which form a complementary pair, and c-ions and z-ions, which form another complementary pair, are also formed. The a-ions and x-ions are formed when the peptide fragments between the amino acid side chain and the carbonyl molecule. The c-ions and z-ions are formed when the peptide fragments between the nitrogen and the amino acid side chain molecule. These ions are formed when fragmentation occurs high-energy conditions since higher amounts of energy are required to break these bonds. Fig. 2 shows a typical tandem MS spectrum.

Figure 2
figure 2

Tandem mass spectrum. This figure shows the possible fragmentation on the short peptide AVAGCAGAR and its respective intensity versus m/z mass spectrometry plot.

The development of chemical theory of peptide fragmentation [39, 40] has enabled the de novo prediction of fragmentation spectra from peptide sequences. Using a kinetic model, Zhang made the first successful attempt at predicting the low-energy CID spectra of singly and doubly charged peptides [41]. Elias et al. [42] were first to successfully utilize a set of well-annotated fragmentation spectra acquired from an electrospray ion-trap mass spectrometer in an attempt to infer the probabilistic rules of fragmentation. More recently, Randy et al. used machine-learning algorithm to predict various fragment-ion types of doubly and triply charged precursor ions by learning peptide fragmentation rules in mass spectrometry in the form of posterior probabilities [43]. Yu et al. proposed a novel method to automatically learn the factors influencing fragmentation from a training set of tandem MS spectra [44]. Despite the availability of the various prediction models, it is unclear how these models could be used for predicting fragment ions in different types of mass spectrometry machines.

Results

To validate the proposed method of tandem MS spectra assessment, we conducted series of tests on theoretical MS spectra as well as experimental MS spectra. The results of the tests on theoretical MS spectra are tabulated in Table 1. We then used another 60 sets experimental tandem MS spectra to tests its effectiveness and robustness.

Table 1 Scoring of theoretical mass spectrum under different conditions

Quantitative measurement of theoretical tandem MS spectra

We first compute the quality score (QS) on theoretical MS spectra based on our derivation shown in Eq. 1. The protein sequence [MTDQEAIQDLWQWR] was chosen arbitrary to form the theoretical spectra for our work. The theoretical spectra are subjected to different degradation processes, including introduction of white Gaussian noise, reduction in ion peak intensities, removal of ion peaks, as describe in the Method section. The test results are tabulated in Table 1.

In the first test, we included all the theoretical b and y-ions peaks in the spectrum, with white Gaussian noise (noise with normal distribution) of different amplitudes added. The scores are captured in Section A of Table 1. We observed that the QS scores remain stable for noise amplitudes between 0 and 10% of the peak intensity.

In the second test, we added in random peaks of equal amplitude to the b and y-ions in addition to the white Gaussian noise. The random peaks could represent spurious ion peaks intended to degrade the quality of the spectrum. We observed that with 10 and 20 random peaks added, the scores are not much affected, with QS equal to 4.6511 and 4.6442 respectively. This shows that the scores are not much affected by the random peaks, as long as the b and y-ions are intact.

In the next two test scenarios, we reduced the intensity of b and y-ions to simulate the lack of fragmented b and y-ions in the spectrum. As b-ions reduce in intensity, the QS drops from 4.5330 to 2.2654 at 10% to 70% reduction of the b-ion intensity, as shown in Section C in Table 1. The reduction of y-ion intensity shows similar effect on the QS score, it drops from 4.6106 to 0.5468 at 10% to 70% reduction in intensity, as shown in Section E in Table 1. The results are shown in Fig. 3. As the intensity is reduced further, there is no longer any peak detected at the mid-point window of the self-convolution result.

Figure 3
figure 3

Plot of QS versus ion intensity reduction. This figure shows the effect of reduction in ion intensity on the QS score.

Lastly, we removed randomly some of the b or y-ion peaks to simulate loss of certain ion fragments. The number of ions removed varies from 2 to 8 and we observed that the QS drop from 4.7692 to 2.9114 and from 3.9813 to 2.2562 for b-ion and y-ion loss respectively, as shown in Section E and Section F of Table 1. As the number of ion peak is further reduced, the mid-point peak is no longer detectable. These tests show the relation between the qualities of the spectrum to the QS that we established to assess the quality of the MS.

Qualitative measurement of experimental tandem MS spectra

We started the quality assessment by simply performing a self-convolution on some of the experimental MS spectra. Fig. 4 shows a plot of the result of self-convolution of one of the raw tandem MS spectra. Although the plot does show a high peak at the mid-point window of the result, we found out that the product of two high intensity peaks happened incidentally to be at the mid-point. This could cause misinterpretation and therefore erroneous for us to consider this result as an indication of good quality spectrum. We have thus further improved on the approach by considering side peaks and normalisation process.

Figure 4
figure 4

Plot of self-convolution of experimental mass spectrum. This figure shows the actual mass spectrum (left) and its respective self-convolution result (right). A high mid-point intensity might not indicate a good quality spectrum as a product of two high intensity peaks could generate it by chance.

The proposed method was subsequently tested on 60 sets of real tandem MS spectra (unpublished). They were subjected to the QS scoring function described in the Eq. 1. We considered 15 highest intensity peaks to the left and right of the mid-point window of each spectrum. The self-convolution result is shown in Fig. 5. The DC shifted self-convolution plots of the original tandem MS spectrum is contrasted with that of the newly generated plot, as shown in Fig. 6. We have also assumed that 30 peaks are sufficient in our calculation, but this number can be increased in the case where more ion fragments are expected. All tandem mass spectra having high scores have been identified successfully using MASCOT [8] with high confidence (> 45).

Figure 5
figure 5

Pre-processing of ion peaks intensities. This figure shows a plot of the experimental tandem MS (left) and the newly generated mass spectrum after being pre-processed (right).

Figure 6
figure 6

DC-shifted self-convolution plot of experimental tandem MS. This figure shows the difference between the DC-shifted self-convolution results obtained from the original mass spectrum (left) and the pre-processed mass spectrum (right).

Discussion

The fragmentation of peptide sequence using conventional mass spectrometer produces spectra consists mostly of b and y-ion peaks. The quality of the mass spectra depends therefore mainly on the presence of the b- and the y-ions in the spectra. Current state-of-the-art database search tools depend heavily on these ion peaks and the lack of such peaks would lead to no protein match, or in the worst case, the erroneous matching of proteins in the database. Some database search algorithms allow inclusion of a- and/or z-ions; such inclusion makes the search more complex and computationally intensive, hence significantly slows down the protein identification process.

We proposed a novel method where the quality of the mass spectrum is determined from self-convolution of the mass spectra. This approach complements existing methods in selecting good quality tandem MS spectra to be processed by database search and/or de novo sequencing. This method is unique, as it does not depend on the charge of the fragmented ion, nor its length. Random peaks such as those produced by machine noise or contaminants (e.g. Keratin), irregardless of its intensity will not affect the process, as it requires a complementary pair to work.

Knowing that the presence of a fair amount of complementary b- and y-ions constitute to good quality mass spectrum, we can be assured that by selecting spectra with high QS values, only good quality tandem MS are pre-filtered to be processed for protein identification.

We note that tandem MS spectra having non-complementary b and y-ions might score poorly using this approach. Examples of such spectra are those having large number of y-ions but only very few complementary b-ions, and vice versa.

Conclusion

We conclude that the new approach is effective and useful in assessing the quality of tandem mass spectrum by analysing the self-convolution result of the spectra. This method relies mainly on the symmetry property inherited from the formation of complementary b and y-ions found in the tandem MS spectra. The proposed assessment scheme can be used to complement existing pre-filter/assessment processes to ensure that only good quality spectra are sent for protein identification process, reducing false positive protein detection by database search and de novo sequencing protein identification tools. This method can be further improved by taking into consideration of other complementary ions, such as a-ions and x-ions.

Methods

We proposed a method that exploits the naturally inherited symmetry property of tandem mass spectrum. The symmetry property of the spectra formed by the combination of b- and y-ions can be observed easily from the spectrum shown in Fig. 2. The m/z difference between b1 and b2 is equivalent to that which is between y8 and y7 as they represent the same amino acid 'Alanine', at 71.04 Dalton. Likewise, the m/z difference between b2 and b3 is equivalent to that which is between y7 and y6 as they represent the same amino acid 'Glycine', at 57.02 Dalton, and so on. This observed symmetry is a very useful feature as it can be used to determine the quality of the spectrum generated from the mass spectrometer. If a given spectrum contains all the b-ions and y-ions of a peptide, the self-convolution of the mass spectrum would be produced the highest peak when all the corresponding b-ions and y-ions peaks are aligned. For example, for the spectrum shown in Fig. 2, the highest peak would occur when y7, y6, y5, y4, y3, y2 correspond to b2, b3, b4, b5, b6, b7 are aligned on the m/z axis. This peaks occurs theoretically at the mid-point of the self-convolution results.

To verify the observation, the molecular weights of the theoretical b- and y-ions were generated for peptide sequence [MTDQEAIQDLWQWR], using MS-Digest [45].

The b-ions thus obtained are:

b = [233.10, 348.12, 476.18, 605.22, 676.26, 789.35, 917.40, 1032.43, 1145.51, 1331.59, 1459.65, 1645.73];

The y-ions generated are:

y = [1688.80, 1587.76, 1472.73, 1344.67, 1215.63, 1144.59, 1031.51, 903.45, 788.42, 675.34, 489.26, 361.20, 175.12];

A time series data is then created such that the starting mass is 0 Dalton and the ending mass is 1819.84 Dalton, which is the mono-isotopic peptide precursor mass (MH+), with an interval of 0.01 Da. The following conditions are used to set the intensity of the time series data:

d a t a ( n ) = { 100 i f m / z = b ( n ) o r m / z = y ( n ) e l s e ( r a n d o m n o i s e l e v e l n ) 0 ≤ n ≤ 1 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGKbazcqWGHbqycqWG0baDcqWGHbqycqGGOaakcqWGUbGBcqGGPaqkcqGH9aqpdaGabeqaauaabaqacqaaaaqaaiabigdaXiabicdaWiabicdaWaqaaiabdMgaPjabdAgaMjabbccaGiabd2gaTjabc+caViabdQha6jabg2da9iabdkgaIjabcIcaOiabd6gaUjabcMcaPaqaaiabd+gaVjabdkhaYbqaaiabd2gaTjabc+caViabdQha6jabg2da9iabdMha5jabcIcaOiabd6gaUjabcMcaPaqaaiabdwgaLjabdYgaSjabdohaZjabdwgaLbqaaiabcIcaOiabdkhaYjabdggaHjabd6gaUjabdsgaKjabd+gaVjabd2gaTjabbccaGiabd6gaUjabd+gaVjabdMgaPjabdohaZjabdwgaLjabbccaGiabdYgaSjabdwgaLjabdAha2jabdwgaLHqaciab=XgaSjabbccaGiab=5gaUjabcMcaPaqaaaqaaiabicdaWiabgsMiJkabd6gaUjabgsMiJkabigdaXaaaaiaawUhaaaaa@7A86@

A plot of these b-ions and y-ions and the self-convolution values are shown in the Fig. 7. From this figure, we observed a high peak occurs at the mid-point of the self-convolution, where the b-ions (bn, bn-1, bn-2, ... b2) align with corresponding y-ions (y2, y3, y4, ... yn). However, it is also noted that the cumulating sum of the product of all the points steadily increases from 0 to the mid-point and reducing thereof, forming a triangle below the peaks. This is potentially damaging to the detection of the peaks especially when significant noise levels are present, compounded by low intensity of b-ions and/or y-ions peaks and missing peaks, as we will demonstrate later. To determine the effects of increasing noise levels, we change the noise level to 10 as shown below.

Figure 7
figure 7

Self-convolution plot for noise amplitude = 1. This figure shows the result of self-convolution when noise peaks of amplitude 1 is added to the theoretical tandem MS.

d a t a ( n ) = { 100 i f m / z = b ( n ) o r m / z = y ( n ) e l s e ( r a n d o m n o i s e l e v e l n ) 0 ≤ n ≤ 10 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGKbazcqWGHbqycqWG0baDcqWGHbqycqGGOaakcqWGUbGBcqGGPaqkcqGH9aqpdaGabeqaauaabaqacqaaaaqaaiabigdaXiabicdaWiabicdaWaqaaiabdMgaPjabdAgaMjabbccaGiabd2gaTjabc+caViabdQha6jabg2da9iabdkgaIjabcIcaOiabd6gaUjabcMcaPaqaaiabd+gaVjabdkhaYbqaaiabd2gaTjabc+caViabdQha6jabg2da9iabdMha5jabcIcaOiabd6gaUjabcMcaPaqaaiabdwgaLjabdYgaSjabdohaZjabdwgaLbqaaiabcIcaOiabdkhaYjabdggaHjabd6gaUjabdsgaKjabd+gaVjabd2gaTjabbccaGiabd6gaUjabd+gaVjabdMgaPjabdohaZjabdwgaLjabbccaGiabdYgaSjabdwgaLjabdAha2jabdwgaLHqaciab=XgaSjabbccaGiab=5gaUjabcMcaPaqaaaqaaiabicdaWiabgsMiJkabd6gaUjabgsMiJkabigdaXiabicdaWaaaaiaawUhaaaaa@7B74@

We observe that, while the noise level is only 10% of the ions intensity as shown in Fig. 8, the distinctive mid-point peak is significantly reduced in comparison to the increased overall overlapping convolution values. The other observable peaks in Fig. 7 are also lost in view of the greatly increased overlapping convolution values due to augmented in noise levels. This problem can be resolved by applying convolution theorem and by removing the DC component of the product of Fourier transforms before performing the inverse Fourier transform. According to Convolution Theorem, convolution is achieved by first applying the Discrete Fourier Transform (DFT) onto the data sets, multiply these two transforms, and then perform the inverse DFT. The key point is that the near DC components are removed by setting the first 10 points of the DFT product to 0. Finally the data is normalised against its largest magnitude. The pseudo-codes are shown as below:

Figure 8
figure 8

Self-convolution plot for noise amplitude = 10. This figure shows the result of self-convolution when noise peaks of amplitude 10 is added to the theoretical tandem MS.

D = DFT(data); // compute the Discrete Fourier Transform from the spectrum

D = Df * Df; // compute the product of the DFT

DD(1:10) = 0; // remove the near-DC components from the spectrum

IDD = abs(iDFT(DD)); // compute the amplitude of the inverse Discrete

// Fourier Transform

NIDD = IDD/max(IDD); // normalised self-convolution value

As depicted in Fig. 9, we have eliminated the detrimental effects of noise by preserving the maximum peak at the mid point and the other observable peaks as compared with Fig. 8. The removal of near DC component and an additional normalization step have improved our ability to determine the quality of the spectrum.

Figure 9
figure 9

DC-shifted self-convolution plot for noise amplitude = 1 and 10. This figure shows the DC-shifted self-convolution results of theoretical tandem MS with noise amplitude = 1 (left) and noise amplitude = 10 (right).

Quantitative measurement

We further propose a quantitative method to determine the quality of a given tandem MS spectrum from the self-convolution values, as follows:

  1. 1)

    Determine the maximum peak value occurs at the mid-point of the normalised self-convolution values (Pmax(mid - point)) within the +/- 2 Dalton error windows of the MS fragment ion mass values.

  2. 2)

    Find the N highest peaks to the left of (P L ) and N highest peaks to the right of (P R ) the mid-point peak value. The choice of N value ranges from 10 to 30, depending on the mono-isotopic peptide precursor mass of the fragment.

  3. 3)

    Calculate the ratio of the maximum mid-point peak to the average of the highest peaks to the left and right of the mid-point peak.

We term this ratio as the Quality Score (QS) of the tandem MS spectrum as shown in the following equation:

Q S = P max ( m i d − p o int ) 1 2 N ( ∑ n = 1 N P L n + ∑ n = 1 N P R n ) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGrbqucqWGtbWucqGH9aqpdaWcaaqaaiabdcfaqnaaBaaaleaacyGGTbqBcqGGHbqycqGG4baEcqGGOaakcqWGTbqBcqWGPbqAcqWGKbazcqGHsislcqWGWbaCcqWGVbWBcyGGPbqAcqGGUbGBcqGG0baDcqGGPaqkaeqaaaGcbaWaaSaaaeaacqaIXaqmaeaacqaIYaGmcqWGobGtaaWaaeWaaeaadaaeWbqaaiabdcfaqnaaBaaaleaacqWGmbatdaWgaaadbaGaemOBa4gabeaaaSqabaaabaGaemOBa4Maeyypa0JaeGymaedabaGaemOta4eaniabggHiLdGccqGHRaWkdaaeWbqaaiabdcfaqnaaBaaaleaacqWGsbGudaWgaaadbaGaemOBa4gabeaaaSqabaaabaGaemOBa4Maeyypa0JaeGymaedabaGaemOta4eaniabggHiLdaakiaawIcacaGLPaaaaaaaaa@5E85@
(1)

Fig. 10 shows the actual components considered in our quantitative method described above. Fig. 11 shows the normalised self-convolution plot of a good tandem mass spectrum. We can see clearly that the score is higher (QS = 3.0833) in this case as compared to those shown in Fig. 4 (QS = 1.9907) and Fig. 6 (QS = 1.8030). We performed MASCOT database search to confirm the quality of these spectra.

Figure 10
figure 10

Qualitative measurement of spectrum quality.

Figure 11
figure 11

DC-shifted self-convolution of good quality mass spectrum.

Availability and requirements

Project name: MS Quality Assessment

Operating system(s): UNIX or Windows

Programming language: MATLAB version 5.3, no special toolbox needed.

Licence: Email request to author.

Any restrictions to use by non-academics: Licence needed.

References

  1. What is mass spectrometry?. [http://www.asms.org/whatisms]

  2. Herbert CG, Johnstone RAW: Mass spectrometry basics. 2003, CRC Press LLC, Boca Raton, FL

    Google Scholar 

  3. Puymbrouck JV, Angulo D, Drew K, Hollenbeck LA, Battre D, Schilling A, Jabon D, Laszewski GV: A batch import module for an empirically derived mass spectral database. DePaul CTI Technical report. 2006

    Google Scholar 

  4. Desiere F, Deutsch EW, King NL, Nesvizhskii AI, Mallick P, Eng J, Chen S, Eddes J, Loevenich SN, Aebersold R: The peptideatlas project. Nucleic Acids Res. 2006, 34: D655-

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  5. Kinter M, Sherman NE: Protein sequencing and identification using mass spectrometry. Wiley-Interscience. 2000, New York

    Google Scholar 

  6. Ma Bin, Zhang Kaizhong, Hendrie Christopher, Liang Chengzhi, Li Ming, Doherty-Kirby Amanda, Lajoie Gilles: PEAKS: Powerful software for peptide de novo sequencing by ms/ms. Rapid Communications in Mass Spectrometry. 2003, 17 (20): 2337-2342.

    Article  CAS  PubMed  Google Scholar 

  7. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis I. 1994, 20: 3551-3567.

    Article  Google Scholar 

  8. MASCOT by Matrixscience. [http://www.matrixscience.com/home.html]

  9. Phenyx by Genebio. [http://www.phenyx-ms.com/]

  10. Eng Jimmy, McCormack Ashley, Yates John: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1999, 5: 976-989.

    Article  Google Scholar 

  11. Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH: Open mass spectrometry search algorithm. J Proteome Res. 2004, 3: 958-964.

    Article  CAS  PubMed  Google Scholar 

  12. Johnson RS, Taylor JA: Searching sequence databases via de novo peptide sequencing by tandem mass spectrometry. Mol Biotechnol. 2002, 146: 41-61.

    Google Scholar 

  13. Shevchenko A, Sunyaev S, Loboda A, Shevchenko A, Bork P, Ens W, Standing KG: Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. Anal Chem. 2001, 73 (9): 1917-1926.

    Article  CAS  PubMed  Google Scholar 

  14. Mann M, Wilm M: Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal Chem. 1994, 66 (24): 4390-4399.

    Article  CAS  PubMed  Google Scholar 

  15. Sunyaev S, Liska AJ, Golod A, Shevchenko A, Shevchenko A: Multitag: Multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. Anal Chem. 2003, 75 (6): 1307-1315.

    Article  CAS  PubMed  Google Scholar 

  16. Tabb DL, Saraf S, Yates JR: Gutentag: High-throughput sequence tagging via an empirically derived fragmentation model. Anal Chem. 2003, 75 (23): 6415-6421.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  17. Eng JK, McCormack AL, Yates JR: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994, 5: 976-989.

    Article  CAS  PubMed  Google Scholar 

  18. Pevzner PA, Dancik V, Tang CL: Mutation-tolerant protein identification by mass spectrometry. J Comput Biol. 2000, 7: 777-787.

    Article  CAS  PubMed  Google Scholar 

  19. Field HI, Fenyo D, Beavis RC: A bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database. Proteomics. 2002, 2: 36-47.

    Article  CAS  PubMed  Google Scholar 

  20. Clauser KR, Baker P, Burlingame AL: Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing ms or ms/ms and database searching. Anal Chem. 1999, 71: 2871-2882.

    Article  CAS  PubMed  Google Scholar 

  21. Fenyo D, Qin J, Chait BT: Protein identification using mass spectrometric information. Electrophoresis. 1998, 19: 998-1005.

    Article  CAS  PubMed  Google Scholar 

  22. Zhang N, Aebersold R, Schwikowski B: ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data. Proteomics. 2002, 2: 1406-

    Article  CAS  PubMed  Google Scholar 

  23. Cargile BJ, Bundy JL, Stephenson JL: Potential for false positive identifications from large databases through tandem mass spectrometry. J Proteome Res. 2004, 3: 1082-1085.

    Article  CAS  PubMed  Google Scholar 

  24. Keller Andrew, Purvine Samuel, Nesvizhskii Alexey, Stolyar Sergey, Goodlett David, Kolker Eugene: Experimental protein mixture for validating tandem mass spectral analysis. OMICS: A Journal of Integrative Biology. 2002, 6: 207-212.

    Article  CAS  PubMed  Google Scholar 

  25. Jussi Salmi, Robert Moulder, Jan-Jonas Filen, Olli Nevalainen S, Tuula Nyman A, Riitta Lahesmaa, Tero Aittokallio: Quality classification of tandem mass spectrometry. Bioinformatics Journal. 2006, 22 (4): 400-406.

    Article  Google Scholar 

  26. Fang-Xiang Wu, Pierre Gagné, Arnaud Droit, Guy Poirier G: Quality assessment of peptide tandem mass spectra. First International Multi-Symposiums on Computer and Computational Sciences. 2006, 1: 243-250.

    Google Scholar 

  27. Samuel Purvine, Natali Kolker, Eugene Kolker: Spectral quality assessment for high-throughput tandem mass spectrometry proteomics. OMICS A Journal of Integrative Biology. 2004, 8 (3): 255-256.

    Article  Google Scholar 

  28. Bern Marshall, Goldberg David, Hayes McDonald W, Yates John: Automatic quality assessment of peptide tandem mass spectra. Bioinformatics Journal. 2004, 20 (Suppl 1): i49-i54.

    Article  CAS  Google Scholar 

  29. Yik-Chung Wu, Tung-Sang Ng: Symbol timing recovery for GMSK modulation based on square algorithm. IEEE Comm Lett. 2001, 5 (5): 221-223.

    Article  Google Scholar 

  30. Bharath AA: A tiling of phase-space through self convolution. IEEE Transactions on Signal Processing. 2000, 48: 3581-3585.

    Article  Google Scholar 

  31. Roepstorff P, Fohlman J: Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed Mass Spectrom. 1984, 11 (11): 601-

    Article  CAS  PubMed  Google Scholar 

  32. Johnson Richard, Martin Stephen, Biemann Klaus, Stults John, Throck Watson J: Novel fragmentation process of peptides by collision-induced decomposition in a tandem mass spectrometer: Differentiation of leucine and isoleucine. Anal Chem. 1987, 59 (21): 2621-2625.

    Article  CAS  PubMed  Google Scholar 

  33. Biemann K: Contributions of mass spectrometry to peptide and protein structure. Biomed Environ Mass Spectrom. 1988, 16 (1–12): 99-111.

    Article  CAS  PubMed  Google Scholar 

  34. Biemann K: Mass spectrometry. Methods in Enzymology. Edited by: McCloskey JA. 1990, San Diego: Academic Press, 193: 886-887.

    Google Scholar 

  35. McCormack AL, Jones JL, Wysocki VH: Surface-induced dissociation of multiply-protonated peptides. J Am Soc Mass Spectrom. 1992, 3: 859-862.

    Article  CAS  PubMed  Google Scholar 

  36. Barbacci DC, Russell DH: Sequence and side-chain specific photofragment (193 nm) ions from protonated substance-p by matrix-assisted laser desorption ionization time-of-flight mass spectrometry. J Am Soc Mass Spectrom. 1999, 10: 1038-1040.

    Article  CAS  Google Scholar 

  37. Zubarev RA, Kelleher NL, McLafferty FW: Electron capture dissociation of multiply charged protein cations. A nonergodic process. J Am Chem Soc. 1998, 120 (13): 3265-3266.

    Article  CAS  Google Scholar 

  38. Syka JE, Coon JJ, Schroeder MJ, Shabanowitz J, Hunt DF: Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proc Natl Acad Sci USA. 2004, 101: 9528-9533.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  39. McCormack AL, Somogyi A, Dongre AR, Wysocki VH: Surface-induced dissociation in conjunction with a quantum mechanical approach. Anal Chem. 1993, 65: 2859-2872.

    Article  CAS  PubMed  Google Scholar 

  40. Wysocki VH, Tsaprailis G, Smith LL, Breci LA: Mobile and localized protons: a framework for understanding peptide dissociation. J Mass Spectrom. 2000, 35: 1399-1406.

    Article  CAS  PubMed  Google Scholar 

  41. Zhang Z: Prediction of low-energy collision-induced dissociation spectra of peptides. Anal Chem. 2004, 76: 3908-3922.

    Article  CAS  PubMed  Google Scholar 

  42. Elias JE, Gibbons FD, King OD, Roth FP, Gygi SP: Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat Biotechnol. 2004, 22 (2): 214-219.

    Article  CAS  PubMed  Google Scholar 

  43. Arnold Randy, Jayasankar Narmada, Aggarwal Divya, Tang Haixu, Radivojac Predrag: A machine learning approach to predicting peptide fragmentation spectra. Pacific Symposium on Biocomputing. 2006, 11: 219-230.

    Google Scholar 

  44. Yu C, Lin Y, Sun S, Cai J, Zhang J, Bu D, Zhang Z, Chen R: An iterative algorithm to quantify factors influencing peptide fragmentation during tandem mass spectrometry. J Bioinform Comput Biol. 2007, 5 (2): 297-311.

    Article  CAS  PubMed  Google Scholar 

  45. MS-Digest. [http://prospector.ucsf.edu/prospector/4.27.1/cgibin/msForm.cgi?form=msdigest]

Download references

Acknowledgements

We would like to thank Prof Kon Oi Lian, National Cancer Centre of Singapore, for providing us the experimental tandem mass spectra to make this work possible, and to thank Nanyang Polytechnic for providing financial and equipment support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keng Wah Choo.

Additional information

Competing interests

The author(s) declares that there are no competing interests.

Authors' contributions

CKW proposed the initial implementation of the algorithm and tested the functionality of the codes. He was involved in drafting the manuscript. LT investigated the symmetry property and helped improve the final quantitative measurement of the mass spectra. He revised the manuscript. All authors read and approved the final manuscript.

Keng Wah Choo and Wai Mun Tham contributed equally to this work.

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Choo, K.W., Tham, W.M. Tandem mass spectrometry data quality assessment by self-convolution. BMC Bioinformatics 8, 352 (2007). https://doi.org/10.1186/1471-2105-8-352

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1471-2105-8-352

Keywords