Precise protein quantification based on peptide quantification using iTRAQ™

Boehm, Andreas M; Pütz, Stephanie; Altenhöfer, Daniela; Sickmann, Albert; Falk, Michael

doi:10.1186/1471-2105-8-214

Software
Open access
Published: 21 June 2007

Precise protein quantification based on peptide quantification using iTRAQ™

Andreas M Boehm¹,
Stephanie Pütz¹,
Daniela Altenhöfer²,
Albert Sickmann¹ &
…
Michael Falk²

BMC Bioinformatics volume 8, Article number: 214 (2007) Cite this article

19k Accesses
90 Citations
3 Altmetric
Metrics details

Abstract

Background

Mass spectrometry based quantification of peptides can be performed using the iTRAQ™ reagent in conjunction with mass spectrometry. This technology yields information about the relative abundance of single peptides. A method for the calculation of reliable quantification information is required in order to obtain biologically relevant data at the protein expression level.

Results

A method comprising sound error estimation and statistical methods is presented that allows precise abundance analysis plus error calculation at the peptide as well as at the protein level. This yields the relevant information that is required for quantitative proteomics. Comparing the performance of our method named Quant with existing approaches the error estimation is reliable and offers information for precise bioinformatic models. Quant is shown to generate results that are consistent with those produced by ProQuant™, thus validating both systems. Moreover, the results are consistent with that of Mascot™ 2.2. The MATLAB^® scripts of Quant are freely available via http://www.protein-ms.de and http://sourceforge.net/projects/protms/, each under the GNU Lesser General Public License.

Conclusion

The software Quant demonstrates improvements in protein quantification using iTRAQ™. Precise quantification data can be obtained at the protein level when using error propagation and adequate visualization. Quant integrates both and additionally provides the possibility to obtain more reliable results by calculation of wise quality measures. Peak area integration has been replaced by sum of intensities, yielding more reliable quantification results. Additionally, Quant allows the combination of quantitative information obtained by iTRAQ™ with peptide and protein identifications from popular tandem MS identification tools. Hence Quant is a useful tool for the proteomics community and may help improving analysis of proteomic experimental data. In addition, we have shown that a lognormal distribution fits the data of mass spectrometry based relative peptide quantification.

Background

Mass spectrometry is a common technique employed for protein identification in proteomics. In tandem mass spectrometry, proteins are identified by matching the measured fragment ion spectra of peptides with theoretical spectra calculated from known DNA or protein sequences [1], for example the NCBI sequence database [2] or Swiss-Prot [3].

Instead of studying a single protein in detail as done in former days of protein sciences, the analysis of all proteins of a cell – the proteome – became important [4]. The proteome comprises all the proteins present in an organism, tissue or cell at a particular time. In contrast to the genome, the proteome is not static but highly dynamic.

To understand the biological and biochemical processes in a cell or an organism, for example responses to different environmental influences or the difference between healthy and diseased tissue, analysis of all differences at genomic or proteomic level needs to be performed. The protein abundance changes over time are needed for understanding cellular processes [5].

Differences in protein expression are not accessible at genomic level but often are accessible at the proteome level [6]. Some proteins are up- or down-regulated in the different stages of a cell. Therefore, quantitative information of the expressed proteins is needed and constitutes a key-step to fully understand functions of organelles, cells, organisms as well as processes of diseases. Furthermore, the quantitative information of the protein expression can be used for bioinformatic modelling of cellular processes such as pathways, cell maturing and metabolisms [7].

The advantages of mass spectrometry-based peptide quantification are precision, sensitivity, throughput and convenient automation [8, 9]. During the last decade, several techniques have been established [10], e.g. the isobaric tag for relative and absolute quantitation (iTRAQ™) that is currently the only technique capable of multiplexing up to four different samples for relative quantification. Four chemically identical iTRAQ™ reagents are available, named 114, 115, 116, 117, which have the same overall mass. Each label is composed of a peptide reactive group (NHS ester) and an isobaric tag of 145 Da that consists of a balancer group (carbonyl) and a reporter group (based on N-methylpiperazine) [11], as shown in figure 1. Between the balancer and the reporter group is a fragmentation site. The peptide reactive group attaches specifically to free primary amino groups – N-termini and ε-amino groups of lysine residues. Side reactions on tyrosine have been also reported [11]. No labelling occurs if the primary amino groups are modified, for example N-terminal glutamine or glutamic acid could form a ring (pyroglutamic acid) or an acetylation may occur. Therefore by using iTRAQ™, peptides within the sample are labelled that possess at least one free primary amino group.

In fragment ion spectra of iTRAQ™ labelled peptides, additional peaks appear in the m/z range of 114 to 117, originating from the singly charged reporter group fragment of each iTRAQ™ label. Peptide quantification can be performed by interpretation of these peaks. In order to allow for judging the results calculated from the reporter peaks, a reliable quality measure is needed [12] not only at the peptide level.

The development of precise and transparent methods for analysis of proteomic data is one of the crucial challenges in protein sciences [8]. A software for data evaluation support is needed for quantification, because Proteomics yields huge amounts of data [13]. These computer programs must be capable of providing results at the protein level. Some software already is available for analyzing iTRAQ™ data, such as i-Tracker [14], MassTRAQ [15], ProQuant™ (Applied Biosystems (ABI), Darmstadt, Germany), ProteinPilot™ (ABI) or Mascot™ 2.2 (Matrix Science, London, UK). Some of these are not freely available, such as ProQuant™, ProteinPilot™ and Mascot™. MassTRAQ and i-Tracker only provide data at the peptide level. These tools have in common that they are not capable of calculating reliable quantification information at the protein level or do not provide precise error estimation or a reliable quality measure. Some of them assume a mismatching and inappropriate distribution for their peptide and signal statistics. We thus decided to develop our tool named Quant for quantification at peptide level as well as at protein level. We focus on the protein level, as only this allows meaningful interpretations of the experimental data including a reliable transfer into bioinformatic modelling. Moreover, this software is freely available.

Methods

Experiments

The functionality of Quant has been proven by application to a standard protein mix provided by Applied Biosystems within the iTRAQ™ kit.

Sample preparation

A six-protein mix delivered with the iTRAQ™ kit was used for the analysis. The protein mix consisted of bovine serum albumin (Accession Number P02769), β-galactosidase (P00722), α-lactalbumin (P00711), β-lactoglobulin (P02754), lysozyme (P00698), apo-transferrin (P02787).

The proteins were dissolved according to the iTRAQ™ reagent protocol [16] in 100 mM triethylammonium bicarbonate buffer at pH 8.5. The cysteine residues were blocked and alkylated with MMTS as described in the iTRAQ™ protocol and the proteins were digested overnight using trypsin. The obtained peptides were labelled with the iTRAQ™ reagent in 70% ethanol.

The sample was divided in two sections, whereby one half was labelled with the iTRAQ™ reagent 114 and the other with 117. These differently labelled samples were mixed 1:1 and 1:3. The samples were separated by using multidimensional liquid chromatography. In the first dimension, the mixture was separated by strong cation exchange chromatography (PL-SCX; 2.1-mm inner diameter (ID), 150-mm length, 1000-Å pore size, 8-μm particle size, Polymer Laboratories, Darmstadt, Germany) using a linear binary gradient (solvent A: 50 mM KH₂PO₄, pH 3.5; solvent B: 50 mM KH₂PO₄, 0.25 M NaCl, 25% ACN, pH 3.5). The separation of the peptides was performed with a gradient of 2% per minute increasing amount of solvent B. SCX fractions were taken every minute and the organic solvent was removed under vacuum, furthermore the fractions were separated in a second dimension and analyzed using nano LC-MS/MS.

The nano MS/MS analysis was conducted with a Qstar XL (ABI). Samples were preconcentrated using a C18 Pep-Map trapping column (300 μm ID, 1 mm length, 100 Å pore size, 5 μm particle size; Dionex, Idstein, Germany) and afterwards separated on a C18 PepMap main column (75 μm ID, 150 mm length, 100 Å pore size, 3 μm particle size; Dionex) using a linear binary gradient (solvent A: 0.1% FA; solvent B: 0.1% FA, 84% ACN). Full MS scans from 400 to 1500 m/z were recorded, and the two most intensive peptide ions were subjected to further fragmentation. The MS/MS scans were recorded from 100 to 1500 m/z.

Protein identification

MS/MS Data was exported using wiff2dta [13], version 1.1.10. Protein identification was performed using Mascot™, Version 2.0 (Matrix Science, London, UK) and the database SwissProt (26-01-2006). Identification data as well as fragment ion spectra were extracted using mres2x [17]. MS/MS peptide identifications were verified using theospec [1] and the visualization tools of resDB [18]. Protein identifications were verified using seqDB [19] as used in former studies [18].

The quantification by ProQuant™ was performed using the Analyst QS™ Software, version 1.1. Proteins were implicitly identified by ProID™ 1.1 using the SwissProt database (26-01-2006). An interrogator database was generated based on the database using the enzyme trypsin and allowing one missed cleavage site. The parameters for ProQuant™ (version 1.1) and Pro Group Report (version 1.0.2) were 1.30 for the protein score threshold, and competitor proteins were shown within a protein score of 2.00. The mass tolerance was set to 0.4 amu for precursor ions and 0.4 amu for fragment ions.

Additionally, Mascot™ 2.2 was used for iTRAQ™ analysis. The protein ratio type was set to median, the normalization method was median ratio, no outlier removal was chosen and the peptide threshold was set to at least homology.

Error estimation and error propagation

We introduce precise error propagation in quantification software. A common method in error estimation is done by using the mean value μ, the standard deviation σ and by applying the kσ-rule and the Tschebyschew-equation and has been proposed for quantification [12]. But this method implicates the assumption of the independence of the measured values and simultaneously requires their normal distribution (normality). If one of these or both cannot be assured, other means than this statistical approach to error estimation have to be applied. This is the case, if for example each measurement is only made once and uncertainty arises from precision issues of the instruments used. Moreover, the peptide count in quantitative proteomics is not large enough for reliable calculation of a mean and a standard deviation. Then, errors have to be estimated by intervals. The minimum and maximum values are calculated.

Usually in error treatment, observations are denoted with their errors. Let a and b be two measurements of the true values a₀ and b₀ with the relative errors |f_a| and |f_b|, respectively. The corresponding absolute errors are denoted as |e_a| and |e_b|. Then the equations a = a₀ (1 ± |f_a|) = a₀ ± |e_a| and b = b₀ (1 ± |f_b|) = b₀ ± |e_b| are valid.

Error propagation can be calculated dependant on the mathematical operations as follows. Sum and difference can be estimated as

a ± b {a₀ ± b₀ - (|e_a| + |e_b|), a₀ ± b₀ + (|e_a| + |e_b|)} and |e_{a ± b}| = |e_a| + |e_b|,

and product as well as quotient as

a·b ∈ {a₀·b₀·(1 - (|f_a| + |f_b|)), a₀·b₀·(1 + (|f_a| + |f_b|))} and |f_a·b| = |f_a| + |f_b|,

\begin{matrix} \frac{a}{b} \in {\frac{a_{0}}{b_{0}} (1 - (| f_{a} | + | f_{b} |)), \frac{a_{0}}{b_{0}} (1 + (| f_{a} | + | f_{b} |))} & and & | f_{\frac{a}{b}} | = | f_{a} | + | f_{b} | \end{matrix}

This can be applied to the calculation of the determinant of any m × n matrix M. If any two columns are exchanged, the propagated relative error is not affected. This is especially valid when determinants are calculated by using submatrices.

The absolute error |e_i| of the peak intensity I_iis 0.5 in case of integer values. In all other cases, this error depends on the precision of the mass spectrometer and must be estimated individually during calibration. An MS/MS spectrum can be defined as a set M of 2-tuples M = {(x_i, I_i) | i ∈ {1,..., n}} and the intensities I_ican be regarded as error-prone I_i= y₀ ± |e_i| = y₀ (1 ± f_i), but derived from the true signal y₀.

Purity correction of iTRAQ™ labels and error estimation

The iTRAQ™ reagent batches supplied by ABI are provided with sixteen purity values. These indicate the percentages of each reporter ion that have masses differing by -2, -1, +1 and +2 Da from the nominal reporter ion mass due to isotopic variants. Following the method proposed formerly [14], we use this information to correct the values of each reporter ion to account for the losses to and gains from other reporter ions. This results in simultaneous equations that can be framed such that they can be solved by applying Cramer's rule. This is where we extend the published method by means of error propagation. The relative error of the true reporter intensity W_iis $| f_{W_{i}} | = | f_{\det (C_{i})} | = \frac{e_{\det (C_{i})}}{| \det (C_{i}) |}$ , with i ∈ {114, 115, 116, 117}.

In addition, we introduce an initial experiment error that is taken into consideration during calculation of peptide and especially for protein quantification. In former publications [14], a rough intensity error estimation has been proposed. We improve this by a more reliable estimation. Moreover, our method is not fixed to integer intensity values in the fragment ion spectra.

Quantification of proteins

When performing protein quantification, only unique peptides are taken into consideration, whereas peptides belonging to more than one protein sequence are only used for proving the identification of the corresponding proteins. The ratios of the unique peptides are lognormal distributed if their count n is large enough, see figure 2. This has been previously reported for difference gel electrophoresis (DIGE) protein data [20, 21]. The Shapiro-Wilk-test, a powerful test of departure from normality, performed with Statistica™ (version 7.1, StatSoft Europe GmbH, Hamburg, Germany) yields W = 0.9629 and a p-value of 0.2095 for the data of the 1:3 mix. Therefore, the null hypothesis that the log-transformed data is normal distributed cannot be rejected due to the high p-value. The median of the ratios is calculated, too. In case of lognormal distribution, this equals the mean value μ of the log-transformed and thus normal distributed peptide ratios. However, in case of large n, the median should be preferred to the mean value of the non-transformed data, because it represents the medium observation and is thus the more meaningful choice between the both. The median represents the protein ratio. Additionally, the protein ratio R_Pis calculated using the method of least-squares estimation (LSE) by minimizing the square root $\sqrt{\sum_{i = 1}^{n} {(R_{P} - R_{i})}^{2}}$ . This yields a value with a minimal mean distance from the data points R_i. Both, LSE and median represent the protein ration derived from the peptide ratios. The choice of the median as protein ratio bases on the lognormal distribution of the peptide ratios and is a good choice for large enough data sets. The LSE is appropriate for smaller data sets and does not depend on an underlying distribution. This is the average of the points Ri, as can be shown. Both values should be nearly equal and their difference can be regarded as an additional quality measure. Moreover, if the peptide ratio count is large enough, the mean value μ and standard deviation σ of the log-transformed peptide ratios can be used as quality indicators, too.

Implementation

The implementation was done on MATLAB^® (The Mathworks, Ismaning, Germany), version 6.1. The program files are contained in additional file 1, the detailed documentation in additional file 2. We provide example data in additional file 3.

The quantification values are calculated by the script startquantitraq. It executes quantitraq that performs the iTRAQ™ quantification. The integration is done by calling sumquantitraq (sum of intensities) or flquantitraq (area calculation by trapezoids), depending on the user's choice. The function pcquantitraq implements the purity correction and is called by quantitraq. The peptide ratios are calculated by raquantitraq. The list of files being processed in batch is provided in the file names01.txt. These files contain the uncentroided MS/MS spectra in DTA format. We recommend not using centroided MS/MS spectra. Mascot™ results could be exported by using mres2x [17], for instance. The script startexperror performs the calculation of the experiment error by execution of the functions experror that calls logtrans, qplot and killzero. By running startplotitraq, the errors are plotted and the boxplots are created by iteratively calling plotitraq. The result files listed in the file names02.txt are processed.

Results

Peptide quantification based on fragment ion spectra

In contrast to other quantification software such as i-Tracker [14] or RelEx [22], Quant is able to cope with just one signal per iTRAQ™ reporter ion. We allow the choice between two methods of integration: trapezoid integration as implemented in existing software tools and the sum of intensities (see below). We introduce a constant minimal peak width b that is applied if only one peak is found in order to allow calculation of a peak area A when trapezoid integration has been chosen. The error estimation in the former case is as follows: |f_A| = |f_i| ⇒ e_A= |e|·b. In the latter case, the absolute error of trapezoid integration of peaks {(x_i, y_i)} belonging to the mass spectrum S = {(x₁, y₁),...,(x_n, y_n)} is |e_A| = |e|·(x_n- x₁). The absolute error when summing up the intensities is |e_S| = n·|e|.

Relative quantification is performed by calculation of peptide ratios. Each pair of ratios is calculated by building quotients R_{i, j}of the true reporter intensities W_iand W_j, based on area or sum, for example $R_{114, 115} = \frac{W_{114}}{W_{115}}$ . Consequently, the implicated relative error of the quotient is $| f_{R_{i, j}} | = | f_{W_{i}} | + | f_{W_{j}} |$ , the absolute error $| e_{R_{i, j}} | = | f_{R_{i, j}} | R_{i, j}$ .

The effects of the chosen integration method are as follows. The quadratic effect of the integration process that comes from the area calculation does not disappear by applying quotients when ratios are calculated. Consider the example of two labels with two peaks each: P_A= {(114.0000, 6.0000), (114.2000, 9.0000)} (label A) and P_B= {(115.0000, 4.0000), (115.2000, 16.0000)} (label B), see figure 3. The summed intensities are 15.0000 and 20.0000, respectively. The trapezoid integrals amount to 1.5000 (A) and 2.0000 (B). The corresponding ratios are 1.3333 (summed) and 1.3333 (area). If an additional peak would have been acquired at for example 115.0600 m/z with an intensity of 7.6000, the area of B will not change, but the summed intensity will change to 27.6000, yielding a ratio of 1.8400. This yields a difference in relative quantification of about 38%. Therefore, we recommend using the sum of intensities instead of calculating an underlying area.

These distorting effects of the integration method are independent of the peak-picking method (centroid, gaussian peak detection etc.) that is applied by the data extraction software processing the raw data of the mass spectrometer. Quant itself uses MS/MS data extracted by other means and therefore is independent of any peak-picking method. Moreover it is independent of the mass spectrometer manufacturer and of the controlling software.

Quant integrates an "experiment error" for protein quantification, i.e. a shift of peptide ratios that indicates the overall protein quantification. Previous studies have shown by plotting the ratio distribution of the proteins that most proteins of a sample are not regulated [23, 24]. Therefore, the distribution of peptide ratios obtained by a quantification experiment should scatter around a value of one. If this is not the case, this shift indicates an error that happened during the sample preparation in the laboratory. Consider the example of mixing two samples 1:1. The protein concentration has to be known. This can be determined by a BCA [25, 26] or Bradford assay [27], but both are not precise as other colorimetric protein assays, too [28]. Thus no exact 1:1 mix can be guaranteed during sample preparation.

Moreover, errors could occur during pipetting, particularly when handling small amounts of protein sample. In order to quantify this shift, the distribution of the peptide ratios must be analyzed in detail.

Firstly, the type of distribution must be determined. We found all peptide ratios lognormal distributed as reported previously for DIGE protein data [20, 21]. The median was chosen as parameter, because the log-transformed median equals the mean of the log-transformed normal distributed data. Besides the observation, that biological data mostly are lognormal distributed, in the case of peptide quantification a left-steeply, right skewed distribution is observed. This can be explained by the fact that in peptide quantification, the ratios have values greater than zero, but very seldom large values. Usually, they vary around 1. The lognormal distribution can be proved by a normal-probability-plot as shown in figure 2.

The definition of the median in conjunction with the multiplicative characteristic of the lognormal distribution implies that the shift in question is multiplicative, too. This factor is the reciprocal of the median. All peptide ratios are multiplied with this value. Consequently, the median of the shifted peptide ratios is then near one.

The multiplication of the ratios with the median m effects the error estimation. The absolute error changes from e_Rto $e_{R^{'}} = \frac{1}{m} e_{R}$ . The relative error f_mof m implies a relative error of f = f_R+ f_mwhen calculating the quotients.

Multiple labelling of peptides has no effects on the quantification results, because the peptides being compared have identical sequences, and thus are equally labelled.

Protein quantification and visualization

The in-house implementation of a pipeline that integrates Quant accepts peptide identifications from either Mascot™ [29] or Sequest™ [30] and integrates the tool mres2x [17] in order to preserve the linkage between the peptide identification and the corresponding MS/MS spectra.

Usually, only unique peptides are taken into consideration, whereas peptides pointing to more than one protein sequence are only used for improving protein identification as well as for verification and confirmation of identifications (see figure 4).

Visualization of protein quantification is done by providing a boxplot of the peptide ratios, as depicted in figures 5, 6, 7, 8, 9. This includes the first and third quartile of the data, i.e. the 25% and 75% quantile. The median is depicted by a horizontal red line. The whiskers mark the data range and are limited to 150% of the inter-quartile-range (IQR). Outliers are marked in red. The IQR represents a quality measure as it quantifies the scatter of the data independent of the underlying distribution.

As a measure of quality, the confidence interval [μ-k·σ, μ + k·σ] can be used for the log-transformed data. Additionally, the standard deviation σ of this data can be used as an indicator of the quantification quality in case of a large peptide count per protein. As a numeric tool for measuring the overall quality of the data used, the root-mean-square value (RMS) can be applied to the relative errors of peptide quantification: $R M S (\vec{f}) = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} f_{i}^{2}} = \frac{| \vec{f} |}{\sqrt{n}}$ .

The smaller the RMS-value, the better the level of uncertainty is. This method must be preferred to the norm of an error vector, because the dimensions of error vectors are not identical. Moreover, the RMS is appropriate for small data sets.

Experimental results

The standard protein mix supplied with the iTRAQ™ kit was used for testing our software tool Quant. The contents and amount of proteins are known and this protein mix is generally used to establish the iTRAQ™ workflow in laboratories.

Furthermore, we always test new software with a generalized and known sample. By doing this, the functionality and applicability can be easily shown.

The standard protein mix of iTRAQ™ consists of bovine serum albumin (Bos taurus), β-galactosidase (E. coli), α-lactalbumin (Bos taurus), β-lactoglobulin (Bos taurus), lysozyme (Gallus gallus), apo-transferrin (Homo sapiens). The results acquired by following our standardised protein identification procedure that comprises LC-MS/MS and the database search algorithm Mascot™ 2.0 are shown in tables 1 and 2. These data prove that all expected proteins have been identified. However, several homologous proteins are detected, because the complete database SwissProt was used for identification. In the tables 1 and 2, all peptides belonging to more than one protein are marked in red. To visualize the unique and non-unique peptides of a protein, an example is shown in figure 4.

Table 1 Identified proteins of the 1:1 sample mix

Full size table

Table 2 Identified proteins of the 1:3 sample mix.

Full size table

The list of identification was then submitted to quantification by Quant. As only unique peptides can be used for reliable quantification, the software Quant implements a filter that removes all non-unique peptides. In a real non-standard sample this is necessary as otherwise protein isoforms neither can be distinguished correctly nor quantified in a reliable manner (see tables 3 and 4 as well as tables 5 and 6, respectively).

Table 3 Quantification results of the sample 1:1 mix.

Full size table

Table 4 Quantification results of the sample 1:1 mix.

Full size table

Table 5 Quantification results of the sample 1:3 mix.

Full size table

Table 6 Quantification results of the sample 1:3 mix.

Full size table

Running the software Quant with this filter, only quantification results for the proteins BGAL_ECOLI, TRFE_HUMAN, ALBU_BOVIN were calculated as for the other proteins only non-unique peptides were detected. These quantification results are presented in tables 4 and 6. In the case of using a known standard protein mix with proteins from different organisms, the non-unique peptides are accessible by deactivation of that filter. This can be avoided in a real sample because the organism is usually known and the database search can be accomplished with a database only containing the proteins of this organism or by using a taxonomy filter as supported by Mascot™. The quantification results obtained by not applying the filter for unique and non-unique peptides are summarized in tables 4, 6 and 3, 5, respectively. In these tables not only the results from our software Quant are listed, but additionally the output of the software ProQuant™ that implements no restriction to only unique peptides. Data obtained from Mascot™ 2.2 are presented, too. The absolute protein quantification ratios yielded by Quant, Mascot™ 2.2, and ProQuant™ are comparable. As shown in tables 3 and 5, including non-unique peptides distorts the quantification results. The experiment error of Quant (bias of ProQuant™) indicates the overall protein mixing ratio. The protein ratio results are normalized by this factor. The visualization of the protein results for BGAL_ECOLI, TRFE_HUMAN and ALBU_BOVIN is shown in figures 5, 6, 7, 8, 9. No peptides were detected that underwent N-terminal cyclation.

Discussion

Comparison with other software tools

In contrast to other software used for peptide quantification that applies trapezoid or other methods of integration for area calculation, we decided to introduce the sum of intensities in MS-based quantification. We have shown that integration implies changes in relative quantification of peptides and proteins, see figure 3. This yields similar changes when absolute quantification is performed. The effect depends on the precision, resolution and calibration of the mass spectrometer, but is not zero. Consequently, Quant is able to cope with just one signal per iTRAQ™ reporter ion. For the integration of peak areas, we introduced a minimum peak width, in order to provide this feature in that context. The sum of the signal intensities reflects the ion count recorded by the mass spectrometer more precisely than an integrated peak area, as shown in figure 3. Moreover, when summing up intensities the problem of just one reporter signal is not existent. The peaks are filtered by applying a threshold for the peak intensity. This is an option for the user, as the noise in mass spectra depends on the mass spectrometer that is used.

We improved the error estimation of other approaches [14] by adding precise error indication. Instead of taking only the maximum peak intensity as a basis of error estimation that has been formerly proposed [14], we use all peaks belonging to an iTRAQ™ reporter for precise error calculation. Additionally, we propagate the implications of the purity correction on the error estimation. When relative quantification is calculated, we propagate the estimated errors and use them for calculation of a quantification error. This is the maximum possible error and can be used as a quality indicator. If reporter peaks are missing for a label, the relative quantification cannot be performed. Thus, no zero values appear in the peptide ratio lists of the proteins and the log-transformation can be performed in all cases.

Multiple MS/MS spectra belonging to the same peptide sequence are not merged to one quantification value. We regard them as single measurements that are analyzed separately. Thus by using Quant, modified and unmodified peptides can be distinguished. Moreover, modified peptides might appear as outliers of the boxplot and can be analyzed separately. Some examples of this are included in the figures 5, 6, 7, 8, 9. If outliers are detected, the amino acid sequence should be analyzed in detail, and in some cases a new database search should be performed in order to confirm these sequences and to seek out further post-translational modifications, e.g. non iTRAQ™ labelled peptides because of N-terminal cyclation or acetylation of primary amino groups.

Quant uses MS/MS data extracted by other means and therefore is independent of any peak-picking method. Moreover it is independent of the mass spectrometer controlling software.

In comparison with Peakardt.FindPairs [31] that uses the mean value of peptide ratios for protein quantification, we use the median. This is statistically sound and correct, as peptide ratios are lognormal distributed (see figure 2) and therefore the mean value does not equal the median. Moreover, the median is robust against outliers that would have effects on the mean value. Therefore, there is no need to eliminate or to reject outliers. Moreover, Quant is able to point the user to outliers that should be analyzed further.

As a numeric tool for estimation of the quantification quality, we introduce the root-mean-square value (RMS) into protein quantification. This value is calculated from the relative errors of the peptide ratios. In contrast to the quality estimation by applying the standard deviation to the non-transformed data, the RMS is independent of the number of data points. Calculation of the standard deviation requires sufficient data points for doing a precise assumption on the underlying distribution of the data. Other tools, such as Peakardt.FindPairs [31], use the standard deviation σ of the non-transformed data as a quality measure. That approach uses σ and the Tschebyscheff-equation as basis for identification of outliers. This is needed for Peakardt.FindPairs, because the mean value is used as a parameter for protein quantification, which is sensitive to outliers. If the median would have been chosen, this problem would not occur.

Mascot™ 2.2 provides an analysis of iTRAQ™ data that is described online [32]. The lognormal distribution is employed. We could show that peptide ratios are from lognormal distribution and in consequence the use of the Shaprio-Wilk-test is the appropriate choice. We suggest not to rely on data with less than 5 observations when using this test, an upper limit for this procedure does not exist [33]. Mascot™ does not provide an experiment error or a bias within the result display. We could show that Mascot™ 2.2 bases on statistically correct and appropriate assumptions, concerning the iTRAQ™ evaluation.

ProteinPilot™ itself uses the same statistical approach as ProQuant™, but restricts peptides to unique ones. According to the information available with the trial version, the software estimates the experiment error (bias correction) with at least 20 protein ratios, although the median is applied. In contrast to Quant that makes use of the median and the LSE, ProteinPilot™ calculates the protein ratio by a weighted average. Similar to our approach, ProteinPilot™ yields a quality measure that is derived from the 95% confidence interval (error factor) which is calculated from the standard deviation in logspace.

In contrast to other quantification software that often are restricted to the use of only one protein identification algorithm, such as Mascot™ (Mascot™ 2.2), Sequest™ (RelEx – no iTRAQ™ capability), ProID™ (ProQuant™) or Paragon™ (ProteinPilot™), our method named Quant is independent of the identification algorithm. Moreover, Quant implements the purity correction including error propagation and precise error estimation. Additionally, we present reasons on an appropriate manner of intensity calculation as preprocessing for peptide ratio analysis.

The data presented in tables 3, 4 suggest that integration of non-unique peptides into calculation of protein quantification impairs the results in a negative way. The results generated by Quant are consistent with those produced by ProQuant™ as well as with Mascot™ 2.2. Because of the precise error propagation and the adequate visualization, the data obtained by using Quant is reliable.

Conclusion

We have shown that relative quantification can be performed on data generated by tandem MS and iTRAQ™. We presented an analyzing method named Quant capable of calculating precise data, what has been shown by application to the protein standard mix supplied with the iTRAQ™ kit. The protein ratios of this standard have been calculated precisely from MS/MS spectra of the identification results.

We showed that restriction of the data evaluation to unique peptides is the only way of obtaining reliable quantification results at the protein level. Identification of unique peptides can be easily automated. Moreover, Quant is independent of the underlying protein identification software.

We have shown that a lognormal distribution fits the data of relative peptide quantification by applying the Shapiro-Wilk-test on the log-transformed data. Outliers can be identified by applying proper means of statistical tools, i.e. distribution analysis, boxplot, median, LSE and RMS. These are helpful as quality measures. We replaced peak area integration by sum of intensities, yielding reliable quantification results.

The methods presented here scale well with the protein and peptide ratios. The quality of the results yielded by Quant are not dependant of the peptide or protein ratios, but rather depend on the quality of the MS/MS experiment as well as on the protein identification and the MS/MS spectra, especially the scale of signal intensities is important. Therefore, and proven by the statistically sound system, the dynamic range of Quant is not limited by the inherent methods in comparison to the instrumental methods. Moreover, Quant provides a precise quality measure of the protein quantification by the RMS value.

The presented method is expandable to the 8-plex iTRAQ™ [34] as it is independent of the number of different labels.

Our data analysis method is more robust than other published software tools. Quant demonstrates improvements in peptide and protein quantification using iTRAQ™. Precise quantification data can be obtained when using error propagation and adequate visualization in conjunctions with consideration of an experiment error. Quant is shown to generate results that are consistent with those produced by ProQuant™ and Mascot™ 2.2, thus validating these systems.

Availability and requirements

The MATLAB^® program scripts are freely available upon request from the authors and freely available via http://www.protein-ms.de and http://sourceforge.net/projects/protms/ under the GNU Lesser General Public License. A MATLAB^® installation is required for executing the scripts.

Abbreviations

Å:: Angström
ABI:: Applied Biosystems/MDS-Sciex
ACN:: acetonitrile
amu:: atomic mass unit
BCA:: bicinchoninic acid
Da:: Dalton
DIGE:: difference gel electrophoresis
DNA:: desoxyribonuclein acid
DTA:: file format for MS/MS spectrum data
EE:: experiment error
EF:: error factor of ProQuant™
FA:: formic acid
ID:: inner diameter
Id:: database identifier of a protein
IQR:: inter-quartile range
iTRAQ™:: isobaric tag for relative and absolute quantitation
LSE:: least squares estimator
μm:: micrometre
mm:: millimetre
mM:: millimolar
MMTS:: methyl methanethiosulfonate
MS:: mass spectrometry
MS/MS:: tandem mass spectrometry
NHS:: N-hydroxy-succinimide
NN:: No value available
pVal:: p-value
RMS:: root-mean-square value
SCX:: strong cation exchange

References

Boehm AM, Grosse-Coosmann F, Sickmann A: Command Line Tool for Calculating Theoretical MS Spectra for Given Sequences. Bioinformatics. 2004, 20: 2889-2891. 10.1093/bioinformatics/bth328.
Article CAS PubMed Google Scholar
NCBI: National Center for Biotechnology Information. [http://www.ncbi.nih.gov/]
Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Mischoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT Protein Knowledgebase and Its Supplement TrEMBL in 2003. Nucleic Acids Research. 2003, 31: 365-370. 10.1093/nar/gkg095.
Article PubMed Central CAS PubMed Google Scholar
Wilkins MR, Sanchez JC, Gooley AA, Appel RD, Humphery-Smith I, Hochstrasser DF, Williams KL: Progress with Proteome Projects: Why All Proteins Expressed by a Genome Should Be Identified and How to Do It. Biotechnology and Genetic Engineering Reviews. 1996, 19-50.
Google Scholar
Krijgsveld J, Heck AJR: Quantitative Proteomics by Metabolic Labelling with Stable Isotopes. Drug Discovery Today. 2004, 3: S11-S15. 10.1016/S1741-8372(04)02420-X.
Article CAS Google Scholar
Gygi SP, Rist B, Gerber SA, Turecek F, Gelb MH, Aebersold R: Quantitative Analysis of Complex Protein Mixtures Using Isotope-Coded Affinity Tags. Nature Biotechnology. 1999, 17: 994 -9999. 10.1038/13690.
Article CAS PubMed Google Scholar
Cavalieri D, Filippo CD: Bioinformatic Methods for Integrating Whole-Genome Expression Results into Cellular Networks. Drug Discovery Today. 2005, 10: 727-734. 10.1016/S1359-6446(05)03433-1.
Article CAS PubMed Google Scholar
Aebersold R, Mann M: Mass Spectrometry-Based Proteomics. Nature. 2003, 422: 198-207. 10.1038/nature01511.
Article CAS PubMed Google Scholar
Aebersold R, Goodlett DR: Mass Spectrometry in Proteomics. Chemical Reviews. 2001, 101: 269-295. 10.1021/cr990076h.
Article CAS PubMed Google Scholar
Pütz S, Reinders J, Reinders Y, Sickmann A: Mass Spectrometry-Based Peptide Quantification: Applications and Limitations. Expert Review of Proteomics. 2005, 2: 381-392. 10.1586/14789450.2.3.381.
Article PubMed Google Scholar
Ross PL, Huang YN, Marchese JN, Williamson B, Parker K, Hattan S, Khainovski N, Pillai S, Dey S, Daniels S, Purkayastha S, Juhasz P, Martin S, Bartlet-Jones M, He F, Jacobson A, Pappin DJ: Multiplexed Protein Quantitation in Saccharomyces cerevisiae Using Amine-reactive Isobaric Tagging Reagents. Molecular & Cellular Proteomics. 2004, 3: 1154-1169. 10.1074/mcp.M400129-MCP200.
Article CAS Google Scholar
Ong SE, Mann M: Mass Spectrometry-Based Proteomics Turns Quantitative. Nature Chemical Biology. 2005, 1: 252-262. 10.1038/nchembio736.
Article CAS PubMed Google Scholar
Boehm AM, Galvin RP, Sickmann A: Extractor for ESI Quadrupole TOF Tandem MS Data Enabled for High Throughput Batch Processing. BMC Bioinformatics. 2004, 5:
Google Scholar
Shadforth IP, Dunkley TPJ, Lilley KS, Bessant C: i-Tracker: For quantitative proteomics using iTRAQ(TM). BMC Genomics. 2005, 6:
Google Scholar
Wu KP, Lin WT, Hung WN, Yian YH, Chen YR, Chen YJ, Sung TY, Hsu WL: MassTRAQ: A Fully Automated Tool for iTRAQ-labeled Protein Quantification: ; Stanford, USA. Edited by: Martin DC. 2005, IEEE Computer Society, 157-158.
Google Scholar
Applied Biosystems: Applied Biosystems iTRAQ™ Reagents Amine-Modifying Labeling Reagents for Multiplexed Relative and Absolute Protein Quantitation - Protocol. [http://docs.appliedbiosystems.com/pebiodocs/04350831.pdf]
Grosse-Coosmann F, Boehm AM, Sickmann A: Efficient Analysis and Extraction of MS/MS Result Data from Mascot™ Result Files. BMC Bioinformatics. 2005, 6:
Google Scholar
Zahedi RP, Sickmann A, Boehm AM, Winkler C, Zufall N, Schönfisch B, Guiard B, Pfanner N, Meisinger C: Proteomic Analysis of the Yeast Mitochondrial Outer Membrane Reveals Accumulation of a Subclass of Preproteins. Molecular Biology of the Cell. 2006, 17: 1436-1450. 10.1091/mbc.E05-08-0740.
Article PubMed Central CAS PubMed Google Scholar
Boehm AM, Sickmann A: A Comprehensive Dictionary of Protein Accession Codes for Complete Protein Accession Identifier Alias Resolving. Proteomics. 2006, 6: 4223-4226. 10.1002/pmic.200600018.
Article CAS PubMed Google Scholar
Jung K, Gannoun A, Sitek B, Apostolov O, Schramm A, Meyer HE, Stühler K, Urfer W: Statistical Evaluation Of Methods For The Analysis Of Dynamic Protein Expression Data From A Tumor Study. REVSTAT – Statistical Journal. 2006, 4: 67–80-
Google Scholar
Jung K, Gannoun A, Sitek B, Meyer HE, Stühler K, Urfer W: Analysis of Dynamic Protein Expression Data. REVSTAT Statistical Journal. 2005, 3: 99-111.
Google Scholar
MacCoss MJ, Wu CC, Liu H, Sadygov R, Yates JR: A Correlation Algorithm for the Automated Quantitative Analysis of Shotgun Proteomics Data. Analytical Chemistry. 2003, 75: 6912 -66921. 10.1021/ac034790h.
Article CAS PubMed Google Scholar
Patwardhan AJ, Strittmatter EF, Camp DG, Smith RD, Pallavicini MG: Quantitative Proteome Analysis of Breast Cancer Cell Lines Using 18O-Labeling and an Accurate Mass and Time Tag Strategy. Proteomics. 2006, 6: 2903–2915-10.1002/pmic.200500582.
Article PubMed Google Scholar
Kolkman A, Daran-Lapujade P, Fullaondo A, Olsthoorn MMA, Pronk JT, Slijper M, Heck AJR: Proteome Analysis of Yeast Response to Various Nutrient Limitations. Molecular Systems Biology. 2006, 2:
Google Scholar
Smith PK: Measurement of Protein Using Bicinchoninic Acid. 1987, US Patent 4839295, Pierce Chemical Company
Google Scholar
Smith PK, Krohn RI, Hermanson GT, Mallia AK, Gartner FH, Provenzano MD, Fujimoto EK, Goeke NM, Olson BJ, Klenk DC: Measurement of Protein Using Bicinchoninic Acid. Analytical Biochemistry. 1985, 150 (1): 76-85. 10.1016/0003-2697(85)90442-7.
Article CAS PubMed Google Scholar
Bradford MM: A Rapid and Sensitive Method for the Quantitation of Microgram Quantities of Protein Utilizing the Principle of Protein-Dye Binding. Analytical Biochemistry. 1976, 72: 248-254. 10.1016/0003-2697(76)90527-3.
Article CAS PubMed Google Scholar
Sapan CV, Lundblad RL, Price NC: Colorimetric Protein Assay Techniques. Biotechnol Appl Biochem. 1999, 29: 99-108.
CAS PubMed Google Scholar
Perkins DN, Pappin DJC, Creasy DM, Cottrell JS: Probability-Based Protein Identification by Searching Sequence Databases Using Mass Spectrometry Data. Electrophoresis. 1999, 20: 3551-3567. 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2.
Article CAS PubMed Google Scholar
Eng JK, McCormack AL, Yates JR: An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database. Journal of the American Society for Mass Spectrometry. 1994, 5: 976-989. 10.1016/1044-0305(94)80016-2.
Article CAS PubMed Google Scholar
Reidegeld KA, Franke G, Hebeler R, Wiese S, Oeljeklaus S, Lakhal B, Meyer HE, Warscheid B: Peakardt.FindPairs - A Universal Software for Protein Quantitation via Stable Isotope-Labeling through Mass Spectrometry: ; München. 2005, , S30-
Google Scholar
Matrix Science Ltd.: Quantitation: Statistical procedures. [http://www.matrixscience.com/help/quant_statistics_help.html]
Royston P: Approximating the Shapiro-Wilk W-Test for Non-Normality. Statistics and Computing. 1992, 2: 117-119. 10.1007/BF01891203.
Article Google Scholar
Applied Biosystems: Multiplex Protein Quantitation using iTRAQ™ Reagents - 8plex - Publication 114PB15-01. [http://docs.appliedbiosystems.com/search-dodnum.taf?dodnum=116320]
Applied Biosystems: Using Pro Group Reports. [http://docs.appliedbiosystems.com/pebiodocs/00113913.pdf]

Download references

Acknowledgements

This work was supported by the Deutsche Forschungsgemeinschaft (FZT 82).

Author information

Authors and Affiliations

Rudolf Virchow Center, DFG Research Center for Experimental Biomedicine, University of Wurzburg, (Protein Mass Spectrometry and Functional Proteomics), Wurzburg, D-97078, Germany
Andreas M Boehm, Stephanie Pütz & Albert Sickmann
Institute of Mathematics, University of Wuerzburg, Am Hubland, D-97074, Wuerzburg, Germany
Daniela Altenhöfer & Michael Falk

Authors

Andreas M Boehm
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie Pütz
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Altenhöfer
View author publications
You can also search for this author in PubMed Google Scholar
Albert Sickmann
View author publications
You can also search for this author in PubMed Google Scholar
Michael Falk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Andreas M Boehm, Albert Sickmann or Michael Falk.

Additional information

Authors' contributions

AB initiated the project and implemented the program in the laboratory. DA implemented the MATLAB^® scripts. MF and DA introduced precise error estimations and statistics to the project. AS and SP conducted the experiments and contributed with ideas and discussions. AB, SP and DA contributed equally to the manuscript. All authors have read and approved the final manuscript.

Electronic supplementary material

12859_2007_1586_MOESM1_ESM.zip

Additional File 1: Archive containing the MATLAB^® scripts. This file contains the MATLAB^® scripts of Quant that can be executed with MATLAB^®. (ZIP 17 KB)

12859_2007_1586_MOESM2_ESM.pdf

Additional File 2: Documentation of the MATLAB^® scripts. This file contains the full documentation of the Quant scripts and explains the usage of the MATLAB^® scripts. (PDF 66 KB)

12859_2007_1586_MOESM3_ESM.zip

Additional File 3: Archive containing the example data. This file contains the example data, that can be processed with the Quant scripts. (ZIP 416 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Boehm, A.M., Pütz, S., Altenhöfer, D. et al. Precise protein quantification based on peptide quantification using iTRAQ™. BMC Bioinformatics 8, 214 (2007). https://doi.org/10.1186/1471-2105-8-214

Download citation

Received: 05 February 2007
Accepted: 21 June 2007
Published: 21 June 2007
DOI: https://doi.org/10.1186/1471-2105-8-214

Precise protein quantification based on peptide quantification using iTRAQ™

Abstract

Background

Results

Conclusion

Background

Methods

Experiments

Sample preparation

Protein identification

Error estimation and error propagation

Purity correction of iTRAQ™ labels and error estimation

Quantification of proteins

Implementation

Results

Peptide quantification based on fragment ion spectra

Protein quantification and visualization

Experimental results

Discussion

Comparison with other software tools

Conclusion

Availability and requirements

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Authors' contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us