Department of Computer Science, Saarland University, Saarbrücken, Germany

Center for Bioinformatics, Saarland University, Saarbrücken, Germany

Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-AlzetteLuxembourg

Division for Systematic Proteome Research, Institute for Experimental Medicine, , Kiel, Germany

Institut für Informatik, Johannes-Gutenberg -Universität, Mainz, Germany

Abstract

Background

The robust identification of isotope patterns originating from peptides being analyzed through mass spectrometry (MS) is often significantly hampered by noise artifacts and the interference of overlapping patterns arising e.g. from post-translational modifications. As the classification of the recorded data points into either ‘noise’ or ‘signal’ lies at the very root of essentially every proteomic application, the quality of the automated processing of mass spectra can significantly influence the way the data might be interpreted within a given biological context.

Results

We propose non-negative least squares/non-negative least absolute deviation regression to fit a raw spectrum by templates imitating isotope patterns. In a carefully designed validation scheme, we show that the method exhibits excellent performance in pattern picking. It is demonstrated that the method is able to disentangle complicated overlaps of patterns.

Conclusions

We find that regularization is not necessary to prevent overfitting and that thresholding is an effective and user-friendly way to perform feature selection. The proposed method avoids problems inherent in regularization-based approaches, comes with a set of well-interpretable parameters whose default configuration is shown to generalize well without the need for fine-tuning, and is applicable to spectra of different platforms. The

Background

Mass spectrometry (MS), often in conjunction with high performance liquid chromatography (HPLC), is the de-facto standard analytical tool to derive important biological knowledge about the protein content of whole cells, organelles, or biomedical samples like tumour or blood plasma. Within a typical experimental setup, purified proteins of the sample under study are digested by an enzyme. Before entering the mass spectrometer, peptides are separated chromatographically according to their physico-chemical properties in order to avoid a massive overlapping of peptide signals within a single scan. Nevertheless, due to the sheer number of peptides present in a sample, interfering patterns still occur frequently, not least because of post-translational modifications such as the deamidation of asparagines or glutamine residues. In order to obtain an unambiguous assignment of the signals, and in particular their isotope patterns, which is a prerequisite for a proper identification and quantification, every data point in _{1}-regularized non-negative least squares. Without non-negativity constraints, this procedure is known as the lasso _{1}-regularization is the method of choice. In the present paper, we argue for a deviation from that paradigm mainly in view of the following two aspects. First, a major benefit of our fitting+thresholding approach is that parameter choice is more user-friendly, since the threshold can be interpreted in terms of a signal-to-noise ratio. This is unlike the regularization parameter of the lasso, which can in general not be related directly to the signal. In the presence of heterogeneous noise and model misspecifications, the ‘right amount’ of regularization is notoriously difficult to choose. Second, there is a substantial body of work showing that non-negativity constraints alone may suffice to recover a sparse target. Non-negative least squares + thresholding is analyzed in _{1}-approach with respect to sparse recovery. See Section “Sparse recovery with non-negativity constraints: non-negative least squares + thresholding vs. the non-negative lasso” for a detailed discussion.

Methods

A spectrum is understood as a sequence of pairs _{
i
}=_{
i
}/_{
i
} is a mass (_{
i
}, measured in Dalton _{
i
}), and _{
i
}
is the intensity, i.e. the abundance of a particular mass (modulo charge state), observed at _{
i
},

Template model

The

where **
Φ
** is a non-negative matrix of templates and **
β
**

where the _{
c,j,k
}
are functions representing a single peak within an isotope pattern, depending on a location _{
c,j
} and a parameter vector **
θ
**

In (2), the nonnegative weights _{
c,j,k
}
equal the height of the isotopic peak _{
c,j,k
} are calculated from _{
c,j
}
as _{
c,j,0}=max_{
k
}
_{
c,j,k
}) is taken as characteristic location of the template instead of using the finally reported monoisotopic position: we set _{
c,j,0}=_{
c,j
} so that the remaining _{
c,j,k
}, _{
c,j
} in both directions along the _{
x
}
_{
c,j
}(**
β
**

Template model.

**Template model.** Illustration of the template construction (charge state

Parameter estimation

The parameters _{
c,j
}=(_{
c,j
}
_{
c,j
}
_{
c,j
})^{⊤}
of the peaks (3) are unknown in practice. Following a central paradigm of our framework, which is to relieve the user of performing laboursome fine-tuning of parameters, we have developed a systematic procedure automatically providing estimates of these parameters, which is considerably more efficient and flexible than a grid search. For instance, the parameters may additionally depend on the

In a first step, we apply a simple peak detection algorithm to the spectrum to identify disjoint regions

yielding an estimate _{
l
} of **
θ
** as a linear combination of known functions

for which a linear trend i.e. _{
l
}(_{
l,1} + _{
l,2}

Parameter estimation.

**Parameter estimation.** Illustration of peak parameter estimation. The figure displays a well-resolved peak in the region _{i},_{i})}
that enter a nonlinear least squares problem of the form (4). Under the assumption of an EMG model, the resulting fit is indicated by a solid line.

We refrain from using least squares regression to determine the parameters in (5) due to its sensitivity to possible outliers, which arise from poorly resolved, wiggly or overlapping isotope patterns, which may affect the quality of the estimates _{
l,m
}},

Template fitting

The computation of the design matrix **
Φ
**
requires a set of _{1},_{
n
}]. We instead restrict ourselves to a suitable subset of the set _{
i
}
falling into a sliding window of fixed width around a specific position. For _{1},_{
n
}], we define the local noise level based on sliding window width

Given the LNL, we place templates at position _{
i
}
(one for each charge state) if the corresponding _{
i
}
exceeds LNL(_{
i
})
by a factor **
Φ
**
according to Eqs. (1) and (2). In the fitting step, we compute a non-negative least squares (

The optimization problem (7) is a quadratic (

Comparison with pepex

In prior work **
Φ
** is not constructed from the convolution of isotope distributions and peak shapes as described in Section “Template model”. Instead, peak detection is applied first to reduce the raw intensity data to peak clusters, a step that is usually referred to as centroiding. At the second stage, called de-isotoping, peak clusters are fitted by a design matrix containing isotope distributions themselves, not convolved versions. While the approach is computationally more attractive and avoids estimation of peak shape parameters (cf. Section “Parameter estimation”), the division into centroiding and de-isotoping may lead to poor performance for low resolution and noisy data, or in the presence of overlapping patterns. In these cases, peak detection is little reliable. In our template-based approach, there is no separation of centroiding and de-isotoping. It performs much better in the aforementioned cases, since it operates directly on the data and is hence less affected if single peaks of a pattern are difficult to detect. This reasoning is supported by our evaluation in Section “Results and discussion” as well as that in **
Φ
** directly represent isotope distributions instead of isotopic patterns.

Postprocessing and thresholding

While indeed a considerable fraction of the entries of _{
l
}
_{
u
}
of maximum signal, non-negative least squares fitting attributes weights

Peak splitting.

**Peak splitting.** Lower panel: Non-negative least squares fit of the sampled signal with and without postprocessing. Upper panel: Solution path of the non-negative lasso for the same data.

To a large extent, ‘peak splitting’ can be corrected by means of the following merging procedure, which we regard as postprocessing of the fitting step (7) and which we apply prior to thresholding. Given an estimate

1. Separately for each

2. With the notation of Eq. (2), for each _{
c
}, we solve the following optimization problem.

with the aim to find a location _{
c,g
} approximating the fit of the most intense peaks

3. One ends up with sets

The additional benefit of solving (8) in step two as compared to the selection of the template with the largest coefficient within each group as proposed in _{
c,g
} and _{
c,g
}.

All candidate positions

where _{
i
},

The idea underlying this procedure is that in noise regions, the fit to the data will be poor, and consequently, the size of the residuals is expected to be large relative to the signal, hence leading to a low goodness-of-fit statistic. The truncation at 0.5 limits the influence of this correction. A final list is generated by checking whether the signal-to-noise ratios (9) exceed a ‘significance threshold’

Finding a set of default parameters

Apart from the signal-to-noise threshold

Sparse recovery with non-negativity constraints: non-negative least squares + thresholding vs. the non-negative lasso

We believe that our preference for the first alternative is a major methodological contribution that has potential to impact related problems where non-negativity problems come into play. In the present section, we provide, at a high level, a series of arguments rooting in the statistics and signal processing literature that clarify our contribution and support our preference.

Linear models and usual paradigms in statistics

The fact that we favour non-negative least squares + thresholding may seem implausible since it questions or partially even contradicts paradigms about high-dimensional statistical inference. Consider the linear model

which corresponds to model (1), where ‘≈’ is used instead of ‘=’ to account for stochastic noise or model misspecifications. Linear models of the form (10) have been and continue to be objects of central interest in statistical modelling.

• Classical work in statistics shows that under mild conditions if the number of sample

• Since many contemporary datasets, like the MS datasets of the present paper, are characterized by a large **
β
**

The second bullet provides quite some justification for

The power of non-negativity constraints

• It turns out that the non-negativity constraint **
β
**≥

• There are several recent papers **
β
**

One should bear in mind that the non-negativity constraints are essential for our approach. Thresholding the unconstrained ordinary least squares estimator

Shortcomings of _{1}-regularization in theory

In **
β
**

Shortcomings of _{1}-regularization in practice

The study in

Moreover, when using _{1}-regularization, data fitting and model selection are coupled. While this is often regarded as advantage, since model selection is performed automatically, we think that it is preferable to have a clear separation between data fitting and model selection, which is a feature of our approach. Prior to thresholding, the output of our fitting approach gives rise to a ranking which we obtain without the necessity to specify any parameter. Selection is completely based on a single fit simply by letting the the threshold vary. On the contrary, if one wants to reduce the number of features selected by the lasso, one resets the regularization parameter and solves a new optimization problem. Note that it is in general not possible to compute the entire solution path of the lasso **
Φ
** is in the ten thousands so that the active set algorithm of

Results and discussion

For the assessment of the pattern picking performance, in total eight spectra generated by two different ionization methods, matrix assisted laser desorption/ionization (MALDI) and electrospray ionization (ESI), respectively, form the basis of the evaluation. While MALDI has been coupled to a time-of-flight (TOF) mass analyzer, ESI MS spectra have been recorded on both a linear ion trap (LTQ) and an Orbitrap mass analyzer. In addition, a series of spectra were prepared with the aim of investigating in detail the method’s performance in the presence of overlapping peptides.

Datasets

For MALDI mass spectra (Additional file

To demonstrate explicitly the method’s ability to separate strongly overlapping patterns even in the case of badly resolved signals, 22 additional spectra have been generated in positive ion mode on a Bruker Daltonics HCT Ultra Ion Trap MS with an electrospray ion source. Three synthetic peptides (cf. Section “Unmixing of overlaps” for details) with sequences corresponding to tryptic peptides from bovine serum albumin (BSA) were used as analytes. In each measurement two out of three peptides were mixed in different ratios to get overlapping peptide signals, also with different charge states. Two different concentrations (500 fmol/

**MALDI-TOF spectra.**

Click here for file

**ESI spectra.**

Click here for file

Validation strategy

Validation of pattern picking is notoriously difficult, because a gold standard which is satisfactory from both statistical and biological points of view is missing. In this context, a major problem one has to account for is that spectra frequently contain patterns whose shape is not distinguishable from those of peptides, but which are in fact various artifacts resulting e.g. from impurities during sample preparation and measurement. These artifacts do not constitute biologically relevant information and are, in this sense, ‘false positives’. An important instance are signals derived from the matrix (or from matrix-clusters) frequently observed in MALDI MS. The pattern of these signals is similar to that of peptides; nevertheless, due to their molecular composition, which differs significantly from that of an average peptide, the exact masses can be used to exclude these signals from the data analysis. On the other hand, from a statistical perspective which judges a method according to how well it is able to detect specific patterns in a given dataset, a qualification as ‘true positive’ is justified. With the aim to unify these aspects, we have worked out a dual validation scheme. In order to reduce the number of artifacts, all automatically generated lists of candidates for peptide masses as well as the lists of a human expert (see below) are postprocessed by a peptide mass filter ^{a} are used for subsequent evaluation.

Comparison with manual annotation

The first part investigates how well a method is able to support a human expert who annotates the spectra manually. More specifically, the automatically generated lists are matched to the manual annotation such that an entry of the list (potential peptide mass) is declared ‘true positive’ whenever there is a corresponding mass in the manual annotation deviating by no more than _{
m/z
} between neighboring data points in _{
m/z
}, we can derive the following tolerance values: ^{b} and

As the performance of our as well as those of all competing methods depends on a threshold-like parameter governing, crudely speaking, the trade-off between precision and recall, we explore the performance for a range of reasonable parameter values, instead of fixing an (arbitrary) value, which we believe to be little meaningful. The results are then visualized as ROC curve, in which each point in the (Recall, Precision)-plane corresponds to a specific choice of the parameter. Formally, we introduce binary variables {_{
i
}(_{
i
}(

Database query

The second part evaluates the lists in terms of a query to the Mascot search engine

**Evaluation and results.**

Click here for file

Competing methods

We compare our method in its two variants depending on the choice of the fitting criterion (cf. Eq. (7)), labelled _{1}
(_{2}
(

Lasso

The ‘lasso’ method in this paper serves as surrogate for NITPICK. Since the ‘lasso’ is embedded into our framework while implementing a methodology that closely resembles NITPICK, we use the ‘lasso’ for the sake of convenience, to avoid an involved parameter optimization for NITPICK. Our lasso implementation benefits from the improved merging procedure of Section “Postprocessing and thresholding”. To accomodate a heterogeneous noise level,

where **
W
** is a diagonal matrix with entries

Pepex

As discussed in Section “Template fitting”, pepex performs centroiding and de-isotoping separately. De-isotoping is based on non-negative least squares. Since pepex is limited to detect patterns of charge state one, its performance is only assessed for MALDI-TOF spectra. Accordingly, when comparing the ouptput of pepex with the manual annotation, the few patterns of charge state two are excluded. The parameters

Isotope wavelet

As opposed to our method, this approach is not able to handle overlaps. On the other hand, it typically shows strong performance in noisy and low intensity regions or on datasets with extremely low concentrations

Vendor

The parameter setting for the ABI MALDI-TOF/TOF MS software was as follows: Local Noise Width (

Results

Manual annotation vs. database query

When inspecting Figures _{1} and _{2}) yield a significant improvement, which does not become apparent from the database query. This is because only a fraction of the manual annotation is actually confirmed by the database query. The part which is not matched likely consists of artifacts due to contamination or chemical noise as well as of specific modifications not captured by the database query. In light of this, our dual validation scheme indeed makes sense.

Results for pattern picking, MALDI-TOF.

**Results for pattern picking, MALDI-TOF.** Pattern picking performance for the MALDI-TOF spectra as described in Section “Datasets”. The points in the (Recall,Precision)-plane correspond to different choices of a method-specific threshold(-like) parameter.

Results for pattern picking, ESI.

**Results for pattern picking, ESI.** Pattern picking performance for the ESI spectra as described in Section “Datasets”. The points in the (Recall,Precision)-plane correspond to different choices of a method-specific threshold(-like) parameter.

**MALDI Myo 500 fmol**

**score**

**cvrg**

**hits**

**score**

**cvrg**

**hits**

Corresponding Mascot results for the data shown in Figures

_{1}

211.0

**0.85**

0.94

96.8

**0.96**

0.04

_{2}

211.0

**0.85**

0.94

49.6

**0.96**

0.04

lasso

207.0

**0.85**

**1.00**

142.0

0.91

**0.37**

pepex

**223.0**

**0.85**

**1.00**

142.0

0.90

0.17

vendor

**223.0**

**0.85**

0.94

**174.0**

0.90

0.29

wavelet

207.0

**0.85**

**1.00**

156.0

0.90

0.14

**MALDI Lys 500 fmol**

**score**

**cvrg**

**hits**

**score**

**cvrg**

**hits**

_{1}

167.0

**0.81**

0.57

133.0

**0.83**

**0.37**

_{2}

168.0

0.80

0.64

**144.0**

**0.83**

0.34

lasso

151.0

0.64

0.77

112.0

**0.83**

**0.37**

pepex

**172.0**

0.80

0.63

135.0

**0.83**

0.25

vendor

146.0

0.64

**0.75**

91.4

**0.83**

0.20

wavelet

127.0

0.58

**0.75**

113.0

0.81

0.20

**MALDI Myo 10 fmol**

**score**

**cvrg**

**hits**

**score**

**cvrg**

**hits**

_{1}

**211.0**

**0.85**

0.94

82.2

**0.95**

0.04

_{2}

207.0

0.74

**1.00**

109.0

0.90

0.14

lasso

195.0

0.77

0.87

**146.0**

0.85

0.46

pepex

97.8

0.80

0.22

97.8

0.80

0.22

vendor

123.0

0.62

0.62

123.0

0.62

**0.62**

wavelet

131.0

**0.85**

0.13

131.0

0.85

0.13

**MALDI Lys 10 fmol**

**score**

**cvrg**

**hits**

**score**

**cvrg**

**hits**

_{1}

**89.0**

0.35

**1.00**

**73.7**

0.54

**0.23**

_{2}

**89.0**

0.35

**1.00**

35.4

0.72

0.09

lasso

81.9

**0.46**

0.70

46.0

0.74

0.10

pepex

47.1

0.17

**1.00**

31.2

0.53

0.12

vendor

62.7

0.23

**1.00**

43.2

0.34

0.16

wavelet

55.4

0.23

0.45

43.8

**0.82**

0.10

**Orbi Lys 1000 fmol**

**score**

**cvrg**

**hits**

**score**

**cvrg**

**hits**

_{1}

149.0

0.70

0.78

138.0

0.80

**0.53**

_{2}

139.0

**0.80**

0.50

**139.0**

0.80

0.50

lasso

**159.0**

0.63

**0.87**

120.0

**0.81**

0.29

wavelet

105.0

0.69

0.44

95.1

0.80

0.23

**IT Lys 1000 fmol**

**score**

**cvrg**

**hits**

**score**

**cvrg**

**hits**

_{1}

78.7

0.63

0.28

70.9

0.74

0.17

_{2}

82.1

0.72

0.36

35.4

0.85

0.13

lasso

103.0

**0.84**

0.33

**76.8**

**0.99**

**0.21**

wavelet

**107.0**

0.79

**0.63**

69.8

**0.99**

0.11

**Orbi Lys 250 fmol**

**score**

**cvrg**

**hits**

**score**

**cvrg**

**hits**

_{1}

107.0

0.63

0.50

100.0

0.80

0.31

_{2}

103.0

0.63

0.52

66.9

**0.81**

0.14

lasso

**108.0**

0.63

**0.77**

**107.0**

0.80

**0.27**

wavelet

80.6

**0.70**

0.22

80.6

0.70

0.22

**IT Lys 250fmol**

**score**

**cvrg**

**hits**

**score**

**cvrg**

**hits**

_{1}

59.4

0.46

0.16

59.4

0.46

0.16

_{2}

37.0

0.59

0.14

37.0

0.59

0.14

lasso

**66.3**

**0.84**

0.20

**66.3**

**0.84**

**0.20**

wavelet

56.3

0.59

**0.36**

21.3

0.75

0.12

Comparison

Figure _{1} and _{2}) throughout all MALDI-TOF spectra under consideration. For the myoglobin spectra high sequence coverages are attained that clearly stand above those of competing methods. For the spectra at 10 fmol, only the performance of lasso is competetive with that of our methods in terms of the Mascot score; all other competitors, including the vendor software which has been tailored to process these spectra, are significantly weaker. In particular, the strikingly high proportion of ‘hits’ (≥94%) indicates that even at moderate concentration levels, our methods still distinguish well between signal and noise. This observation is strongly supported by the ROC curves in Figure

For MALDI-TOF spectra at high concentration levels, pepex achieves the best scores and is competitive with respect to sequence coverage. However, the performance of pepex degrades dramatically at lower concentration levels, as it is unambiguously shown by both parts of the evaluation. In particular, the database scores are the worst among all methods compared. This provides some support for our reasoning at the end of Section “Template fitting”.

For the ESI spectra, our methods in total fall a bit short of the lasso (particularly for the ion trap spectra), but perform convincingly as well, thereby demonstrating that they can deal well with multiple charge states. This is an important finding, since the presence of multiple charges makes the sparse recovery problem as formulated in model (1) much more challenging, because the number of parameters to be estimated as well as the correlations across templates are increased. In spite of these difficulties, Figure

Additional remarks

• In Figure

• The fact that some of the ROCs start in the lower left corner results from outputs containing only false positives.

Unmixing of overlaps

Motivation

One of the main advantages of our method over more simplistic pattern picking methods is the ability to disentangle isotope patterns of overlapping peptide signals, whose presence may lead to a significantly more challening pattern picking problem as e.g. discussed in

Results

The peptides analyzed here in order to assess the performance of our approach were synthesized by means of Fmoc-solid phase peptide synthesis; sequences corresponding to tryptic peptides from bovine serum albumin (BSA) with the sequences listed in Table

**sequence**

**sequence residueno. in BSA**

**monoisotopic mass(protonated) / charge**

GACLLPK

198-204

351.20437 / +2

CCTKPESER

460-468

351.48816 / +3

VLASSAR

212-218

352.20850 / +2

In each measurement two out of the three listed peptides were mixed together in different ratios (Additional file

**Overlapping peptide signals.**

Click here for file

Unmixing of overlap.

**Unmixing of overlap.** Graphical representation of selected overlap problems as tabulated in Table

**peptides**

**351.2(2)/352.2(2)**

**351.4(3)/352.2(2)**

**351.2(2)/351.4(3)**

**proportion**

**1:1**

**1:5**

**5:1**

**1:1**

**1:5**

**5:1**

**10:1**

**1:1**

**1:5**

**5:1**

**1:10**

Results of the analyses of the series of spectra containing two overlapping target peptides. The first column contains the

fmol

500

x

x

−

−

x

−

x

−

−

−

x

all

1000

x

x

x

x

x

x

x

x

x

−

−

500

−

x

−

−

x

−

−

−

−

−

−

default

1000

x

x

−

x

−

x

x

−

x

−

−

Conclusion

We have proposed a template matching approach for feature extraction in proteomic mass spectra. The main methodological innovation is a framework for sparse recovery in which sparsity is not promoted explicitly by a regularization term, as it is usually done and was done in previous work. We fully exploit the strength of non-negativity constraints, which permits us to circumvent the delicate choice of a ‘proper’ amount of regularization, an ever-lasting problem in statistics, and to work with thresholding instead. The latter is not only computationally attractive, because one does not have to repeatedly solve the same optimization problem for different choices of the regularization parameter, but also increases user-friendliness, since the threshold is directly related to the signal-to-noise ratio, the quantity domain experts are interested in. The replacement of a regularization parameter by a threshold is a cornerstone in our conceptual design guided by the principle to relieve the user from laboursome fine tuning of parameters. We believe that a small set of well-interpretable parameters with suitable defaults additionally improves robustness and reproducibility of results. In this context, we would like to emphasize again that apart from the threshold, the user does not have to specify any parameters before running our software.

In a comprehensive experimental study involving instruments of varying resolution and spectra of varying concentration levels, where we comparatively assess the performance of our approach on the basis of an elaborate dual validation scheme, it is demonstrated that the performance for pattern picking is excellent for MALDI-TOF spectra and outstands due to its specificity in selecting signal and only little noise. A major strength of the method is its ability to unmix overlapping peptide signals as shown for a series of ESI spectra. In total, we demonstrate that our approach is broadly applicable to a variety of spectra. While our approach is guided by a concrete application in proteomics, the framework is general enough to be of much of use for related deconvolution problems emerging in other fields − only the templates have to be adjusted according to the specific application.

While in this paper, we have focused on single spectra, the approach can be extended to process whole LC-MS runs, as it has already been implemented in our

Concerning future directions of research, a question we have not yet answered in a satisfactory way is the choice of the fitting criterion. While both criteria (least squares and least absolute deviation) employed in this paper perform well, their implicit assumption of additive noise might be questionable

Endnotes

^{a}Monoisotopic peptide mass centers are modelled by: 1.000485·_{
n
} + 0.029, where _{
n
} denotes the nominal mass.^{b}For the MALDI-TOF lysozyme datasets an extended search tolerance of 100ppm was applied due to experimental miscalibration of the MS.

Appendix

Fitting with non-negativity constraints

In the following, we provide the details concerning optimization problem (7). In view of the special structure of **
Φ
**, (7) is computationally tractable even if **
Φ
**
and the Gram matrix **
Φ
**
^{⊤}
**
Φ
**, which is crucial in the computation, can conveniently be handled by using software for sparse matrices. For

Non-negative least squares

Consider the quadratic program

In order to solve (12), we use the so-called log-barrier method which amounts to solving a sequence of an unconstrained nonlinear convex problems in which the constraints _{
j
}≥0), _{
j
})/_{
j
}<0
and zero otherwise. Beginning with a moderately sized starting value for

using Newton’s method. The gradient and Hessian with respect to **
β
**, respectively, are given by

The Newton descent direction **
d
**

Solution of linear systems of this structure constitutes the main computational effort to be made. Fast solutions are obtained by using

Complexity analysis of non-negative least squares

We here provide the order of magnitude of floating points operations (flops) required per update (i.e. per Newton step) for the specific non-negative least squares problems considered for this paper. In our implementation, we exploit that the templates contained in the matrix **
Φ
**
are highly localized. As a result, after a suitable column permutation, the matrix **
Φ
**
^{⊤}
**
Φ
** is roughly a band matrix with bandwidth ^{2})
flops (e.g.

Non-negative least absolute deviation

Consider the optimization problem

Problem (15) can be recast as the following linear program.

For its solution, we use the log-barrier method sketched in the previous paragraph. After incorporating log-barrier terms for all constraints, the objectives of the unconstrained convex problems are of the form

where we have used the notational shortcuts

The gradients w.r.t. **
r
**
and

Introducing **
R
**=diag(

The linear system for the Newton descent directions reads

Note that **
d
**

Plugging this into the second block of the linear system, one obtains

which is equivalent to

In order to solve the linear system, we proceed as for non-negative least squares. The computational cost of this operation is roughly the same, since the sparse structure of **
Φ
**
^{⊤}
**
Φ
**
can still be exploited. For non-negative least squares, re-computation of the Hessian **
Φ
**
^{⊤}([**
Ξ
**

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

MS and MH devised the methodology as presented in Section “Methods”. MS implemented the Bioconductor package, with contributions by RH and MH. The comparative data analysis was performed by RH, MS, MH and AH; RH and AH performed the MASCOT queries. AT developed the experimental design and provided an interpretation of the MS data. TJ and BG conducted the MS experiments and produced the results of the vendor software. All authors read and approved the final manuscript.

Acknowledgements

The authors would like to thank Markus Martin for setting up the Bruker Daltonics HCT Ultra Ion Trap MS and Bart van den Berg for measuring the LC-MS datasets used in the vignette of the

Funding

Clusters of Excellence ‘Multimodal Computing and Interaction’ (to M.S., R.H. and B.G.), ‘Inflammation@Interfaces’ (to A.T. and T.J.) within the Excellence Initiative of the German Federal Government; DFG (grants BIZ4:1-4 to R.H. and A.H.).