Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Baseline Correction for NMR Spectroscopic Metabolomics Data Analysis

Yuanxin Xi1 and David M Rocke12*

Author Affiliations

1 Department of Applied Science, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA

2 Division of Biostatistics, School of Medicine, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA

For all author emails, please log on.

BMC Bioinformatics 2008, 9:324  doi:10.1186/1471-2105-9-324

The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/9/324


Received:21 January 2008
Accepted:29 July 2008
Published:29 July 2008

© 2008 Xi and Rocke; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

We propose a statistically principled baseline correction method, derived from a parametric smoothing model. It uses a score function to describe the key features of baseline distortion and constructs an optimal baseline curve to maximize it. The parameters are determined automatically by using LOWESS (locally weighted scatterplot smoothing) regression to estimate the noise variance.

Results

We tested this method on 1D NMR spectra with different forms of baseline distortions, and demonstrated that it is effective for both regular 1D NMR spectra and metabolomics spectra with over-crowded peaks.

Conclusion

Compared with the automatic baseline correction function in XWINNMR 3.5, the penalized smoothing method provides more accurate baseline correction for high-signal density metabolomics spectra.

Background

Baseline distortions in 1D NMR spectra are mainly caused by the corruption of the first few data points in FID (free induction decay). These corrupted data points add low frequency modulations in the Fourier-transformed spectrum, and thus formed the distorted baseline. Correction of these distortions is a necessary step in NMR spectra data processing because they offset the intensity values and result in inaccuracy in peak assignment and quantification. These errors could be critical in the study of metabolomics, which involves many small but statistically significant peaks that are sensitive to baseline distortions. Incorrect quantification of these peaks may result in failures in detection of important metabolites or identifying potential biomarkers.

Existing automatic baseline distortion correction methods fall into two categories: time domain correction and frequency domain correction [1-9]. Time domain correction methods reconstruct the corrupted data points in FID to reduce the low frequency modulation [6-8]. Frequency domain correction methods construct baseline curves in the spectra directly, and subtract these baseline curves to remove the distortion [1-5]. These methods have been implemented in commercial software and hand-written programs for NMR data processing. Usually a combination of both time and frequency domain methods are applied to achieve better correction. In comparison, time domain methods provide general correction for FID and frequency domain methods provide more detailed correction on the spectrum itself. For processing a specific kind of NMR spectra, such as metabolomics spectra, frequency domain methods are easier to tailor for different needs.

In this article, we present a novel frequency domain baseline correction method for processing 1D spectra for metabolomics profiling. 1D proton NMR spectroscopy has been widely applied in metabolomics profiling because it can simultaneously measure many metabolites that lie above the detection limit. These metabolomics spectra usually have many crowded peaks such that noise regions are broken into smaller pieces and are more difficult to identify accurately. The term noise regions refer to the regions in the spectrum that only contain noise. In some part of the spectra, severely overlapping peaks form long gaps between noise regions and are a cause of numerical sensitivity in baseline construction. Traditional frequency domain methods detect noise regions and construct the baseline curve by interpolating among identified noise regions. These methods rely heavily on robust noise region identification, and therefore may not achieve optimal baseline correction in metabolomics spectra. For example, Golotvin et al [5] proposed identifying noise points by comparing the intensity range of a small neighborhood with the standard deviation of noise regions, which is estimated by dividing the spectrum into 32 sections and taking the minimum value of standard deviations of these sections. We observed that this method occasionally identifies the low signal points in metabolomics spectra as noise because they may overlap with each other and have reduced standard deviation, and as a result these signal points will be offset to zero after baseline correction. Noise standard deviation estimation is also theoretically biased to be smaller than the true value in a statistical view, and leads to additional inaccuracy in detection of noise data points.

As an alternative to the existing noise detection and interpolation approaches, we developed a new baseline correction method based on a penalized parametric smoothing model. This method fits a curve following the bottom envelope of the spectrum and doesn't need explicit identification of the noise data points. The primary motivation is that we model the baseline as a smooth curve of arbitrary form that goes through the noise region instead of linked pieces of selected noise points. We describe key features of this model by a score function and construct the optimal baseline curve corresponding to the function maximum. In addition, we present a more accurate estimation of noise variance by LOWESS (locally weighted scatterplot smoothing) regression and use it to determine the model parameters.

Methods

Suppose a 1D NMR spectrum is represented by the set of ordered pairs {xi, yi}, where xi is the abscissa in ppm units and yi is the ordinate, representing the height of the spectrum as a Fourier-transformed RF decay curve. The fundamental model behind our method is that the spectrum can be represented as

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M1">View MathML</a>

(1)

where bi is the baseline, μi is the true signal, and ηi and εi are random errors normally distributed with mean 0 and variance 1, generally autocorrelated, a type of model that fits a wide variety of measurement data (Rocke and Lorenzato 1995 [10]; Rocke and Durbin 2001 [11]). An estimated baseline should be 1) smooth, but not necessarily flat; and 2) run through the middle of the data in segments where there is no signal. Based the on these features, we construct the following score function:

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M2">View MathML</a>

(2)

And <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M3">View MathML</a> is the Heaviside step function.

b = {bi} is a set of points that represents a certain baseline. The optimal baseline curve b0 should maximize the score function F(b).

b0 = arg max F(b)(3)

F(b) has three components. The first term <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M4">View MathML</a> is the sum of all baseline points. We want to maximize it subject to the smoothness penalty <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M5">View MathML</a> and the negativity penalty <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M6">View MathML</a>. The smoothness penalty is a discrete form of integral of squares of second-order derivatives, which is small for linear segments and large for small curvature radius. The negativity penalty is designed to be nonzero only when the baseline point is above the data point, by using the Heaviside step function g(bi - yi). It counteracts the uptrend of the first term and force the baseline to run through the middle of the data. By maximizing this function the baseline is pushed up to the spectrum but not exceeding the zero-signal level, and forced to be as smooth as possible to link peak regions.

The negative penalty parameter B is determined by the condition that the baseline should run through the center of the noise region. Take the simplest case of a spectrum with only normally distributed noise with variance σ2 and mean 0. The baseline should also be a horizontal line at y = b, so the summation term in the score function <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M7">View MathML</a>. The smoothing term <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M5">View MathML</a> = 0 because the baseline should be horizontal and has no curvature. The expectation value of negativity term could be calculated based on the probability density function (PDF) of the noise P(y).

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M8">View MathML</a>

(4)

Where <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M9">View MathML</a> is the PDF of normal distribution with variance σ2 and mean 0. The boundary of the integral (-∞, b) in equation (4) is determined by explicitly plug in the step function <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M10">View MathML</a>.

Hence the expected value of the score function <F(b) > becomes:

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M11">View MathML</a>

(5)

The estimated baseline should be at the zero intensity level, which means the score function reaches its maximum at b = 0.

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M12">View MathML</a>

This gives us the theoretical value of B

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M13">View MathML</a>

(6)

So the negativity penalty parameter B is determined by the noise standard deviation σ. We define the constant <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M14">View MathML</a> so that B is in the form of <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M15">View MathML</a>. By dividing by σ, the negativity penalty <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M6">View MathML</a> will have the same order of the intensity as in the first linear summation term <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M4">View MathML</a> in the score function, which guarantees that the maximal point of the score function remain invariant if the spectrum is multiplied by a scalar, so that the corresponding baseline curve will not be affected. For the same reason, the smoothing penalty parameter, denoted by A in the score function (2), should also take the form of <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M16">View MathML</a> to guarantee invariance in spectrum scaling, where A* is a constant and C is a coefficient related with the resolution of the spectra and will be discussed later.

For example, if we multiply the spectrum {xi, yi} by a scalar k, we get a new spectrum {xi, kyi}. The noise standard deviation of the scaled spectrum, denoted by σ', is also k times the original noise standard deviation σ: σ' = .

The score function for the original spectrum is:

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M17">View MathML</a>

(7)

And the score function for the scaled spectrum is:

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M18">View MathML</a>

(8)

The estimated baseline for scaled spectrum should also be also k times of the original spectrum baseline, thus b' = kb, or equivalently, bi' = kbi. Substitute into equation (8), we have

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M19">View MathML</a>

So F(b') = F(kb) = kF(b), which means the score function is in an invariant form for scaling. Multiplying the spectrum by a constant does not affect finding the optimal baseline by maximizing this score function.

In addition, the smoothness penalty should be robust to the abscissa resolution. For example, if we take half the data points (with odd indices) of the original spectrum so that the chemical shift interval is doubled, the baseline curve should not be affected. This suggests the coefficient C in <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M16">View MathML</a> is related with the abscissa resolution dx.

Recall the smoothing term is the sum of squared second order derivatives of {bi}, rewrite it in the generic form of discrete representation of second order derivative:

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M20">View MathML</a>

Therefore C has an inverse quadruple relation with the resolution dx of the abscissa. For a given spectrum, dx is inverse proportional to the number of data points n: <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M21">View MathML</a>, which means C could take the value of n4.

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M22">View MathML</a>

(9)

where A* is a constant that is independent of the spectrum. By comparing different spectra, we choose an empirically reasonable value of A* to be A* = 5 × 10-9.

Based on the above analysis, the score function takes the follow form

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M23','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M23">View MathML</a>

(10)

where A* = 5 × 10-9, B* ≈ 1.25, σ is the standard deviation of noise and n is the total number of data points. The baseline curve is insensitive to small changes of A and B, unless the orders of magnitude are changed.

The estimation of the noise standard deviation σ is based on the model in equation (1), The variance of a certain part of spectrum is derived as the following

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M24">View MathML</a>

(11)

It indicates that the signal variance Var(y) increases with the mean value of signal intensity μ. We can estimate the noise variance <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M25">View MathML</a> by fitting equation (11) on the signal variances and mean intensities sampled from the spectrum. We divide the spectrum into small regions and compute the variance and mean intensity within each regions. Figure 1 plots the variances versus mean values with region size of 32 data points, corresponding to 0.012 ppm in chemical shifts. We use LOWESS (locally weighted scatterplot smoothing) regression to fit equation (11). The red line in Figure 1 represents the fitted regression line. It has a quadratic form as expressed in equation (11).

thumbnailFigure 1. LOWESS smoothing for variances estimation. Variances versus mean intensities sampled bins of 1D NMR metabolomics spectra. The bin size was set to 32 data points, corresponding to 0.012 ppm in chemical shift. The fitted LOWESS regression curve was plotted in red, and the flat region of the LOWESS curve represents the estimated noise variance.

Since <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M26">View MathML</a> when μ ≈ 0, the noise variance <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M25">View MathML</a> is approximately equal to the signal variance Var(y) for small mean values. We take the predicted value of Var(y) at zero mean intensity in the LOWESS regression to be our estimate of <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M25">View MathML</a>, and the standard deviation of the noise σ is the square root of the noise variance

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M27">View MathML</a>

After determining the parameters, we maximize the function F(b) to find the baseline b0, according to equation (3). Mathematically, we solve <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M28">View MathML</a> to find the maximum of F(b). This partial derivative equation expands as a linear system with the solution to be b0. The numerical implementation of solving this linear system is attached in the appendix.

Results and discussion

Based on this penalized smoothing model, we test the baseline correction method on simple 1D NMR spectra and complex metabolomics spectra.

Figure 2 demonstrates this method corrects the baseline distortion of a simple 1D 1H NMR reference spectrum of DSS (2,2-Dimethyl-2-silapentane-5-sulfonic acid) with n = 65536 data points. The estimated noise standard deviation is σ = 8335.9. According to previous discussion of parameter determination, the parameters are set to <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M29">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M30">View MathML</a>. Figure 2A shows the original spectrum with apparent baseline distortions. This distorted baseline is detected by the penalized smoothing method in Figure 2B. In Figure 2C, this baseline curve is subtracted from the spectrum and the distortion is corrected. The optimal baseline found by our baseline model fits well with the distortion curve. The small peak at 2ppm in the spectrum is correctly presented after baseline subtraction.

thumbnailFigure 2. Baseline correction by penalized parametric smoothing method. (A) Original 1D proton NMR spectrum of DSS reference with distorted baseline. (B) Detected baseline curve by penalized parametric smoothing method. (C) Corrected spectrum after baseline subtraction.

We test this method in more complicated metabolomics spectra collected from tissue samples of red abalone. The data are from a study of environmental stresses on the development of a bacterial infection among red abalones (Haliotis rufescens) [12,13]. The dataset include 65 1D proton NMR spectra with 65536 data points in each spectrum. In our test the penalized smoothing method correctly detected and removed the distorted baseline for all 65 spectra. Figure 3 shows the baseline correction result on one example of testing spectra using the penalized smoothing method. In Figure 3A, the peaks of metabolites aggregate together and form continuous peak regions. Lack of noise points in these regions generates big gaps in baseline construction. As demonstrated in Figure 3B and Figure 3C, the baseline distortion is correctly detected and removed. In these gaps, the baseline curve is constructed following the smoothing constraints in the penalized smoothing model. The penalized smoothing method does not require explicit identification of noise region identification and therefore avoids constructing the baseline by interpolation, which is sensitive to the identification of noise points, especially in the region with high peak densities where noise points is difficult to detect accurately.

thumbnailFigure 3. Baseline correction for metabolomics spectrum by penalized parametric smoothing method. (A) Original spectrum with distorted baseline. (B) Detected baseline curve by penalized parametric smoothing method. (C) Corrected spectrum after baseline subtraction.

We compared the penalized smoothing method with commercial software in Figure 4 and Figure 5. We applied the penalized smoothing method and the automatic baseline correction function in XWIN-NMR software (version 3.5) carried by the Bruker AVANCE 600 NMR facility to the same spectrum with baseline distortion, and plot the baseline corrected spectrum in Figure 4. The red lines represent the ideal horizontal baseline at zero intensity level. Both methods are capable of removing large distortions of the baseline and setting the corrected baseline to near the zero intensity level. The corrected baseline by the penalized smoothing method (Figure 4A) fits well with the ideal horizontal baseline in red. In Figure 4B, the corrected baseline by XWINNMR has apparently larger deviation from the zero intensity level.

thumbnailFigure 4. Comparison of penalized smoothing method and XWIN-NMR 3.5. (A) Spectrum corrected by penalized parametric smoothing method. (B) Spectrum corrected by automatic baseline correction function in XWIN-NMR 3.5. The red lines represent the zero intensity level.

thumbnailFigure 5. Comparison of baseline distortion detection by penalized smoothing methods and XWIN-NMR 3.5. This is an expanded region of the spectrum in Figure 4 before baseline correction. Baseline curves are detected by penalized parametric method (red) and XWIN 3.5 automatic baseline correction function (black).

In Figure 5 we show a region from 6.5 ppm to 9.5 ppm of original spectrum to illustrate the difference of baselines detected by penalized smoothing methods (in red) and XWINNMR 3.5 (in black). The penalized smoothing baseline is closer to the center of noise regions and therefore detects the offsets of signals more accurately. This leads to a more accurate identification and quantification of signals in the corrected spectrum, especially for the small peaks at 7.08 ppm, 8.35 ppm and 8.94 ppm in Figure 5. For all 65 testing spectra, the penalized smoothing method shows significant improvement in presenting low intensity peaks in 47 spectra, corresponding to a percentage of 69%. To achieve comparable baseline correction quality in these spectra, XWINNMR 3.5 needs manually marking of the noise points or regions for baseline interpolation, which is time consuming for batch processing and is dependent on the experience of the software users. The penalized smoothing method is fully automatic and user independent.

Conclusion

We propose an alternative baseline correction method based on a penalized smoothing model. This method constructs the baseline by maximizing a score function (eq 2) that evaluates how well the baseline fits the spectrum. The model parameters A and B in score function are determined by the noise variance of the spectrum, σ2, which can be automatically estimated by LOWESS regression. This method does not require explicit identification of noise data points for baseline interpolation, or assumption of fixed forms of baseline curves, i.e. polynomials etc. Instead, it uses a general smoothing term to fit flexible forms of baseline distortion.

We applied this method to 1D NMR spectra with baseline distortion, and demonstrate it is effective for both regular 1D NMR spectra and metabolomics spectra with over-crowded peaks. The numerical implementation is fast and stable on common personal computers. This method provides an alternative to the existing noise detection and interpolation approaches in baseline correction, especially for spectra with many crowded peaks, such as in NMR metabolomics profiling, where noise points are more difficult to identify accurately. Compared with the widely used XWINNMR software, the method provided more accurate baseline correction on 47 out of 65 of our testing metabolomics spectra. With a few modifications, this penalized smoothing baseline correction approach is also applicable to 2D NMR spectra. The numerical implementation and optimization for 2D baseline correction could be one topic in further study.

Appendix

We solve <a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M31">View MathML</a> to find the maximum of score function F(b), which leads to linear system of order n. (n is the number of data points in b).

Db = m

D is the Hessian matrix of F(b), i.e.<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M32','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M32">View MathML</a>, and m is a vector. In detail, the entries of D and m have the following entries:

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M33','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M33">View MathML</a>

The boundary points may lack the neighbor points in the above formulas, and need to be treated separately. The non-existent terms in these equations are omitted:

<a onClick="popup('http://www.biomedcentral.com/1471-2105/9/324/mathml/M34','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/9/324/mathml/M34">View MathML</a>

We use an iterated procedure to solve Db = m because the entries of D and m are dependent on b, and need to be updated in each iteration until it converges. This procedure is described in the following steps:

1) Set the initial value of baseline points b to be zeros.

2) determine D and m based on current values of b

3) solve Db = m for b

4) stop if the relative change of b is smaller than a threshold, otherwise go to step 2)

The Hessian matrix D is highly sparse, with all the non-zero elements in a symmetric band along the main diagonal. This kind of matrix structure can be solved very efficiently. In addition, the matrix is positive semi-definite, which guarantees the convergence of the iteration, so it is not sensitive to the initial value of b. We use MATLAB to implement the above algorithm and the code is available upon request.

Acknowledgements

The research reported in this paper was supported by grants from National Institute of Environmental Health Sciences, National Institutes of Health (P42-ES04699), from the Ecotoxicology Lead Campus Program (ELCP) of Toxic Substances Research and Teaching Program of the University of California., from the National Cancer Institute (P30-CA093373-04), from the UC Davis Health System, from the National Human Genome Research Institute (R01-HG003352), from the Air Force Office of Scientific Research (FA9550-07-1-0146) and from the Department of Energy (DE-FG02-07ER64341).

References

  1. Bartels C, Güntert P, Wüthrich K: A new automatic baseline correction method for multidimensional NMR spectra with strong solvent signals.

    J Magn Reson Ser A 1995, 117:330-333. Publisher Full Text OpenURL

  2. Brown D: Improved Baseline Recognition and Modeling of FT NMR Spectra.

    J Magn Reson Ser A 1995, 114:268-270. Publisher Full Text OpenURL

  3. Saffrich R, Beneicke W, Neidig K, Kalbitzer H: Baseline correction in n-dimensional NMR spectra by sectionally linear interpolation.

    J Magn Reson Ser B 1993, 101:304-308. Publisher Full Text OpenURL

  4. Rouh A, Delsuc M, Bertrand G, Lallemand J: Baseline correction of FT NMR Spectra: An Approach in terms of classification.

    J Magn Reson Ser A 1993, 102:357-359. Publisher Full Text OpenURL

  5. Golotvin S, Williams A: Improved Baseline Recognition and Modeling of FT NMR Spectra.

    J Magn Reson 2000, 146:122-125. PubMed Abstract | Publisher Full Text OpenURL

  6. Halamek J, Vondra V, Kasal M: The elimination of baseline distortions induced by audio filters.

    J Magn Reson Ser A 1994, 110:194-197. Publisher Full Text OpenURL

  7. Marion D, Bax A: Baseline distortion in real-fourier-transform NMR-spectra.

    J Magn Reson 1988, 79:352-356. OpenURL

  8. Heuer A, Haeberlen U: A new method for suppressing baseline distortions in FT NMR.

    J Magn Reson 1989, 85:79-94. OpenURL

  9. Lunga G, Pogni R, Basosi R: A Simple Method for Baseline Correction in EPR Spectroscopy.

    J Magn Reson Ser A 1994, 108:65-70. Publisher Full Text OpenURL

  10. Rocke D, Lorenzato S: A Two-Component Model for Measurement Error in Analytical Chemistry.

    Technometrics 1995, 37:176-184. Publisher Full Text OpenURL

  11. Durbin B, Hardin J, Hawkins D, Rocke D: A Variance-Stabilizing Transformation for Gene Expression Microarray Data.

    Bioinformatics 2002, 18 Suppl 1:S105-110. PubMed Abstract | Publisher Full Text OpenURL

  12. Rosenblum E, Tjeerdema R, Viant M: Effects of Temperature on Host-Pathogen-Drug Interactions in Red Abalone, Haliotis rufescens, Determined by 1H NMR Metabolomics.

    Env Sci Technol 2006, 40:7077-7084. Publisher Full Text OpenURL

  13. Rosenblum E, Viant M, Braid B, Moore J, Friedman C, Tjeerdema R: Characterizing the Metabolic Actions of Natural Stresses in the California Red Abalone, Haliotis rufescens using 1H NMR Metabolomics.

    Metabolomics 2005, 1:199-209. Publisher Full Text OpenURL