School of Computing, Engineering and Mathematics, University of Western Sydney, Sydney, New South Wales, Australia

Division of Mathematics, Informatics and Statistics, CSIRO, Brisbane, Queensland, Australia

Adelaide Proteomics Centre, School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, South Australia, Australia

Abstract

Background

Imaging Mass Spectrometry (IMS) provides a means to measure the spatial distribution of biochemical features on the surface of a sectioned tissue sample. IMS datasets are typically huge and visualisation and subsequent analysis can be challenging. Principal component analysis (PCA) is one popular data reduction technique that has been used and we propose another; the minimum noise fraction (MNF) transform which is popular in remote sensing.

Findings

The MNF transform is able to extract spatially coherent information from IMS data. The MNF transform is implemented through an R-package which is available together with example data from

Conclusions

In our example, the MNF transform was able to find additional images of interest. The extracted information forms a useful basis for subsequent analyses.

Background

Imaging Mass Spectrometry (IMS) provides a means to measure the spatial distribution of drug metabolite, lipid, peptide and protein features on the surface of a sectioned tissue sample (see

The data can be thought of in two ways, firstly a set of mass spectra acquired at a spatial array of spots, and secondly as a

Image(A)of coronal murine midbrain section, and ion intensity maps at m/z of 6719(B), 9980(C) and 14136(D)

**Image(A)of coronal murine midbrain section, and ion intensity maps at m/z of 6719(B), 9980(C) and 14136(D).** m/z are approximate, from flexImaging V2.1. Scale bar is 1 mm.

Current methods typically use spectral features, not spatial information, to guide analysis. Hence the predominance of PCA and HC type approaches. We propose the use of the minimum noise fraction (MNF) transform

Findings

Principal Components Analysis

Principal Components Analysis (PCA) treats the IMS data as a collection of spectra. Therefore, in PCA, the spatial structure of the spots is not relevant and so the data can be represented as a matrix _{Zk}} be a typical mass spectrum. PCA seeks linear combinations of intensities over the mass charge ratios that maximizes variance. That is, the first principal component is defined by a vector _{ak}} with ^{at}

If _{ΣZ} is the covariance matrix of the mass spectra, that is, the (_{k1},_{k2}) entry is the covariance of the ion intensity measured at the _{1}-th m/z ratio and the ion intensity measured at the _{2}-th m/z ratio, then the first principal component maximises ^{at}_{ΣZ}is unknown so is estimated using the sample covariance matrix _{Z} given by

where

It should be noted that the mass spectra are unlikely to form a set of independent observations since spatially close spectra will likely be correlated.

The MNF transform

PCA makes no use of the spatial structure of the observed mass spectra. The Minimum Noise Fraction transform uses a simple model to allow the spatial structure to influence the analysis. Here we modify the notation to emphasise the spatial aspect; let

where

This is to be interpreted as “the mass spectrum at spot _{ΣM} and _{ΣN}.

The MNF transform seeks linear combinations of intensities over the mass charge ratios that maximizes _{ak}} with ^{at}^{at}

In PCA we need an estimate for _{ΣZ}, whereas for the MNF we need estimates of _{ΣM}and _{ΣN}. These are not as straight-forward to obtain as in PCA, since the signal _{ΣZ}=_{ΣM} + _{ΣN}we see that the SNR is maximised when the following ratio is maximised,

Thus only an estimate for _{ΣN} is required. In reality, only an estimate of _{ΣN}_{ΣM} is required, and we find it easiest to estimate the former.

Green et al. _{ΣN} and Berman et al.

Each spot (except edge spots) has two horizontal and two vertical neighbours. Averaging these four values gives the prediction of a local linear fit at the central spot, from which a pseudo-residual can be derived. Since the spots are on a regular grid, this corresponds to the residual from a local linear fit to the four neighbouring spots. This procedure produces a set of pseudo-residuals (one for each spot at each mass charge ratio, subject to simple modification at edge spots) from which the sample noise covariance matrix _{N} can be formed. We use this as the estimate of _{ΣN}.

Implementation

PCA corresponds to the maximisation of ^{at}_{Z} corresponding to the largest eigenvalue. Subsequent principal components are defined by eigenvectors corresponding to subsequent eigenvalues. A similar argument shows that the

satisfies

This is a

For both PCA and MNF the uses of all mass charge ratios would produce sample covariance matrices that are extremely large, so firstly some pre-filtering is used. In PCA, this is often a peak identification method, or selection by taking all the mass charge ratios for which the intensity exceeds some threshold (in some or all spots). For the MNF transform, we use only those mass charge ratios whose SNR values exceed a threshold. This SNR corresponds to the ratio of diagonal entries in _{Z} and _{N}. The threshold is chosen so that the matrices are of a manageable size.

Our implementation uses the LAPACK

Results

We demonstrate the method using a section of 10

The processed data consists of intensities at 11280 mass charge ratios, repeated across a grid of 2012 spots over the tissue slice. The data were first logged and then background corrected by using a 5-knot robust spline fit to estimate baseline. Pre-filtering of mass charge ratios was carried by thresholding intensities (in the case of PCA) or SNRs (in the case of MNF) so that 650 were retained. PCA and MNF transforms were computed. This means that PCA operated on the 650 mass charge ratios with the highest intensity, whereas MNF used the 650 mass charge ratios with the highest estimated signal to noise. The choice of 650 data points stems from trial and error and a pragmatic desire to use manageable covariance matrices.

Figure

The first six principal component images of a coronal murine midbrain section

**The first six principal component images of a coronal murine midbrain section.**

Figure

The first six MNF transformed images of a coronal murine midbrain section

**The first six MNF transformed images of a coronal murine midbrain section.**

Subsequent Analysis

Deininger et al.

Taking the first four MNF band images, and applying hierarchical clustering we can determine seven clusters. Figure

**The results of clustering using the first four MNF bands**
**(A)**

**The results of clustering using the first four MNF bands.****(A)** the clustering of spots (spots in the same cluster have the same colour), **(B)** the average (background corrected) mass spectrum for each cluster

As with PCA, the choice of the number of MNF bands to use in subsequent analysis (such as hierarchical clustering) is somewhat ad-hoc and depends on the form of such analysis. For clustering and classification there are many methods for choosing the number of features but we regard this as a topic for further research.

In this instance the number of components chosen (six) was primarily chosen for convenience and subjective reasons. The 5th and 6th PCA plots still show some faint internal structure, whereas subsequent ones do not (not shown). So we use 6 components for both PCA and MNF for consistency.

More generally the number of PCA components

Conclusion

We have shown that the minimum noise fraction transform is a potent addition to the suite of analysis tools available for the analysis of Imaging Mass Spectrometry data. Like PCA, we have further demonstrated that the MNF bands generated can be used as summaries of the mass spectra to analyse the spatial characteristics of a tissue slice. We regard the MNF transform as providing a useful alternative to PCA in Imaging Mass Spectrometry. Its defining feature is that is uses estimates of spatial signal to noise ratio to sequentially define bands whereas PCA uses only total variation (signal plus noise).

Both PCA and MNF are computationally efficient when compared to the data acquisiton and preprocessing steps involved. In our implementation, all code was written in R and C and is therefore platform independent. However, the flexImaging provided data in a proprietary format that required the use of a Windows only proprietary tool (CompassXport). We have successfully used emulation software on Linux and Mac OS X based systems to run this tool.

Availability and requirements

**Project Name:** Computing Minimum Noise Fraction Transforms of Imaging Mass Spectrometry Data;**Project Home:****Operating Systems:** MNF code is in R and C and is compatible with Windows, Mac, and Linux;**Programming Language:** R, **Other Requirements:** caMassClass,**License** GPL-2;**Restrictions to use by non-academics:** none;

Availability of supporting data

The software and supporting data are available for download from the project home at

Competing interests

The authors declare that they have no competing interests.

Author’s contributions

GS and DC concieved the statistical approach. DC implemented the analysis. GS drafted the manuscript. JORG, SRM and PH developed the protocols, and obtained, prepared and processed the samples. All authors read and approved the final manuscript.

Funding

GS was employed by CSIRO when much of this work was carried out. The Adelaide Proteomics Centre was partially funded by Bioplatforms Australia and an NHMRC equipment grant.

Acknowledgements

The authors would like to thank Mike Buckley for providing critical feedback of an early draft of this manuscript.