A flexible statistical model for alignment of label-free proteomics data – incorporating ion mobility and product ion information
1 Institute for Genome Sciences and Policy, Duke University Medical Center, Durham, North Carolina, USA
2 Waters Corporation, Milford, Massachusetts, USA
3 Department of Medicine, Duke University Medical Center, Durham, North Carolina, USA
4 Department of Statistics, Duke University, Durham, North Carolina, USA
5 Quintiles, Durham, North Carolina, USA
BMC Bioinformatics 2013, 14:364 doi:10.1186/1471-2105-14-364Published: 16 December 2013
The goal of many proteomics experiments is to determine the abundance of proteins in biological samples, and the variation thereof in various physiological conditions. High-throughput quantitative proteomics, specifically label-free LC-MS/MS, allows rapid measurement of thousands of proteins, enabling large-scale studies of various biological systems. Prior to analyzing these information-rich datasets, raw data must undergo several computational processing steps. We present a method to address one of the essential steps in proteomics data processing - the matching of peptide measurements across samples.
We describe a novel method for label-free proteomics data alignment with the ability to incorporate previously unused aspects of the data, particularly ion mobility drift times and product ion information. We compare the results of our alignment method to PEPPeR and OpenMS, and compare alignment accuracy achieved by different versions of our method utilizing various data characteristics. Our method results in increased match recall rates and similar or improved mismatch rates compared to PEPPeR and OpenMS feature-based alignment. We also show that the inclusion of drift time and product ion information results in higher recall rates and more confident matches, without increases in error rates.
Based on the results presented here, we argue that the incorporation of ion mobility drift time and product ion information are worthy pursuits. Alignment methods should be flexible enough to utilize all available data, particularly with recent advancements in experimental separation methods.