This article is part of the supplement: Selected articles from the IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS) 2011
A systematic model of the LC-MS proteomics pipeline
1 Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
2 Current affiliation: Department of Pathology, University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
3 Computational Biology Division, Translational Genomics Research Institution, Phoenix, AZ, USA
4 Department of Bioinformatics and Computational Biology, University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
BMC Genomics 2012, 13(Suppl 6):S2 doi:10.1186/1471-2164-13-S6-S2Published: 26 October 2012
Mass spectrometry is a complex technique used for large-scale protein profiling with clinical and pharmaceutical applications. While individual components in the system have been studied extensively, little work has been done to integrate various modules and evaluate them from a systems point of view.
In this work, we investigate this problem by putting together the different modules in a typical proteomics work flow, in order to capture and analyze key factors that impact the number of identified peptides and quantified proteins, protein quantification error, differential expression results, and classification performance. The proposed proteomics pipeline model can be used to optimize the work flow as well as to pinpoint critical bottlenecks worth investing time and resources into for improving performance. Using the model-based approach proposed here, one can study systematically the critical problem of proteomic biomarker discovery, by means of simulation using ground-truthed synthetic MS data.