Research article
Identifying technical aliases in SELDI mass spectra of complex mixtures of proteins
1 Department of Pediatrics, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA USA
2 Department of Electrical Engineering, Stanford University, Stanford, CA USA
BMC Research Notes 2013, 6:358 doi:10.1186/1756-0500-6-358
Published: 8 September 2013Abstract
Background
Biomarker discovery datasets created using mass spectrum protein profiling of complex mixtures of proteins contain many peaks that represent the same protein with different charge states. Correlated variables such as these can confound the statistical analyses of proteomic data. Previously we developed an algorithm that clustered mass spectrum peaks that were biologically or technically correlated. Here we demonstrate an algorithm that clusters correlated technical aliases only.
Results
In this paper, we propose a preprocessing algorithm that can be used for grouping technical aliases in mass spectrometry protein profiling data. The stringency of the variance allowed for clustering is customizable, thereby affecting the number of peaks that are clustered. Subsequent analysis of the clusters, instead of individual peaks, helps reduce difficulties associated with technically-correlated data, and can aid more efficient biomarker identification.
Conclusions
This software can be used to pre-process and thereby decrease the complexity of protein profiling proteomics data, thus simplifying the subsequent analysis of biomarkers by decreasing the number of tests. The software is also a practical tool for identifying which features to investigate further by purification, identification and confirmation.



