Log on / register
Feedback | Support | My details
Open AccessResearch article

Improved machine learning method for analysis of gas phase chemistry of peptides

Allison Gehrke1 email, Shaojun Sun1 email, Lukasz Kurgan2 email, Natalie Ahn4,5 email, Katheryn Resing4 email, Karen Kafadar3,6 email and Krzysztof Cios7,8 email

1Department of Computer Science and Engineering, University of Colorado at Denver, USA

2Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada

3Department of Statistics, Indiana University, Bloomington, IN, USA

4Department of Chemistry and Biochemistry, University of Colorado, Boulder, CO, USA

5Howard Hughes Medical Institute, University of Colorado, Boulder, CO, USA

6Department of Preventive Medicine and Biometrics, School of Medicine, University of Colorado, Denver, CO, USA

7Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA

8IITiS, Polish Academy of Sciences, Poland

author email corresponding author email

BMC Bioinformatics 2008, 9:515doi:10.1186/1471-2105-9-515

Published: 3 December 2008

Abstract

Background

Accurate peptide identification is important to high-throughput proteomics analyses that use mass spectrometry. Search programs compare fragmentation spectra (MS/MS) of peptides from complex digests with theoretically derived spectra from a database of protein sequences. Improved discrimination is achieved with theoretical spectra that are based on simulating gas phase chemistry of the peptides, but the limited understanding of those processes affects the accuracy of predictions from theoretical spectra.

Results

We employed a robust data mining strategy using new feature annotation functions of MAE software, which revealed under-prediction of the frequency of occurrence in fragmentation of the second peptide bond. We applied methods of exploratory data analysis to pre-process the information in the MS/MS spectra, including data normalization and attribute selection, to reduce the attributes to a smaller, less correlated set for machine learning studies. We then compared our rule building machine learning program, DataSqueezer, with commonly used association rules and decision tree algorithms. All used machine learning algorithms produced similar results that were consistent with expected properties for a second gas phase mechanism at the second peptide bond.

Conclusion

The results provide compelling evidence that we have identified underlying chemical properties in the data that suggest the existence of an additional gas phase mechanism for the second peptide bond. Thus, the methods described in this study provide a valuable approach for analyses of this kind in the future.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.