This article is part of the supplement: Proceedings of the Great Lakes Bioinformatics Conference 2013
A neural network approach to multi-biomarker panel discovery by high-throughput plasma proteomics profiling of breast cancer
1 Department of Academic and Institutional Resources and Technology, University of North Texas Health Science Center, Fort Worth, USA
2 Department of Forensic and Investigative Genetics, University of North Texas Health Science Center, Fort Worth, USA
3 School of Informatics, Indiana University, Indianapolis, IN 46202, USA
4 Department of Computer and Information Science, School of Science, Purdue University, Indianapolis, IN 46202, USA
5 Indiana Center for Systems Biology and Personalized Medicine, Indianapolis, IN 46202, USA
6 Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, USA
7 Institute of Biopharmaceutical Informatics and Technology, Wenzhou Medical College, Wenzhou, Zhejiang, China
BMC Proceedings 2013, 7(Suppl 7):S10 doi:10.1186/1753-6561-7-S7-S10Published: 20 December 2013
In the past several years, there has been increasing interest and enthusiasm in molecular biomarkers as tools for early detection of cancer. Liquid chromatography tandem mass spectrometry (LC/MS/MS) based plasma proteomics profiling technique is a promising technology platform to study candidate protein biomarkers for early detection of cancer. Factors such as inherent variability, protein detectability limitation, and peptide discovery biases among LC/MS/MS platforms have made the classification and prediction of proteomics profiles challenging. Developing proteomics data analysis methods to identify multi-protein biomarker panels for breast cancer diagnosis based on neural networks provides hope for improving both the sensitivity and the specificity of candidate cancer biomarkers for early detection.
In our previous method, we developed a Feed Forward Neural Network-based method to build the classifier for plasma samples of breast cancer and then applied the classifier to predict blind dataset of breast cancer. However, the optimal combination C* in our previous method was actually determined by applying the trained FFNN on the testing set with the combination. Therefore, in this paper, we applied a three way data split to the Feed Forward Neural Network for training, validation and testing based. We found that the prediction performance of the FFNN model based on the three way data split outperforms our previous method and the prediction performance is improved from (AUC = 0.8706, precision = 82.5%, accuracy = 82.5%, sensitivity = 82.5%, specificity = 82.5% for the testing set) to (AUC = 0.895, precision = 86.84%, accuracy = 85%, sensitivity = 82.5%, specificity = 87.5% for the testing set).
Further pathway analysis showed that the top three five-marker panels are associated with complement and coagulation cascades, signaling, activation, and hemostasis, which are consistent with previous findings. We believe the new approach is a better solution for multi-biomarker panel discovery and it can be applied to other clinical proteomics.