This article is part of the supplement: Proceedings of the 2011 International Conference on Bioinformatics and Computational Biology (BIOCOMP'11)
Recursive SVM biomarker selection for early detection of breast cancer in peripheral blood
1 Department of Academic and Institutional Resources and Technology, University of North Texas Health Science Center, Fort Worth, TX, USA
2 Department of Forensic and Investigative Genetics, University of North Texas Health Science Center, Fort Worth, TX, USA
3 Rush University Cancer Center, Rush University Medical Center, Chicago, IL 60612, USA
4 Department of General Surgery and Immunology and Microbiology, Rush University Medical Center, Chicago, IL 60612, USA
5 Department of Internal Medicine and Biochemistry, Rush University Medical Center, Chicago, IL 60612, USA
BMC Medical Genomics 2013, 6(Suppl 1):S4 doi:10.1186/1755-8794-6-S1-S4Published: 23 January 2013
Breast cancer is worldwide the second most common type of cancer after lung cancer. Traditional mammography and Tissue Microarray has been studied for early cancer detection and cancer prediction. However, there is a need for more reliable diagnostic tools for early detection of breast cancer. This can be a challenge due to a number of factors and logistics. First, obtaining tissue biopsies can be difficult. Second, mammography may not detect small tumors, and is often unsatisfactory for younger women who typically have dense breast tissue. Lastly, breast cancer is not a single homogeneous disease but consists of multiple disease states, each arising from a distinct molecular mechanism and having a distinct clinical progression path which makes the disease difficult to detect and predict in early stages.
In the paper, we present a Support Vector Machine based on Recursive Feature Elimination and Cross Validation (SVM-RFE-CV) algorithm for early detection of breast cancer in peripheral blood and show how to use SVM-RFE-CV to model the classification and prediction problem of early detection of breast cancer in peripheral blood.
The training set which consists of 32 health and 33 cancer samples and the testing set consisting of 31 health and 34 cancer samples were randomly separated from a dataset of peripheral blood of breast cancer that is downloaded from Gene Express Omnibus. First, we identified the 42 differentially expressed biomarkers between "normal" and "cancer". Then, with the SVM-RFE-CV we extracted 15 biomarkers that yield zero cross validation score. Lastly, we compared the classification and prediction performance of SVM-RFE-CV with that of SVM and SVM Recursive Feature Elimination (SVM-RFE).
We found that 1) the SVM-RFE-CV is suitable for analyzing noisy high-throughput microarray data, 2) it outperforms SVM-RFE in the robustness to noise and in the ability to recover informative features, and 3) it can improve the prediction performance (Area Under Curve) in the testing data set from 0.5826 to 0.7879. Further pathway analysis showed that the biomarkers are associated with Signaling, Hemostasis, Hormones, and Immune System, which are consistent with previous findings. Our prediction model can serve as a general model for biomarker discovery in early detection of other cancers. In the future, Polymerase Chain Reaction (PCR) is planned for validation of the ability of these potential biomarkers for early detection of breast cancer.