SVM classifier to predict genes important for self-renewal and pluripotency of mouse embryonic stem cells
1 Department of Pharmacology and System Therapeutics, Mount Sinai School of Medicine, 1 Gustave L. Levy Place, New York, New York, 10029, USA
2 Systems Biology Center New York (SBCNY), Mount Sinai School of Medicine, 1 Gustave L. Levy Place, New York, New York, 10029, USA
3 Department of Gene and Cell Medicine, Mount Sinai School of Medicine, 1 Gustave L. Levy Place, New York, New York, 10029, USA
4 Black Family Stem Cell Institute, Mount Sinai School of Medicine, 1 Gustave L. Levy Place, New York, New York, 10029, USA
BMC Systems Biology 2010, 4:173 doi:10.1186/1752-0509-4-173Published: 21 December 2010
Mouse embryonic stem cells (mESCs) are derived from the inner cell mass of a developing blastocyst and can be cultured indefinitely in-vitro. Their distinct features are their ability to self-renew and to differentiate to all adult cell types. Genes that maintain mESCs self-renewal and pluripotency identity are of interest to stem cell biologists. Although significant steps have been made toward the identification and characterization of such genes, the list is still incomplete and controversial. For example, the overlap among candidate self-renewal and pluripotency genes across different RNAi screens is surprisingly small. Meanwhile, machine learning approaches have been used to analyze multi-dimensional experimental data and integrate results from many studies, yet they have not been applied to specifically tackle the task of predicting and classifying self-renewal and pluripotency gene membership.
For this study we developed a classifier, a supervised machine learning framework for predicting self-renewal and pluripotency mESCs stemness membership genes (MSMG) using support vector machines (SVM). The data used to train the classifier was derived from mESCs-related studies using mRNA microarrays, measuring gene expression in various stages of early differentiation, as well as ChIP-seq studies applied to mESCs profiling genome-wide binding of key transcription factors, such as Nanog, Oct4, and Sox2, to the regulatory regions of other genes. Comparison to other classification methods using the leave-one-out cross-validation method was employed to evaluate the accuracy and generality of the classification. Finally, two sets of candidate genes from genome-wide RNA interference screens are used to test the generality and potential application of the classifier.
Our results reveal that an SVM approach can be useful for prioritizing genes for functional validation experiments and complement the analyses of high-throughput profiling experimental data in stem cell research.