This article is part of the supplement: Selected articles from the Eighth Asia-Pacific Bioinformatics Conference (APBC 2010)
SFSSClass: an integrated approach for miRNA based tumor classification
1 Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
2 Department of Computer Science and & Engineering, Jadavpur University, Kolkata, India
3 Watson School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
4 MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Tsinghua University, Beijing 100084, China
BMC Bioinformatics 2010, 11(Suppl 1):S22 doi:10.1186/1471-2105-11-S1-S22Published: 18 January 2010
MicroRNA (miRNA) expression profiling data has recently been found to be particularly important in cancer research and can be used as a diagnostic and prognostic tool. Current approaches of tumor classification using miRNA expression data do not integrate the experimental knowledge available in the literature. A judicious integration of such knowledge with effective miRNA and sample selection through a biclustering approach could be an important step in improving the accuracy of tumor classification.
In this article, a novel classification technique called SFSSClass is developed that judiciously integrates a biclustering technique SAMBA for simultaneous feature (miRNA) and sample (tissue) selection (SFSS), a cancer-miRNA network that we have developed by mining the literature of experimentally verified cancer-miRNA relationships and a classifier uncorrelated shrunken centroid (USC). SFSSClass is used for classifying multiple classes of tumors and cancer cell lines. In a part of the investigation, poorly differentiated tumors (PDT) having non diagnostic histological appearance are classified while training on more differentiated tumor (MDT) samples. The proposed method is found to outperform the best known accuracy in the literature on the experimental data sets. For example, while the best accuracy reported in the literature for classifying PDT samples is ~76.5%, the accuracy of SFSSClass is found to be ~82.3%. The advantage of incorporating biclustering integrated with the cancer-miRNA network is evident from the consistently better performance of SFSSClass (integration of SAMBA, cancer-miRNA network and USC) over USC (eg., ~70.5% for SFSSClass versus ~58.8% in classifying a set of 17 MDT samples from 9 tumor types, ~91.7% for SFSSClass versus ~75% in classifying 12 cell lines from 6 tumor types and ~82.3% for SFSSClass versus ~41.2% in classifying 17 PDT samples from 11 tumor types).
In this article, we develop the SFSSClass algorithm which judiciously integrates a biclustering technique for simultaneous feature (miRNA) and sample (tissue) selection, the cancer-miRNA network and a classifier. The novel integration of experimental knowledge with computational tools efficiently selects relevant features that have high intra-class and low inter-class similarity. The performance of the SFSSClass is found to be significantly improved with respect to the other existing approaches.