Combining dissimilarity based classifiers for cancer prediction using gene expression profiles

Blanco, Ángela; Martín-Merino, Manuel; De Las Rivas, Javier

doi:10.1186/1471-2105-8-S8-S3

Volume 8 Supplement 8

Highlights from the Third International Society for Computational Biology (ISCB) Student Council Symposium at the Fifteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB)

Oral presentation
Open access
Published: 20 November 2007

Combining dissimilarity based classifiers for cancer prediction using gene expression profiles

Ángela Blanco¹,
Manuel Martín-Merino¹ &
Javier De Las Rivas²

BMC Bioinformatics volume 8, Article number: S3 (2007) Cite this article

2977 Accesses
11 Citations
Metrics details

Background

DNA Microarrays allow us to monitor the expression level of thousands of genes simultaneously across a collection of related samples. This technology has been applied to the prediction of cancer considering the gene expression profiles in both, normal and cancer samples.

Support Vector Machines (SVM) have been applied to identify cancer samples considering the gene expression levels with encouraging results. This kind of techniques are able to deal with high dimensional and noisy data which is an important requirement in our practical problem.

However, common SVM algorithms rely on the use of the Euclidean distance which does not reflect accurately the proximities among the sample profiles [1].

This feature favors the misclassification of cancer samples (false negative errors) which is a serious drawback in our application. The SVM has been extended to incorporate non-Euclidean dissimilarities [2].

Nevertheless, no dissimilarity can be considered superior to the others because each one reflects just different features of the data and misclassifies a different set of patterns.

The false negative errors of individual classifiers can be reduced by combining non-optimal classifiers [3]. To this aim, different versions of the classifier are usually built by bootstrap sampling the patterns or the features.

However, resampling techniques reduce the size of the training set increasing the bias of individual classifiers and consequently the error of the resulting combination [4].

Our approach

To avoid the bias introduced by resampling techniques, we propose a combination strategy that builds the diversity of classifiers considering a set of dissimilarities that reflect different features of the data. In order to incorporate the dissimilarities into the SVM, they are first embedded in an Euclidean space such that the inter-pattern distances reflect the original dissimilarity matrix. Next, for each dissimilarity a C-SVM is trained. Finally, the resulting classifiers are properly combined using a voting strategy. Our method is able to work directly from a dissimilarity matrix.

Experimental results

The algorithm proposed has been tested using two benchmark datasets, Leukemia [5] and Breast Cancer [6].

Table 1 shows that the combination of dissimilarities improves significantly the Euclidean distance which is usually considered by most of SVM algorithms. The algorithm based on the combination of dissimilarities improves the best single dissimilarity which is ℵ². In breast cancer, false negative errors are significantly reduced. Experimental results are similar for the k-NN classifier.

Table 1

Full size table

Conclusion

In this paper, we have proposed an ensemble of classifiers based on a diversity of dissimilarities. Experimental results suggest that the method proposed improves both, misclassification errors and false negative errors of classifiers based on a single dissimilarity.

References

Jiang D, Tang C, Zhang A: Cluster Analysis for Gene Expression Data: A Survey. IEEE Transactions on Knowledge and Data Engineering 2004, 16(11):1370–1386. 10.1109/TKDE.2004.68
Article Google Scholar
Pekalska E, Paclick P, Duin R: A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research 2001, 2: 175–211. 10.1162/15324430260185592
Google Scholar
Kittler J, Hatef M, Duin PW, Matas J: On Combining Classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998, 20(3):226–239. 10.1109/34.667881
Article Google Scholar
Valentini G, Dietterich T: Bias-variance analysis of support vector machines for the development of svm-based ensemble methods. Journal of Machine Learning Research 2004, 5: 725–775.
Google Scholar
West M, et al.: Predicting the Clinical Status of Human Breast Cancer by using Gene Expression Profiles. PNAS 2001, 98(20):11462–11467. 10.1073/pnas.201162998
Article PubMed Central CAS PubMed Google Scholar
Golub TR, et al.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 1999, 286(15):531–537. 10.1126/science.286.5439.531
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Universidad Pontificia de Salamanca, C/Compañía, 5, 37002, Salamanca, Spain
Ángela Blanco & Manuel Martín-Merino
Cancer Research Center (CIC-IBMCC, CSIC/USAL), Salamanca, Spain
Javier De Las Rivas

Authors

Ángela Blanco
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Martín-Merino
View author publications
You can also search for this author in PubMed Google Scholar
Javier De Las Rivas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ángela Blanco.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Blanco, Á., Martín-Merino, M. & De Las Rivas, J. Combining dissimilarity based classifiers for cancer prediction using gene expression profiles. BMC Bioinformatics 8 (Suppl 8), S3 (2007). https://doi.org/10.1186/1471-2105-8-S8-S3

Download citation

Published: 20 November 2007
DOI: https://doi.org/10.1186/1471-2105-8-S8-S3

Highlights from the Third International Society for Computational Biology (ISCB) Student Council Symposium at the Fifteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB)

Combining dissimilarity based classifiers for cancer prediction using gene expression profiles

Background

Our approach

Experimental results

Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Highlights from the Third International Society for Computational Biology (ISCB) Student Council Symposium at the Fifteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB)

Combining dissimilarity based classifiers for cancer prediction using gene expression profiles

Background

Our approach

Experimental results

Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us