Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

A feature selection method for classification within functional genomics experiments based on the proportional overlapping score

Osama Mahmoud13*, Andrew Harrison1, Aris Perperoglou1, Asma Gul1, Zardad Khan1, Metodi V Metodiev2 and Berthold Lausen1

Author Affiliations

1 Department of Mathematical Sciences, University of Essex, Wivenhoe Park, CO4 3SQ Colchester, UK

2 School of Biological Sciences/Proteomics Unit, University of Essex, Wivenhoe Park, CO4 3SQ Colchester, UK

3 Department of Applied Statisitcs, Helwan University, Cairo, Egypt

For all author emails, please log on.

BMC Bioinformatics 2014, 15:274  doi:10.1186/1471-2105-15-274

Published: 11 August 2014

Abstract

Background

Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature’s relevance to a classification task.

Results

We apply POS, along‐with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.

Conclusions

A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along‐with a novel gene score are exploited to produce the selected subset of genes.

Keywords:
Feature selection; Gene ranking; Microarray classification; Proportional overlap score; Gene mask; Minimum subset of genes