Open Access Highly Accessed Open Badges Methodology article

A feature selection method for classification within functional genomics experiments based on the proportional overlapping score

Osama Mahmoud13*, Andrew Harrison1, Aris Perperoglou1, Asma Gul1, Zardad Khan1, Metodi V Metodiev2 and Berthold Lausen1

Author Affiliations

1 Department of Mathematical Sciences, University of Essex, Wivenhoe Park, CO4 3SQ Colchester, UK

2 School of Biological Sciences/Proteomics Unit, University of Essex, Wivenhoe Park, CO4 3SQ Colchester, UK

3 Department of Applied Statisitcs, Helwan University, Cairo, Egypt

For all author emails, please log on.

BMC Bioinformatics 2014, 15:274  doi:10.1186/1471-2105-15-274

Published: 11 August 2014



Microarray technology, as well as other functional genomics experiments, allow simultaneous measurements of thousands of genes within each sample. Both the prediction accuracy and interpretability of a classifier could be enhanced by performing the classification based only on selected discriminative genes. We propose a statistical method for selecting genes based on overlapping analysis of expression data across classes. This method results in a novel measure, called proportional overlapping score (POS), of a feature’s relevance to a classification task.


We apply POS, along‐with four widely used gene selection methods, to several benchmark gene expression datasets. The experimental results of classification error rates computed using the Random Forest, k Nearest Neighbor and Support Vector Machine classifiers show that POS achieves a better performance.


A novel gene selection method, POS, is proposed. POS analyzes the expressions overlap across classes taking into account the proportions of overlapping samples. It robustly defines a mask for each gene that allows it to minimize the effect of expression outliers. The constructed masks along‐with a novel gene score are exploited to produce the selected subset of genes.

Feature selection; Gene ranking; Microarray classification; Proportional overlap score; Gene mask; Minimum subset of genes