Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Optimized between-group classification: a new jackknife-based gene selection procedure for genome-wide expression data

Florent Baty1*, Michel P Bihl1, Guy Perrière2, Aedín C Culhane3 and Martin H Brutsche1

Author Affiliations

1 Pulmonary Gene Research, University Hospital Basel, CH-4031 Basel, Switzerland

2 Laboratoire de Biométrie et de Biologie Évolutive, UMR CNRS 5558, Université Claude Bernard Lyon 1, 43 blvd du 11 Novembre, 1918, 69622 Villeurbanne Cedex, France

3 Bioinformatics Conway Institute, University College Dublin, Ireland

For all author emails, please log on.

BMC Bioinformatics 2005, 6:239  doi:10.1186/1471-2105-6-239

Published: 28 September 2005

Abstract

Background

A recent publication described a supervised classification method for microarray data: Between Group Analysis (BGA). This method which is based on performing multivariate ordination of groups proved to be very efficient for both classification of samples into pre-defined groups and disease class prediction of new unknown samples. Classification and prediction with BGA are classically performed using the whole set of genes and no variable selection is required. We hypothesize that an optimized selection of highly discriminating genes might improve the prediction power of BGA.

Results

We propose an optimized between-group classification (OBC) which uses a jackknife-based gene selection procedure. OBC emphasizes classification accuracy rather than feature selection. OBC is a backward optimization procedure that maximizes the percentage of between group inertia by removing the least influential genes one by one from the analysis. This selects a subset of highly discriminative genes which optimize disease class prediction. We apply OBC to four datasets and compared it to other classification methods.

Conclusion

OBC considerably improved the classification and predictive accuracy of BGA, when assessed using independent data sets and leave-one-out cross-validation.

Availability

The R code is freely available [see 1] as well as supplementary information [see 2].

Additional File 1. R code of the OBC algorithm.

Format: R Size: KB Download fileOpen Data

Additional File 2. Further description of the sarcoidosis and tumour data. This files gives details about the optimal subset of genes obtained after OBC.

Format: PDF Size: KB Download file

This file can be viewed with: Adobe Acrobat ReaderOpen Data