Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

To aggregate or not to aggregate high-dimensional classifiers

Cheng-Jian Xu, Huub CJ Hoefsloot* and Age K Smilde

Author Affiliations

Biosystems Data Analysis group, University of Amsterdam, P.O. Box 94215 1090 GE Amsterdam, The Netherlands

For all author emails, please log on.

BMC Bioinformatics 2011, 12:153  doi:10.1186/1471-2105-12-153

Published: 13 May 2011

Abstract

Background

High-throughput functional genomics technologies generate large amount of data with hundreds or thousands of measurements per sample. The number of sample is usually much smaller in the order of ten or hundred. This poses statistical challenges and calls for appropriate solutions for the analysis of this kind of data.

Results

Principal component discriminant analysis (PCDA), an adaptation of classical linear discriminant analysis (LDA) for high-dimensional data, has been selected as an example of a base learner. The multiple versions of PCDA models from repeated double cross-validation were aggregated, and the final classification was performed by majority voting. The performance of this approach was evaluated by simulation, genomics, proteomics and metabolomics data sets.

Conclusions

The aggregating PCDA learner can improve the prediction performance, provide more stable result, and help to know the variability of the models. The disadvantage and limitations of aggregating were also discussed.