Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Dimension reduction with gene expression data using targeted variable importance measurement

Hui Wang1* and Mark J van der Laan2

Author affiliations

1 Department of Pediatrics, Stanford University, MSOB X111, Stanford, CA 94305, USA

2 Division of Biostatistics, University of California Berkeley, 101 Haviland Hall, Berkeley, CA 94720, USA

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2011, 12:312  doi:10.1186/1471-2105-12-312

Published: 29 July 2011

Abstract

Background

When a large number of candidate variables are present, a dimension reduction procedure is usually conducted to reduce the variable space before the subsequent analysis is carried out. The goal of dimension reduction is to find a list of candidate genes with a more operable length ideally including all the relevant genes. Leaving many uninformative genes in the analysis can lead to biased estimates and reduced power. Therefore, dimension reduction is often considered a necessary predecessor of the analysis because it can not only reduce the cost of handling numerous variables, but also has the potential to improve the performance of the downstream analysis algorithms.

Results

We propose a TMLE-VIM dimension reduction procedure based on the variable importance measurement (VIM) in the frame work of targeted maximum likelihood estimation (TMLE). TMLE is an extension of maximum likelihood estimation targeting the parameter of interest. TMLE-VIM is a two-stage procedure. The first stage resorts to a machine learning algorithm, and the second step improves the first stage estimation with respect to the parameter of interest.

Conclusions

We demonstrate with simulations and data analyses that our approach not only enjoys the prediction power of machine learning algorithms, but also accounts for the correlation structures among variables and therefore produces better variable rankings. When utilized in dimension reduction, TMLE-VIM can help to obtain the shortest possible list with the most truly associated variables.