BMC Bioinformatics

official impact factor 3.03

Open Access Highly Access Methodology article

Bias in random forest variable importance measures: Illustrations, sources and a solution

Carolin Strobl1*, Anne-Laure Boulesteix2, Achim Zeileis3 and Torsten Hothorn4

Author Affiliations

1 Institut für Statistik, Ludwig-Maximilians-Universität München, Ludwigstr. 33, 80539 München, Germany

2 Institut für medizinische Statistik und Epidemiologie, Technische Universität München, Ismaningerstr. 22, 81675 München, Germany

3 Department für Statistik und Mathematik, Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria

4 Institut für Medizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universtität Erlangen-Nürnberg, Waldstr. 6, D-91054 Erlangen, Germany

For all author emails, please log on.

BMC Bioinformatics 2007, 8:25 doi:10.1186/1471-2105-8-25

Published: 25 January 2007

Additional files

Additional File 1:

R source code. The exemplary R source code includes all function calls and comments on all important options of the randomForest and cforest functions, that were used in the simulation studies and the application to C-to-U conversion data. Please install the latest versions of the packages randomForest and party before use.

Format: R Size: 8KB Download file

Open Data