Open Access Open Badges Research article

Translating microarray data for diagnostic testing in childhood leukaemia

Katrin Hoffmann1*, Martin J Firth2, Alex H Beesley1, Nicholas H de Klerk2 and Ursula R Kees1

Author Affiliations

1 Division of Children's Leukaemia and Cancer Research, Telethon Institute for Child Health Research and Centre for Child Health Research, The University of Western Australia, Perth, Australia

2 Division of Biostatistics and Genetic Epidemiology, Telethon Institute for Child Health Research and Centre for Child Health Research, The University of Western Australia, Perth, Australia

For all author emails, please log on.

BMC Cancer 2006, 6:229  doi:10.1186/1471-2407-6-229

Published: 26 September 2006



Recent findings from microarray studies have raised the prospect of a standardized diagnostic gene expression platform to enhance accurate diagnosis and risk stratification in paediatric acute lymphoblastic leukaemia (ALL). However, the robustness as well as the format for such a diagnostic test remains to be determined. As a step towards clinical application of these findings, we have systematically analyzed a published ALL microarray data set using Robust Multi-array Analysis (RMA) and Random Forest (RF).


We examined published microarray data from 104 ALL patients specimens, that represent six different subgroups defined by cytogenetic features and immunophenotypes. Using the decision-tree based supervised learning algorithm Random Forest (RF), we determined a small set of genes for optimal subgroup distinction and subsequently validated their predictive power in an independent patient cohort.


We achieved very high overall ALL subgroup prediction accuracies of about 98%, and were able to verify the robustness of these genes in an independent panel of 68 specimens obtained from a different institution and processed in a different laboratory. Our study established that the selection of discriminating genes is strongly dependent on the analysis method. This may have profound implications for clinical use, particularly when the classifier is reduced to a small set of genes. We have demonstrated that as few as 26 genes yield accurate class prediction and importantly, almost 70% of these genes have not been previously identified as essential for class distinction of the six ALL subgroups.


Our finding supports the feasibility of qRT-PCR technology for standardized diagnostic testing in paediatric ALL and should, in conjunction with conventional cytogenetics lead to a more accurate classification of the disease. In addition, we have demonstrated that microarray findings from one study can be confirmed in an independent study, using an entirely independent patient cohort and with microarray experiments being performed by a different research team.