Open Access Highly Accessed Open Badges Methodology article

Prediction of the outcome of preoperative chemotherapy in breast cancer using DNA probes that provide information on both complete and incomplete responses

René Natowicz1, Roberto Incitti2, Euler Guimarães Horta13, Benoît Charles1, Philippe Guinot1, Kai Yan4, Charles Coutant5, Fabrice Andre6, Lajos Pusztai4 and Roman Rouzier57*

Author Affiliations

1 University of Paris – Est. ESIEE-Paris, Computer Sciences Department. Cité Descartes BP. 99, 93162 Noisy-le-Grand, France

2 Université Paris 12, Faculté de Médecine, Institut Mondor de Médecine Moléculaire (IFR10), Créteil, F-94000, France

3 Federal University of Minas Gerais, Brazil, Departamento de Engenharia Eletronica, Campus da UFMG (Pampulha), Av. Antonio Carlos, 6627, Belo Horizonte, MG 31270-901, Brazil

4 University of Texas M.D. Anderson Cancer Center, Department of Breast Medical Oncology, Unit 1354, PO Box 301439, Houston, Texas, USA

5 AP-HP, Hôpital Tenon, Department of Gynecology, 4 rue de la Chine, F-75020 Paris, France

6 Institut Gustave Roussy, Breast Cancer Unit, 39 rue Desmoulins, 94805 Villejuif, Cedex, France

7 UPMC Univ Paris 06, UPRES EA 4053, F-75005, Paris, France

For all author emails, please log on.

BMC Bioinformatics 2008, 9:149  doi:10.1186/1471-2105-9-149

Published: 15 March 2008



DNA microarray technology has emerged as a major tool for exploring cancer biology and solving clinical issues. Predicting a patient's response to chemotherapy is one such issue; successful prediction would make it possible to give patients the most appropriate chemotherapy regimen. Patient response can be classified as either a pathologic complete response (PCR) or residual disease (NoPCR), and these strongly correlate with patient outcome. Microarrays can be used as multigenic predictors of patient response, but probe selection remains problematic. In this study, each probe set was considered as an elementary predictor of the response and was ranked on its ability to predict a high number of PCR and NoPCR cases in a ratio similar to that seen in the learning set. We defined a valuation function that assigned high values to probe sets according to how different the expression of the genes was and to how closely the relative proportions of PCR and NoPCR predictions to the proportions observed in the learning set was. Multigenic predictors were designed by selecting probe sets highly ranked in their predictions and tested using several validation sets.


Our method defined three types of probe sets: 71% were mono-informative probe sets (59% predicted only NoPCR, and 12% predicted only PCR), 25% were bi-informative, and 4% were non-informative. Using a valuation function to rank the probe sets allowed us to select those that correctly predicted the response of a high number of patient cases in the training set and that predicted a PCR/NoPCR ratio for validation sets that was similar to that of the whole learning set. Based on DLDA and the nearest centroid method, bi-informative probes proved more successful predictors than probes selected using a t test.


Prediction of the response to breast cancer preoperative chemotherapy was significantly improved by selecting DNA probe sets that were successful in predicting outcomes for the entire learning set, both in terms of accurately predicting a high number of cases and in correctly predicting the ratio of PCR to NoPCR cases.