Email updates

Keep up to date with the latest news and content from BMC Cancer and BioMed Central.

Open Access Highly Accessed Research article

A gene expression predictor of response to EGFR-targeted therapy stratifies progression-free survival to cetuximab in KRAS wild-type metastatic colorectal cancer

Justin M Balko and Esther P Black*

BMC Cancer 2009, 9:145  doi:10.1186/1471-2407-9-145

PubMed Commons is an experimental system of commenting on PubMed abstracts, introduced in October 2013. Comments are displayed on the abstract page, but during the initial closed pilot, only registered users can read or post comments. Any researcher who is listed as an author of an article indexed by PubMed is entitled to participate in the pilot. If you would like to participate and need an invitation, please email, giving the PubMed ID of an article on which you are an author. For more information, see the PubMed Commons FAQ.

Model refinement on test data constitutes a dangerous case of overfitting

Olivier Gevaert   (2009-08-05 10:09)  University of Leuven, Belgium

Although already pointed out by the reviewers that the model refinement in this article is a dangerous case of overfitting, still the authors overemphasize that they improve prediction of progression free survival (PFS) of their model. In this article, the authors report that a 180-gene signature developed based on lung data and a different EGFR inhibitor, is predictive of therapy response in metastatic colorectal cancer independent of KRAS mutation status. This constitutes an important finding since these results show that their original signature is independent of disease and anti-EGFR monoclonal antibodies. Next, the researchers attempted to reduce the number of genes in the original signature to a suitable number manageable with other methods besides microarray such as qRT-PCR. This is obviously an important step because a smaller set of genes will facilitate clinical applicability of the signature but the authors should have stopped there and not have emphasized that by doing this they improve their predictive accuracy. This is trivial since information from the test set was used to reduce the 180-gene signature to a set of 26 genes. Thus, the results presented in Figure 3 and Table 3 correspond to training set performance and therefore the performance of the 26-gene signature is not known at this moment and can only be verified using another independent test set.

Competing interests

I have no financial or non-financial competing interests in this subject.


Post a comment