Email updates

Keep up to date with the latest news and content from BMC Cancer and BioMed Central.

Open Access Research article

Statistical techniques to construct assays for identifying likely responders to a treatment under evaluation from cell line genomic data

Erich P Huang1, Jane Fridlyand2, Nicholas Lewin-Koh2, Peng Yue2, Xiaoyan Shi2, David Dornan2 and Bart Burington2*

Author Affiliations

1 Biometric Research Branch - Department of Cancer Treatment and Diagnosis, National Cancer Institute, National Institutes of Health. Rockville, MD 20852, USA

2 Genentech, Inc. South San Francisco, CA 94080, USA

For all author emails, please log on.

BMC Cancer 2010, 10:586  doi:10.1186/1471-2407-10-586

Published: 27 October 2010



Developing the right drugs for the right patients has become a mantra of drug development. In practice, it is very difficult to identify subsets of patients who will respond to a drug under evaluation. Most of the time, no single diagnostic will be available, and more complex decision rules will be required to define a sensitive population, using, for instance, mRNA expression, protein expression or DNA copy number. Moreover, diagnostic development will often begin with in-vitro cell-line data and a high-dimensional exploratory platform, only later to be transferred to a diagnostic assay for use with patient samples. In this manuscript, we present a novel approach to developing robust genomic predictors that are not only capable of generalizing from in-vitro to patient, but are also amenable to clinically validated assays such as qRT-PCR.


Using our approach, we constructed a predictor of sensitivity to dacetuzumab, an investigational drug for CD40-expressing malignancies such as lymphoma using genomic measurements of cell lines treated with dacetuzumab. Additionally, we evaluated several state-of-the-art prediction methods by independently pairing the feature selection and classification components of the predictor. In this way, we constructed several predictors that we validated on an independent DLBCL patient dataset. Similar analyses were performed on genomic measurements of breast cancer cell lines and patients to construct a predictor of estrogen receptor (ER) status.


The best dacetuzumab sensitivity predictors involved ten or fewer genes and accurately classified lymphoma patients by their survival and known prognostic subtypes. The best ER status classifiers involved one or two genes and led to accurate ER status predictions more than 85% of the time. The novel method we proposed performed as well or better than other methods evaluated.


We demonstrated the feasibility of combining feature selection techniques with classification methods to develop assays using cell line genomic measurements that performed well in patient data. In both case studies, we constructed parsimonious models that generalized well from cell lines to patients.