The ability to predict the response of a patient to a particular course of treatment is particularly important in cancer, where the treatment options available often have a narrow therapeutic index. In an effort to strike the best balance for an individual patient between targeting the cancer and the unwanted side effects on healthy tissues, researchers have looked to genetic biomarkers for clues to how a patient’s body will respond. Stephanie Huang and colleagues from the University of Chicago, USA, present a novel method for predicting this response in patients based solely on the tumor gene expression profiles prior to treatment. Here Huang discusses how their method, published in a recent study in Genome Biology, fared when tested against existing clinical trial data, and the implications of their findings for clinical practice.
How does the approach taken in your study differ from previous studies into cancer biomarkers for the prediction of chemotherapeutic responses?
Our model was built based on cell line data, in which whole genome gene expression was fitted against drug sensitivity measurements obtained from a large panel of cell lines. The relationship matrix between every gene and drug sensitivity was then applied to expression levels obtained from a patients’ tumor prior to treatment to predict the patients’ response to that drug. Our method captures clinical drug response in multiple independent datasets from completely different cancer types and drugs, using no prior biological knowledge. Because the models were developed on cell lines, such a method could be easily extended to various drugs/compounds of interest.
For example, in drug development, our methods could be used to enrich for drug responders without exposing patients to highly toxic agents. The fact that we saw a strong performance from statistical models that allow small contributions from every gene also supports the idea of ‘omics’ level prediction, where a very large number of molecular markers are incorporated in a complex model, rather than prediction from a single nucleotide polymorphism (SNP) or a small scale gene signature.
Your method was developed using gene expression microarray data from almost 700 cell lines. How did you account for the innate differences in gene expression between cell lines and primary tumor tissue?
The innate difference between gene expression in cell lines and primary tumor tissue was corrected using a method that previously applied to batch correction in microarray experiments. This employs an empirical Bayesian approach to standardize the mean and variance for each gene across samples. This data homogenization is a critical step to our approach. These ‘standardized’ gene expression levels are then fitted in a whole-genome ridge regression model, which captures substantial variability in in vivo drug response.
How did your method compare when tested against existing clinical trial data? Were these results expected or surprising?
We validated our approach in three independent clinical trial datasets, and obtained predictions approximately as good, or even better than, gene signatures derived directly from the trials. At first these results were surprising, but there are some interesting considerations. The cell line training data included many more samples than any of the clinical datasets, thus offering improved power. It is also possible that on cell lines, drug response is measured with greater precision when screened in a controlled environment. Statistical models developed on a very large clinical dataset, with a precisely measured drug response phenotype, should undoubtedly outperform models developed on a comparable number of cell lines, but there are clear practical and ethical considerations and it is thus extremely difficult and expensive to obtain such clinical data. In comparison, cell lines offer a cheap, readily available model system on which drugs can be quickly screened at no risk to patients.
Our results clearly show that by correctly analysing/integrating the data, cell lines may often offer a practical and useful alternative approach. Furthermore, the cell lines employed in our model construction represent a collection of different types of cancer; while clinical trials often focus on a specific type of cancer. The drug sensitivity data obtained in various cancer types is likely more informative for predicting in vivo drug response.
Your method is based solely on whole-genome gene expression data. Do you think the absence of other parameters (e.g. cancer type, drug mode of action, other genomic aberrations) affects the power of your model?
It is likely that in future our approach could be improved by incorporating additional predictors in the models. However it should be noted that expression data acts as a surrogate for many unmeasured molecular phenotypes (e.g. tissue of origin, genomic aberrations), and so it remains debatable whether in many cases performance could be drastically improved with this additional information.
We have presented some evidence that a more rigorous quantification of the transcriptome (e.g. through RNA-seq) or incorporating additional ‘omic’ information (e.g. microRNA expression) could be particularly valuable in improving prediction accuracy. In addition, data obtained from more sophisticated cell line drug sensitivity screening (e.g. quantifying drug sensitivity under different microenvironments) may also improve prediction accuracy.
Efforts to leverage current cancer treatments has led to combination therapies. Do you think your method could be used to predict treatment responses in instances where more than one drug is administered?
This is a great question and is something that we are actively pursuing. We are working with our clinical colleagues to test the applicability of our approach in clinical trials where patients were treated with multiple drugs. It is likely that our method could capture the additive effect of several drugs, however, given the current model system (i.e. panels of cell lines treated with a single drug), interactions (such as synergic effects) between different drugs, or drugs and treatment regimes (e.g. radiation) would be missed. In order to capture these types of interactions a suitable model system would be required. It may be possible to develop such a system using cell lines, mouse or even clinical data (for already established multi-drug regimes).
In essence, developing similar statistical models for multi-drug regimes would be no different than for single drugs, it would simply require a suitable model system, accurately measured molecular and drug response phenotypes and the correct type of machine learning algorithm to relate these data to each other.
How feasible it is to translate your method to the clinic, in terms of the time, cost, and expertise/resources needed?
Given the falling price of gene expression quantification and the fact that expression based diagnostics are already used in several clinical settings (e.g. OncoType DX for breast cancer and colon cancer), it is feasible to incorporate whole genome expression into clinics. We are working on building user friendly software that employs the workflow presented in our study. This will compute the expression-drug sensitivity relationships and allow a clinician to upload patients’ tumor expression, thus obtaining predicted drug sensitivity for a group of patients. Of course, prospective studies are needed to resolve the issue of multiple drug treatment combinations and the interpretation of predictions for other drugs etc.
How important is open access to clinical trials data in facilitating your research in this field?
Access to clinical datasets is absolutely key for pharmacogenomics research in general. We could only identify a small number of datasets on which to test our methods and data access issues may have played a role in this. Open access to clinical trials would clearly facilitate more widespread and rigorous testing of this or similar methods.
What’s next for your research?
The immediate follow up work will involve the study of prediction in the combination therapy setting. We are also actively investigating the possibility of improving the predictive power of our approach by incorporating additional ‘omic’ level data (e.g. genomic abnormality and microRNA expression). We are seeking collaboration in conducting prospective clinical trials to explore the real world utility of our method. Furthermore, mechanistic studies are ongoing to examine some of the novel genes/pathways identified through our approach. We believe the key to unravelling a complex trait like drug response lays in obtaining very large quantities of relevant data, that can be leveraged using sophisticated machine learning algorithms, rather than traditional low throughput approaches that have been of very limited success in most cases.