Abstract
Background
As today, there are hundreds of targeted therapies for the treatment of cancer, many of which have companion biomarkers that are in use to inform treatment decisions. If we would consider this whole arsenal of targeted therapies as a treatment option for every patient, very soon we will reach a scenario where each patient is positive for several markers suggesting their treatment with several targeted therapies. Given the documented side effects of anticancer drugs, it is clear that such a strategy is unfeasible.
Results
Here, we propose a strategy that optimizes the design of combinatorial therapies to achieve the best response rates with the minimal toxicity. In this methodology markers are assigned to drugs such that we achieve a high overall response rate while using personalized combinations of minimal size. We tested this methodology in an in silico cancer patient cohort, constructed from in vitro data for 714 cell lines and 138 drugs reported by the Sanger Institute. Our analysis indicates that, even in the context of personalized medicine, combinations of three or more drugs are required to achieve high response rates. Furthermore, patienttopatient variations in pharmacokinetics have a significant impact in the overall response rate. A 10 fold increase in the pharmacokinetics variations resulted in a significant drop the overall response rate.
Conclusions
The design of optimal combinatorial therapy for anticancer treatment requires a transition from the onedrug/onebiomarker approach to global strategies that simultaneously assign makers to a catalog of drugs. The methodology reported here provides a framework to achieve this transition.
Keywords:
Cancer; Personalized medicine; Targeted therapy; Combinatorial therapyBackground
Personalized cancer therapy has been proposed as the next battle in the war on cancer and targeted therapies as the new warfare machinery [1]. Targeted therapies are designed to treat cancers carrying specific molecular alterations. In turn these molecular alterations can be used as companion biomarkers to inform the decision of using, or not using, the targeted therapy to treat a patient [2]. For example, in the context of breast cancer, the level of the receptor tyrosine kinase HER2/neu is used to select trastuzumab (Herceptin; Genentech) as adjuvant therapy [3].
By design, a targeted therapy is expected to be effective in a subset of cancer patients (e.g., trastuzumab in HER2/neu positive breast cancer patients). However, even within this subset, the longterm response may be reduced. Some patients may initially respond to the targeted therapy but later on regress due to the occurrence of secondary molecular alterations. For example, in the context of melanoma, cancers with the BRAF(V600E) mutation can be treated with vemurafenib (Zelboraf, Plexxikon) resulting in outstanding response [4]. However, in about one year most patients regress due to upregulation of compensatory pathways [5,6]. The molecular background of a cancer can also modulate the response to a targeted therapy, even when treatment is suggested by the biomarker. For example, as a difference with melanoma patients, colon cancer patients harbouring the same BRAF(V600E) mutation show a very limited response to vemurafenib [7]. One mechanism explaining this difference is the feedback activation of EGFR upon treatment with vemurafenib and the fact that EGFR levels are higher in colon cancer than in melamoma cells [8].
Although targeted therapies may fail as single agents, they can still be effective when used in combination with other agents. Combinatorial therapy is a rational approach to overcome the failure of single drugs [9]. One hypothesis is that one agent in the combination can cover for the caveats of other agents, increasing the response rate [10]. As for the case of single agents, biomarkers can be used to inform the inclusion of targeted therapies in a drug combination, which we name personalized combinatorial therapy.
The shift from single drug targeted therapy to combinatorial personalized therapies introduces a new challenge. As today, there are hundreds of targeted therapies with their associated biomarkers, some of which are already in use to inform treatment decisions. If we would consider the whole arsenal of targeted therapies as a treatment option for every patient, very soon we will reach a scenario where each patient is positive for several markers suggesting their treatment with several targeted therapies [11]. Given the documented side effects of anticancer drugs, it is clear that such a strategy is unfeasible. A new strategy is needed to optimize the design of combinatorial therapies to achieve the best respond rates with the minimal toxicity. In this work we introduce a methodology to achieve this goal.
Results and discussion
The shift from single drug targeted therapy to personalized combinatorial therapies introduces a new challenge. We need to define a protocol to design the personalized combinations given a catalog of drugs, a catalog of markers and the status of those markers in the patient’s cancer. To formally address this problem we introduce the scheme depicted in Figure 1. We are given as input a cohort of patients together with the status of m markers in those patients. To be more precise, the markers status of each patient is represented by a barcode or Boolean vector X_{i}=(x_{i1},…,x_{im}), where x_{il}=1 when marker l is observed in patient i and 0 otherwise. We are also given as input a set of drugs that are available for anticancer treatment. In the context of personalized medicine we would like to assign markers to a drug to identify the patient subpopulation with the best response rates. Again, to be precise, the marker assignment to each drug is represented by a barcode or Boolean vector Y_{j}=(y_{j1},…,y_{jm}), where y_{jl}=1 if marker l is used to inform the treatment with drug j and 0 otherwise. A drugtosample protocol f_{j}(X_{i},Y_{j}) is used to inform the treatment options, where f_{j}(X_{i},Y_{j})=1 indicates to consider drug j as a treatment option for sample i and f_{j}(X_{i},Y_{j})=0 otherwise. For example, Figure 1 illustrates the protocol where f_{j}(X_{i},Y_{j})=1 if the sample and the drug share a marker in common. Once the treatment options are determined for each sample, we then apply a patient protocol g to choose the personalized therapies for each patient. For example, Figure 1 illustrates the protocol g indicating the treatment with the drug with highest expected response rate among the treatment options identified for each patient (g_{best,1}). Another possibility is to treat with the c drugs with the higher response rates among those suggested for each patient (g_{best,c}).
Figure 1. Optimization of personalized therapies. 1 We are given as input a set of patients, Boolean vectors reporting the markers status on each patient (X) and a set of drugs available for treatment. 2 A Boolean vector reporting the markers that will be used to inform treatment is specified for each drug (Y). 3 Drugs are suggested for the treatment of each patient using a drugtosample protocol depending on the sample and drug markers (f_{j}(X_{i},Y_{j})). In this example a drug is suggested for the treatment of a patient whenever they share at least one marker. 4 Finally, a sample protocol is used to specify the treatment to each patient (g). In this example the best treatment for each sample is selected. Finally, we optimize the marker assignments to drugs (Y_{j}), the drugtosample protocols (f_{j}(X_{i},Y_{j})) and the sample protocol (g) to obtain the maximum overall response rate, as represented in the figure by the arrow.
The current approach to targeted therapies is to assign markers to drugs based either on the target for which the drug was developed or some preliminary study suggesting an increase response rate in patients having the marker. We take a more general approach where the markers are assigned to drugs to maximize the response rate to therapy. To this end, we define the following optimization problem:
Find the drug marker assignments Y_{j}, the drugtosample protocols f_{j }and sample protocol g that maximize the overall response rate O.
Response model
To calculate O we require the probability that each patient responds to a drug when the drug is used as a single agent and some quantification of drug interactions. In the simplest scenario where there are no drug interactions, the probability P_{i} that a patient responds to is personalized therapy is given by the probability that it responds to at least one of the drugs on its personalized combination
where e_{ij}=1 if drug j is included in the personalized therapy of patient i and p_{ij }is the probability that patient iresponds to drug j when the latter is used as a single agent. When interactions are present we can improve on (Eq. 1) after adding correction terms accounting for twodrug interactions and so on
In this equation values of J_{jk}<0 will result in response rates higher than what expected if the drugs do not interact (synergy) while values of J_{jk}>0 will result in response rates lower than what expected if the drugs do not interact (antagonism). We note that antagonism could take place at the level of pharmacodynamics (antagonism at the cellular level) or at the level of pharmacokinetics (antagonism at the drug metabolism level) and the latter may result in increased toxicity.
The average of P_{i }across samples defines the overall response rate O of the personalized combinatorial therapies
We are aware of documented examples of drug interactions in the context of cancer treatment [12]. However, for most combinations we do not have a quantitative estimate of how these interactions affect the response rate. For the purpose of illustrating our methodology, we will use the noninteracting drugs approximation (Eq. 1) in our simulations.
Responsebymarker approximation
In the clinical practice we cannot test the response of each cancer patient to each approved anticancer drug. However, we can estimate the response rate to a drug depending on the present/absence of the markers assigned to that drug. For example, let us consider the case where K_{j }markers are used to inform the treatment with drug j. The patients are divided into groups depending on the status of those markers. We can conduct a clinical trial to estimate the response rate q(j,s) of drug j for each group of patients. Once the q(j,s) are known, we can estimate the response rate to any patient. To be more precise we enumerate the patient groups using the index
where is the list of markers assigned to drug j and x_{l }is the status of the lth marker. Using this notation we obtain the response bymarker approximation
In short, the probability that a given patient i responds to a given drug j is approximated by the estimated fraction of patients that responds to that drug within the group of patients having the same status as patient i for the markers assigned to drug j.
Finding the optimal personalized combinations
We need some procedure to find the optimal treatment combinations. In the Methods section we report a simulated annealing algorithm that performs an exploration of the space of markers assigned to drugs and drugtosample protocols with a gradual increased bias towards improvements on the overall response rate. Although this algorithm may not find the optimal solution, it can provide a good approximation to hard computational problems [13].
Updating the drugtosample protocols
During the optimization procedure we need to explore different marker assignments to drugs and different choices of drugtosample protocols. To this end we need some precise representation of the Boolean functions and the transformations among them. The drugtosample protocols are represented by a Boolean function f_{j}(X_{i},Y_{j}) that returns 0 (do not suggest) or 1 (suggest) depending on the status of the markers assigned to the drug on a given sample. For computational convenience it is easier to write the Boolean functions as , where K_{j} is the number of markers assigned to drug j, is the list of markers assigned to drug j and f_{j} is a Boolean function of K_{j }inputs. Given K markers there are 2^{k} possible input states (x_{1},…,x_{k}), which can be enumerated as follows: . For each of these input states we can set the output o_{a }to 0 or 1. We can enumerate the Boolean functions with K inputs using the mapping . Therefore, we can represent every Boolean function with two indexes (K,b), the first one denoting the number of inputs and the second one the specific Boolean function with Kinputs.
Figure 2a and b show all Boolean functions with one (1,b) and two (2,b) inputs, respectively. Each Boolean function is represented by a truthtable where for each imput the output 0 or 1 is specified. The letters A and B are used to denote the inputs and the b index of each function is indicated on the upper raw of the truthtable. We note that functions where the output is independent of at least one input are not considered, because they can be reduced to a simpler function. For example function (1,0) (Figure 2a) is equivalent to have no markers assigned and function (2,3) (Figure 2b) is equivalent to (1,1) (Figure 2a) after removing the marker B.
Figure 2. Boolean functions and operations among them. a) Boolean functions with one input. Functions with a dashed line are not considered because the output is independent of the input. b) Boolean functions with two inputs. Functions with a dashed line are not considered because the output is independent of at least one input. c) An example showing the removal of the right marker (B) from function (2,2), which can be either result into function (1,0) or (1,1). Since function (1,0) is excluded then the function (1,1) is always chosen. d) Same example but removing the left marker (A). e) All mappings from (2,b) to (1,b’) following removal of the right (B) marker. f) All mappings from (2,b) to (1,b’) following removal of the left marker. e) All mappings from (1,b) to (2,b’) following the addition of a marker. For simplicity, we have chosen the reverse of the rightmarker removal (panel e) as the mapping for the marker addition.
To explore different Boolean functions we change the function, add a new marker or remove one marker. When changing a Boolean function, (K,b)→(K,b’), a new function is selected at random among all considered Boolean functions with the same number of inputs. When removing a marker, (K,b)→(K1,b’), if the drug has one marker then we remove it, the drug will have no markers assigned and, therefore, it will not be considered for the treatment of any patient. If the drug has two markers assigned then we remove one of the two markers and use the transformations illustrated in Figure 2c and d. For example, in Figure 2c we start with the function (2,2) and remove the B (right) input. For this function the output is always 0 when the A (left) input is 1 but the output can be 0 or 1 when the A input is 0. Therefore, (2,2) can be mapped to (1,0) or (1,1) after removing the B input. Since the output of (1,0) is independent of the input state it is not considered. A similar reasoning can be applied to obtain the mappings for function (2,2) when removing the A marker instead (Figure 2d). Applying this approach to every (2,b) function we obtain the mappings in Figure 2e and f. Finally, if a marker is added, (K,b)→(K+1,b’), then we use the mappings in Figure 2g, which are the reverse of (K1,b’)→(K,b) removing the A (left) input. In all cases, when more that one choice is available we choose one of them with equal probability.
Case study
To test our methodology we investigate an in silico case study where we can actually quantify the response of each sample to each drug. The in silico case study is based on in vitro growth inhibition data reported by the Sanger Institute [14]. In the Sanger screen 714 cell lines were tested for their responses against 138 drugs. For several sampledrug pairs the natural logarithm of the drug concentration to achieve a 50% growth inhibition relative to untreated controls (logIC50) was reported. The logIC50 data is missing for 26,031 drugcell line pairs, representing 20% of all drugsample pairs. The missing logIC50 data was imputed using the weighted average approach described in the Methods section. The Pearson Correlation Coefficient (PCC) between the imputed and actual log50s, when the latter were available, was 0.89 (Figure 3).
Figure 3. Comparison between the imputed and reported logIC50 data. The solid line represents the case when the imputed and reported values coincide.
For each cell line the cancer subtype and the status of 47 cancer related genes was also reported, including somatic mutations and copy number alterations. We use as markers the observation of a specific cancer type (e.g., breast cancer), somatic mutations (e.g., TP53:wildtype, TP53:R175H, TP53:R248Q, etc.), and copy number alterations (gene:, gene:0, gene:+ for deletion, normal and amplification, respectively). This procedure resulted in 921 markers. Among those, we retained 181 markers that are observed in at least 10 cell lines.
To each cell line we associate a sample that is fully composed of that cell line. We assume that different drugs are used at different treatment doses because they are active at different concentration ranges. The mean logIC50 of a drug across cancer cell lines is a good estimate of the typical concentration for the drug activity in this in vitro setting. Thus, for each drug we set the treatment logconcentration y_{j}=mean(logIC50)_{j}+logh, where h represents the fold change in the dose. Values of h below 1 represent low dose therapy, while those above 1 represent high dose therapy. In average, cancer cells have IC50s that are about 2 fold lower than those of normal cells [15]. Based on this we assume that the highest tolerated dose is h=2, and that is the dose used for treatment.
We assume that due to variations in drug delivery the actual logdose reaching the cancer cells, denoted by Z_{j}, is different from y_{j}. Pharmacokinetic variables generally follow a normal distribution after a logtransformation [16] and, therefore, we assume that Z_{j }(the logdose) is a random variable following a normal distribution, with mean y_{j }and variance σ. Here σ models variations associated with drug pharmacokinetics in patients. Pharmacokinetic parameters characterizing the steady state plasma drug concentrations and drug clearance rates can vary as much as 2–10 fold [17,18]. To model such variations we will use σ=1,10.
We define a response as the achievement of at least 50% growth inhibition. In this case a sample responds to a drug if Z_{j}>logIC50_{ij }and does not respond otherwise. Under these assumptions, the probability p_{ij }that sample i responds to drug j is given by
where erfc(x) is the complementary error function. When the cell line logIC50_{ij} is much higher than the treatment dose reaching the cancer cells (logIC50_{ij}y_{j}>>σ) then p_{ij}≈0. In contrast, when the cell line logIC50_{ij }is much lower than the treatment dose reaching the cancer cells (logIC50_{ij}y_{j}<<σ) then p_{ij}≈1.
To test a more realistic scenario, we are not going to use the response probabilities in (Eq. 6). Instead, we are going to use the response bymarker approximation in (Eq. 5). To this end, given a drug and its assigned markers, we divide the cell lines into groups depending on the status of those markers, and estimate the response probability of q(j,s) as the average of p_{ij }over all cell lines in that group. To avoid biases from small group sizes, we set q(j,s)=0 for any group with less than 10 samples.
We do not have an estimate of the possible interactions between the 138 drugs in this in silico study. We assume that the drugs do not interact and we approximate the response to a personalized drug combination by (Eq. 1), but replacing p_{ij} by the response bymarker approximation (Eq. 5).
In the optimization problem defined above we could attempt to optimize the marker assignments to drugs, the drugtosample protocols f_{j}(X_{i},Y_{j}) and the sample protocol g. However, to reduce the computational complexity of the problem, we will impose the sample protocol g_{best,c}, assign at most two markers to each drug and optimize over marker assignments to drugs and the drugtosample protocols.
Using the simulatedannealing algorithm we obtained the optimal personalized therapies for the in silico cohort. In general we have no way to warranty that the simulatedannealing algorithm did not get stuck at a local minimum, precluding it from finding the optimal solution. However, by starting at different initial assignments of markers/Booleanfunctions and monitoring the improvement on the solutions found we can get an idea of how close we are from the optimal solution. Figure 4 shows the highest overall response rate (as estimated with the bymarker approximation, O*) as more initial conditions were tested. There are no significant improvements between a 100 and 1,000 initial conditions indicating that the simulatingannealing algorithm is close to the optimal solution.
Figure 4. Convergence of the simulatedannealing algorithm for the in silico study. The overall response rate (as estimated with the bymarker approximation, O*) as a function of the number of initial conditions tried.
We note that in this study we count with the actual response probability of each cell line to each drug. Therefore, we can use as input the optimal personalized combinations obtained by using the response bymarker approximation (Eq. 5) and then calculate the overall response rate using the original cell line response rates (Eq. 6).
When the pharmacokinetic variations are small (σ=1), the predicted overall response rate is as high as 90% when treating with personalized therapies using one drug alone. Then it increases towards 100% as we move to personalized combinations using more drugs (Figure 5a). However, a 10fold increase in the pharmacokinetic variations (σ=10) results in a drop of the overall response rate to about 60% when treating with one drug alone (Figure 5a). This observation indicates that the success of personalized therapy will also depend on the magnitude of pharmacokinetic variations and on our ability to personalize the drug dosage for each patient to counteract those pharmacokinetic variations.
Figure 5. Predictions of the in silico study. Model predictions as a function of the maximum combination size allowed for two values of the pharmacokinetic variations parameter σ. a) The overall response rate. b) Number of drugs used for the treatment of at least one sample.
We note that not all drugs are included in the treatment of at least one sample, resulting in a smaller effective drug catalog (Figure 5b). For all the maximum combination sizes tested, less than 80 out of 138 (58%) of the drugs are needed. Furthermore, beyond personalized combinations of three drugs, we observe a decrease in the number of needed drugs as we increased the maximum allowed combination size (Figure 5b). This observation suggests that the need for only 58% of the drugs will hold for larger combination sizes. We note that the decrease of the needed drugs is unexpected. For example, if the response rates were independent identically distributed random variables then the probability that a drug is selected for the treatment of a samples is c/d, the probability that a drug is selected for the treatment of at least one sample is 1(1c/d)^{s} and the average number of drugs used for the treatment of at least one sample is d* = d[1 − (1 − c/d)^{s}]. Therefore, for independent identically distributed response rates d* increases monotonically with increased the combination size c. The departure from this expectation in Figure 5b could be due to the existence of correlations in the response rates of different drugs when treating different cells lines. Furthermore, we cannot exclude that for large c the simulatedannealing algorithm gets trapped in local optima and that for the actual global optimal d* does increases with increasing c. In any event this discrepancy should motivate future work to obtain theoretical estimates for d* based on the patterns of correlations between the response rates and the ability of the simulatingannealing algorithm to reach the global optimum.
In Table 1 we report the effective drug catalog for the small pharmacokinetic variations case (σ=1) and maximum combination size c=3 drugs. In addition, we report whether those drugs were included in the catalogs for c=1 and 2, showing the percent of samples treated when included and (−) otherwise. Most drugs in the c=3 catalog are also included in the c=1 and 2 catalogs, indicating that there is a core set of drugs that is relevant independent of the maximum combination size allowed. The percentage of samples treated with a given drug in the catalog increases from c=1 to 3. This effect can be explained by the fact that, as we allow combinations of more drugs, a drug can be included in personalized combinations as a second or third choice.
Table 1. The catalog of drugs in the optimized personalized combinatorial therapies
We note that in some instances the marker assigned to a drug coincides with what expected given the known drug target (Table 1, Markers and Target columns). For example, the marker TP53:wt (i.e., TP53 wildtype) is suggested to inform the treatment with nutlin3a. This makes sense because nutlin3a releases TP53 from the inhibition by its negative regulator MDM2 and the outcome of nutlin3a treatment is modulated by the TP53 status [19]. In another case, the marker BRAF:V600E is assigned to the BRAF inhibitor PLX4720 [20]. The marker KRAS:G12D is assigned to another BRAF inhibitor, AZ628, which still makes sense because KRAS is just upstream of BRAF in the RAS/RAF/MAPK/ERK signaling pathway [21]. In another case, the marker ERBB2:0 (i.e., normal ERBB2 copy number) and the Boolean function (1,1) (i.e., suggest in the absence of the marker) are assigned to the ERBB2/EGFR inhibitor BIBW2992, which again makes sense since ERBB2 inhibitors are expected to be more effective in the presence of ERBB2 amplifications [22]. However, in most instances the relation between the assigned marker/Booleanfunction and the known target is not obvious. The best example is the assignment of a tissue type as a marker, rather than the status of the gene coding for the target or another gene in the same pathway.
Conclusions
We have proposed a methodology that optimizes the assignment of companion biomarkers to drugs to achieve the highest possible response rate with the minimal toxicity. The outcome of our methodology is an optimal drug catalog, the assignment of optimal biomarkers to each drug and a treatment decision protocol where a drug is used to treat a patient when the latter is positive for the drug companion biomarker. The application of the treatment decision protocol for every drug in the catalog results in optimal personalized combinatorial therapies for every patient.
An interest future direction is the investigation of the impact of drug interactions. We expect that the optimization approach will favor drugs that synergize with many other drugs in the catalog relative to those that do not interact or antagonize with other drugs in the catalog. At the end, the interplay between manifesting a high response rate in a group of patients and the degree of synergy (or absence of antagonism) with other drugs in the catalog will determine the suitability of a given drug for its use in personalized combinations. The challenge will be to estimate of the degree of synergy/antagonism between current anticancer drugs.
Our methodology is entirely based on estimated response rates given a marker. The latter can be estimated from clinical trails testing each anticancer drug as a single agent, where all patients enrolled are tested for a set of predefined biomarkers. Using this information we can estimate the overall response rate given a marker, for each of the markers considered. In second step, we should select a cohort of patients where the status of all these biomarkers has been determined. This cohort could be, in principle, the union of all cohorts where the drugs were tested as single agents. Using the mutation status of each gene and the estimated response rates given a marker we can estimate the response rate of each patient in an approximate manner. With those estimates at hand we can then apply the methodology introduced here and make a prediction for the optimal drug catalog, the assignment of optimal biomarkers to each drug and a treatment decision protocol where a drug is used to treat a patient when it is positive for the drug marker. Finally, the predicted personalized combinatorial therapy should be tested in a two arms clinical trial to determine how it performs compared to the standard of care.
The optimization scheme introduced here can be generalized in several directions. We can improve the response rate calculation including drug interactions, provided the direction and the magnitude of those interactions is given. Our approach is also suitable for the inclusion of genetic markers affecting drug metabolism [2]. These markers can be included in the optimization scheme as well, provided we specify a model for their impact on the response rate. Further generalizations are also needed to model toxicity. However, these generalizations will result in more complicated models with more parameters, many of which will be difficult to quantify. In the mean time, the simplifications introduced here allow us to implement the personalized combinatorial therapies approach in the clinical context, by routinely sequence a subset of genes on each patient enrolled in clinical trials.
Methods
Simulated annealing algorithm
The simulatedannealing algorithm aims to maximize the overall response rate, or equivalently to minimize E=−sO, where s is the number of samples. The algorithm starts from no markers assigned to drugs (Y_{j}=(0,…,0) for all drugs) and explores random changes of the Y_{j }and the drugtosample protocols f_{j}(X,Y_{j}). At each step of the algorithm, a drug j is selected and, for that drug, either a marker is added or removed or a new drugtosample protocol is selected. Changes are accepted when E decreases, and when E increases they are accepted with probability exp(β(E_{0}E)), where E_{0 }and E are calculated before and after the change, respectively, and β is the “annealing” parameter. β is gradually increased such that, as the algorithm proceeds, changes increasing E are more likely to be rejected. The pseudocode for the simulated annealing algorithm implementation for our specific optimization problem is shown in Figure 6.
Figure 6. Pseudocode for the simulatedannealing algorithm. *This step is introduced to avoid the accumulation of markers in drugs that are not used for treatment.
In our simulations we have chosen the parameters T=10,000d, dt=d, β_{0}=0 and dβ=0.01.
The simulatedannealing algorithm can get trapped in drug marker assignments that are suboptimal. To overcome this limitation we repeat the algorithm several times and report the solution with minimum c. We did not observe significant changes from a 100 to a 1,000 repetitions. The results discussed below are obtained for 1,000 repetitions.
IC50 imputation
The missing logIC50 data was imputed using the weighted average over samples with available data
where NA denotes missing value,
is a weighted distance between samples, and
is the Euclidean distance between drugs based on the available data. The exploration of the parameters (α_{sample}, α_{drug}) in the range (1–22,122) resulted in Pearson Correlation Coefficients (PCCs) between imputed and actual logIC50s, when available, in the range 0.830.89, with a maximum of 0.89 for (α_{sample} = 20, α_{drug} = 3).
Competing interests
The authors declare they have no competing interests.
Author contribution
AV conceived, executed and wrote this work.
References

Trusheim MR, Berndt ER, Douglas FL: Stratified medicine: strategic and economic implications of combining drugs and clinical biomarkers.
Nat Rev Drug Discov 2007, 6(4):287293. PubMed Abstract  Publisher Full Text

Kelloff GJ, Sigman CC: Cancer biomarkers: selecting the right drug for the right patient.
Nat Rev Drug Discov 2012, 11(3):201214. PubMed Abstract  Publisher Full Text

Burstein HJ: The distinctive nature of HER2positive breast cancers.
N Engl J Med 2005, 353(16):16521654. PubMed Abstract  Publisher Full Text

Chapman PB, Hauschild A, Robert C, Haanen JB, Ascierto P, Larkin J, Dummer R, Garbe C, Testori A, Maio M: Improved survival with vemurafenib in melanoma with BRAF V600E mutation.
N Engl J Med 2011, 364(26):25072516. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Nazarian R, Shi H, Wang Q, Kong X, Koya RC, Lee H, Chen Z, Lee MK, Attar N, Sazegar H: Melanomas acquire resistance to BRAF(V600E) inhibition by RTK or NRAS upregulation.
Nature 2010, 468(7326):973977. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Poulikakos PI, Persaud Y, Janakiraman M, Kong X, Ng C, Moriceau G, Shi H, Atefi M, Titz B, Gabay MT: RAF inhibitor resistance is mediated by dimerization of aberrantly spliced BRAF(V600E).
Nature 2011, 480(7377):387390. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Roth AD, Tejpar S, Delorenzi M, Yan P, Fiocca R, Klingbiel D, Dietrich D, Biesmans B, Bodoky G, Barone C: Prognostic role of KRAS and BRAF in stage II and III resected colon cancer: results of the translational study on the PETACC3, EORTC 40993, SAKK 60–00 trial.
J Clin Oncol 2010, 28(3):466474. PubMed Abstract  Publisher Full Text

Prahallad A, Sun C, Huang S, Di Nicolantonio F, Salazar R, Zecchin D, Beijersbergen RL, Bardelli A, Bernards R: Unresponsiveness of colon cancer to BRAF(V600E) inhibition through feedback activation of EGFR.
Nature 2012, 483(7387):100103. PubMed Abstract  Publisher Full Text

Feala JD, Cortes J, Duxbury PM, Piermarocchi C, McCulloch AD, Paternostro G: Systems approaches and algorithms for discovery of combinatorial therapies.
Wiley Interdiscip Rev Syst Biol Med 2010, 2(2):181193. PubMed Abstract  Publisher Full Text

Vazquez A: Optimal drug combinations and minimal hitting sets.
BMC Syst Biol 2009, 3:81. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Harris T: Gene and drug matrix for personalized cancer therapy.
Nat Rev Drug Discov 2010, 9(8):660. PubMed Abstract  Publisher Full Text

Scripture CD, Figg WD: Drug interactions in cancer therapy.
Nat Rev Cancer 2006, 6(7):546558. PubMed Abstract  Publisher Full Text

Kirkpatrick S, Gelatt CD Jr, Vecchi MP: Optimization by simulated annealing.
Science 1983, 220(4598):671680. PubMed Abstract  Publisher Full Text

Garnett MJ, Edelman EJ, Heidorn SJ, Greenman CD, Dastur A, Lau KW, Greninger P, Thompson IR, Luo X, Soares J: Systematic identification of genomic markers of drug sensitivity in cancer cells.
Nature 2012, 483(7391):570575. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Vazquez A: The universe of nrmal and cancer cell line responses to anticancer treatment: Lessons for cancer therapy.
Nature Precedings 2011. Publisher Full Text

Lacey LF, Keene ON, Pritchard JF, Bye A: Common noncompartmental pharmacokinetic variables: are they normaly or lognormally distributed?
Journal of Biopharmaceutical Statistics 1997, 7(1):171178. PubMed Abstract  Publisher Full Text

Galpin AJ, Evans WE: Therapeutic drug monitoring in cancer management.
Clin Chem 1993, 39(11 Pt 2):24192430. PubMed Abstract  Publisher Full Text

Gurney H: How to calculate the dose of chemotherapy.
Br J Cancer 2002, 86(8):12971302. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Vassilev LT, Vu BT, Graves B, Carvajal D, Podlaski F, Filipovic Z, Kong N, Kammlott U, Lukacs C, Klein C: In vivo activation of the p53 pathway by smallmolecule antagonists of MDM2.
Science 2004, 303(5659):844848. PubMed Abstract  Publisher Full Text

Tsai J, Lee JT, Wang W, Zhang J, Cho H, Mamo S, Bremer R, Gillette S, Kong J, Haass NK: Discovery of a selective inhibitor of oncogenic BRaf kinase with potent antimelanoma activity.
Proc Natl Acad Sci U S A 2008, 105(8):30413046. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Downward J: Targeting RAS signalling pathways in cancer therapy.
Nat Rev Cancer 2003, 3(1):1122. PubMed Abstract  Publisher Full Text

Dowsett M, Procter M, McCaskillStevens W, de Azambuja E, Dafni U, Rueschoff J, Jordan B, Dolci S, Abramovitz M, Stoss O: Diseasefree survival according to degree of HER2 amplification for patients treated with adjuvant chemotherapy with or without 1 year of trastuzumab: the HERA Trial.
J Clin Oncol 2009, 27(18):29622969. PubMed Abstract  Publisher Full Text  PubMed Central Full Text