Email updates

Keep up to date with the latest news and content from BMC Cancer and BioMed Central.

Open Access Highly Accessed Research article

Expression profiling of blood samples from an SU5416 Phase III metastatic colorectal cancer clinical trial: a novel strategy for biomarker identification

Samuel E DePrimo1*, Lily M Wong1, Deepak B Khatry2, Susan L Nicholas1, William C Manning1, Beverly D Smolich1, Anne-Marie O'Farrell1 and Julie M Cherrington1

Author Affiliations

1 Preclinical Research and Exploratory Development, SUGEN, Inc. South San Francisco, CA 94080, USA

2 Bioinformatics, SUGEN, Inc., South San Francisco, CA 94080, USA

For all author emails, please log on.

BMC Cancer 2003, 3:3  doi:10.1186/1471-2407-3-3

The electronic version of this article is the complete one and can be found online at:

Received:19 November 2002
Accepted:7 February 2003
Published:7 February 2003

© 2003 DePrimo et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.



Microarray-based gene expression profiling is a powerful approach for the identification of molecular biomarkers of disease, particularly in human cancers. Utility of this approach to measure responses to therapy is less well established, in part due to challenges in obtaining serial biopsies. Identification of suitable surrogate tissues will help minimize limitations imposed by those challenges. This study describes an approach used to identify gene expression changes that might serve as surrogate biomarkers of drug activity.


Expression profiling using microarrays was applied to peripheral blood mononuclear cell (PBMC) samples obtained from patients with advanced colorectal cancer participating in a Phase III clinical trial. The PBMC samples were harvested pre-treatment and at the end of the first 6-week cycle from patients receiving standard of care chemotherapy or standard of care plus SU5416, a vascular endothelial growth factor (VEGF) receptor tyrosine kinase (RTK) inhibitor. Results from matched pairs of PBMC samples from 23 patients were queried for expression changes that consistently correlated with SU5416 administration.


Thirteen transcripts met this selection criterion; six were further tested by quantitative RT-PCR analysis of 62 additional samples from this trial and a second SU5416 Phase III trial of similar design. This method confirmed four of these transcripts (CD24, lactoferrin, lipocalin 2, and MMP-9) as potential biomarkers of drug treatment. Discriminant analysis showed that expression profiles of these 4 transcripts could be used to classify patients by treatment arm in a predictive fashion.


These results establish a foundation for the further exploration of peripheral blood cells as a surrogate system for biomarker analyses in clinical oncology studies.


Identification of biomarkers that provide rapid and accessible readouts of drug exposure, activity, toxicity or efficacy is becoming increasingly important in the clinical development of novel molecularly targeted therapeutics. Surrogate endpoints can be applied in the assessment of biological activity or clinical responses and perhaps in selection of patients most likely to respond to therapy. Methodologies for large-scale molecular profiling of disease tissues have been well established [1-3] and have been shown to be of utility both diagnostically [4] and prognostically [5]. However, application of such approaches in the assessment of changes induced by cancer therapeutic agents in solid tumors in man has been hindered by limited accessibility or availability of tumor tissue at multiple time points during treatment. We evaluated the utility of patients' blood cells, a readily accessible source of material, for the identification of surrogate molecular markers of biological activity of SU5416, a small molecule kinase inhibitor that is a VEGF receptor (VEGFR) antagonist with anti-angiogenic properties in vitro and in vivo[6,7].

Microarray technologies such as the Affymetrix Genechip® platform facilitate rapid measurement of the expression levels of thousands of transcripts in a single experiment and allow comparison of expression patterns across many samples [8]. Previous reports have described gene expression profiles in blood that distinguish patients with relapsing-remitting multiple sclerosis [9] and systemic lupus erythematosus [10] from healthy controls. We focused on blood cell samples from oncology clinical trials that evaluated SU5416 in patients with metastatic colorectal cancer. We reasoned that peripheral blood cells may serve as a surrogate tissue since VEGF receptors are expressed in certain blood cell types such as monocytes and platelets [11,12] and thus signal transduction in those cells may be directly impacted by SU5416; also, gene expression changes in blood cells caused indirectly as a result of therapy-induced perturbations might also be detected in this approach.

Subjects in the initial investigation were participants in an open-label, multicenter, international Phase III study in which patients were randomized to be treated with either the standard-of-care 5-FU/LV chemotherapy (control arm) regimen alone or 5-FU/LV chemotherapy co-administered with SU5416 (treatment arm) administered twice weekly via intravenous infusion at a dose of 145 mg/m2 (for more detail see [13,14]). Affymetrix expression profiling technology was applied to RNA from matched peripheral blood mononuclear cell (PBMC) sample pairs (before and after treatment) harvested from subjects, for assessment of changes in gene expression that might correlate with SU5416 administration. Here we describe the approach and summarize key findings from the study as well as some of the practical challenges that were encountered. A set of transcripts that correlated with administration of the SU5416 regimen was identified and independently validated in additional clinical samples; discriminant analysis of change in levels of these transcripts demonstrated their potential utility in class prediction. The implications of gene expression profiling applications such as this one in the clinical development of novel molecular therapies are discussed.


Study population

Patient samples were derived from 2 randomized, open-label, multicenter Phase III clinical trials comparing standard of care chemotherapy alone or combined with SU5416 in patients with metastatic colorectal cancer. In both trials SU5416 was delivered twice weekly at a dose of 145 mg/m2 via I.V. infusion. In the first trial (designated Trial A), the standard of care chemotherapy consisted of weekly administration of 5-FU and leucovorin (Roswell Park regimen); in the second trial (designated Trial B), the standard of care chemotherapy consisted of weekly or bi-weekly administration of 5-FU, leucovorin, and Irinotecan (CPT-11). A total of 23 pairs of patient samples were included in Affymetrix microarray expression profiling analysis (2 females and 9 males in the SU5416 treatment arm, and 2 females and 10 males in the control arm). The median patient age was 66 and 65 years for the SU5416 treatment arm and control arm, respectively. For RT-PCR verification experiments, samples from 12 females and 24 males from the SU5416 treatment arm, and 14 females and 17 males from the control arm were used. The median age for these patients was 62 and 60 years, respectively. Clinical response criteria were defined according to RECIST guidelines; briefly, complete response (CR) is defined as complete disappearance of all measurable and evaluable clinical evidence of cancer; partial response (PR) is defined as at least a 50% reduction in the size of all measurable tumor areas, progressive disease (PD) is defined as an increase of ≥25% (compared to baseline or best response) in the size of all measurable tumor areas; and stable disease (SD) is defined as neither sufficient shrinkage to qualify for PR nor sufficient increase to qualify for PD

Patient samples

All clinical samples for biomarker analysis were harvested and handled in accordance with full Institutional Review Board-approved protocol, and study participants had signed the study informed consent prior to any study-related procedures. All blood samples were collected into Vacutainer tubes containing sodium heparin. Ten ml of blood was withdrawn from patients prior to receiving any treatment on day 1 and also prior to dosing at end of cycle 1 (day 56 in Trial A, day 42 in Trial B). For PBMC preparations, blood samples were shipped overnight at ambient temperature to a central processing facility (Quest Diagnostics, Inc., Collegeville, PA, USA) for PBMC isolation via Ficoll gradient method [15]. Purified PBMCs were shipped in RNA lysis buffer (Clontech, Palo Alto, CA, USA) to SUGEN where isolation of total RNA was performed. For Trial B, whole peripheral blood samples were directly frozen at the clinical sites and shipped on dry ice to SUGEN for RNA isolation.

RNA sample processing

Total RNA was purified from PBMC samples using Clontech Nucleospin RNA II kit reagents (Clontech, Palo Alto, CA) and from whole blood samples using MRC TRI Reagent BD (Molecular Research Center, Cincinnati, OH, USA), an adaptation of the Chomczynski single-step method [16], according to the manufacturer's instructions. All sample preparations included a treatment with RNAse-free Dnase. RNA yields were measured by UV absorbance and RNA quality was assessed by agarose gel electrophoresis with ethidium bromide staining for visualization of ribosomal RNA band integrity.

Affymetrix high-density oligonucleotide microarray analysis of PBMC expression profiles

In general, the standard RNA processing and hybridization protocols as recommended by Affymetrix (Santa Clara, CA, USA) were followed in this study; these protocols are available in the Genechip® Expression Analysis Technical Manual. Yields of total RNA for PBMC samples were generally low and for the majority of patients it was not possible to use the standard amount of total RNA (≥ 5 μg) as recommended in the standard protocol. Therefore a double linear amplification approach [17] was used in the generation of cRNA for hybridization. In these experiments, equal amounts of starting material were used for pre- and post-treatment samples from each donor (typically 2 μg). Briefly, the protocol was as follows: double-stranded cDNA was synthesized from total RNA, with Invitrogen Life Technologies SuperScript Choice system reagents (Invitrogen, Carlsbad, CA, USA). The T7-(dT)24 oligomer was used for priming first-strand cDNA synthesis. Double-stranded cDNA product was purified via phenol-chloroform extraction, and then used as template in first round of in vitro transcription (IVT) of cRNA. The IVT reaction was performed with BioArray HighYield RNA Transcript Labeling Kit (Affymetrix) according to manufacturer's protocol, but with substitution of non-biotinylated ribonucleotides for biotinylated ribonucleotides. The cRNA product was then purified with Qiagen spin column clean-up protocol and used as template in second round of cDNA synthesis. This second round of cDNA synthesis was similar to the first round except that random hexamers were used in priming of first-strand synthesis, with T7-(dT)24 oligomer priming the second-strand. The second round of IVT of cRNA was as in the first round but with biotinylated ribonucleotides rather than non-biotinylated ribonucleotides. Purified cRNA was quantitated, chemically fragmented according to Affymetrix protocol, and then hybridized overnight on Human Genome U95A Arrays (which contain probe sets for the detection of approximately 12,600 transcripts). Hybridized arrays were washed and stained with phycoerythrin-conjugated streptavidin detection chemistry in an Affymetrix Fluidics station. Images were scanned with a Hewlett-Packard GeneArray scanner.

Data Analysis

Data files were generated from scanned array images in the Affymetrix Microarray Suite Version 4.0 program. In this program, Average Difference (AD) values serve as relative indicators of the expression level of transcripts represented on the arrays. Average Difference determination relies on difference between background-subtracted signal from perfect match (PM) oligos and corresponding mismatch control (MM) oligos within a probe set representing a given transcript. To enable comparison of all hybridization data, global scaling was applied by multiplying the output of each experiment by a Scaling factor (SF) to make its average intensity equal to a user-defined Target Intensity (which was set at 1500 for these experiments). For comparisons between time points from a single patient, batch files were generated with Microarray Suite. These files contain calculated fold change (FC) values, which represent differential expression between day 56 compared to baseline, and also Difference Calls (DC), which represent a more conservative estimate of differential expression, with qualitative scores assigned to each transcript measurement according to the following system: Increased (I), Marginally Increased (MI), No Change (NC), Marginally Decreased (MD), and Decreased (D).

Subsequent data analysis was performed primarily with Spotfire DecisionSite for Functional Genomics software (version7.0) package and its Array Explorer component (Spotfire, Somerville, MA). Hierarchical clustering analysis and statistical comparisons were included in this step. Further refinement of the data, including filtering by Difference Call scores, was done with the Microsoft Access 97 database analysis program.

SYBR Green quantitative RT-PCR verification of array results

Primers were designed with Primer Express 1.5 software (Applied Biosystems). In all cases, primers were designed to bind within the sequence represented by Affymetrix probe sets (target sequence information available at webcite). Total RNA samples (1 μg) were reverse transcribed to yield first-strand cDNA using the Applied Biosystems Reverse Transcription Reagents protocol (Applied Biosystems, Foster City, CA, USA). The reverse transcription reactions were then diluted 1:5 in distilled H2O. SYBR Green PCR reactions were performed in 96-well optical plates and run in an ABI PRISM® 7700 Sequence Detection System (SDS) machine. For individual reactions, 10 μl of each sample were combined with 15 μl of SYBR Green PCR Master Mix (Applied Biosystems) containing the appropriate primer pair at 350 nM. Data were extracted and amplification plots generated with ABI SDS software. All amplifications were done in duplicate and threshold cycle (Ct) scores were averaged for subsequent calculations of relative expression values. The Ct scores represent the cycle number at which fluorescence signal (ΔRn) crosses an arbitrary (user-defined) threshold. Heat dissociation curve analysis was performed after each SYBR Green run as a test of whether a single product had been generated in each PCR reaction; multiple peaks in the dissociation curves are indicative of multiple PCR products and thus reduced specificity and sensitivity.

Quantitation and statistical analysis of SYBR Green RT-PCR data

The Ct scores for genes of interest for each sample were normalized against Ct scores for the corresponding endogenous control gene, which was the β-glucuronidase (GUS) gene. Relative expression for day 56 compared to day 1 was determined by the following calculation, as described in the Applied Biosytems users bulletin on Relative Quantitation of Gene Expression and in [18]:

Rel Exp = 2-ΔΔCt,

Where ΔΔ Ct = (Ct Target - Ct GUS)day 56 - (Ct Target - Ct GUS)day 1.

The relative expression data for a selected subset of potential biomarkers were tested for differences between the SU5416 (treatment) and the standard of care (control) arms. The Mann-Whitney U Test with a critical alpha level of 0.05 was used for statistical significance. Individual genes observed to be significantly different by Affymetrix analysis and in both sets of SYBR Green RT-PCR experiments were screened as potential biomarker candidates. This subset of potential biomarker candidates was tested subsequently for utility as class predictors to discriminate between the SU5416 and the standard of care arms. Discriminant analysis [19], a multivariate statistical technique, was used for this purpose. The genes were tested individually, using all possible combinations, and by reducing dimensions (Principal Component Analysis) in order to determine the subset of genes (predictor variables) that yielded highest classification accuracy. Cross-validation was used to test the robustness of classification accuracy. Results from three different cross-validations were evaluated to select the best set of predictor biomarkers: (1) jackknife method (dropping one case at a time), (2) randomly splitting the pooled data into two halves, prediction (for building model) and validation (for testing model) sets, and (3) using one trial as prediction and the second trial as validation sets, respectively. All statistical analyses were carried out after natural-log transformation of the data; SYSTAT 9.01 (SPSS, Inc., Chicago, IL, USA) software was used in statistical analysis.


Affymetrix expression profiling of pre- and post-treatment matched PBMC samples

Total RNA was isolated from PBMC prepared from patient's blood samples taken before (pre-dose day 1) and after (pre-dose day 56) administration of SU5416 or corresponding control regimen in a Phase III trial, designated as Trial A. Due to typically low RNA yields, many sample pairs were of insufficient quality for further use (failure rate of at least 1 of 8 samples, or 25% of sample pairs). Only sample pairs in which both day 1 and day 56 samples RNA yields were of 1 μg or greater were used in expression profiling analysis. Samples were hybridized to U95A high-density oligonucleotide arrays. A total of 11 sample pairs from the SU5416 treatment arm and 12 from the control arm were analyzed in the primary dataset. The change in expression (ratio of day 56 measurements to day 1 for each patient as calculated by Affymetrix software) was defined for each transcript; these are referred to as fold change (FC) values. The FC values for the 23 cases were analyzed with Spotfire Decision Site software tools, to compare patients in the SU5416 arm and control arm.

A t-test analysis was used to identify transcripts that were statistically significantly different between the two treatment arms. Over 100 genes with p-values less than 0.02 were identified; however, because there are over 12,000 transcripts measured on the U95A arrays, some of the genes would potentially be identified by chance. To further refine this subset of genes, queries based on Difference Call (DC) status were performed; Difference Calls offer a more stringent but non-numerical measure of differential expression, and are derived from a decision matrix that weighs comparison results from four metrics used in the Affymetrix analysis platform. The data were filtered to identify genes that were 'Increased' (I) or 'Decreased' (D) in a majority of the SU5416 arm cases but not in the control arm. A group of 13 genes that frequently showed increased expression was identified. Figure 1 displays a schema of the DC scores assigned to each gene for each patient sample pair; all cases from the SU5416 arm show induction in at least 6 of the 13 genes. Table 1 lists the number of cases in each arm in which an 'Increased' call was assigned and includes a brief description of putative functions of the gene products. The average fold change of all of these transcripts was higher in the SU5416 arm (the lowest average fold change was 2.6 for hypothetical protein FLJ13052, the highest was 33 for lactoferrin); the range of fold changes was also broader in this category, presumably reflecting variability among patients.

thumbnailFigure 1. Differential expression of candidate biomarker transcripts in patient PBMC at day 56 relative to day 1 of therapy. The diagram is a depiction of the Affymetrix Difference Calls assigned to each day 56:day 1 expression comparison among the patient sample pairs analyzed via GeneChip hybridization analysis. Letters within blocks represent the Difference Call assigned to each relative expression comparison. The abbreviations are: I = Increase, MI = Marginally Increased, NC = Not changed; MD = Marginally Decreased; D = Decreased. Cases in which an Increased or Marginally Increased call is assigned to a day 56:day 1 comparison are shaded in gray. Each column represents a different patient. Column headings in each grid represent patient response assessed at end of first treatment cycle: PR = partial response, CR = complete response, PD = progressive disease.

Table 1. List of potential biomarker transcripts as detected in Affymetrix analysis.

Quantitative RT-PCR validation of differentially expressed transcripts

A subset of these transcripts was chosen for validation by quantitative RT-PCR (qRT-PCR) analysis. Primer sets were designed for 6 of the 13 genes; matrix metalloproteinase-9 (MMP-9), thrombospondin-1 (TSP-1), CD24, defensin α3, lipocalin-2 (LNC2), and lactoferrrin (see Table 2). These 6 genes were chosen based on potential functions of encoded proteins (in the cases of thromobospondin 1 and MMP-9, these have known roles in angiogenesis) [20,21], or because of the extent of differential regulation between treatment arms. Also, the lipocalin-2 gene (LCN2) has been reported to be inducible by dexamethasone in murine cells [22]; dexamethasone is one of the premedications administered to patients in the SU5416 arm.

Table 2. Primer sequences used in RT-PCR validation

SYBR Green RT-PCR was used to validate the microarray expression profiling data. SYBR Green is a dye that fluoresces when bound to double-stranded DNA, thus signal is directly proportional to the amount of product formed during PCR amplification [18,23]. This method allows rapid and inexpensive comparison of gene expression across a large number of samples. The qRT-PCR validation was performed with a total of 31 sample pairs, 8 of which had previously been analyzed on Affymetrix U95A arrays and thus allowed a direct comparison of the correlationbetween the 2 transcript profiling methods. Data for each gene was normalized to expression of a housekeeping gene, β-glucuronidase (GUS). Direct comparison of SYBR Green RT-PCR results and Affymetrix results from the same RNA samples (n = 8 day 1-day 56 pairs) showed an overall qualitative agreement (i.e., same trend of induction or no change detected by both methods for each target) of 73%, as 35 of 48 comparisons were concordant. This number may be an underestimate since results for one patient were inconsistent for all 6 transcripts.

Figure 2 summarizes the results from the RT-PCR validation and compares them to those from Affymetrix analysis. Mann-Whitney U test comparison of SU5416 and control results from both analyses indicates that three of the six genes displayed statistical significance (p-values less than 0.05) based on the SYBR Green RT-PCR data (Table 3); these are CD24, lactoferrin, and LCN2. MMP-9 exhibited a p-value that was close to the significance cutoff and thus was also selected for further analysis. Defensin α3 and TSP-1 were not pursued further. Thus, four of the six initial transcripts selected from Affymetrix expression profiling were confirmed as differentially expressed using a PCR-based approach, in a larger sample set from the same clinical trial.

thumbnailFigure 2. Differential expression of 6 transcripts as measured by microarray and quantitative RT-PCR. The percentage of cases in 5-FU/LV (control) and 5-FU/LV + SU5416 trial arms with increased expression (at predose day 56 relative to predose day 1) of each transcript is displayed. Panel A displays results from Affymetrix analysis and Panel B displays results from SYBR Green RT-PCR verification. For the Affymetrix data, an increase is determined by Difference Call status; for the SYBR Green data, an increase is defined here as relative expression value of 2-fold or greater. A total of 31 sample pairs were used in RT-PCR analysis; 18 were from SU5416 arm (5 PR, 1 CR, 11 PD, and 1 SD response at end of cycle 1), and 13 were from the control arm (9 PR, 3 PD, and 1 SD). The relative expression values as determined in the RT-PCR analysis for each patient are displayed (see Additional File 1).

Table 3. Mann-Whitney U Test comparisons of expression fold changes from SU5416 treatment and control arms (Trial A).

Quantitative RT-PCR validation of differentially expressed transcripts with samples from a second Phase III SU5416 trial

To further confirm these transcripts as potential biomarkers of SU5416 administration, SYBR Green RT-PCR analysis was carried out in samples from a second Phase III trial using SU5416 (Trial B). This trial was also a randomized metastatic colorectal cancer study, comparing a new chemotherapy standard of care, 5-FU/leucovorin/CPT [24,25] to standard of care plus SU5416. RNA was isolated from patients' peripheral blood samples (rather than PBMC) harvested at the beginning (pre-dose day 1) and end (day 42) of cycle 1. In order to test if similar gene expression changes occurred, analysis was performed on 36 sample pairs, 18 from SU5416 treatment arm and 18 from control arm.

Figure 3 summarizes the overall frequency of induction of 2-fold or greater in each arm of the trial. It is clear that these transcripts are more frequently induced at day 42 of treatment in the SU5416 arm than in the control arm. This is supported by statistical analysis, as indicated in results of the Mann-Whitney U Test (Table 4). A visual representation of hierarchical clustering analysis of the qRT-PCR relative expression values from both trials for the each of the transcripts is displayed in Figure 4. This clustering pattern displays the distinction between the SU5416 and control arms based on relative expression data, and also indicates further distinctions among subsets of patients as well as the degree of overlap between trial arms in the clustering pattern. The extent of similarity between the relative expression patterns for each transcript (represented in columns) is also indicated; the pattern of MMP-9 is distinct from the others as it appears in a separate branch in the dendrogram structure. A table containing the raw relative expression measurements included in this dataset can be viewed in 1.

Additional file. Additional file 1

Format: PDF Size: 222KB Download file

This file can be viewed with: Adobe Acrobat ReaderOpen Data

thumbnailFigure 3. Differential expression of four transcripts in a second Phase III trial as measured by quantitative RT-PCR. Percentage of cases in CPT-11/5-FU/LV (control) and CPT-11/5-FU/LV + SU5416 trial arms with increased expression (at predose day 42 relative to predose day 1) of 4 candidate biomarker transcripts in a second SU5416 Phase III clinical trial is displayed. The convention is the same as in panel B in Figure 2. A total of 36 sample pairs was included in this analysis; 18 from the SU5416 arm and 18 from the control arm (8 PR and 10 SD responses at end cycle 1 in each group). The relative expression values for each patient in this group are displayed (see Additional File 1).

thumbnailFigure 4. Hierarchical clustering of relative expression ratios for four biomarker transcripts. This mosaic depicts association between patient samples and relative expression of the 4 potential biomarker transcripts. Natural log-transformed SYBR Green RT-PCR ratio data (relative expression of day 56: day 1) were used in analysis. In the color scheme, higher ratios are indicated in red, lower ones in green (scale ranges from -4 to +4). Results from individual patients are oriented as rows and transcripts are oriented as columns. Red bars on the right side of the map indicate cases from the SU5416 arm. The hierarchical clustering method is average linkage and the distance metric is Euclidean. A table containing the relative expression values that were used in the clustering analysis can be viewed (see Additional File 1).

Table 4. Mann-Whitney U Test of relative expression data fromTrial B

Discriminant analysis of the classification power of candidate biomarkers

An important next step was to test whether relative expression data from these samples could be used in a predictive fashion to classify samples to the appropriate trial arm. In order to assess this, discriminant analysis [19] of the RT-PCR data was performed. The relative expression values from both trials were combined into a single dataset and then natural log-transformed to reduce the scale of the values, making control and treated arms more comparable. When the cases were pooled (n = 67) and subjected to classification prediction, the overall accuracy of assignment to the appropriate trial arm was 84% when lactoferrin, CD24, and LCN2 were used as the predictor gene set (inclusion of MMP-9 slightly reduced the accuracy of cross-validation). Further cross-validation was performed by the jackknife method (which does a series of predictions, randomly removing 1 case from the total each time), and by splitting the data set into 2 random halves (one a 'training' set and the other a 'testing' set).

The results from each of these steps are summarized in Table 5 for a set of 3 of the 4 transcripts that gave the best accuracy percentage. As indicated, it is predicted that expression data from these 3 genes would accurately distinguish SU5416 arm patients from control arm in 67% to 84% of cases. As a further test, the dataset from the first clinical trial was used as the 'training' set and the set from the second as the 'testing' set, as opposed to pooling the two trials and randomly selecting cases. In this scenario, the accuracy in cross-validation was 86% and 77% for the training and testing set, respectively. This suggests that results derived from one trial might be applied prospectively in analysis of subsequent similar trials.

Table 5. Effectiveness of LCN2, CD24, and Lactoferrin as a predictor set for discriminating between the control and SU5416 treatment arms.*


Large-scale gene expression analysis was applied to blood RNA samples from a clinical trial of the signal transduction inhibitor SU5416 to investigate changes in gene expression that might correlate with exposure to this experimental cancer therapy. A set of 4 transcripts (CD24, lactoferrin, LCN2, and MMP-9) was identified, whose expression was significantly induced at the end of one treatment cycle relative to baseline following SU5416 administration. Discriminant analysis indicated that changes in expression of these transcripts predicted the trial arm to which a patient belonged with accuracy as high as 80%.

This work represents a novel approach to clinical biomarker discovery wherein expression profiling of patients' blood cell RNA is utilized as a surrogate readout of dynamic changes occurring in patients bearing solid tumors. A recent report describes an expression profiling approach as applied to bone marrow samples before and after treatment with the tyrosine kinase inhibitor STI-571 in patients with acute lymphoblastic leukemia [26]. This work clearly demonstrates utility of the approach in investigation of drug-resistance development in hematological malignancies. However, extending similar approaches to solid tumor oncology is technically and clinically challenging due to relative inaccessibility of tumor tissue and difficulties in obtaining samples at multiple time points. In our investigation, a relatively small number of transcript level changes were identified that were specific to patients receiving an SU5416 dosing regimen, and 4 of these were independently verified in a larger sample set. Inter-individual variation and heterogeneity of response are two variables that have likely impacted this dataset, and it is possible that a greater number of statistically significant changes might be observed if a larger group of patients were used in the initial Affymetrix analysis. Further, inclusion of more than a single sampling timepoint after initiation of treatment would also likely be informative in future studies, and presumably provide a window to more acute physiological changes triggered by drug exposure. Also, a relatively stringent criterion for significance – the Affymetrix Difference Call – was used in the selection of transcripts that were expressed differently in each trial arm. Mining the dataset with other analytical approaches might lead to a different set of potential biomarkers. However, it is worth noting that we have retrospectively applied Significance Analysis of Microarrays, or SAM, [27] to the Affymetrix dataset: 3 of the 10 most significant genes identified by that method were among the 4 that were verified by RT-PCR (LCN2, CD24, and lactoferrin), with MMP-9 ranked sixteenth. All of the transcripts listed in Table 1 were included among the 27 most significant, with the exception of thrombospondin (ranked 50th).

Independent quantitative RT-PCR verification of initial array hybridization results was performed on larger sample populations obtained from two conceptually similar Phase III clinical trials that used SU5416. Further exploration of the differential expression data for these 4 transcripts suggests that differences in expression are relatively robust and adequate to allow classification of patients into appropriate arms of the trial, with an expected accuracy of greater than 70%. It must be stressed that the subject population size in this qRT-PCR study is not large (n = 67); however, the results are encouraging given the challenges of working with samples from these two large multicenter clinical trials.

These four transcripts are considered to be biomarkers of the SU5416 administration regimen rather than activity of SU5416 specifically; with the available samples we could not exclude the possibility that altered gene expression resulted from exposure to the administration vehicle (Cremophor) or the concomitant premedications (dexamethasone and H1- and H2-blocker antihistamines) rather than SU5416. The mouse homologue of one of the candidate biomarkers, LNC2, has been shown to be regulated at the transcriptional level by dexamethasone in murine cells [22]. However, there is no published evidence that the human LCN2 gene is likewise dexamethasone-inducible, and there is no description of glucocorticoid-responsive elements in the 5' regulatory region of the human LCN2 gene [28]. Further, preliminary investigation in our laboratory, using in vitro cultures of purified PBMCs from healthy volunteers, indicates that neither the LCN2 gene nor any of the other 3 transcripts is directly inducible by acute treatment with either SU5416 or dexamethasone (data not shown). The biological connection between expression of these four transcripts and this therapy regimen remains to be further elucidated. However, the fact that SU5416 is an inhibitor of VEGF-mediated angiogenesis may be of relevance to the expression of MMP-9 mRNA, as the gene encodes a well-characterized enzyme with roles in angiogenesis and tissue remodeling. Also noteworthy is a report describing bovine lactoferrin as an inhibitor of VEGF-mediated angiogenesis in a rat model [29]. Intriguingly, a recent report has identified a high molecular weight urinary MMP to be a complex of MMP-9 and LCN2 (herein referred to as neutrophil gelatinase-associated lipocalin, or NGAL) proteins [30]. Such urinary MMP complexes have been described as useful markers for cancer diagnosis and prognosis. The relevance of the MMP-9/NGAL protein complex to the observed changes in the levels of their transcripts in blood cells remains to be determined.

Investigation of expression patterns of these four genes in samples from clinical trials designed to test other angiogenesis inhibitors will allow further examination of their validity as biomarkers of mechanism or drug exposure. This approach may also be useful in the identification of biomarkers of clinical response. Our initial analysis has not yielded a definitive set of gene expression changes, in PBMC samples, that are correlated with objective response at end of first cycle; this may be related to a limited sample size or features of the specific trial design and patient population. Also, as previously discussed, it may be essential to profile samples harvested at multiple time points during the course of therapy in order to fully capture the variability and extent of dynamic responses specific to a given therapy. However, the methodology will be applied to subsequent clinical trials involving other anti-angiogenesis agents and molecularly targeted therapeutics.

Additionally and importantly, improvements in reagents available for storage and purification of RNA from clinical blood samples should also enhance quality of specimens, especially in cases where specimens are initially harvested at multiple clinical sites; the recently introduced PAXgene blood RNA stabilization reagent is an example of such improvement [31]. The analysis of whole blood rather than purified PBMC as surrogate specimens in future studies may also minimize variations due to sample handling and processing. Nevertheless, the results described here demonstrate that human blood samples can serve as surrogate specimens for biomarker investigations in cancer patients and imply that large-scale gene expression analysis may be a useful approach for characterization of drug activity from clinical trial samples. This expression profiling approach might also both lead to and be enhanced by improvements in oncology trial design, such as more rational and tailored selection of indications or individuals that are most likely to be responsive to a given molecularly targeted therapy.


Large-scale gene expression profiling has been applied to peripheral blood cells harvested before and after treatment from patients participating in a Phase III clinical trial of SU5416. Four transcripts were identified as potential biomarkers of SU5416 administration and verified using a qRT-PCR approach. Discriminant analysis indicated that expression profiles of these transcripts could be used to predict the trial arm to which patients belonged. These results suggest that expression profiling of peripheral blood cells is a valid surrogate approach for biomarker discovery in oncology clinical trials.

Competing interests

All authors are employed by SUGEN, Inc.

Authors' contributions

S. DePrimo planned and performed gene expression profiling laboratory work and data analysis, and drafted the manuscript; L. Wong performed gene expression profiling laboratory work and aided in clinical sample tracking; D. Khatry performed statistical data analysis and discriminant analysis; S. Nicholas performed curation and tracking of clinical samples; W. Manning participated in the supervision and coordination of the study and manuscript preparation; B. Smolich contributed to the conception, design and coordination of the study; A.O'Farrell initiated expression profiling laboratory work and manuscript preparation and participated in the design, supervision, and coordination of the study; J. Cherrington contributed to the conception, design, coordination, and guidance of the study. All authors read and approved the final manuscript.


We would like to thank Helene Yuen and Brian Schryver for technical assistance, Christopher Bauer for guidance in qRT-PCR analyses, and Alyssa Morimoto, Mark Jacobs, Sucha Sudarsanam, and Gerard Manning for helpful discussions and suggestions.


  1. Brown PO, Botstein D: Exploring the new world of the genome with DNA microarrays.

    Nat Genet 1999, 21:33-37. PubMed Abstract | Publisher Full Text OpenURL

  2. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

    Science 1999, 286:531-537. PubMed Abstract | Publisher Full Text OpenURL

  3. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamenschikov A, Williams C, Zhu SX, Lonning PE, Borresen-Dale AL, Brown PO, Botstein D: Molecular portraits of human breast tumours.

    Nature 2000, 406:747-752. PubMed Abstract | Publisher Full Text OpenURL

  4. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J Jr, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Staudt LM, et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.

    Nature 2000, 403:503-511. PubMed Abstract | Publisher Full Text OpenURL

  5. van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH: Gene expression profiling predicts clinical outcome of breast cancer.

    Nature 2002, 415:530-536. PubMed Abstract | Publisher Full Text OpenURL

  6. Mendel DB, Laird AD, Smolich BD, Blake RA, Liang C, Hannah AL, Shaheen RM, Ellis LM, Weitman S, Shawver LK, Cherrington JM: Development of SU a selective small molecule inhibitor of VEGF receptor tyrosine kinase activity, as an anti-angiogenesis agent.

    Anticancer Drug Des 5416, 15:29-41. OpenURL

  7. Shaheen RM, Davis DW, Liu W, Zebrowski BK, Wilson MR, Bucana CD, McConkey DJ, McMahon G, Ellis LM: Antiangiogenic therapy targeting the tyrosine kinase receptor for vascular endothelial growth factor receptor inhibits the growth of colon cancer liver metastasis and induces tumor and endothelial cell apoptosis.

    Cancer Res 1999, 59:5412-5416. PubMed Abstract | Publisher Full Text OpenURL

  8. Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL: Expression monitoring by hybridization to high-density oligonucleotide arrays.

    Nat Biotechnol 1996, 14:1675-1680. PubMed Abstract OpenURL

  9. Ramanathan M, Weinstock-Guttman B, Nguyen LT, Badgett D, Miller C, Patrick K, Brownscheidle C, Jacobs L: In vivo gene expression revealed by cDNA arrays: the pattern in relapsing-remitting multiple sclerosis patients compared with normal subjects.

    J Neuroimmunol 2001, 116:213-219. PubMed Abstract | Publisher Full Text OpenURL

  10. Rus V, Atamas SP, Shustova V, Luzina IG, Selaru F, Magder LS, Via CS: Expression of cytokine- and chemokine-related genes in peripheral blood mononuclear cells from lupus patients by cDNA array.

    Clin Immunol 2002, 102:283-290. PubMed Abstract | Publisher Full Text OpenURL

  11. Sawano A, Iwai S, Sakurai Y, Ito M, Shitara K, Nakahata T, Shibuya M: Flt-1, vascular endothelial growth factor receptor 1, is a novel cell surface marker for the lineage of monocyte-macrophages in humans.

    Blood 2001, 97:785-791. PubMed Abstract | Publisher Full Text OpenURL

  12. Selheim F, Holmsen H, Vassbotn FS: Identification of functional VEGF receptors on human platelets.

    FEBS Lett 2002, 512:107-110. PubMed Abstract | Publisher Full Text OpenURL

  13. Rosen PJ, Amado R, Hecht JR, Chang D, Mulay M, Parson M, Laxa B, Brown J, Cropp GF, Hannah A, Rosen L: A Phase I/II Study of SU5416 in Combination with 5-FU/Leucovorin in Patients with Metastatic Colorectal Cancer.

    Proc Am Soc Clin Oncol 2000, 19:3a (abstract 5D). OpenURL

  14. Rothenberg ML, Berlin JD, Cropp GF, Fleischer AC, Schumaker RD, Hande KR, Culley A, Dorminy C, Donnelly E, Chen J, Schaaf L, Hannah AL: Phase I/II Study of SU5416 in Combination with Irinotecan/5-FU/LV (IFL) in Patients with Metastatic Colorectal Cancer.

    Proc Am Soc Clin Oncol 2001, 20:75a (abstract 298). OpenURL

  15. Tripodi D, Lyons S, Davies D: Separation of peripheral leukocytes by Ficoll density gradient centrifugation.

    Transplantation 1971, 11:487-488. PubMed Abstract OpenURL

  16. Chomczynski P: A reagent for the single-step simultaneous isolation of RNA, DNA and proteins from cell and tissue samples.

    Biotechniques 1993, 15:532-534. PubMed Abstract OpenURL

  17. Eberwine J, Yeh H, Miyashiro K, Cao Y, Nair S, Finnell R, Zettel M, Coleman P: Analysis of gene expression in single live neurons.

    Proc Natl Acad Sci U S A 1992, 89:3010-3014. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  18. Schmittgen TD, Zakrajsek BA, Mills AG, Gorn V, Singer MJ, Reed MW: Quantitative reverse transcription-polymerase chain reaction to study mRNA decay: comparison of endpoint and real-time methods.

    Anal Biochem 2000, 285:194-204. PubMed Abstract | Publisher Full Text OpenURL

  19. Huberty CJ: Applied Discriminant Analysis. New York: Wiley-Interscience; 1994. OpenURL

  20. Nguyen M, Arkell J, Jackson CJ: Human endothelial gelatinases and angiogenesis.

    Int J Biochem Cell Biol 2001, 33:960-970. PubMed Abstract | Publisher Full Text OpenURL

  21. de Fraipont F, Nicholson AC, Feige JJ, Van Meir EG: Thrombospondins and tumor angiogenesis.

    Trends Mol Med 2001, 7:401-407. PubMed Abstract | Publisher Full Text OpenURL

  22. Devireddy LR, Teodoro JG, Richard FA, Green MR: Induction of apoptosis by a secreted lipocalin that is transcriptionally regulated by IL-3 deprivation.

    Science 2001, 293:829-834. PubMed Abstract | Publisher Full Text OpenURL

  23. Schneeberger C, Speiser P, Kury F, Zeillinger R: Quantitative detection of reverse transcriptase-PCR products by means of a novel and sensitive DNA stain.

    PCR Methods Appl 1995, 4:234-238. PubMed Abstract OpenURL

  24. Saltz LB, Cox JV, Blanke C, Rosen LS, Fehrenbacher L, Moore MJ, Maroun JA, Ackland SP, Locker PK, Pirotta N, Elfring GL, Miller LL: Irinotecan plus fluorouracil and leucovorin for metastatic colorectal cancer. Irinotecan Study Group.

    N Engl J Med 2000, 343:905-914. PubMed Abstract | Publisher Full Text OpenURL

  25. Douillard JY, Cunningham D, Roth AD, Navarro M, James RD, Karasek P, Jandik P, Iveson T, Carmichael J, Alakl M, Gruia G, Awad L, Rougier P: Irinotecan combined with fluorouracil compared with fluorouracil alone as first-line treatment for metastatic colorectal cancer: a multicentre randomised trial.

    Lancet 2000, 355:1041-1047. PubMed Abstract | Publisher Full Text OpenURL

  26. Hofmann WK, de Vos S, Elashoff D, Gschaidmeier H, Hoelzer D, Koeffler HP, Ottmann OG: Relation between resistance of Philadelphia-chromosome-positive acute lymphoblastic leukaemia to the tyrosine kinase inhibitor STI571 and gene-expression profiles: a gene-expression study.

    Lancet 2002, 359:481-486. PubMed Abstract | Publisher Full Text OpenURL

  27. Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response.

    Proc Natl Acad Sci U S A 2001, 98:5116-5121. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  28. Cowland JB, Borregaard N: Molecular characterization and pattern of tissue expression of the gene for neutrophil gelatinase-associated lipocalin from humans.

    Genomics 1997, 45:17-23. PubMed Abstract | Publisher Full Text OpenURL

  29. Norrby K, Mattsby-Baltzer I, Innocenti M, Tuneberg S: Orally administered bovine lactoferrin systemically inhibits VEGF(165)-mediated angiogenesis in the rat.

    Int J Cancer 2001, 91:236-240. PubMed Abstract | Publisher Full Text OpenURL

  30. Yan L, Borregaard N, Kjeldsen L, Moses MA: The high molecular weight urinary matrix metalloproteinase (MMP) activity is a complex of gelatinase B/MMP-9 and neutrophil gelatinase-associated lipocalin (NGAL). Modulation of MMP-9 activity by NGAL.

    J Biol Chem 2001, 276:37258-37265. PubMed Abstract | Publisher Full Text OpenURL

  31. Rainen L, Oelmueller U, Jurgensen S, Wyrich R, Ballas C, Schram J, Herdman C, Bankaitis-Davis D, Nicholls N, Trollinger D, Tryon V: Stabilization of mRNA expression in whole blood samples.

    Clin Chem 2002, 48:1883-1890. PubMed Abstract | Publisher Full Text OpenURL

Pre-publication history

The pre-publication history for this paper can be accessed here: