Department of Mathematical Sciences, Cameron University, Lawton, OK, 73505, USA

Cancer Centers of Southwest Oklahoma, Lawton, OK, 73505, USA

Department of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35295, USA

Abstract

Background

We explore the benefits of applying a new proportional hazard model to analyze survival of breast cancer patients. As a parametric model, the hypertabastic survival model offers a closer fit to experimental data than Cox regression, and furthermore provides explicit survival and hazard functions which can be used as additional tools in the survival analysis. In addition, one of our main concerns is utilization of multiple gene expression variables. Our analysis treats the important issue of interaction of different gene signatures in the survival analysis.

Methods

The hypertabastic proportional hazards model was applied in survival analysis of breast cancer patients. This model was compared, using statistical measures of goodness of fit, with models based on the semi-parametric Cox proportional hazards model and the parametric log-logistic and Weibull models. The explicit functions for hazard and survival were then used to analyze the dynamic behavior of hazard and survival functions.

Results

The hypertabastic model provided the best fit among all the models considered. Use of multiple gene expression variables also provided a considerable improvement in the goodness of fit of the model, as compared to use of only one. By utilizing the explicit survival and hazard functions provided by the model, we were able to determine the magnitude of the maximum rate of increase in hazard, and the maximum rate of decrease in survival, as well as the times when these occurred. We explore the influence of each gene expression variable on these extrema. Furthermore, in the cases of continuous gene expression variables, represented by a measure of correlation, we were able to investigate the dynamics with respect to changes in gene expression.

Conclusions

We observed that use of three different gene signatures in the model provided a greater combined effect and allowed us to assess the relative importance of each in determination of outcome in this data set. These results point to the potential to combine gene signatures to a greater effect in cases where each gene signature represents some distinct aspect of the cancer biology. Furthermore we conclude that the hypertabastic survival models can be an effective survival analysis tool for breast cancer patients.

Background

A number of important papers have appeared in recent years using gene expression as a predictor of outcome in cancer patients, and it has become clear this genomic information will greatly improve prognostic capabilities. In the statistical survival analysis, these papers have utilized the semi-parametric Cox proportional hazard model and the Kaplan-Meiers estimator for the survival and hazard curves. One purpose of this paper is to show the advantages that can be gained by utilizing a parametric model, which allows use of explicitly defined, continuous hazard and survival functions for tools in analysis. Parametric models in general have a higher accuracy, and the recently introduced hypertabastic model

Breast cancer patients with similar clinical profiles may experience widely differing outcomes and different responses to therapy, and means for more accuracy in prognosis will fill an important need. The development of variables with more prognostic power was a primary goal in the development of gene expression signatures for breast cancer outcome. Early papers utilizing gene expression to predict the progression of breast cancer determined several distinct categories

More recently researchers

Clinical trials have begun for gene expression signatures in breast cancer

The combined model we form in this paper illustrates how a quantitative prediction of hazard and survival can be formed that incorporates the predictive capabilities of these three gene expression variables. Note that each of these variables has medical significance in breast cancer progression. In our discussion of this model in the Results and discussion section, we explore the role of these variables, how they affect one another in the context of the xmodel, and what information can be gained from variation in the levels of CSR correlation, ErbB2+ correlation, and good or poor seventy gene signature. This analysis and investigation addresses the important issue of how multiple gene expression signatures representing different aspects of the underlying biology can be combined and how they may interact. We have found a partial answer in the context of the given model; however it is far from complete in answering this important question. We claim this is an important issue that should receive further attention and possibly alternative approaches in modeling.

Methods

Here we present the proportional hazard form of the Hypertabastic model, which will be applied in the survival analysis of the breast cancer patients. One important feature of the hypertabastic survival model is the ability of the hazard function to assume many different shapes, in contrast to the Weibull, lognormal, and log logistic distributions. The hypertabastic distribution function is defined as

The hypertabastic proportional hazard model has a hazard function of the form

where h_{0}(t) is the baseline hazard function, given by

and where ^{
β
}
^{
β
})]/_{
k = 1}
^{
p
}
_{
k
}
_{
k
}, where the x_{k} are covariates and the θ_{k} are the associated parameters. Similarly the hypertabastic survival function

where S_{0}(t) is the baseline survival function, given by

For further detail, see

This model is applied to the 295 patient study from the Netherlands Cancer Institute which is presented in

Here we further discuss the different variables that were included as potential covariates in the model. The first class of variables was the clinical variables, including the following: estrogen receptor status (ERS), tumor grade (TG1 and TG2), age (AGE), diameter (DIAM), and lymph node status (LN1 and LN2). The primary gene expression variable we tested was the seventy gene signature (70G) of

In implementation of the hypertabastic survival model to this set of data, we considered the clinical, gene expression, and classification variables described above. We applied a standard stepwise forward selection of variables procedure. In addition since some of the variables are highly correlated, we used a procedure that would ensure no two of the variables considered would have a pairwise correlation of 0.5 or higher. The parameters were estimated using a SAS program, and these parameter estimates were double checked using Mathematica. A SAS program for hypertabastic proportional hazard model using log-time is provided in the Additional file

**Data cancer.**

Click here for file

Once the parameters had been estimated, these values were used in the survival function (2) and hazard function (1). Then Mathematica was utilized to sketch graphs of the hazard and survival functions for the desired cases. Further dynamic analysis of these curves and their derivatives was also made using Mathematica.

Results and discussion

Model based on gene expression and clinical variables

In this section we apply the model selection procedure to determine an effective model to represent the survival of the breast cancer patients in the Netherlands study of

**−2 Log likelihood**

**AIC**

**−2 Log likelihood without covariates**

Hypertabastic

387.755

399.755

467.952

Weibull

399.000

411.000

474.089

Log Logistic

502.126

514.126

544.930

Cox Regression

764.001

772.001

836.598

In Table

**Parameter**

**Estimate**

**Standard Dev.**

**Wald test**

**P-value**

**Hazard ratio**

a (model)

0.7247

0.2888

6.298

0.01209

NA

b (model)

0.6205

0.1244

24.873

6.125 10^-7

NA

c (AGE)

−0.07350

0.01480

24.645

6.891 10^-7

0.9291

d (70G)

1.199

0.3872

9.585

0.001962

3.316

e (CSR)

2.661

0.7025

14.343

0.0001524

14.305

f (CERBB)

1.561

0.7285

4.594

0.03208

4.766

Inclusion of the clinical variables improved the goodness of fit of the model for each of the gene signatures considered, consistent with the results of

In the absence of a combined model, researchers and doctors are already aware of the possibility for several important variables to point toward different conclusions. Our combined model addresses this question of how much weight to assign to each of several significant variables. This model offers a scientific approach to this issue, based on statistical techniques and quantitative analysis. The added advantage of use of a good-fitting parametric model, such as the hypertabastic survival model, is the ability to analyze the temporal dynamics of the hazard and survival functions, as we illustrate in the remainder of this section. Since two of the gene expression variables are continuous, as given by levels of correlation to an established gene expression, we are also able to investigate the dynamics of hazard and survival with respect to changes in level of gene expression.

Dynamics of survival and hazard

The temporal dynamics of hazard and survival curves for the combined model follow from the above determination of parameter values. In the following we work out the details of this time course, as well as the influence of the covariates, with particular attention to the gene expression variables and their interactions. In order to isolate the effects of one or two of the variables within the combined model we will hold all other variables at a fixed level, usually the median. We begin with the seventy gene signature 70G, both in relation to the other gene expression variables CSR and CERBB, and also in comparison to 70G as a single variable model.

We now analyze the interaction between the seventy gene signature and CSR correlation within our multivariable model, while holding our other variable of ErbB2+ correlation fixed at its median value. The graphs in Figure

Survival function for varying CSR correlation and seventy gene signature

**Survival function for varying CSR correlation and seventy gene signature.**

Notice that when the seventy gene signature has a poor prognosis, the effect of CSR correlation on survival is also magnified. We can determine the maximum rate of decrease in survival probability for each of the cases, and these are given in Table

**Time of min**

**Veloc. at min**

**Survival**

Good prognosis (70G = 0)

CSR min

4.003

−0.004703

CSR max

3.446

−0.04187

Only 70 gene sig.

8.332

−0.007713

Poor prognosis (70G = 1)

CSR min

3.814

−0.01516

CSR max

2.682

−0.1198

Only 70 gene sig.

3.743

−0.05187

**Hazard**

Good prognosis (70G = 0)

CSR min

2.187

0.006365

CSR max

2.187

0.05976

Only 70 gene sig.

5.107

0.009010

Poor prognosis (70G = 1)

CSR min

2.187

0.02111

CSR max

2.187

0.1982

Only 70 gene sig.

5.107

0.07695

We note that in the case of a poor prognosis for the seventy gene signature, the maximum rate of decrease in the survival function occurs sooner in all of the cases. Furthermore, this rate of change has a larger magnitude, indicating a larger rate of decrease in the survival function, when there is a poor prognosis. These graphs also compare the curve in the middle, where 70G is the only covariate with the curves on the outside. For these curves all four variables are included in the model, while the focus is on the variation in CSR correlation from the minimum value to the maximum value, with other variables at median level. Here the differences in shape also come about due to the variation in the values of α and β between these cases, a feature of the hypertabastic distribution allowing greater variability in the location and magnitude of the maximum rate of decrease for the survival functions.

Figure

Hazard function for varying CSR correlation and seventy gene signature

**Hazard function for varying CSR correlation and seventy gene signature.**

For the two correlation variables (CSR and CERBB), an increased level of correlation is associated with a poor outcome, and both cases exhibit the same general profile of more invasiveness, more resistance to treatment, and shorter times until recurrence. In the following we compare the effect of the ErbB2+ correlation (CERBB) to the CSR correlation (CSR) treated above. We note that although there are some similarities, these biological processes measured by the two gene expression variables play different roles in tumor progression. The CSR correlation treated above deals with the role of fibroblasts in both wound healing and tumor progression in cancer and relates to the proposed wound-like phenotype that has been observed in a number of human cancers

The different means of action between ErbB2 and CSR allows for overlap of both these variables in determination of probability of survival. The effect of ErbB2+ correlation (CERB) in the survival model follows approximately the same pattern as the CSR correlation (CSR) described above, although the magnitude is somewhat smaller, as described below. The hazards ratio and p-values for these two variables are comparable when considered individually, with hazard ratios of (45.489) and (30.036) for CSR correlation and ErbB2+ correlation, respectively, and p-values of (1.462 10^-9) and (2.990 10^-7), respectively. However, when considered with all the other variables in the model, these become hazard ratios of (14.305) and (4.766) for CSR correlation and ErbB2+ correlation, respectively, and p-values of 0.0001524 and 0.03208, respectively. The effect of the seventy gene signature on the ErbB2+ correlation will be comparable to the effect on the CSR correlation, as demonstrated above. Thus the ErbB2+ correlation will display the same pattern as the CSR correlation, with a somewhat smaller magnitude due to the difference in hazard ratios. In the following we will also investigate each of these correlations, CSR and ErbB2+, as continuous variables within our overall model. We will also consider the relation between these variables below, where an increase in correlation of one variable can be expected to amplify the effects of the other, as observed above for the seventy gene signature.

The graphs in Figure

Survival function with varying ErbB2+ correlation

**Survival function with varying ErbB2+ correlation.**

**Time**

**Velocity**

Good prognosis

Min ErbB2+

3.929

−0.008628

Max ErbB2+

3.627

−0.02704

O7 only

8.332

−0.007713

Poor prognosis

Min ErbB2+

3.624

−0.02722

Max ErbB2+

3.024

−0.07859

O7 only

3.743

−0.05187

The effect of the ErbB2+ correlation is comparable to that for CSR correlation observed above, although the magnitude is smaller. The difference in 20 year survival rates between the minimum and maximum ErbB2+ correlations are 0.2316 in the case of good seventy gene signature and 0.4097 in the case of poor seventy gene signature. These are just over half of the effect observed for the difference between minimum CSR correlation and maximum CSR correlation, which is 0.4235 for the good seventy gene signature and 0.7021 for the poor seventy gene signature.

In the remainder of the study we further describe interactions between our three gene expression variables, 70G, CSR, and CERBB, in determining the survival function. As the variables for CSR correlation and ErbB2+ correlation are continuous variables, we study the effect of variation of the level of correlation on the survival function. We first investigate separately the effects of each of these correlations, CSR and ErbB2+, in determining the probability of survival beyond ten years. Then, as a function of two variables we are able to investigate the combined effect of these two correlations on the probability of survival beyond ten years. We also use two variables to consider the effect of each of these individual variables in combination with time. In each case we analyze the survival function to explore quantitatively how change in the level of correlation will affect the prognosis and the probability of survival beyond a given time. It is also possible to determine at what time a given correlation will display its largest impact on survival. This analysis will further allow us to compare the influence of these two variables, CSR correlation and ErbB2+ correlation, and how they affect the survival and hazard curves, over time.

We first investigate the role of CSR correlation (CSR) while holding the other variables at median level and assuming a poor prognosis in seventy gene signature (70G). We consider three fixed times, probability of survival past 5 years, past 10 years, and past 20 years. These survival curves, followed by their rates of change, are given in Figure

Survival and hazard at 5, 10, and years, as functions of CSR correlation

**Survival and hazard at 5, 10, and years, as functions of CSR correlation.**

As expected, survival drops off with increasing CSR correlation. The effect from the CSR correlation increases with time, as may also be expected. For survival beyond 5 years, the decrease in survival with increasing CSR correlation occurs at an increasing rate throughout the experimental range of CSR correlations, reaching a maximum rate of decrease of (−0.8387) at the maximum correlation. However at 10 and 20 years, the effect of CSR correlation in decreasing survival is even larger, with a maximum rate of decrease occurring at correlations within the experimental range. The specific values are given in Table

**Time**

**Correlation**

**Velocity**

**Correlation**

**Velocity**

Note: Max[CSR] = 0.455306 and Max[CERBB] = 0.451045 for this data set.

Effect of variation of CSR correlation

5 years

0.6855

−0.9788

Max

−0.8387

10 years

0.3855

−0.9788

0.3855

−0.9788

20 years

0.1498

−0.9788

0.1498

−0.9788

Effect of variation of ErbB2+ correlation

5 years

1.000

−0.5652

Max

−0.3868

10 years

0.6063

−0.5744

Max

−0.5590

20 years

0.2063

−0.5744

0.2063

−0.5744

The hazard function continues increasing for both increasing time and increasing correlation, as we observe in the hazard graphs found in Figure

We now investigate how ErbB2+ correlation affects the probability of survival beyond times of 5, 10, and 20 years. The graphs representing these survival curves appear in Figure

Survival at 5, 10, and 20 years, as functions of ErbB2+ correlation

**Survival at 5, 10, and 20 years, as functions of ErbB2+ correlation.**

To further illustrate the quantitative difference for these two variables, we give Table

**5 year survival:**

**Correlation**

**CSR**

**ErbB2+**

**Correlation**

**CSR**

**ErbB2+**

−0.3

0.9299

0.8967

0.1

0.8101

0.8157

−0.2

0.9095

0.8803

0.2

0.7597

0.7881

−0.1

0.8836

0.8615

0.3

0.6987

0.7570

0

0.8509

0.8401

0.4

0.6263

0.7223

**10 year survival:**

**Correlation**

**CSR**

**ErbB2+**

**Correlation**

**CSR**

**ErbB2+**

−0.3

0.8506

0.7844

0.1

0.625607

0.6354

−0.2

0.8097

0.7528

0.2

0.542263

0.5885

−0.1

0.7592

0.7177

0.3

0.44998

0.5380

0

0.6981

0.6784

0.4

0.352761

0.4845

We consider how the survival function depends on both of these continuous variables. Note that since Table
_{0}] as a three dimensional graph for any fixed value of t_{0}. In Figure
_{0} = 10. The other variables are fixed at median age and a seventy gene signature representing a poor prognosis.

Survival beyond 10 years for CSR and ErbB2+ correlation

**Survival beyond 10 years for CSR and ErbB2+ correlation.**

The dotted and dashed curves along the surface of this graph correspond to the 10 year (dotted) survival curves in Figures

The graph in Figure

Survival as a function of time and correlation

**Survival as a function of time and correlation.**

The comparative effects of CSR correlation and ErbB2+ correlation are obvious from these graphs. At each time change of CSR correlation has a much larger impact as compared to ErbB2+ correlation. Similarly, for each given level of correlation, the decrease of survival percentage with respect to time is much larger for CSR correlation.

Since the function (2) with the parameter values estimated by the model contains all of this information, it is possible to compute probabilities of survival to any time for any given combination of the variables. As a representative examples of the types of computations that can be made, in Table

**10 years**

**20 years**

**20 years | 10 years**

Good prognosis

0.8988

0.8193

0.9116

Poor prognosis

0.7020

0.5164

0.7357

Good prognosis

Low CSR

0.9428

0.8958

0.9502

High CSR

0.8114

0.6769

0.8342

Low ErbB2+

0.9214

0.8583

0.9315

High ErbB2+

0.8420

0.7253

0.8614

Poor prognosis

Low CSR

0.8225

0.6942

0.8440

High CSR

0.5001

0.2742

0.5482

Low ErbB2+

0.7624

0.6024

0.7903

High ErbB2+

0.5654

0.3447

0.6097

In this four-variable model we observed how each of the three gene expression variables influenced the survival and hazard functions for breast cancer patients. For the two continuous gene expression variables, CSR correlation and ErbB2+ correlation, we analyze the effect of changes in levels of gene expression. We were able to assess the combined effect of these variables, or we could look at them separately and compare their effects, such as the above comparison of effects of change in CSR correlation and ErbB2+ correlation. The feature of the hypertabastic survival model of producing explicit hazard and survival functions allowed us to analyze these dynamics. Additionally we are able to compute explicit survival probabilities for any given patient profile. In concluding this survival analysis using several clinical and gene expression variables, we mention our recent work

Conclusions

The new model presented in this article combines several features not included in previous models in survival analysis of breast cancer patients. Through use of the hypertabastic survival model, a parametric model we attain a better fitting model. It furthermore offers explicitly defined hazard and survival functions for use as tools in analysis. As demonstrated in this article, these functions can be used for computation of probabilities, such as those given in the tables above. Furthermore, analysis of the time course of these functions allows scientists to study the time course of the progression of hazard and the decline in survival for these patients. The influence of the variables, collectively or individually, can also be investigated in their role in determining this time course. This analysis illustrates the value of parametric models in survival analysis in cases where a suitable distribution can be found to be close enough to the underlying distribution of the data. We recommend consideration of the hypertabastic distribution as it is shown in

The novel feature of the current model of investigating collective behavior of distinct gene expression variables offers an important new direction of research. The three gene expression variables included in this model originate from three distinct types of gene expression signatures: one signature representing early distant metastasis, one representing the relation of the wound healing microenvironment to that of tumor progression, and the third representing classification of breast cancer tumors into molecular subtype. Furthermore the model gives a means to determine the relative contribution of each variable, quantitatively, in determining survival and hazard. For the two continuous gene expression variables we were also able to investigate the rate of change of hazard and survival with respect to change in the level of gene expression.

By consideration of a wider range of gene expression variables together with clinical variables, this model has moved beyond previous models toward a quantitative assessment of hazard and survival involving all relevant information. These results show the potential to use multiple gene expression signatures to a combined greater effect when the signatures represent different aspects of the cancer biology. We note however that the current model has limitations in its representation of potential interactions between the various gene expression signatures. We feel this issue of interactions among gene expression variables, as well as other variables, is a critical issue for current research. We propose further investigations in this direction, as well as development of new and more refined models designed for this purpose. Certainly the new generation of gene signatures being developed for clinical use

Abbreviations

ErbB2: v-erb-b2 erythroblastic leukemia viral oncogene homolog 2; HER2: Human epidermal growth factor receptor 2; CSR: Core Serum Response; AIC: Akikake Information Criterion; ER: Estrogen Receptor.

Competing interests

The authors declare they have no competing interests.

Authors’ contributions

The work presented in this paper was carried out in collaboration among all authors. M.A.T and W.M.E. applied the hypertabastic proportional hazards model for the breast cancer data, analyzed and interpreted the data, and wrote the paper. N.N and K.P.S. participated in the interpretation and analysis of the data and gave technical assistance. H.L. assisted with running the SAS aspects of the program for the hypertabastic proportional hazards model, as well the log-logistic, Weibull, and Cox regression cases. H.L. also participated in discussion of the results. All authors read and approved the final manuscript.

Acknowledgements

This research was partially supported by the National Institutes of Health grant P30 CA13148.

Pre-publication history

The pre-publication history for this paper can be accessed here: