Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)





In Unit 9 we considered non-parametric methods to analyse and compare survival curves. These methods are heavily used, but they do have their limitations. This is especially so if we have additional information on the characteristics of each individual which may be affecting its survival. We can take some account of such information using the stratified log-rank test - for example we can consider separate survival curves for each age group. But we cannot look at the effect that several such explanatory factors have on the time an individual survives. For that we need a regression approach much like the multiple regression techniques that we have considered in this unit.

The most common approach (although not perhaps the most obvious) is to base the regression model on the proportional hazards assumption. If we are comparing a new treatment with the standard treatment, it is assumed that the ratio of the hazard for an individual on a new treatment to that for an individual on the standard treatment remains constant over time. This applies even if the magnitude of hazards varies over time. The constant ratio is known as the hazard ratio. This assumption is met surprisingly often in medical research - a new treatment tends to change the overall mortality rate, rather than change the pattern of mortality over time. In this approach any explanatory variable acts multiplicatively on the hazard ratio - not directly on the failure time. In the most popular of these models - Cox's proportional hazards model - no underlying distribution of failure times is assumed. In another model - the Weibull proportional hazards model - the failure times are assumed to follow a theoretical distribution known as the Weibull distribution.

In an alternative group of models, the explanatory variables act multiplicatively directly on the failure time. These are known as the accelerated time failure models, and generally do not assume proportional hazards. Failure times are assumed to follow a particular distribution for which there are a number of candidates including the Weibull and gamma distributions. These models are used in ecological applications where the proportional hazards assumption may not be met. For example the event can be germination of seeds. Various treatments can be applied to seeds to break dormancy. The effect of those treatments may be to change the timing of germination, rather than necessarily the eventual germination rate.


Cox's proportional hazards model

The basic model

The most frequently used regression model for survival analysis is Cox's proportional hazards model. We will first consider the model for the 'two group' situation since it is easier to understand the implications and assumptions of the model. We will then extend the model to the multivariate situation.

If we have two groups, one receiving the standard treatment and the other receiving the new treatment, and the proportional hazards assumption is met, then the ratio of the hazard function in one group to that in the other group at any point of time will be constant:

Algebraically speaking -

h1(t)  =  ψ      or      h1(t)  =  ψh0
  • ψ is the hazard ratio,
  • h1(t) is the hazard function at time t for the group receiving the new treatment,
  • h0(t) is the hazard function at time t for the group receiving the standard treatment - this is also known as the baseline hazard function.

The hazard ratio can be regarded as a measure of relative risk. If the hazard ratio is less than 1, the new treatment is superior. If the hazard ratio is less than 1, then the standard treatment is superior.

Because hazards are always positive, the hazard ratio must also be positive. It is more convenient when modelling to have parameters in the model that can vary in an unlimited way so instead of using the hazard ratio (ψ) we use the natural log of the hazard ratio (β). We then rewrite the model in what is called its general form so that it gives the hazard function for an individual at time t rather than for a group. This is a function of (1) the log hazard ratio, (2) an indicator or dummy variable X which defines which group the individual is in, and (3) the baseline hazard. The equation can then be rearranged to give a linear model for the logarithm of the hazard ratio:

Algebraically speaking -

hi(t)   =  exp(βxi) h0(t)

or in linear form:

loge [hi(t)/ h0(t)]   =   βxi where:

  • hi(t) is the hazard function at time t for the ith individual in the study,
  • β is the log of the hazard ratio,
  • xi is the value of the dummy variable X for the ith individual.
    For the two group situation, this is zero if the individual is taking the standard treatment, and 1 if the individual is taking the new treatment ;
  • h0(t) is the baseline hazard function at time t.

This is known as the proportional hazards model for two groups. Note that, unlike the regression models we have encountered so far, there is no constant on the right hand side of the equation other than the regression coefficient β. This is because, were we to include a constant, it could be cancelled out by dividing the baseline hazard function by exp (constant). What we have to do now is to see how we estimate the unknown regression coefficient, and hence fit our data on survival times to Cox's model.


Fitting the model

The coefficient is estimated using the maximum likelihood method in much the same way as we did with logistic regression. But to do this we need the likelihood function for the proportional hazards model.


{Fig. 1}
Let's consider a hypothetical data set comprising two treatment groups each of five individuals suffering from new variant CJD. One group receives the standard treatment, whilst the other receives a new treatment previously untested in humans. The first individual, a member of the standard treatment group, dies on day 20. At the time just before that death all the individuals in the two groups were at risk or are in the risk set. So what is the probability (p1) that the death observed on day 20 was of the subject who did die at that time?

This obviously depends on how many other individuals are in the risk set and their relative risks of death. If all others had the same risk of death, the probability would be one out of ten, or 0.1. But the risk of death varies depending on which group the individual is in. The risk of death for each of the risk set relative to baseline is equal to exp (βxi). Probability theory tells us that the required probability for the first individual to die (p1) is given by exp ( βx1) divided by the sum of all the exp (βxi) for other members of the risk set.

A similar probability (p2 to p10)can be calculated each time a death occurs with the risk set steadily decreasing in size. These probabilities are then combined by multiplying them together to obtain the likelihood function for the proportional hazards model. Because probabilities tend to be very small, we usually use the log likelihood function which is obtained by adding together the natural logarithms of the probabilities. All that then remains is to determine the value of β which maximises this function - in other words gives the maximum likelihood. This is usually done by software such as R using an iterative procedure such as the Newton-Raphson method. However, with the two group situation it is feasible to obtain the estimate just by substituting a range of probable values into the likelihood function as we have done below. The full calculations for three of the points in the graph below are given in this note:

Algebraically speaking -

L(β)   =   Π exp (βxj)
Σi ∈ Rj exp(βxi)
  • L(β) is the likelihood function for β,
  • β is the regression coefficient,
  • xj is the value of the explanatory variable for the individual that died at time j,
  • Σi ∈ Rj exp(βxi) is the sum of the risks for members of the risk set R at time j.

{Fig. 2}

This gives a hazard ratio of 0.1003. Note especially that the actual times of death are not used in maximising the likelihood function - only the rank order of the deaths in each group determines the risk set. Hence the treatment of time is non-parametric. Not surprisingly we therefore have to consider how to deal with ties. What happens if we have two individuals who die at the same time? The conventional approach (termed the Breslow method) is to consider the several deaths at time t are distinct and occur sequentially. Alternative methods include the Efron method and the exact method (all of which are offered in R).

The data we have used to demonstrate maximising the likelihood function contained no censored points, but censored points are readily dealt with. The product of probabilities is only taken over the individuals for whom death times are recorded. Hence the censored individuals do not contribute to the numerator of the likelihood function. But they are included in the summation over the risk sets at death times that occur before a censored time.


Significance testing

The preferred method to test the null hypothesis that the hazard ratio is equal to one (β = 0) is to use a likelihood ratio test, much like the G-test in Unit 9. We have already determined the log likelihood for our model which incorporates the explanatory variable. We can also determine the log likelihood for the null model - that model for which there are no explanatory variables in the linear component of the model. We have done this by taking β as equal to 0 (exp β = 1) as shown in the blue column in the table in the note. The log of the likelihood ratio is then the difference between these two log likelihoods. As with the G-test, where all frequencies are large, the natural log of L2 (or twice the log of the likelihood ratio) has an approximately chi-square distribution. In practice we work with minus twice the log of the likelihood ratio as the log of the likelihoods are always negative.

Hence the log likelihood ratio statistic is given by -2 log Lnull model - (-2 log Lfull model).

    For our example this is equal to (15.1044 x 2)-(12.0705 x2) = 6.0678.
The number of degrees of freedom is equal to the difference between the number of β-parameters being fitted under the two models, so in this case df = 1. Checking this value for the χ2 distribution we get P = 0.0137.

The analysis in R including estimation of the hazard ratio and significance testing is given below. We conclude that the explanatory variable has a significant effect on the hazard ratio.

Using R
coxph(formula = surv1 ~ coxp1$treat)

  n= 10, number of events= 10

              coef exp(coef) se(coef)     z Pr(>|z|)
coxp1$treats 2.299     9.966    1.112 2.067   0.0387 *
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

             exp(coef) exp(-coef) lower .95 upper .95
coxp1$treats     9.966     0.1003     1.127     88.14

Rsquare= 0.455   (max possible= 0.951 )
Likelihood ratio test= 6.07  on 1 df,   p=0.01377
Wald test            = 4.27  on 1 df,   p=0.03869
Score (logrank) test = 6.21  on 1 df,   p=0.01267

Algebraically speaking

Z  =  Estimate of β
SE (β)
  =  (Estimate of β)2
SE (β)

As you can see from the R output there are two other ways of testing the significance of the hazard ratio. If you assume the coefficient β is approximately normally distributed, we can use a Z-test to determine if the value of β differs significantly from 0. For this you need the standard error of β which is computed from the second order partial derivatives of the log-likelihood function. Squaring the Z value gives the Wald statistic which is approximately distributed as χ2. In our example we get a standard error of 1.112, which gives a Z-value of 2.06748 (P = 0.0387).

You have probably noticed that the two P-values are rather different - 0.0137 compared to 0.0387. This highlights the point that in these tests we have not met the 'large sample' assumptions of the tests. We only have a sample size of ten 'deaths' observed, so at best the tests can only be regarded as very approximate. For large samples (n > 100) the two methods should give similar results where we have only one explanatory variable. However, the log ratio tests are much to be preferred when dealing with several explanatory variables.

Given that we now have the standard error of the coefficient (and assuming a large sample size), we can also attach an approximate 95% confidence interval, assuming that the maximum likelihood estimate follows a normal distribution. This is given by β ±1.96 SE(β) which for our example is -4.4795 to -0.1205. We can convert back to the hazard ratios using exp(β) which gives us 0.011 to 0.886. The Z-test we have given above is equivalent to determining whether or not this confidence interval includes unity.


Model checking

We look now at the matter of checking the assumptions of the model, in particular the assumption of proportional hazards. We saw previously that we could test this by simply plotting the cumulative survival function for each group against time. If the lines were parallel, we could assume proportional hazards.

There is another way of doing this which tells us not only whether the proportional hazards assumption is met, but also, as we shall see later, informs us about the distribution of failure times. Instead of using the cumulative survival function we use the cumulative hazard function (H). If we could divide the total time period into many very short periods, H would be the sum of the time specific hazard functions.

Algebraically speaking -

H(t)   =   -logeS(t) where:

  • H(t) is the cumulative hazard function,
  • S(t) is the cumulative survival function

In practice we estimate H from our data by taking minus the natural log of the cumulative survival function as shown here. We then plot the natural log of H(t) against either time or the natural log of time - a plot sometimes known (confusingly) as a log minus log plot because one is plotting ln(-ln S(t)) on the y axis. These plots for our example above are shown here:

{Fig. 3}

If the two lines or curves are parallel, then the hazards can be considered proportional. The vertical separation of the two lines gives an approximate estimate of the log hazard ratio.


{Fig. 3a}

Another way to test the proportional hazards assumption (more appropriate with multivariate models) is to plot score residuals (also termed scaled Schoenfeld residuals or beta (t) - see below) against (transformed) time. If the proportional hazards assumption is true, beta(t) will be a horizontal line. The printout gives a test for slope=0.

As expected in this case we can readily accept that the line is horizontal, given that we could draw such a line within the dotted lines showing the 95% confidence band.


{Fig. 4}

Let's see what these plots looks like for a data set where the proportional hazards assumption is clearly not met. These data are the emergence times of seedlings subjected to two different treatments to break dormancy. Since the survival plots cross we already suspect that the hazard ratio is not constant over time. the log cumulative hazard plot confirms our suspicions - clearly the lines are not parallel and we cannot assume proportional hazards.

This example can also be used to demonstrate the dangers of premature censoring. If the experiment had been terminated at day 400, we would have concluded that treatment one produced a more rapid emergence rate - a conclusion not supported by events after day 400.

{Fig. 4a}

We reach the same conclusions about the failure to meet the proportional hazards assumption if we look at the time-dependent coefficient beta(t). Clearly the line is not horizontal. The significance test also indicates a highly significant deviation (P = 0.001).


The multivariate model

Up till now we have only considered the situation where there are two groups, with a dummy variable taking the value 0 or 1 to define whether the individual is in the control or treatment group. The full power of the Cox regression model is realised when we extend it to the multivariate situation. Let us first recap on the assumptions and implications of the model for two groups (that is with a single explanatory variable), before we extend it to cover several explanatory variables:

  1. The proportional hazards assumption means that we are assuming that the explanatory variable only changes the chance of failure - not the timing of periods of high hazard.
  2. The explanatory variable acts directly on the baseline hazard function and not on the failure time, and remains constant over time.
  3. We have not assumed any particular form of probability distribution for survival times.

Multivariate Cox regression is most heavily used in medical research, although it also increasingly used in veterinary and ecological studies. One common use in medical research is to adjust the estimator of the treatment effect in a randomised controlled clinical trial. Use of covariates allows one to deal with any confounding problems if there are any imbalances between the covariate and the treatment group . In addition any factor used as a basis for stratification at randomization should be included in the regression model or we would overestimate the variance, and get an overly conservative test. Cox regression is also used for observational studies on the main risk factors affecting survival of patients suffering from a chronic disease.

How then do we extend the Cox regression model to include several explanatory variables? The explanatory variables are assumed to act multiplicatively on the hazard ratio. So if we have several such variables, the log of the hazard ratio for an individual is equal to the sum of the effects of the explanatory variables:

Algebraically speaking -

hi(t)   =  expβjxji}h0(t)

or in linear form:

loge [hi(t)/ h0(t)]   =   Σ βjxji
  • hi(t) is the hazard of death at time (t) for the ith individual out of n individuals in a study,
  • βj is the log hazard ratio for the jth explanatory variable out of p variables in the study relative to the baseline level,
  • xji is the baseline value of the jth explanatory variable for the ith individual,
  • h0 is the baseline hazard function.

The model accommodates factors with multiple levels as well as variates in the same way as the other linear models that we have considered:

  • For multilevel factors we establish n-1 dummy variables to cope with the n levels of the factor. The factor now has n-1 degrees of freedom. One level of the factor is chosen to be 0, in other words the baseline level.
  • Each variate is included with its own β-coefficient. For variates the baseline hazard function is the function for an individual for whom all the variates take the value zero.
  • We can also include interactions between factors (for example age and sex) and mixed terms formed from a factor and a variate.

In order to fit the multivariate model we have to modify the likelihood function to allow for multiple explanatory variables. We will not go into the details of this aspect, but the modification is straight forward, and model fitting can be carried out on most good statistical software packages. Selecting the 'best' model is then carried out much as in other multiple regression techniques.


Time-dependent variables

Up till now we have assumed that all explanatory variables are measured for each individual at the start of the study, and do not change over time. Whilst this is sufficient for variables that remain constant (such as ethnic group), it may not be so for variables that are time dependent. These may be internal variables that relate to a particular individual in the study such as blood pressure, or external variables such as levels of atmospheric pollutants. Such time-dependent variables can also be introduced into the Cox regression model to give what is known as the updated covariates (proportional hazards) model.

It is important to note at this stage that we are still assuming that the coefficients of the regression model remain constant - it is the values taken by the covariates that are changing. However, since the hazard of death at time t is no longer proportional to the baseline hazard, one could challenge whether it should still be described as a proportional hazards model. The hazard ratio can only be given after a specified time interval, rather than as a constant measure of risk.

The basic model is modified such that the hazard for an individual at time t is dependent on the value of the explanatory variable not at the start of the study but at time t:

Algebraically speaking -

hi(t)   =  expβjxji(t)}h0(t) where:

  • xji(t) is the value of the jth explanatory variable for the ith individual at time t,
  • all other variables are as defined above.

The likelihood function is modified in like manner and the model is fitted in the usual way, although computational time can be greatly increased. The time dependent variable values must be known at the time of each death for all members of the risk set. The only problem comes in knowing exactly what value of the variable to use if they are not being measured very frequently and do not vary in a predictable way. For continuous variables linear interpolation can be used to estimate values between recording occasions. For categorical variables the value from the most recent reading is used.


Checking the multivariate model

  • Log cumulative hazard plots
    We can extend the log cumulative hazard procedure that we used for the two group model to the multivariate situation. This is done by comparing between levels for one variable within each level of the other explanatory variables. If you have any continuous variables these must be grouped into a number of categories for this purpose. As you can imagine, this approach rapidly becomes impractical as the number of explanatory variables increases! This method of checking the assumption of proportional hazards also has the problem that it is subjective - how do we assess whether lines are parallel or not? They will clearly not be precisely parallel because of random variation, so how 'parallel' do they have to be?
  • An objective test for time dependent coefficients
    Clearly an objective way of testing the proportional hazards assumption for any particular variable in the Cox regression model would be preferable. For this we utilise the ability of the model to accommodate time-dependent variables. Assuming the variable to be tested is in the form of an indicator variable(s) X1, we create a new variable(s) X2 by multiplying the indicator variable(s) by time. The new variable is then tested in the model as a time-dependent variable. If the coefficient β is significantly greater than 0 then the hazard ratio is increasing with time; if it is less than 0 it is decreasing with time. If it does not differ significantly from 0 we can assume the hazard ratio is constant at eβ.

  • Examination of residuals
    The other main approach to model checking for the proportional hazards model is to examine residuals. Up till now we have viewed residuals as the differences between observed values and those predicted by the regression model. More generally residuals are values calculated for each individual whose behaviour is known when the fitted model is satisfactory. A variety of different residuals are given for Cox's regression model by the various software packages - unfortunately their interpretation is not as straightforward as for the other regression models we have looked at.

    The most widely used, at least until recently, are known as Cox-Snell residuals. This residual is the estimated value of the cumulative hazard function of the ith individual at that individual's survival time. After adjusting values for those subjects whose survival times are censored , plots of the hazard estimates versus expected exponential values with unit mean indicate if the fit of the model to the observed data is satisfactory. The Cox-Snell residuals can be further modified by subtracting one, so they now have a mean of zero, and multiplied by -1 to give what are known as martingale residuals . In large samples these residuals have an expected value of zero. However, they are not symmetrically distributed about zero so they are still difficult to interpret. This can be rectified by in turn converting the martingale residuals to deviance residuals which are symmetrically distributed around zero. Lastly there is another class of residuals called score residuals which differ from the others in that a separate set of residuals is calculated for each explanatory variable in the model.

    The different residuals (most frequently martingale, deviance or score) can be plotted in a variety of ways to investigate the appropriateness of the model for the data. They can be plotted:

    1. against observation number to give index plots - this will reveal observations not well fitted by the model,
    2. against survival time or the rank of survival time - this enables one to check on departures from proportional hazards,
    3. against individual explanatory variables - this is done to assess whether variable needs to be included and whether a transformation is necessary.
  • Explained variation

    As with logistic regression there are several measures of explained variation (loosely equivalent to R2 for linear regression) which have been suggested for the Cox proportional hazards regression model. Schemper & Stare (1996) suggest there is no uniformly superior measure, especially as it depends on whether the analysis is done on uncensored or censored populations. For uncensored populations, the squared rank correlation between survival time and the predictor is one of the recommended choices. With censored populations, Schemper's measure, Vz, should be considered.



We have spent much of this page discussing checking the various assumptions, but we summarize them here briefly to be consistent with other More Information pages.

  1. The mechanisms giving rise to censoring of individuals must not be related to the probability of an event occurring. This assumption is known as non-informative censoring and applies for nearly all forms of survival analysis.
  2. Survival curves for different strata must have hazard functions that are proportional over time. This is the proportional hazards assumption discussed above for a single explanatory variable and for the multivariate model.

  3. There should be a linear relationship between the log hazard and each covariate. This is best checked using residual plots.

    topics :

    Distributions of failure times