Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Up to now we have not assumed any particular form of distribution for the survival times. That is why Cox's proportional hazard model is described as semi-parametric, and (in part) why it is so heavily used. However, there are two main reasons why we may prefer to use a fully parametric model - in other words one where we assume an underlying theoretical distribution. Firstly, even if the proportional hazards assumption is met, a fully parametric proportional hazards model is more powerful than the Cox regression model providing its assumptions are met. Secondly the proportional hazards assumption may not be met so we then have to use an alternative model type.
The probability density function, f(t), actually describes the distribution of survival times. However, we can also use the cumulative survival function or the hazard function to assess the goodness of fit between a particular theoretical distribution and the data, since all three functions are mathematically related. We start the figure below with three survival
Cumulative survival function: Human populations follow a type 1 curve, with the survival function not decreasing markedly until the oldest age
Hazard function: The type I curve has mortality concentrated late in life. The mortality rate is constant for the type II curve, whilst for the type III curve the mortality rate is much higher early in life.
Probability density function: We have a characteristic peak of survival times late in
In order to build parametric models for survival times, we need distributions that mimic the range of distributions of the probability density functions shown above. If we keep the hazard function constant, we have already seen above that we get our simplest survival model, the exponential distribution:
The exponential distribution
Since the distribution of survival times is very skewed, the mean survival time is not an appropriate summary measure for the data. The median survival time is more suitable and is given by: tm = ln 2/λ. For the plot given above the median survival time works out to 66 days.
Although the exponential distribution provides a good starting point for consideration of possible distributions, its assumptions are too restrictive for most studies. Consequently the related Weibull distribution is more commonly used.
The Weibull distribution
In this distribution we now have two parameters, λ and γ known as the scale and shape parameters respectively. If the shape parameter is equal to 1, a look at the formula shows that it reduces to the exponential distribution - in other words the hazard function is constant. If γ < 1 the hazard function declines over time - if γ > 1 it increases over time. Note however that changes are monotonic - in other words the hazard function cannot increase initially and then decline again.