Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site




Distributions of failure times

On this page: The exponential distribution  The Weibull distribution 

Up to now we have not assumed any particular form of distribution for the survival times. That is why Cox's proportional hazard model is described as semi-parametric, and (in part) why it is so heavily used. However, there are two main reasons why we may prefer to use a fully parametric model - in other words one where we assume an underlying theoretical distribution. Firstly, even if the proportional hazards assumption is met, a fully parametric proportional hazards model is more powerful than the Cox regression model providing its assumptions are met. Secondly the proportional hazards assumption may not be met so we then have to use an alternative model type.

The probability density function, f(t), actually describes the distribution of survival times. However, we can also use the cumulative survival function or the hazard function to assess the goodness of fit between a particular theoretical distribution and the data, since all three functions are mathematically related. We start the figure below with three survival functions covering the range of types that may occur:

{Fig. 1}

Cumulative survival function: Human populations follow a type 1 curve, with the survival function not decreasing markedly until the oldest age groups. The type II curve with a constant survival function for the whole lifetime is rare Lastly the type III curve is characteristic of organisms which have low survivorship in the juvenile stages such as fish and most insects.

Hazard function: The type I curve has mortality concentrated late in life. The mortality rate is constant for the type II curve, whilst for the type III curve the mortality rate is much higher early in life.

Probability density function: We have a characteristic peak of survival times late in life for the type I curve. Survival times follow an exponential distribution for the type II curve, whilst for the type III curve we have an extreme right skew.

In order to build parametric models for survival times, we need distributions that mimic the range of distributions of the probability density functions shown above. If we keep the hazard function constant, we have already seen above that we get our simplest survival model, the exponential distribution:


The exponential distribution

Algebraically speaking

f(t) = λexp(-λt)
  • f(t) is the probability density function of survival times
  • λ is a positive constant equal to the hazard function
This distribution has been used to model failure times in biological studies when only a portion of the lifespan is of interest. The biological model that would lead to such a distribution would be if hazards occurred in the environment at random (following a Poisson process) and failure occurs the first time such a hazard is encountered. It provides a model for data which follow a Type II survival curve. The formula for the probability density function is given here - the other functions are defined in this note: To fit the model to a set of data the parameter λ is estimated from the relationship: 1/λ = mean survival time. A graphical representation of the shapes of the three functions when λ is approximately equal to 0.01 is shown below.

{Fig. 2}

Since the distribution of survival times is very skewed, the mean survival time is not an appropriate summary measure for the data. The median survival time is more suitable and is given by: tm = ln 2/λ. For the plot given above the median survival time works out to 66 days.

Although the exponential distribution provides a good starting point for consideration of possible distributions, its assumptions are too restrictive for most studies. Consequently the related Weibull distribution is more commonly used.


The Weibull distribution

In this distribution we now have two parameters, λ and γ known as the scale and shape parameters respectively. If the shape parameter is equal to 1, a look at the formula shows that it reduces to the exponential distribution - in other words the hazard function is constant. If γ < 1 the hazard function declines over time - if γ > 1 it increases over time. Note however that changes are monotonic - in other words the hazard function cannot increase initially and then decline again.

Algebraically speaking

f(t) = λ γtγ -1exp(-λt γ)

  • f(t) is the probability density function of survival times,
  • λ is the scale parameter,
  • γ is the shape parameter.

{Fig. 3}