InfluentialPoints.com Biology, images, analysis, design... 

"It has long been an axiom of mine that the little things are infinitely the most important" 

Comparing survival ratesOn this page: MantelHaenszel survival tests The proportional hazards assumption
Survival analysis refers to analysis of data where we have recorded the time period from a defined time origin up to a certain event for a number of individuals. That event is often termed a 'failure', and the length of time the failure We looked at how to estimate survival rates in Unit 1. For example we looked at the results of a randomised clinical trial (data are based on but not identical to study of Burri et al. Standard errors and confidence intervalsWe have already covered the variance for the interval/time specific mortality rates back in Unit 4 when looking at simple proportions. Since the time specific survival rate (p) is derived from the mortality rate (q) using p = 1 − q, the variance of each of these is pq/n. The standard error of these rates is the square root of pq/n, so you can get a 95% confidence interval using the normal approximation of 1.96 times the standard Usually one is more interested in attaching confidence intervals to the cumulative survival probability (S), the hazard function (h) or the density probability function (P). We will look at how to estimate the standard error of the survivorship function here.
For examples we return to the development of encephalopathy in patients treated for sleeping sickness with Remember that the width of these approximate intervals are determined solely by the proportion surviving and the sample size. Proportions close to 1 or 0 will have the smallest standard error, but the confidence intervals will be unreliable. The rule we adopted previously in Unit 5 was that the normal approximation was only valid if pqn is greater than 5. For the early part of the curve this will be the case when S (=p) is very close to 1. Examination of the intervals in the figure will show the upper limit of the first interval to be greater than 1.0, clearly an impossible value. The problem gets much more serious for the lower part of the curve, as the proportion surviving (=q) decreases below 0.3, and few survivors (=n) remain. In the past these problems were sometimes minimised by using a double log transformation. This will always give values within the permissible range, but does little for the accuracy of the intervals when numbers are low. Various score intervals are available, but probably the best approach is to use bootstrap We can get a rough comparison of survival curves by attaching confidence intervals to each cumulative survival estimate and comparing the two step plots visually. Even with this simple method we can see there is no evidence for any difference between the two survival plots shown above. But we need a more rigorous way to compare survival curves. This is done using the MantelHaenszel method for combining results from multiple 2×2 contingency tables. MantelHaenszel survival tests
The first step is to combine the two separate group life tables (shown to the right) into a single combined table (shown below). The first events are recorded on day 3, when there is one event out of 250 patients for each group. These are entered into the combined table as the first row. Then on day 4 there were two events out of 249 patients for group 1, but no events out of 249 patients for group 2. These results therefore comprise the next row in the combined data table. For day 5 the position is reversed and there are no events out of 247 patients in group 1 and two events out of 249 patients for group 2. This combined data table is shown below. Each row can then be displayed as a 2x2 contingency table  those containing the data from the first three rows are shown to the right of the combined data table:
We then deal with the series of thirteen 2x2 contingency tables with Mantel Haenszel The MantelHaenszel chi square statistic is obtained by summing the observed number of events (14), the expected number of events (14.0203) and the variances (6.9572); squaring the difference between observed and expected totals; and dividing this by the summed variance to give a chi square value. In this case we get a value of 0.00006 (df=1) for which P = 0.994. Hence there is no evidence for any difference between these two survival curves, and we can accept the null hypothesis. This test can be extended to other more complex situations. For example three or more (k) groups can be compared in exactly the same way except that the final MH statistic is tested with k − 1 degrees of freedom. Alternatively a stratified test can be used. Data can be split into strata defined by the level of some confounding variable such as age, or by different sites in a multicentre study. Before we look at the assumptions of this test we will briefly consider another test which is very similar to the logrank test  this is known as the When discussing confidence intervals, we noted that the normal approximation should not be used when numbers have dropped very low. In addition, of course, the intervals will tend to be much wider  in other words we have less confidence in our estimate of the true value of S. But the logrank test gives the same weight to each observation irrespective of the value of n. As a result chance differences when there are few survivors may bias the result of the test. The Wilcoxon test is simply a weighted log rank test, where the contribution each 2x2 table makes to the total is weighted by the total number at risk (N = n_{1} + n_{2}) at the start of the interval.
One then has to ask which of these tests is the more appropriate for the data being analysed. The log rank statistic weights each event equally. The Wilcoxon statistic places more weight on the early events and so is less sensitive to events that occur later on. Some argue that which test is to be used should be specified in advance, as there is otherwise a risk of bias in choosing that statistic most likely to give a significant result. An alternative view is that the choice of tests depends on whether certain assumptions are met. The most important of those assumptions  and one that we shall meet again when we look at parametric approaches to comparing survival curves  is that of proportional hazards. If the proportional hazards assumption is met then the most powerful test is the logrank test; if not then one of the weighted log rank tests is more appropriate. The proportional hazards assumptionFor the logrank test to be valid it is assumed that the relative probability of an event between groups remains constant over time. In other words if an event is twice as likely to occur in group one as in group two in the first time interval, it should also be twice as likely in all other time intervals. Note that this assumption is identical to the homogeneity or 'no interaction' assumption we made before when we used the MantelHaenszel test of association. We can see below two hypothetical survival curves of numbers against time. Numbers in the 'new treatment' group decline more slowly than those in the 'standard treatment' group, and the curves do not cross. We have set the hazard functions to steadily increase over time with the hazard function in the 'new treatment' group (h_{1}) set at half that in the standard treatment group (h_{0}). If we plot the natural log of the hazard function against time, we get two parallel curves separated by the log of the ratio of the two hazards. Because the curves are parallel, the proportional hazards assumption is met. In real life, plots of the hazard function can be very difficult to assess because of random variation in the number of events at any particular point in time. It makes more sense to instead use cumulative functions which show much less The problem with a graphical approach is that it cannot cope with random variation in small samples. It is, for example, possible for the lines to cross even when the assumption is met. Look at our example of encephalopathy events after administration of melarsoprol. There is very little difference between the two survival curves, yet they do cross over. Is this just the result of random variation, or does it mean that the assumptions for our statistical test are not met? One possible way of testing our two survival curves for homogeneity over time would be to use the MantelHaenszel test for
