Worked example 1
Worked example 1

 Survival times (days) of children born with Down syndrome in relation to occurrence of heart defects (CAVD) and leukemia.
 

 With CAVD
  Without CAVD
 

 No.
  Time
  Leuk
  No.
  Time
  Leuk
 
1  37  N  31  28  N

2  55  N  32  61  N

3  73  Y  33  113  Y

4  110  N  34  135  N

5  146  N  35  146  N

6  164  N  36  153  N

7  219  Y  37  183  N

8  310  N  38  256  N

9  329  N  39  292  N

10  475  N  40  336  N

11  730  N  41  365  N

12  949  N  42  548  N

13  1095  N  43  694  N

1430  >3650  N
 44  803  N

   45  913  N

   46  1497  N

   47  1643  N

   48130  >3650  N


These are hypothetical data on the tenyear survival of children born with Down
syndrome ; they are loosely based on a recent study carried out in
Ireland We have focused on two factors known to affect survival of children suffering from this disease  serious heart defects (CAVD)
and leukemia. Note that the researchers also recorded data on various other possible explanatory variables such as birth weight of the child and age of the mother. We have grouped the data according to presence or absence of CAVD. Certain points are apparent from a careful examination of the data:
 Most of the deaths occur within the first year of life (< 365 days),
 Although only 23% of the children suffered from CAVD, these made up 43% of the deaths within the ten year period of the study,
 Leukemia occurred only rarely (2.3% of the total), but all died within the first year of life,
 All observations where survival exceeded ten years are censored  this marked the end of the study period.
We have fitted these data to the Cox regression model using several software packages  some give slightly different results (possible because of using different methods to deal with ties) but we have just presented results using R. CAVD and leukemia were presented as main factors as well as an interaction term between these factors. If we decide to keep the interaction term in the model it means that having both CAVD and leukemia disproportionately affects survival. In most accounts that we have seen, the interaction is not tested for, but it would seem rather important because it is at least possible (if not probable) that the hazard from having both conditions could be higher than the sum of the two effects. We first summarize the output from R giving the fitted parameters for the model along with hazard ratios and Pvalues obtained from the Wald statistics.

Variable
 β
 Standard error
 Hazard ratio
 Wald statistic
 Pvalue


Leukemia  3.570  1.103  35.516  10.478  .001

CAVD  1.048  .392  2.851  7.140  .008

Leuk x CAVD  1.459  1.312  .233  1.237  .266


The output from R is then given below. Note that the hazard ratio is entitled exp(coef) and the test statistic is given as a Z value (so we square these values to get the wald statistic).
The hazard ratios and Pvalues suggest that whilst CAVD and leukemia are significant risk factors, the interaction between the two factors is not significant. So should we drop the interaction term from the model? Well, in this particular case (as we shall see) this would be the right thing to do, but Wald tests should in general not be used as an aid in model selection in multivariate analyses. This is because the individual estimates of the regression coefficients are not independent of one another. Hence the Pvalues for each will change depending on which particular combination is being considered. Instead we should use the likelihood ratios to decide on which variables should be included in the model.
This is done with the proviso that comparisons can only be made of nested models. One model is said to be nested within another if the latter contains all the variables of the former plus at least one other. So, for our analysis, we can compare the fit of a model containing the variable CAVD with one containing both CAVD and leukemia  we cannot directly compare the fit of a model containing only CAVD with one containing only leukemia. As before the degrees of freedom for the likelihood ratio is given by the difference in the number of βparameters in the two models. Hence the comparison of a model containing the variable CAVD with one containing both CAVD and leukemia has 1 degree of freedom.
We can readily obtain the log likelihoods for the different models using R. The first log likelihood is for the null model (142.3934), the second is for the particular model under test.
Model testing proceeds as follows:

Model #  Variables in model  2LogL


1  Null model  284.787

2  CAVD  276.362

3  leukemia  273.614

4  CAVD leukemia  268.105

5  CAVD leukemia interaction  267.164


 We first compare model 5 with model 4 to assess whether there is a significant interaction between CAVD and leukemia. The log likelihood ratio statistic (2 log L_{null model}  (2 log L_{full model}) is 0.89 for which P = 0.345. This is not significant so we can eliminate the interaction term from the model.
 We then compare model 4 with models 2 and 3. The log likelihood ratio statistics are 8.257 and 5.509 respectively for which P values are 0.004 and 0.019 respectively. Hence including CAVD in a model containing leukemia improves the model, as does including leukemia in a model containing CAVD. Both of these factors should therefore be in the model, and we accept model 4 as the best fit model.
 The overall significance level for the fit of the model is obtained by comparing model 4 with model 1. The log likelihood ratio statistic (with 2 df) is 16.675 for which the P value is 0.0002.
The standard errors and confidence intervals of the hazard ratios for the best fit model are obtained from the analysis for that model. Note that we have a fairly narrow confidence interval for the CAVD hazard ratio, but a much wider one for the leukemia hazard ratio. This is because the risk estimate for leukemia is based on a very small number of deaths.

Variable
 β
 Standard error
 Hazard ratio
 95% CI


Leukemia  2.442  .685  11.493  3.00144.012

CAVD  0.947  .387  2.579  1.2085.505


We should then embark on a careful process of checking model diagnostics. First note the rather small value of Rsquare (0.12) compared to its maximum possible (0.89). This should warn us that we are only explaining a rather small proportion of the variability suggesting there are important explanatory variables missing from our model. We next check the proportional hazards assumption .
Plots of beta(t) for leukemia and CAVD against time are shown below:
{Fig. 5}
These reveal we can safely accept the proportional hazards assumption, a decision reinforced by the Pvalues for both leukemia and CAVD ( 0.612 and 0.968 respectively) and an overall Pvalue of 0.875.
We leave you to carry out the remaining checks  namely for influential points and for nonlinearity in the relationship between the log hazard and covariates (see Fox (2002))