InfluentialPoints.com
Biology, images, analysis, design...
 Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

# Serial correlation

## and the Durbin-Watson test

### Serial correlation

One of the assumptions of linear regression is that successive residuals are not correlated over time. A serial correlation coefficient (also termed an autocorrelation coefficient) measures the degree of similarity of observations in a time series separated by different time periods. The Pearson serial correlation coefficient uses the (bivariate) Pearson correlation coefficient on pairs of observations k time periods apart. Used in this way, the equation for the coefficient becomes:

#### Algebraically speaking -

 r = [Σ(ei − )((ei + k − )]/N-1 Σ(ei − )2]/N
where
• r is Pearson's serial correlation coefficient,
• ei and ei + k are residuals at time i and time (i+k), where k is the specified lag period.
• is the overall mean of the series of residuals,
• N is the number of observations.

The presence of autocorrelation for different lag times can be assessed using a plot of the serial correlation coefficient against time lag - the plot is known as a correlogram.

### The Durbin Watson test

This is a formal method of testing whether first-order serial correlation is a serious problem in a regression analysis. The test is carried out on the raw residuals generated in the regression analysis.

#### Algebraically speaking -

 d = Σ(ei − ei − 1)2 ei2
where
• ei are the (ordinary) residuals (Yii) from the regression,
• ei − 1 are the residuals for each previous observation; to obtain these, tabulate residuals (ei) by observation number and simply shift the residuals down one row to form the ei − 1 column. Note there will be one less value of ei − 1 (and hence for the denominator) than for ei, since no earlier observation is available.

Since d is approximately equal to 2(1 − r), where r is the sample autocorrelation of the residuals, if the error terms are uncorrelated, the expected value of d is 2. The more d is less than 2, the stronger the evidence for positive first order serial correlation. Tabulated values of d for given values of N and α give upper and lower limits to the test statistic. Only if the value falls above the upper limit can we accept the null hypothesis that there is no significant positive serial correlation. If the value falls below the lower limit, we accept the alternative hypothesis that there is significant positive serial correlation. If the value falls within the upper and lower tabulated values of the test statistic, the result is considered 'inconclusive'.

The test can also be used to test for significant negative autocorrelation by using the test statistic (4-d) autocorrelation.

The test assumes that all other assumptions for regression analysis are met - in particular that the error distribution is normal.