InfluentialPoints.com Biology, images, analysis, design... 

"It has long been an axiom of mine that the little things are infinitely the most important" 

Nonparametric correlation and regressionOn this page: Spearman rankorder correlation coefficient Kendall rankorder correlation coefficient AssumptionsSpearman rankorder correlation coefficientUnlike Pearson's correlation coefficient, Spearman's correlation coefficient only requires that each variable at least be measured on the ordinal scale. It also makes no distributional assumptions, so can be used for measurement variables where the assumption of bivariate normality does not hold. The data may consist of numeric observations to which ranks are applied, or to nonnumeric observations that can only be ranked. In the case of ties in either the X or Y value, an average rank is assigned. Computationally Spearman's correlation coefficient is simply Pearson's correlation coefficient applied to the ranks of the observations. The value of the coefficient can range from 1 (perfect negative correlation) to 0 (complete independence between rankings) to +1 (perfect positive correlation). Since it is a measure of the linearity of the ranked observations, it provides a test of a monotonic trend of the original data. Note, however, that it cannot be used to detect a nonmonotonic trend, for example where Y initially increases with X, but then decreases at higher values. Hence relationships should always be plotted first before calculating the coefficient. The Spearman correlation coefficient (ρ) (for which we use r_{s} for the statistic) is given by:
If there are no ties, this simplifies to:
Spearman's correlation coefficient is not especially sensitive to ties, and if there are only a small number, the simpler formula can be used. However, ties do bias the value of the statistic upwards so for borderline values it is safer to use the longer formula. Testing significanceFor small samples (N ≤ 10) the significance of r_{s} can be tested using a permutation test, or by comparing the value obtained with that in published tables (see for example Table A10 in Conover For larger samples (n > 10) we can studentize the statistic by dividing it by its standard error:
Important!It has unfortunately become common practice in some disciplines to calculate a nonparametric correlation coefficient with its associated Pvalue, but then plot a best fit least squares line to the data. This is very bad practice and is highly misleading. The Pvalue is not applicable to a linear fit of the (untransformed) Y against X, but to a linear fit of rank (Y) against rank (X).Use as test for trendThe Spearman correlation coefficient can be used as a test for trend. In other words, if one has a set of estimates of (say) population density of an organism over time, one can assess whether numbers are declining or increasing, or whether there is no significant change over time. Measurements are simply paired with the time at which they were taken. The test for trend based on the Spearman correlation coefficient is generally considered more powerful than the Cox and Stuart test for trend.
Kendall rankorder correlation coefficientThe Kendall rank correlation coefficient is another measure of association between two variables measured at least on the ordinal scale. As with the Spearman rankorder correlation coefficient, the value of the coefficient can range from 1 (perfect negative correlation) to 0 (complete independence between rankings) to +1 (perfect positive correlation). Since it is a measure of the linearity of the ranked observations, it provides a test of a monotonic trend of the original data. The coefficient is computed in a similar way to the WilcoxconMannWhitney statistic. It is based on the principle that if there is an association between the ranks of X and the ranks of Y, then if the x ranks are arranged in ascending order, then the y ranks should show an increasing trend if there is a positive association and vice versa if there is a negative association. Starting from the first Y rank we therefore assess whether the difference is positive (a concordant pair) or negative (a discordant pair) with each subsequent Y rank. We then do the same for the second Y rank, until all observations are covered. Kendall's correlation coefficient (τ) (for which we use r_{k} for the statistic) is then given by:
Kendall's coefficient calculated as above only takes the value +1 to −1 if there are no ties. If there are ties in the data, then an alternate formulation should be used. The most popular approach is to calculate what is known as the Gamma coefficient which we will denote by r_{g}. In this situation if two Y ranks are equal and the two corresponding X ranks are not equal, the pair should be counted as ½ concordant and ½ discordant, and the totals adjusted accordingly. If two X ranks are equal, no comparison is made. The coefficient is then calculated using the following formula:
Another way of dealing with ties is to do the following:
Assumptions of rank order correlation coefficientsUnlike the Pearson product moment correlation coefficient, no distributional assumptions are made by the rank order coefficients. However, they do assume the following: Hence individual observations can be ranked into two ordered series. The coefficients should not be used for Ushaped or hatshaped relationships between X and Y. Monotonicity can be checked by simple inspection of the XY scatterplot, or by plotting the rank of each Y observation against the rank of each X observation. Some texts claim there is no point in making this plot, but in fact it provides the most sensitive way to assess whether the relationship between X and Y really is monotonic. Related topics :Sen's estimator of slope
LOWESS regression

