Biology, images, analysis, design...
|"It has long been an axiom of mine that the little things are infinitely the most important" |
Negative binomial distributionOn this page: Definition Properties Assumptions Population parameters
The negative binomial distribution, like the normal distribution, arises from a mathematical
Anscombe (1950) described 5 causal models of the negative binomial, some of which can be interpreted as due to aggregation:
To what extent any of these models might plausibly apply to your data is another matter - particularly since it is possible to devise models that yield an aggregated distribution, whose observations are not distributed negative binomially. Various two parameter models give rise to count
Sometimes the occurrence of an event is not independent of other events in the same sampling unit. For example many organisms are not distributed randomly, or are not sampled randomly, and thus the Poisson distribution does not provide a good description of their pattern of dispersion. The most common pattern of spatial dispersion is aggregated, rather than random or regular. The same may also happen for events occurring through time - one event may 'spark off' other events resulting in a contagious distribution. The negative binomial distribution is one of several probability distributions that can be used to describe such a pattern of dispersion.
So what does the distribution look like? The distribution is shown below for values of k ranging from 8 to 0.1.
With k having a value of 8 the distribution is more or less symmetrical. But for all values of k less than about 8, the distribution is right skewed indicating an aggregated distribution. In fact the parameter (k) is often used as a measure of the degree of clumping or aggregation. It can range from 0 to infinity, with the lower the value of k, the higher the degree of contagion. Note that the distribution is always unimodal - the only reason for the apparent bimodality at lower values of k is the pooling of classes above 140.
The negative binomial is the easiest to calculate, and the most widely-applicable of overdispersion models. Like the Poisson distribution, the negative binomial is discrete, unimodal and skewed. Statistically, its parameters are both simple and flexible.
If a set of organisms conforms to the negative binomial model, if it is maximally clumped k approaches zero, and if it is random k approaches infinity - unfortunately the converse is not true.
The negative binomial model may be described as being 'versatile, but without carrying too deep a causative commitment'. Very often it is used as a fairly arbitrary, but convenient, approximation to how counts are distributed - and, provided the data have a negative binomial distribution, k is used as a measure of that distribution's shape.
In generalized linear models (which we meet in
Be aware however that, whilst a particular model produces a certain value of k, this does not mean the converse is true. Furthermore, there is as yet no unambiguous biological definition of aggregation, nor any agreement on how best to quantify it.
Since there are so many different models that can give rise to the negative binomial, it is hard to give a simple set of assumptions for its validity. However, we can look at what factors may lead to misleading results:
For random samples, because the negative binomial distribution is unimodal, any additional modes may be ascribed to statistical noise - assuming the discrepancy is not too great. In which case, the observed mean is taken to be an estimate of the true mean - the sample variance, calculated using the ordinary sample variance formula, is an estimate of the parametric variance (albeit this is usually an underestimate) - then k can be treated as a sample statistic, and its standard error (or confidence limits) estimated.
If the samples were not taken at random this reasoning does not apply, and the population for which inferences are made should be confined to the actual data at hand - extending its results to other populations (or 'superpopulations') can be horribly misleading - and its standard errors (or confidence limits) equally so.
Using k as an index of aggregation assumes all your samples have the same sample size, and the same population density.
As its name implies, the negative binomial shape parameter, k, describes the shape of a negative binomial distribution. In other words, k is only a reasonable measure to the extent that your data represent a negative binomial distribution. However testing observations against this distribution must be viewed with caution because small samples are unlikely to be shown to be significantly different, and very large sets of real data are almost certain to be. One way to address this is to compare the fit of one or more of the alternate models.
Other long-tailed distributions, such as the Neyman type A and Polya-Aeppli, yield the relationship
Because the negative binomial distribution can arise for many different reasons, the likelihood that a particular one is responsible is correspondingly small. Using k as a measure of aggregation assumes this is the most likely model and that 'aggregation' can be unambiguously defined. Defining 'aggregation' as what it is that k measures is a circular argument, albeit a common one.
Whilst they can be useful descriptively, neither the negative binomial nor its shape parameter can tell you which - if any - biological, statistical or sampling process was responsible for your results being distributed in the way that you observe. A number of ecological studies have shown k is neither constant within a species, nor consistently different between species when compared across a wide range of population densities. Nevertheless, there have been repeated attempts to use k as a parameter for population models, and to support particular ecological and behavioural models. Sadly perhaps, k is not a fundamental biological constant, and does not describe any sort of general ecological or behavioural property - it is merely a rough and ready measure, which assumes an arbitrary model.
Very often therefore, assuming data are negative binomially distributed is much like assuming they are normal - it provides a convenient approximation which discretely conceals many uncertainties. It is unwise to analyse detailed biological processes which may have produced a distribution your observations approximately fit - unless you propose to test these hypotheses using other observations.
Negative binomial population parameters
The mean, variance, skew and kurtosis of a negative binomial population can be calculated as follows:
Be aware that not only is the use of the terms 'success' and 'failure' wholly arbitrary, there are several other ways in which these terms are commonly specified.
The expected probability of obtaining a given value of a count, r, can be expressed as shown below.