Biology, images, analysis, design...
Use/Abuse Principles How To Related
"It has long been an axiom of mine that the little things are infinitely the most important" (Sherlock Holmes)

Search this site



Negative binomial distribution

On this page: Definition  Properties  Assumptions  Population parameters 


The negative binomial distribution, like the normal distribution, arises from a mathematical formula. It is commonly used to describe the distribution of count data, such as the numbers of parasites in blood specimens. Also like the normal distribution, it can be completely defined by just two parameters - its mean (m) and shape parameter (k). However, unlike the normal distribution, the negative binomial does not naturally result from the use of large samples - nor does it arise from a single causal model.

Anscombe (1950) described 5 causal models of the negative binomial, some of which can be interpreted as due to aggregation:

  1. Inverse binomial sampling:
    Let a proportion P of individuals possess a certain character. If observations are selected at random, the number of observations in excess of k that are taken in order to obtain just k individuals with the character has a negative binomial distribution, with exponent k.
    This model is the least relevant biologically, but the one given by most standard texts.

  2. Constant birth-death-immigration rates:
    If birth and death rates are constant, and there is a constant rate of immigration, then the population's size will follow a negative binomial distribution.

  3. Heterogeneous Poisson sampling:
    If the mean of a Poisson distribution (λ) varies randomly from place to place or from time to time, under certain conditions the observed distribution of individuals from a mixture of several such Poisson distributions will follow a negative binomial distribution.

  4. Compounding of Poisson and logarithmic distributions:
    If colonies of individuals are distributed randomly over an area or over time, and the numbers of individuals within a colony follow a logarithmic distribution, then a count of individuals will follow a negative binomial distribution.

  5. True clumping aggregation or contagion:
    If the presence of one individual affects the chance of other individuals occurring within that sampling unit, then this can lead to a negative binomial distribution.
Subsequent work suggested an unlimited number of models may give rise to this distribution, Boswell and Patil reviewed 14 potential models - most of which arise from compounded random processes:

To what extent any of these models might plausibly apply to your data is another matter - particularly since it is possible to devise models that yield an aggregated distribution, whose observations are not distributed negative binomially. Various two parameter models give rise to count data. Depending upon how they are set up, a number of these alternate models can have more than one mode, and some can also yield a negative binomial.



Sometimes the occurrence of an event is not independent of other events in the same sampling unit. For example many organisms are not distributed randomly, or are not sampled randomly, and thus the Poisson distribution does not provide a good description of their pattern of dispersion. The most common pattern of spatial dispersion is aggregated, rather than random or regular. The same may also happen for events occurring through time - one event may 'spark off' other events resulting in a contagious distribution. The negative binomial distribution is one of several probability distributions that can be used to describe such a pattern of dispersion.

So what does the distribution look like? The distribution is shown below for values of k ranging from 8 to 0.1.

{Fig. 1}

With k having a value of 8 the distribution is more or less symmetrical. But for all values of k less than about 8, the distribution is right skewed indicating an aggregated distribution. In fact the parameter (k) is often used as a measure of the degree of clumping or aggregation. It can range from 0 to infinity, with the lower the value of k, the higher the degree of contagion. Note that the distribution is always unimodal - the only reason for the apparent bimodality at lower values of k is the pooling of classes above 140.

The negative binomial is the easiest to calculate, and the most widely-applicable of overdispersion models. Like the Poisson distribution, the negative binomial is discrete, unimodal and skewed. Statistically, its parameters are both simple and flexible.

  • For example, its parametric (population) variance is m + m2/k Hence, unlike with the Poisson distribution, the variance is always greater than the mean. But if k is sufficiently large, m + m2/k approaches m, and the negative binomial converges to Poisson.
  • Where k=1, because of its mathematical form the negative binomial is said to be a 'geometric distribution'. If however, k is very small and the zero category is ignored, the negative binomial converges to a 'log-series' distribution. Unfortunately, although it is widely employed to provide an index of species richness, there is no plausible causal model for the log-series distribution.

If a set of organisms conforms to the negative binomial model, if it is maximally clumped k approaches zero, and if it is random k approaches infinity - unfortunately the converse is not true.


The negative binomial model may be described as being 'versatile, but without carrying too deep a causative commitment'. Very often it is used as a fairly arbitrary, but convenient, approximation to how counts are distributed - and, provided the data have a negative binomial distribution, k is used as a measure of that distribution's shape.

In generalized linear models (which we meet in Unit 14) the negative binomial is sometimes used as a convenient, if somewhat arbitrary, 'error structure'. More problematically, k and 1/k are used as measures of 'aggregation' - and, in some cases, aggregation is defined as such.


Be aware however that, whilst a particular model produces a certain value of k, this does not mean the converse is true. Furthermore, there is as yet no unambiguous biological definition of aggregation, nor any agreement on how best to quantify it.

"The phrase 'degree of aggregation' describes a vague, undefined notion that is open to several interpretations...

Thus the several existing ways of measuring aggregation are not different ways of measuring the same thing:

they measure different things."
Pielou (1977)
Mathematical ecology - Wiley, New York



Since there are so many different models that can give rise to the negative binomial, it is hard to give a simple set of assumptions for its validity. However, we can look at what factors may lead to misleading results:

  1. Which population?

    For random samples, because the negative binomial distribution is unimodal, any additional modes may be ascribed to statistical noise - assuming the discrepancy is not too great. In which case, the observed mean is taken to be an estimate of the true mean - the sample variance, calculated using the ordinary sample variance formula, is an estimate of the parametric variance (albeit this is usually an underestimate) - then k can be treated as a sample statistic, and its standard error (or confidence limits) estimated.

    If the samples were not taken at random this reasoning does not apply, and the population for which inferences are made should be confined to the actual data at hand - extending its results to other populations (or 'superpopulations') can be horribly misleading - and its standard errors (or confidence limits) equally so.

  2. Spatial scale and density of organisms

    Using k as an index of aggregation assumes all your samples have the same sample size, and the same population density.

    • Few indices of aggregation are truly independent of density, and k of the negative binomial is very much so. For entirely mathematical reasons, the negative binomial k is both scale and density dependent, but not linearly. Added to this, when the density of organisms relative to the sampling unit drops very low, the distribution will appear to tend towards randomness (that is a Poisson distribution) irrespective of the true situation - this is sometimes described as an 'integer effect'.

    • Quite aside from sampling artefacts, we would expect sparsely distributed organisms to interact differently from densely packed individuals of the same species. In the limit, extremely dispersed individuals do not interact - which, under the simplest (and least plausible) models allows them to be distributed at random. In reality this can only occur among populations in a completely homogenous environment, that do not reproduce, and are never subject to chance events - such as predation or lightning. At the other extreme, very densely packed organisms often repel one another and adopt more regular distribution - which the negative binomial cannot model.

    • How organisms are distributed also depends upon what scale you sample them upon. Very few measures of aggregation, dispersion or nonrandomness are truly scale free - with the (controversial) exception of Taylor's power law. Viewed upon a gross enough scale any organism can be described as aggregated - if for no other reason than they are restricted to this part of our galaxy! Unless there is a natural sampling unit (for example each host in parasite studies) it is always wise to sample at several different spatial scales (for example with several different sizes of quadrats).

  3. Is k a reasonable measure?

    As its name implies, the negative binomial shape parameter, k, describes the shape of a negative binomial distribution. In other words, k is only a reasonable measure to the extent that your data represent a negative binomial distribution. However testing observations against this distribution must be viewed with caution because small samples are unlikely to be shown to be significantly different, and very large sets of real data are almost certain to be. One way to address this is to compare the fit of one or more of the alternate models.

    Other long-tailed distributions, such as the Neyman type A and Polya-Aeppli, yield the relationship v = m + am2 However, whilst single-cause models may be criticised as overly specific, the negative binomial model is anything but.

  4. Inference

    Because the negative binomial distribution can arise for many different reasons, the likelihood that a particular one is responsible is correspondingly small. Using k as a measure of aggregation assumes this is the most likely model and that 'aggregation' can be unambiguously defined. Defining 'aggregation' as what it is that k measures is a circular argument, albeit a common one.

    Whilst they can be useful descriptively, neither the negative binomial nor its shape parameter can tell you which - if any - biological, statistical or sampling process was responsible for your results being distributed in the way that you observe. A number of ecological studies have shown k is neither constant within a species, nor consistently different between species when compared across a wide range of population densities. Nevertheless, there have been repeated attempts to use k as a parameter for population models, and to support particular ecological and behavioural models. Sadly perhaps, k is not a fundamental biological constant, and does not describe any sort of general ecological or behavioural property - it is merely a rough and ready measure, which assumes an arbitrary model.

    Very often therefore, assuming data are negative binomially distributed is much like assuming they are normal - it provides a convenient approximation which discretely conceals many uncertainties. It is unwise to analyse detailed biological processes which may have produced a distribution your observations approximately fit - unless you propose to test these hypotheses using other observations.


Negative binomial population parameters

The mean, variance, skew and kurtosis of a negative binomial population can be calculated as follows:

  • The mean frequency of failures, m, can also be calculated as 1 − k - where k is the mean number of successes.

  • The variance is m(k+m)/k

  • The skew is (1 + m/(k+m)) × √(km/(k+m))

  • The kurtosis is 3 + 6/k + k/(m(k+m))

Be aware that not only is the use of the terms 'success' and 'failure' wholly arbitrary, there are several other ways in which these terms are commonly specified.

  • As P and Q, where P = k/(k+m) and Q = m/(k+m)
  • As p and q, where p = P/Q = k/m and q = 1/Q = (k+m)/m
  • k may be described as the 'shape', or 'size' parameter.


Mass probability function

The expected probability of obtaining a given value of a count, r, can be expressed as shown below.

Algebraically speaking -

Pr   =   [ Γ(k + r) ]( m ) ( k + m ) − (k + r)
r! Γ(k)kk

=   [ Γ(k + r) ]( m ) ( k )k
r! Γ(k)k + mk + m

  • Pr is the probability of getting r individuals in the sampling unit,
  • m is the mean,
  • k is the 'shape' parameter.

    But, so it can cope with fractional values of k,

  • Γ(k) is the gamma function of k, and Γ(k) = [k+1]!

Related topics :

Logarithmic Series Distribution

Taylors Power Law