InfluentialPoints.com Biology, images, analysis, design... 

"It has long been an axiom of mine that the little things are infinitely the most important" 

The log series distribution,
Species richness & rarefaction
Species heterogeneity measures
The log series coefficient, α
The discrete lognormal model
Which is best?

"A great deal of time and expertise has been expended on the compilation of faunal lists for particular habitats, but the consequent increase in our understanding ... is still meagre." 
The oldest and simplest measures of diversity is species richness  the total number of species (S) within a given community. One problem with this measure is it is rarely possible to observe every species within a community  in other words to conduct a complete census. This can only be done if you examine a very small area.
In practice, the longer you search, or the bigger the area you examine, the more species you can expect to observe. You are likely to observe the most common species fairly quickly, but in time the number of species discovered per unit effort invested will level off  although you can never be quite certain you have found them
One, very practical, implication of this arises when we try to compare survey results  for instance where one survey finds 231 individuals and 20 species, and another survey finds 157 individuals and 12 species. A rarefaction model may be used to estimate how many species the first survey would be expected to find if 157 individuals were randomly selected from the first
An important weakness of the rarefaction model is it assumes that, once you have allowed for their relative abundances, every individual of each species is equally likely to be detected  and that your effort in finding them remains constant. Aside from the human failings inherent to such surveys, species that are aggregated, seasonal, manshy, trapshy, nocturnal or camouflaged will unavoidably bias this measure  and change the variability of your estimate from that predicted by the rarefaction model.
These points aside, an additional constraint in estimating the number of species in a community arises because virtually no community is wholly free from immigrants and visitors. This renders attempts to enumerate every species rather futile. The alternative to spending the rest of your life attempting to identify every species is to perform a standardized but thorough survey  and proceed from there.
Instead of merely attempting to estimate how many species a community contains, various measures have been devised which allow for both the number of species and the relative abundance of each species. For large animals the most common measure of abundance is the number of individuals observed for each species but, for plants, ground cover may be a better measure  or biomass.
A surprising number of species heterogeneity indices have been developed, of which the simplest is the Berger Parker dominance index  the proportion of all the organisms recorded which belong to the most common species. However, for good or ill, this index is insensitive to the number of less abundant species in a
Strangely enough, the ratio between the total number of individuals in the community (N) and the number of individuals in the rarest species (N_{min}) is often discussed (as
The commonly used indices of species heterogeneity can be divided into two classes, 'parametric' indices which assume species obey some model resulting in a predefined pattern of abundance, and 'nonparametric' indices, which do not. Since the nonparametric indices are easier to describe let us begin there.
Assuming a sample of n individuals represents a very much larger population, and contains n_{i} individuals of species i, then the approximate probability of observing one individual from species i is n_{i}/n  and the probability of obtaining two individuals from that same species is
In reality such samples are anything but random or unbiased, so D or
In theory H' can be very large, but for biological communities seldom exceeds 5. For a singlespecies sample the smallest H' is zero, but if there are S species in a sample of n individuals H' cannot be less than
Once again this index assumes the survey is a random and unbiased sample of the community. In the real world this is not the case, which biases H' (usually downwards) and you have to estimate its variability using simulation. One way around this is to confine your estimates to the individuals you have sampled, using Brillion's index, H =
Where n is large, for field data H and H' yield similar results  although H cannot be calculated where n_{i} is the area covered, or the biomass, of species i.
Underlying the parametric indices three models of species abundance have received particular attention.
Until quite recently the log series model was most popular, and since it is closely related to the negative binomial let us now consider it in more detail.
Many surveys of natural biological communities observe most individuals belong to just a few species, and most species are represented by very few individuals. In other words, the distribution of species' abundance appears to be strongly skewed.
Fisher et al. (1943) noted that species abundance data such as these could be approximated by a negative binomial with a kvalue approaching zero, from which the zero class was omitted. The zero class corresponded to all the species in the community that had not been recorded. He justified this model on two grounds:
Under the negative binomial model, m is the mean number of individuals observed among all species of that community  large values of k indicate similar numbers of each species, and a homogenous abundance of species in that community. Where k approaches zero, because we are not interested in species absent from our sample, the negative binomial can be reduced to its mathematical equivalent, a
The parameters of this model can be expressed in various ways.
Provided the log series model is reasonable, x only depends upon the overall number of individuals per species  not the number of species in the community. Whereas α is a function of both the number of species (n) and the number of individuals per species (s/n)  and is therefore used as an index of species diversity. For most surveys x is extremely close to one. So for species abundance data, such as the carabids, the distribution can be fitted by looking up n/s in tables, then using it as an initial estimate of
Given the form of the log series distribution, it always assumes most species are represented by single individuals. In reality, since sexually reproducing species seldom survive as a single individual, log seriestype distributions may simply result from sampling a minute portion of the biological population. Moreover, even if we could estimate the total number of organisms in a community (N), it seems unlikely this model would provide a useful estimate of how many species it
A number of studies have shown that, where there are sufficient data, surveys can have a skewed, but two tailed species abundance distribution  although the tail comprising the least abundant species is often truncated because they are too few to penetrate the 'veil' of our inefficient sampling.
This had led to suggestions that the 'natural' distribution of species abundance is lognormal. The simplest statistical model of this assumes that samples of any one species are Poisson distributed, but these species means are lognormally distributed. Since the number of individuals is discrete the Poisson lognormal model is sometimes known as the discrete lognormal  although measures of abundance such as biomass, or area covered, are continuous.
Like the logseries model, the lognormal has two parameters  its location and
Provided the distribution's mode is sufficiently distinct, and assuming the distribution is symmetrical, the missing (zero truncated) tail can be estimated  enabling you to calculate the total number of species in that community. But this is probably not be a very reliable estimate unless your sample contains more than 1000 individuals, and has observed at least 80% of the community's species. Even where these assumptions are met, and a clear mode is visible, fitting a Poisson lognormal to a truncated distribution is not easy, so the ordinary lognormal is often used as an approximation.
A number of studies show that sampling a larger proportion of a community alters the lognormal's distribution's location, but not its dispersal. However, because many studies have a very similar dispersal, this parameter does not provides a useful index of species diversity. For these 'canonical' distributions, the geometric mean number of individuals per species is used as a measure of species heterogeneity.
Like the logseries model, this discrete lognormal assumes every individual of a given species is equally likely to be observed when the community is sampled, that observing each individual is an independent event, and the population remains unchanged during sampling. For strongly territorial or aggregated species, or sporadically available ones, or rare species that are destructively sampled, or those that learn to avoid our sampling procedures, we must expect these assumptions to be compromised. In other words, given the grossly unequal and nonrandom efficiency with which many species are sampled, a number of authors find it hard to accept the assumptions required by parametric models.
Which index is best depends upon the use to which it is to be put. For example, theoretical ecologists tend to want the least variable and most robust measures, whereas applied ecologists may be more interested in the underlying model. Thus, nonparametric methods have been criticized for yielding results that are imprecise and modeldependent. Whereas, although parametric indices assume a predefined model, or models, it can be impossible to infer from the observed distribution which is the most appropriate model for a particular set of field data.
For the conservationist, whilst some of these indices are more affected by the least abundant species in a community, none of them provide a quantitative measure of their conservation status. In other words, simply because a species is rare in a study area does not mean that it is rare anywhere else. As a result indices of species heterogeneity, by themselves, provide a misleading measure of a community's conservation value  or the impact of an intervention upon it.
At first sight you might assume that a good index of a community's conservation importance might be the abundance of species that are rare worldwide. Unfortunately, by itself, global rarity is not a reliable measure of endangerment because a number of species are uncommon but very widespread  whereas other species are common but locally rare.
The classical view of endangered species were they were being specifically hunted out  for food, fur, ivory, or as 'vermin'. However there are many endangered species which are not especially persecuted, and many species that are persecuted are not endangered. Experience shows that slowbreeding top predators (such as leopards and eagles) are most vulnerable, and fast breeding opportunists (such as rats and crows) are least.
These points aside, the most endangered species share two important properties:
Indices which attempt to quantify these issues include the rarity index (an additive score of redlisted species), the endemism index (an additive score of highly localized species), and an assortment of combined, weighted indices of species diversity. In reality, of course, none of these measures are of the slightest use where there is no political will to conserve endangered species or the habitats upon which they depend.
