Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 19;17(3):031001.
doi: 10.1088/1478-3975/ab6754.

Diversity in biology: definitions, quantification and models

Affiliations

Diversity in biology: definitions, quantification and models

Song Xu et al. Phys Biol. .

Abstract

Diversity indices are useful single-number metrics for characterizing a complex distribution of a set of attributes across a population of interest. The utility of these different metrics or sets of metrics depends on the context and application, and whether a predictive mechanistic model exists. In this topical review, we first summarize the relevant mathematical principles underlying heterogeneity in a large population, before outlining the various definitions of 'diversity' and providing examples of scientific topics in which its quantification plays an important role. We then review how diversity has been a ubiquitous concept across multiple fields, including ecology, immunology, cellular barcoding experiments, and socioeconomic studies. Since many of these applications involve sampling of populations, we also review how diversity in small samples is related to the diversity in the entire population. Features that arise in each of these applications are highlighted.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Examples of complex, multicomponent populations in which diversity may be a meaningful quantitative concept. (a) Diversity in island ecology. A large number of species may migrate onto an island. Organisms can proliferate and die, leading to a specific time-dependent pattern of species diversity on the island. (b) Microbes are ingested and form a community in the gut by proliferating, competing, and dying. They can also be cleared from the gut. (c) Naive T cell generation in vertebrates. Naive T cells develop in the thymus. Each T cell expresses only one type of T cell receptor (TCR). Naive T cells can proliferate and die in the peripheral blood. The possible number of T cell receptors that can be expressed is enormous > 1015, but only perhaps 106−108 different TCRs usually exist in an organism. The diversity of the T cell receptor repertoire is an important determinant of the organism ’s response to antigens. (a) Island biodiversity. (b) Gut microbiota. (c) T cell production.
Figure 2.
Figure 2.
Number counts and clone counts vary depending on the definition and thresholding of discrete species. This consideration arises in designing experimental measurements.
Figure 3.
Figure 3.
Plot of ln R versus ln A with area A measured in terms of km2. Species counts of long-horned beetles in the Florida Keys are plotted against the island size [98]. The linear regression line yields a slope of z = 0.29. Usually, fits of the species-area exponent z yield a small number.
Figure 4.
Figure 4.
Frequencies of approximately 200 species of bacteria distributed across about a dozen phyla. (a) Group 1 depicts the relative abundance distribution for healthy individuals while (b) Group 2 shows the pattern for irritable bowel syndrome (IBD) patients. The differences in abundance patterns are apparent and have been quantified using the Shannon index for each individual plotted in (c). From Park et al [102]. (a) Group 1. (b) Group 2.
Figure 5.
Figure 5.
(a) Protocol for Viral Integration site (VIS) barcoding studies of hematopoiesis in rhesus macaque [55, 117, 118]. Here, ‘barcodes’ are defined by the random integration sites of a lentiviral vector. (b) Xenograft barcode experiments using mice [119] in which a library of barcodes was used to tag leukemia-propagating cells before direct transplantation into mice.
Figure 6.
Figure 6.
(a) The fractional populations of the largest clones (barcodes) detected in granulocyte blood samples from rhesus macaque. Relative populations are described by the distances between neighboring curves. (b) Diversity indices derived from the data in (a). The Simpson’s index and Shannon diversity are rescaled to fit on the same plot.
Figure 7.
Figure 7.
A simple multispecies birth-death-immigration (BDI) process [55, 136-138]. A constant source (i.e. stem cells with slow dynamics) generated by 16 cells, each of a different clone, undergo asymmetric differentiation with rate α to produce differentiated cells that can undergo birth or death with rates r(N) and μ(N) that may depend on the total population in the differentiated pool. In this example, the differentiated population contains N = 30 cells, R = 9 different clones (barcodes), thus leaving c0 = 7 unseen species.
Figure 8.
Figure 8.
Examples of recently published clone count data. (a) Clone counts derived from a small sample (105 sequences) of T cells [142]. Note the broad distribution described by a biphasic power-law curve. Ignoring the largest clones, power-law fits for each regime yield slopes of – 1.13 and – 1.76. However, one should be cautious describing sampled TCR (and BCR) clone counts using power laws as they hold typically for far less than two decades. (b) Human TCR clone counts for three HIV-infected (red) and three uninfected (black) individuals show qualitative differences between the distributions (unpublished). Other data from mice and humans, under different conditions and in different cell types, have been recently published [149, 150].
Figure 9.
Figure 9.
(a) Ordering of all N = 100 individuals in increasing wealth or income. The hypothetical wealth distributions plotted are wi = 3 (equal wealth, black curve), wi = 10 + (i − 1)/2 (linear distribution, red), Wi = 5 + ei/5−15e−14.8 (green), and wi = 14.5 + 50/(101 − i) (blue). The latter three represent distributions with some amount of inequity. (b) These inequalities can be visually quantified by their corresponding Lorenz curves, plotted as the relative fraction of the population f. The Lorenz curve for a perfectly uniform wealth distribution is given by the straight diagonal line. The area between the diagonal equality line and any other Lorenz curve can be used to visualize the Gini coefficient of the associated wealth distribution. The Gini coefficient, Gini = A/(A + B), is calculated by dividing the difference in areas between the equality line and the Lorenz curve in question (A) by the total area (A + B = 1/2) under the equality curves. The ‘Robin Hood’ index is defined as the maximum difference between the equality line and a given Lorenz curve, and is indicated by arrow for the red and green Lorenz curves.

Similar articles

Cited by

References

    1. Nei M 1973. Analysis of gene diversity in subdivided populations. Proc. Natl Acad. Sci 70 3321–3 - PMC - PubMed
    1. Heywood VH et al. 1995. Global Biodiversity Assessment vol 1140 (Cambridge: Cambridge University Press; )
    1. Purvis A and Hector A 2000. Getting the measure of biodiversity Nature 405 212. - PubMed
    1. Whittaker RJ, Willis KJ and Field R 2001. Scale and species richness: towards a general, hierarchical theory of species diversity J. Biogeogr 28 453–70
    1. Sala OE et al. 2000. Global biodiversity scenarios for the year 2100 Science 287 1770–4 - PubMed

Publication types