## Info

Balancing selection or population growth

Excess of rare mutations

Standard coalescent Neutral alleles, constant Ne

D - 0 Rare and intermediate frequency mutations about equally common ni i

Strong directional selection or recent bottleneck

Excess of intermediate frequency mutations

Figure 8.20 Differences in the shape of genealogies are the basis of Tajima's D test. In the standard coalescent model of genealogical branching the probability of coalescence is constant per lineage over time. The standard coalescent therefore gives expected branch lengths when all alleles are selectively neutral and the effective population size is constant (center). Changes in the effective population size over time (population growth, population bottlenecks) change the probability of coalescence over time as well. Natural selection also alters the probability of coalescence based on the fitness of alleles each lineage bears. Changes in the effective population size and natural selection alter the expected time to coalescence and therefore the expected branch lengths in a genealogical tree. If the chance of coalescence is greater in the present than in the past (right), most coalescent events occur near the present and internal branches are long in comparison with external branches. If the chance of coalescence is smaller in the present than in the past (left), most coalescent events occurred in the past and external branches are long in comparison with internal branches. Since the chance of a mutation is constant over time, lineages with longer branches are expected to experience more mutations.

An alternative way to think about how Tajima's D works is to consider the frequency distribution of mutations under different types of natural selection or population histories. Mutations that happen to occur on internal branches in a genealogy have an intermediate frequency because they are inherited by lineages that arise later in time. In contrast, mutations that happen to occur on external branches have a low frequency since they are unique to a single lineage. Since total internal and total external branch length are expected to be about equal under the standard coalescent model, intermediate and rare alleles are also expected to be about equal in frequency. Both population growth and multi-allelic balancing natural selection can lead to an excess of rare mutations since these processes increase the external branch length. In contrast, strong purifying selection, shrinking population size, or a population bottleneck can lead to an excess of intermediate frequency mutations because these processes increase the amount of internal branch length.

Tajima's D statistic is computed from the difference between 5n and 5S divided by the standard deviation of 5s:

where pS is the number of segregating sites per nucleotide site. Recall that the standard deviation is the square root of the variance, so that dividing by the standard deviation puts D in units of standard deviations away from the mean of zero expected for standard coalescent genealogies. Only when the observed result is about two standard deviations away from the mean do we reject the null hypothesis of D = 0 and thereby reject the null model of a neutral genealogy with constant effective population size (see confidence limits in Table 2 of Tajima 1989a).

The quantities used to compute the variance are n +1

## Post a comment