Probability eXtXtNt

where N(t) is the total number of substitutions (an integer), t is time in years, and X is the rate of substitutions per year. Under this model, the expected number of substitutions at time t is Xt or the product of the substitution rate and the number of time steps that have elapsed. A critical thing to notice in this model is that the rate of substitutions, X, is constant and does not change with time nor with the total number of substitutions, N(t). The Poisson molecular clock is illustrated in Fig. 8.15. The top panel shows the probability that N(t) is between zero and 14 for one time step when the rate of substitution is X=4. The bottom panel of Fig. 8.15 shows variation in the number of substitutions among five replicate sequences that all evolved for the same period of time at the same constant rate of substitution (X = 4).

The Poisson model for a molecular clock implies that the time intervals between substitutions are random in length (Fig. 8.16). Thus, substitutions that follow a Poisson molecular clock will be separated by variable lengths of time. This stands in contrast to our everyday notion of a clock or watch, which has uniform lengths of time separating each event (events are seconds, minutes, and hours). Therefore, a molecular clock that is based on a random process has inherent variation in the number of substitutions that occur over a given time interval even though the rate of substitution remains constant. This means that independent lineages that each diverged from a common ancestor at the same time can display variation in the number of substitutions that have

Literal clock (no variance in time between mutations)

Poisson process (variance in time between mutations)

Time

Figure 8.16 Two representations of rate at which substitution events (circles) occur over time. Mutations might occur with metronome-like regularity, showing little variation in the time that elapses between each mutation event. If substitution is a stochastic process, an alternative view is that the time that elapses between substitutions is a random variable. The Poisson distribution is a commonly used distribution to model the number of events that occur in a given time interval, so the bottom view is often called the Poisson molecular clock. Note that in both cases the number of substitutions and time elapsed is the same so that the average substitution rate is identical.

occurred. In other words, if substitution is a random process then we expect some variation in the number of substitutions among lineages and loci even if divergence time and the substitution rate are constant. This explanation for variation in the numbers of substitutions is often referred to by saying that rates of molecular evolution follow a Poisson clock.

The Poisson process model of the molecular clock leads to a specific prediction about the variation in numbers of substitutions that should be observed if the rate of molecular evolution follows a Poisson process. The Poisson distribution has the special property that the mean is equal to the variance. Therefore, the mean number of substitutions and the variance in the number of substitutions should be equal for independent DNA sequences evolving at the same rate according to a Poisson process. A ratio to compare the mean and variance in the number of substitutions is called the index of dispersion, defined as

where E indicates an expected or mean value. The index of dispersion defines the degree of spread among divergence estimates that should be seen under a Poisson process, just as a Markov chain defines the spread in allele frequencies that is expected in an ensemble of finite populations. If the numbers of substitutions in a sample of pairwise sequence divergences follow a Poisson molecular clock then the variance and mean number of substitutions should be equal and therefore R(t) should equal one. If the variance is larger than the mean then R(t) is greater than one, a situation referred to as an overdispersed molecular clock since substitution rates have a wider range of values than predicted by the Poisson process model.

Overdispersed molecular clock Absolute divergence rates from many independent pairs of species that overall exhibit more variance in divergence rate than expected by the Poisson process molecular clock model; a dispersion index value that is greater than one.

Ancestral polymorphism and Poisson process molecular clock

The molecular clock modeled as a Poisson process assumes it is possible to compare pairs of DNA sequences that were derived from a single DNA sequence in the past and then diverged instantly into two completely isolated species. Actual DNA sequences usually have a more complex history that involves processes that operated in the ancestral species followed by the process of divergence in two separate species (Fig. 8.17). In the ancestral species the number and frequency of neutral alleles per locus in the population were caused by population processes such as genetic drift and mutation (assuming the ancestral species was panmictic). This zone of ancestral polymorphism is the period of time when genetic variation in the ancestral species was dictated by drift-mutation equilibrium. Within this ancestral population, two lineages split at some point and eventually became lineages within the two separate species (Fig. 8.17). DNA sequences from these two lineages were sampled in the present to estimate substitution rates. Recognizing this more complex history of diverged DNA sequences shows two things. First, it points out that lineages and species may have diverged at different points in time, with lineages often diverging earlier than species. Second, it shows that two distinct processes can contribute to the nucleotide differences between sequences seen as substitutions when observed in the present. Referring to Fig. 8.17, during the time period T, polymorphisms among sequences were caused by the population processes dictating polymorphism in the ancestral species. Later, during the time period t,

Ancestral species

Ancestral species

Figure 8.17 An illustration of the history of two DNA sequences that might be sampled from two species in the present time to estimate the rate of substitutions. The history is like a water pipe in an upside-down Y shape. The tube at the top contains the total population of lineages in the ancestral species, eventually splitting into populations of lineages that compose two species. The time when two lineages diverged from a common ancestor is not necessarily the same as the time of speciation. Therefore, a population process governing polymorphism operates for T generations in the ancestral species while a divergence process operates for t generations in the diverged species. The polymorphism process initially dictates the number of nucleotide changes between two sequences. Later, the divergence process dictates the number of nucleotide changes between two sequences. In two DNA sequences sampled in the present it is impossible to distinguish which process has caused the nucleotide changes observed.

substitutions were the product of the divergence process between species.

The existence of both ancestral polymorphism and divergence processes complicates testing for overdispersion of the molecular clock (Gillespie 1989, 1994). To make this point, Gillespie articulated the distinction between origination processes and fixation processes. An origination process describes the times at which the subset of new mutations that will ultimately fix first enter the population. A fixation process, in contrast, describes the times at which the subset of new mutations that will ultimately fix reach a frequency of 1 in the population. At a conceptual level, it is clear that the two processes are not identical. The distribution of origination times is a product of the causes of mutation. Times until fixation depend on both the causes of mutation (the origination process) as well as on the causes of fixation, such as genetic drift in a finite population of neutral alleles or natural selection. Measuring times until fixation of new mutations would require that we are able to follow the populations of diverging species over time and watch as new mutations segregate and eventually go to fixation and loss, recording the times for those that fix and then calling these the substitution times. In Fig. 8.2 originations are the events at the bottom of the y axis and fixations are events at the top of the y axis. In practice, we have only the accumulated amino acid or DNA differences between pairs of species observed at one point in time. Such sequence differences are a product of the origination process because they are a sample of mutations that came into the population some time ago and have fixed by the time we observe them. This is not the same thing as having observed the "tick" of fixations over a long period of time.

Gillespie and Langley (1979) showed that a molecular clock combining polymorphism and divergence does not necessarily comprise a Poisson process where the index of dispersion is expected to equal one. To see this it is necessary to develop expectations for the mean and variance in the number of nucleotide differences between two sequences when both polymorphism and divergence processes are operating over the history of two DNA sequences.

Assuming that the ancestral species is panmictic, there is no recombination, and there is selective neutrality, Watterson (1975) showed that the expected number of segregating sites (S) for a sample of two DNA sequences is

and that the variance in the number of segregating sites is

under the infinite sites model of mutation where 9 = 4Ne|. (The relationship between 9 and the number of segregating sites is derived in Section 8.2. Notice that the factor of a1 is not shown in equation 8.40 because a1 = 1 for a sample of two sequences.) This is a prediction for the amount of polymorphism expected under neutrality in a finite population. This result tells us that the mean and variance of the number of nucleotide sites are expected to be different in a sample of two DNA sequences from the ancestral polymorphism zone in Fig. 8.17.

Now shift focus to the divergence zone of Fig. 8.17. For one DNA sequence from species 1 and another from species 2, the divergence time is 2t because each species has diverged independently for t generations. Based on the Poisson process in equation 8.38, both the expected number of diverged sites and the

Table 8.3 Mean and variance in the number of substitutions at a neutral locus for the cases of divergence between two species and polymorphism within a single panmictic population. The rate of divergence is modeled as a Poisson process so the mean is identical to the variance. The mutation rate is |i and the 0 = 4Njx. Refer to Fig. 8.17 for an illustration of divergence and ancestral polymorphism.

Expected value or mean

Variance

Ancestral polymorphism Divergence Sum

variance in the number of diverged sites are 2|t. The means and variances in the number of nucleotide sites between two sequences are shown in Table 8.3 as they apply to polymorphism and divergence in Fig. 8.17.

Given the means and variances of the number of changes between two DNA sequences for both polymorphism and divergence processes, we can then combine these expectations into a new expression for the index of dispersion. The index of dispersion for the number of differences between a sequence from species 1 and a sequence from species 2 is then

0 0