aMates by self-fertilization. bMates by outcrossing.

of n sequences is (n(n - 1))/2 and so dividing the sum of dj by this number gives the average number of differences per pair of sequences. The average number of pairwise differences can also be divided by the number of nucleotide sites examined (L) to express 1 per nucleotide site. Figure 8.11 shows an example computation of 1 for a hypothetical sample of four DNA sequences.

In larger samples that may include multiple identical DNA sequences, the nucleotide diversity can be estimated by k k

where pi and p. are the frequencies of alleles i and j, respectively, in a sample of k different sequences that each represent one allele. This version of the formula just provides an average of dij that is weighted by the frequency of each type of DNA sequence found in a sample. The nucleotide diversity can be underestimated if there are rare sequence polymorphisms in a population that are unlikely to be sampled (see Renwick et al. 2003). Information on the sampling variance of n can be found in Nei and Kumar (2000).

Some values of n from different organisms and loci are shown in Table 8.1. Estimates of nucleotide diversity are useful because n is a measure of heterozygosity for DNA sequences. As such, the value of n is a function of 4Nep under an equilibrium between genetic drift and mutation. With an estimate of n and the mutation rate at a locus (p), it is then possible to estimate the effective population size. Because n is an estimator of the scaled mutation rate 0, it is sometimes referred to as 5n.

8.3 DNA sequence divergence and the molecular clock

• The molecular clock hypothesis for DNA divergence.

• Dating divergence events with a molecular clock.

One key result of the neutral theory is the prediction that the rate of substitution is equal to the mutation k

0 0

Post a comment