## N

. Therefore, the expected time to coales cence is the inverse of the probability of coalescence or 2(2Ne). This expected time to coalescence can k(k -1)

then be substituted into equation 8.24 to give

This equation simplifies by canceling each k, taking the constant 4Ne outside the summation, and adjusting the range of the summation to remove the -1 after k in the denominator:

to give the expected number of segregating sites in a sample of n DNA sequences. Once the expected number of segregating sites E[S] is known, it can be solved for 0. Notice that 0 = 4Ne| can be substituted in equation 8.26 to give to solve for 0 in terms of the number of segregating sites divided by the total branch length of the genealogy. An estimate of the scaled mutation rate determined from the number of segregating sites in a sample of DNA sequences is symbolized as 5W (W for Watterson) or 5S (Sfor segregating sites). If we define n-1 1

using the absolute number of segregating sites, or es =■

using the number of segregating sites per nucleotide site sampled. The importance of these two final equations is that 4Ne| can be estimated from the number of segregating sites and the number of sequences in a sample.

A second measure of DNA polymorphism is the nucleotide diversity in a sample of DNA sequences, symbolized by n (pronounced "pie"), and also known as the average pairwise differences in a sample of DNA sequences (Nei & Li, 1979; Nei & Kumar 2000). The nucleotide diversity is equivalent to the heterozygosity measured using alleles represented by DNA sequences (assuming random mating and the infinite sites model of mutation). The nucleotide diversity summarizes nucleotide polymorphism by averaging the number of nucleotide site differences found when each unique pair of DNA sequences in a sample is compared. In contrast with the proportion of segregating sites, the nucleotide diversity is sensitive to the frequency of each DNA sequence allele in a sample, since more frequent sequences appear in more of the pairwise comparisons. The nucleotide diversity is the sum of the number of nucleotide differences seen for each pair of DNA sequences:

0 0