Info

The frequency of DNA sequences with a given nucleotide at a site does not influence S (compare sites 2 and 6 or 8 in Fig. 8.11), but S will increase as the number of individuals sampled increases since DNA sequences with additional polymorphisms will be added to the sample.

The number of segregating sites (S) under neutrality is a function of the scaled mutation rate 4Ne|. Watterson (1975) first developed a way to estimate 0 from the number of segregating sites observed in a sample of DNA sequences. The expected number of segregating sites at drift-mutation equlibrium can more easily be determined using the logic of the coales-cent model (Watterson used a different approach). Under the infinite sites model of mutation, each mutation that occurs increases the number of segregating sites by one. The expected number of segregating sites is therefore just the expected number of mutations for a given genealogy. If each lineage has the probability | of mutating each generation and there are k

Sequence 1 AATGTCAACG

Sequence 2 AATGTCAACG

Sequence 3 ATTGTCAACG

Sequence 4 ATTGTGATCG

Site number Hnrn^inuiMooio

Segregating sites (S and pS):

Sites 2, 6, and 8 have variable base pairs among the four sequences (columns marked with *). These are segretating sites. Therefore, for these sequences S _ 3 segregating sites and pS _ 3/10 = 0.3 segregating sites per nucleotide site examined.

Nucleotide diversity (n):

2 AATGTCAACG 12

1 AATGTCAACG . , 2 AATGTCAACG ,

3 ATTGTCAACG 3 ATTGTCAACG 23

1 AATGTCAACG 2 AATGTCAACG 3 ATTGTCAACG _

4 ATTGTGATCG 14 _ 4 ATTGTGATCG "24 _ 3 4 ATTGTGATCG 34 _ 2

Number of pairs of sequences compared _ [n(n - 1)]/2 _ [4(3)]/2 _ 6 n _ 10 differences/6 pairs _ 1.67 average pairwise differences n _ 1.67 avg. differences/10 sites _ 0.167 pairwise differences per site

Figure 8.11 A hypothetical sample of four DNA sequences that are each 10 nucleotides long. There a total of three segregating sites (S = 3) or three-tenths of a segregating site per nucleotide (pS = 0.3). The nucleotide diversity is calculated by summing the nucleotide sites that differ between each unique pair of DNA sequences. In this example there are 1.67 average pairwise nucleotide differences or 0.167 average pairwise nucleotide differences per nucleotide site.

lineages, then the expected number of mutations in one generation is k|i. If the expected time to coalescence for k lineages is T^, then k^,Tk mutations are expected for each value of k. The expected number of mutations (E indicates an expectation or average) is obtained by summing over all k between the present and the most recent common ancestor (MRCA):

where n is the total number of lineages in the present. To see an illustration of this equation, refer to Fig. 3.26 where n = 6 in the summation of equation 8.24 and imagine summing up the probability of a mutation in each time interval between coalescent events.

A fundamental result of the coalescent model is that the probability of k lineages coalescing is

0 0

Post a comment