species are nonsynonymous than synonymous. The rate of nonsynonymous substitutions is 41.2% of the substitution rate for synonymous substitutions. Under neutrality, we expect about 41% of polymorphic sites within D. melanogaster to be at nonsynonymous sites. In contrast, the observed data shows that only about 4.5% of polymorphic sites are nonsynonymous. Thus, polymorphic sites have too many synonymous changes or too few nonsynonymous changes to be consistent with neutral levels of polymorphism. (Note that if using polymorphic sites as the frame of reference, then in this case there is an elevated rate of divergence at nonsynonymous sites compared to neutral expectations.)

A common observation in studies of coding DNA sequences is that numbers of nonsynonymous and synonymous DNA changes are not equal. A neutral explanation for this pattern is that these two types of DNA change have different underlying mutation rates. It is expected that nonsynonymous changes will be more frequent than synonymous changes if mutations occur at random nucleotide sites. In fact, 96% of nucleotide changes in the first nucleo-tide position of a codon, all changes in the second position and 30% of changes at the third position are nonsynonymous. Overall, if mutation occurs at random within coding sequences 75.3% of all mutations will be nonsynonymous and 24.7% will be synonymous.

An alternative explanation is that rates of synonymous and nonsynonymous mutation are roughly equal, but that nonsynonymous mutations commonly alter proteins in ways that impair their function. Nonsynonymous mutations that disrupt function also reduce fitness and are therefore acted against by natural selection. This form of natural selection is sometimes called purifying selection because selection acts to purify the pool of mutations by removing low fitness sequence changes. It is also possible that some nonsynonymous mutations result in enhancement of function and are fixed rapidly by positive selection. A third, non-neutral alternative is that nonsynonymous mutations are maintained at intermediate frequencies in the population by balancing selection. An example of strong balancing selection is the human leukocyte antigen (Hla) B gene in humans as detected with an MK test by Garrigan and Hedrick (2003) using divergence between humans and chimpanzees (Table 8.6d). There are no fixed DNA differences between humans and chimpanzees for this locus, suggesting low rates of mutation since divergence of these two species. In contrast, the human populations show high levels of polymorphism and 1.6 nonsynonymous changes for every synonymous change, inconsistent with the neutral hypothesis that polymorphism and divergence are correlated. Hla genes form the major histocompatibility complex (MHC) region that encodes cell-surface antigen-presenting proteins important in immune system function. Heterozygotes for these loci have higher fitness since they present more diverse cell-surface antigens.

Tajima's D

Tajima's D is a test of the standard coalescent model (neutral alleles in a population of constant size) that is commonly applied to DNA polymorphism data sampled from a single species (Tajima 1989a, 1989b). The test uses the nucleotide diversity and the number of segregating sites observed in a sample of DNA sequences to make two estimates of the scaled mutation rate 0 = 4Ne|l. This section will refer to an estimate of 0 based on the nucleotide diversity as and an estimate of 0 based on the number of segregating sites as 5S. Tajima's D test relies on the fact that and 5S are expected to be approximately equal under the standard coalescence model where all mutations are selectively neutral and the population remains a constant size through time. The null hypothesis of the test is that the sample of DNA sequences was taken from a population with constant effective population size and selective neutrality of all mutations. Natural selection operating on DNA sequences as well as changes in effective population size through time lead to rejection of this null hypothesis.

Tajima's D takes advantage of the fact that mutations that occurred further back in time in a genealogy are counted more times when computing the nucleo-tide diversity (n) from all unique pairs of sequences. In contrast, the position of a mutation on a genealogy does not influence the number of segregating sites

Figure 8.19 Estimates of the scaled mutation rate 0 are estimated differently using nucleotide diversity (5n) and the number of segregating sites (5S) depending on the location of mutations in a genealogy. Each mutation makes a single segregating site under the assumptions of the infinite alleles model no matter where it occurs. However, mutations on internal branches will appear in multiple pairwise comparisons and cause n to be larger (a). In contrast, mutations that occur on external branches (b) that cause a nucleotide change in only a single lineage contribute less to n. Each mutation is counted four times (d13, d23, d14, and d24) in (a) but three times (d12, d23, and d24) in (b) when computing n.

(S) since any number of sequences bearing a given nucleotide always represent just one segregating site (Fig. 8.19). The coalescent process with neutral alleles and constant effective population size results in approximately the same total length along interior and exterior branches in a genealogy. In contrast, processes that alter the probability of coalescence also change the ratio of interior and exterior branch length. Both sustained increases in the effective population size and balancing selection lead to decreasing probabilities of coalescence toward the present time and longer external branches (Fig. 8.20). Longer external branches in a genealogy can also be caused by population structure if the DNA sequences compared are sampled from different demes. Shrinking effective population size or population bottlenecks as well as strong directional selection lead to increasing probabilities of coalescence toward the present time and shorter external branches (Fig. 8.20). Substantial changes to the genealogical branching pattern lead to differences in n and S that cause Tajima's D to differ from zero.

0 0

Post a comment