Info

to estimate in practice because 0 cannot be estimated in the ancestral species as it does not exist any longer. However, the main point of this model is not to provide a practical test of the molecular clock. Instead, the model shows how R(t) > 1 is not necessarily strong evidence to reject a constant rate of substitution. One cause of R(t) > 1 is that the Poisson process accurately describes the substitution process but that substitution rates are not constant. Alternatively, the Poisson process of model of divergence itself may not be accurate even though the rate of DNA change is constant. The latter possibility suggests that the index of dispersion may be a poor way to test the neutral molecular clock hypothesis.

Ancestral polymorphism also presents difficulties for dating divergences using the molecular clock (Maddison 1997; Arbogast et al. 2002). The problem arises because sequence lineage history (genealogy) and species divergence history (species phylogeny) are not identical. Two sequences sampled in the present from two different species have been accumulating substitutions since the most recent common ancestor of the two sequences gave rise to the lineages (Fig. 8.17). The total sequence divergence between two species that would be used to date a speciation event has occurred during two distinct time intervals. One time interval T is the period when the two lineages accumulated changes in the ancestral species. The second time interval t is the period when substitutions accumulated after the current species split. Estimates of time since divergence estimate the total elapsed time since the divergence of the two lineages rather than just the time since divergence of the two species. Thus, the use of the molecular clock to date divergence time yields over-estimates of the species divergence time. As the divergence time t increases relative to the polymorphism time T the degree of over-estimation shrinks. However, it is usually impossible to determine t relative to T in practice and so the degree of over-estimation of the species divergence time is usually unknown.

Relative rate tests of the molecular clock

One method to circumvent some of the limitations inherent in comparing absolute rates of divergence is to compare relative rates instead. The relative rate test compares the number of nucleotide or amino acid changes since divergence from an ancestor represented by a DNA sequence from closely related species (Sarich & Wilson 1967; Fitch 1976). Rates of nucleotide substitution in two different species can be estimated by comparing the number of DNA or amino acid changes that have occurred independently in each of two species using a third outgroup species to assign sequence changes to each lineage. If rates of substitution are equal in the two species, then the number of sequence changes should be equal in the two species within a statistical confidence interval. Unequal numbers of sequence changes lead to rejection of the null hypothesis that the two species have an equal rate of substitution. Relative rate tests avoid the need for a date of divergence that is often imprecise and also do not rely on the dispersion index and its underlying assumption that the molecular clock is a simple Poisson process.

Tajima's (1993a) 1D test of the molecular clock is a relative rate test that uses the number of nucleotide substitutions that occurred along two lineages being compared as well as an outgroup lineage. The basis of the test is shown in Fig. 8.18. In the figure, the letters i, j, and k are used to represent the identity of the nucleotide found at the same nucleotide site in each of the three sequences. The outgroup is used to identify the point in time that nucleotide changes took place since lineages 1 and 2 should share the same base pair as the outgroup due to identity by descent if no substitution has occurred. Only changes that can be assigned unambiguously to a lineage are useful when comparing rates between lineages 1 and 2. Nucleotide substitutions of the pattern iji

Pattern of nucleotide changes i j i i j j i i k

Figure 8.18 Patterns of nucleotide changes that are possible when comparing DNA (or amino acid) sequences from two lineages and an outgroup. The letters i, j, and k are used to represent the identity of the nucleotide found at the same nucleotide site in each of the three sequences. For example, iij indicates that the first two lineages have an identical base pair and the third lineage has a different base pair. Tajima's 1D relative rate test utilizes substitutions that can be unambiguously assigned to one lineage (iji and ijj). If rates of substitution are identical for lineage 1 and 2, E(niji) = E(nijj). The lineages where the substitution took place are ambiguous for the patterns jji and ijk. The pattern iii indicates identical nucleotides in all three sequences and therefore no substitution events.

Figure 8.18 Patterns of nucleotide changes that are possible when comparing DNA (or amino acid) sequences from two lineages and an outgroup. The letters i, j, and k are used to represent the identity of the nucleotide found at the same nucleotide site in each of the three sequences. For example, iij indicates that the first two lineages have an identical base pair and the third lineage has a different base pair. Tajima's 1D relative rate test utilizes substitutions that can be unambiguously assigned to one lineage (iji and ijj). If rates of substitution are identical for lineage 1 and 2, E(niji) = E(nijj). The lineages where the substitution took place are ambiguous for the patterns jji and ijk. The pattern iii indicates identical nucleotides in all three sequences and therefore no substitution events.

indicate the change occurred on lineage 2 whereas pattern ijj indicates the change occurred on lineage 1. These two instances allow unambiguous assignment of a substitution to a lineage to estimate the numbers of substitutions. The other three possible nucleotide patterns cannot be used to estimate rates of substitution for one lineage. Nucleotide sites with the pattern iii are not useful because no substitution occurred and there is no information available to estimate the rate of change. For nucleotide sites with the pattern jji, the substitution to j could have occurred in the ancestor to lineages 1 and 2 or both lineages 1 and 2 could have experienced a substitution but it is not clear which event occurred. The pattern ijk for a nucleotide site indicates that no two lineages share a nucleotide, so again it is unclear at what point in the past these substitutions occurred and they cannot be used to estimate the rates of substitution for lineages 1 and 2.

Under the molecular clock hypothesis, the number of substitutions that occurred on lineage 1 should be identical to the number of substitutions that occurred on lineage 2. Since the divergence time is identical for lineages 1 and 2, identical substitution rates for the two lineages would give the same number of substitutions observed on each lineage. Therefore, the number of substitutions observed for sequence 1 that occurred on lineage 1 (ijj) should be equal to the number of substitutions observed for sequence 2 that occurred on lineage 2 (iji):

where E means expected or average value, n^ is the total number of nucleotide substitutions that occurred on lineage 1, and njij is the total number of nucleotide substitutions that occurred on lineage 2. This expectation can be tested with the Chi-squared statistic

where there is one degree of freedom. A Chi-squared value greater than 3.84 indicates that it is unlikely that the difference in the number of substitutions between the two lineages is due to chance. In other words, a large Chi-squared value is evidence to reject the molecular clock hypothesis that substitution rates are equal for the two lineages and is evidence of rate heterogeneity. The Chi-square approximation is accurate as long as n j and n^ are both 6 or greater.

Tajima's 1D test for equal divergence rates in two taxa is simple to employ because it does not require an explicit nucleotide substitution model. Hamilton et al. (2003) took advantage of this aspect of the 1D test when they compared rates of divergence using both nucleotide and insertion/deletion (indel) variation between species of Brazil nut trees. (Because a range of molecular mechanisms leads to the formation of indels, there are no generally employed models of sequence change by indels and many relative rate tests cannot be used with indel variation.) Comparing substitution rates among eight species with the 1D test, they found that two tree species consistently failed to support a molecular clock for both nucleotide and indel changes. One species (Lecythis zabucajo) had an accelerated rate of substitution whereas the other species (Eschweilera romeucardosoi) had a slowed rate of substitution.

Relative rate tests provide no information about rates of molecular evolution in the outgroup taxon nor any information about absolute rates of DNA sequence change. The outcome of relative rate tests depends critically on the outgroup used (Bromham et al. 2000). As the time since divergence of the common ancestor of both taxa and the outgroup increases, so does the time over which the evolutionary rates are averaged. If rate heterogeneity is a short-term or recent phenomenon then averaging from a distant outgroup may obscure it. Conversely, if rate heterogeneity is only apparent over long time periods, the rate of substitution may appear homogeneous if a recently diverged outgroup is employed. Finally, since natural selection depends on population-specific fitness values, it is considered unlikely that selection acting simultaneously on both lineages subject to a relative rate test would result in rate homogeneity.

Three-taxon relative rate tests that incorporate nucleotide-substitution models and use a maximum likelihood framework are described in Gu and Li (1992) and Muse and Weir (1992). A variety of relative rate tests that utilize phylogenetic trees are also available that test the molecular clock hypothesis using sequences from many taxa simultaneously (see Nei & Kumar 2000; Page & Holmes 1998).

Patterns and causes of rate heterogeneity

Ohta and Kimura (1971) were the first to carry out a test of the Poisson process molecular clock with rigorous statistical comparisons. They used protein sequences from three loci (P globin, a globin, and cytochrome c) sampled from a range of species.

Lineage effect Variation in the rate of divergence among multiple species that could be explained by the different lineages having variable neutral mutation rates. Replication-independent causes of mutation Causes of mutation that can occur at any time and are therefore independent of the rate of cell division. Examples include environmental mutagens such as ultraviolet radiation, y particles, and chemicals.

Residual effect Variation or unevenness in the rate of divergence within a lineage that cannot be explained by rate heterogeneity among lineages or loci.

Based on the observed divergences between pairs of sequences and estimates of the time that has elapsed since those species diverged, they estimated a series of absolute rates of divergences. These absolute rates varied widely (the dispersion index for their data falls between 1.3 7 and 2.05), leading them to reject the hypothesis of a constant molecular clock (see Gillespie 1991). A few years later, Langley and Fitch (1974) published a larger analysis of absolute substitution rates for the same three loci as well as fibrinopeptide A and used phylogenies to better estimate the number of substitutions for each species. They too found that the dispersion index was greater than one for all loci. These papers attracted a great deal of attention because the variation in rates of sequence change required explanation. Since these early results, a great deal of data on both absolute and relative rates of molecular evolution show clearly that rates of molecular evolution are commonly more variable than expected by a Poisson process model. In fact, rate heterogeneity may now be considered the norm and a constant rate of molecular evolution the exception. This section focuses on hypotheses to explain variation in rates of molecular evolution.

Under neutrality, variation in the rate of divergence at different loci can be explained by differences in rates of mutation. Similarly, variable rates of divergence at the same locus in different species can be explained by different mutation rates among species. Such variation in rates of molecular evolution for the same locus in different species is called a lineage effect on the molecular clock (Gillespie 1989). There may be rate heterogeneity evident at a locus even after accounting for variation among lineages, called residual effects (reviewed in Gillespie 1991). Residual effects are the variation in the rate of divergence or unevenness in the tick rate of the molecular clock within lineages over time (see Fig. 8.16). Residual effects are sometimes described as a pattern where substitutions occur in bursts or clusters with periods of no change in between. The cause of residual effects must be a process that is changing over time within a lineage. Under the neutral theory, the mutation rate within a lineage must change over time to explain residual effects. While temporally variable mutation rates are possible, it is considered more plausible that mutation rates themselves are constant over time but that substitution rates are variable over time. For example, the presence and absence of natural selection would cause changes in the probability of substitution of the mutations that appear constant over time.

Kimura (1983a) argued that mutation rates in different species are roughly constant per year. This could be true if the processes that caused mutations were constant over time units like years. Examples are replication-independent causes of mutation such as exposure to ultraviolet radiation, Y particles, or chemical mutagens. The free radical ions constantly produced within cells are another example of a replication-independent cause of mutation. It seems likely that exposure to these extrinsic causes of mutation is constant over calendar time and so a portion of mutations due to replication-independent causes have a rate that is also set in calendar time.

Returning to the basis of the neutral theory shows why different species might not experience substitutions at the same rates. As shown at the beginning of the chapter, neutral theory predicts that the substitution rate is equal to the mutation rate. But since the mutation rate is measured in nucleotide changes per generation, then the substitution rate is also expressed in per generation terms. This leads to the difficulty that a constant molecular clock might not exist if species differ in their generation times. As an example, imagine two species with identical mutation rates of || = 1 x 10-5 errors per base pair per generation. Now imagine the species have generation times of 10 and 100 years. The species with the shorter generation time has

1 x 10-5 mutations generation-1 10 years generation-1 = 1 x 10-6 mutations per year (8.51)

Generation-time hypothesis The hypothesis that variation in rates of substitution is due to differences in generation times among species that have constant rates of substitution per generation. This explanation for rate heterogeneity is consistent with neutral molecular evolution.

Replication-dependent causes of mutation

Causes of mutation that occur during replication of DNA, such as replication errors, so that the rate of mutation depends on the rate of cell division.

whereas the species with the longer generation time has

1 x 10-5 mutations generation-1 100 years generation-1 = 1 x 10-7 mutations per year (8.52)

Thus, the constant molecular clock per generation predicted by neutral theory can produce variable rates of substitution per year when comparing species with different generation times.

The observation that neutral mutation rates that are constant per generation may simultaneously be variable per year leads to the generation-time hypothesis, a neutral theory explanation for variation in rates of substitution as caused by differences in generation times of species that have constant rates of substitution per generation. Numerous studies have shown evidence for a generation time effect in rates of substitution (Li et al. 1987, 1996; Ohta 1993, 1995). Substitution rates observed over many nuclear genes in different groups of mammals are shown in Table 8.4. Rodents have shorter generation times than primates and artiodactyls. Substitution rates are also negatively correlated with generation times. In contrast, comparisons within these groups, such as comparing rates of substitution between mice and rats, shows nearly equal substitution rates. The rate's speeding up in rodents compared to primates and artiodactyls is a classic example of the generation time effect and is consistent with a neutral explanation for heterogeneity in the rate of molecular evolution.

A generation-time effect can be explained by replication-dependent causes of mutation. If mutations occur mostly during the process of cell division when chromosomes are replicated, then more cell replications per generation leads to a higher rate of neutral divergence per generation. In animals, variation in replication-dependent mutation rates per generation may be explained by the fixed number of cell divisions leading to germ-line cells (cells that produce gametes). This explains the observation that mutations occur more frequently in male gametes than in female gametes since more germline cell divisions occur in males than in females. The generation-time effect in animals could then be explained if generation times are correlated with the number of germ-line cell divisions (e.g. animals with longer generation times have more germ-line cell divisions). Yet plants with shorter time intervals to first flowering have been shown to have higher rates of substitution (Gaut 1998; Kay et al. 2006). Variation in rates of molecular evolution in plants suggests that germ-line cell divisions are not the only explanation for rate heterogeneity because plants do not have separate germ and somatic cell lines.

The metabolic rate hypothesis proposed by Martin and Palumbi (1993; reviewed by Rand 1994) was based on the observation that sharks have rates of synonymous substitution five to seven times lower than those observed in primates and artiodactyls despite the fact that all taxa examined have relatively similar generation times. Mutation rates may be correlated with metabolic rate of organisms for several reasons. Organisms with high metabolic rates have rapidly operating cellular functions and one of these cellular functions is DNA replication. Therefore, high rates of metabolism cause high rates of DNA replication and high rates of replication-dependent

Table 8.4 Number of substitutions per nucleotide site observed over 49 nuclear genes for different orders of mammals. Divergences are divided into those observed at synonymous and nonsynonymous sites. Primates and artiodactyls (hoofed mammals such as cattle, deer, and pigs with an even number of digits) have longer generation times than do rodents. There were a total of 16,747 synonymous sites and 40,212 nonsynonymous sites. Data from Ohta (1995).

Table 8.4 Number of substitutions per nucleotide site observed over 49 nuclear genes for different orders of mammals. Divergences are divided into those observed at synonymous and nonsynonymous sites. Primates and artiodactyls (hoofed mammals such as cattle, deer, and pigs with an even number of digits) have longer generation times than do rodents. There were a total of 16,747 synonymous sites and 40,212 nonsynonymous sites. Data from Ohta (1995).

Mammalian

Synonymous

Nonsynonymous

group

sites

sites

Primates

0 0

Post a comment