Gene genealogies and the coalescent model

• Modeling the branching of lineages to predict the time to the most recent common ancestor.

At this point in the chapter we need an interlude in the discussion of genetic drift and effective population size to develop a new approach based around lineage branching or gene genealogy. Initially, it is necessary to introduce some basic terminology and concepts used in this approach. Although it may not be evident at first, the lineage-branching approach to population genetics has a great deal in common with the material in the first two sections of this chapter. The immediate goal of this section is to establish and motivate the building blocks necessary to model lineage branching events. The next section of this chapter will then show how the concept of effective population size applies in genealogical branching models. A major advantage of coalescent models is that the action of population genetic processes (genetic drift, gene flow, and natural selection) on the branching pattern of lineages is independent of the allelic states of the lineages. Details about the lineages themselves are developed in Chapter 5, but bear in mind that each lineage represents an independent copy of an allele or DNA sequence. Once both the branching processes and the mutation processes that change on allelic states are brought together, the coalescent approach serves to make testable predictions for the evolution of DNA sequences under a combination of population genetic processes.

Tracing the pattern of ancestry for allele copies in a pedigree provides a means to understand the present patterns in those allele copies (see section 2.6). For example, the pedigree in Fig. 2.14 shows the equivalence of homozygosity in the present and the probability that two allele copies descended from a single ancestor in the past. Given the known individuals at each generation in that pedigree, we traced ancestor-descendant relationships forward in time to predict autozygosity in the most recent generation. Thus, that pedigree is an example of using a prospective or time-forward model, using knowledge of ancestors back in time and basic probability to work forward in time to predict the autozygosity at the most recent point in time.

Another type of analysis of ancestor-descendant relationships is possible based on a retrospective or time-backward model. Imagine that we have a sample of individuals taken in the present time, analogous to individual G in Fig. 2.14, but we have no knowledge of their parents or grandparents or any of their genealogical relationships. Would it be possible to learn something about the past population genetic events that lead up to that sample of individuals? The answer is yes, if we have models of ancestor-descendant relationships (genealogy) that allow us to predict identity by descent in the past based only on knowledge of the present. With such models, we look at patterns among the individuals available to us in the present and try to reconstruct versions of events such as inbreeding, gene flow, or natural selection in the past that could have lead to the individuals in the present. These models are referred to collectively as coalescent theory since the perspective of the models is to predict the probability of possible patterns of genealogical branching working back in time from the present to the point of a single common ancestor in the past. When two lineages trace back in time to a single ancestral lineage it is said to be a coalescent event, hence the term coalescent theory.

A central concept in coalescent theory is connecting a group of lineages in the present back through time to a single ancestor in the past. This single ancestor is the first ancestor (going backward in time) of all the lineages in a sample of lineages in the present time and is referred to as the most recent common ancestor or MRCA. Section 3.2 develops a time-forward model of genetic drift that predicts that a sample of alleles (or lineages) will eventually arrive at fixation or loss. Fixation is reached by random sampling that expands the numbers of a given lineage or allele in the population. The lineage that reaches fixation can be traced back to a single ancestor at some point in the past. In the process of reaching fixation, a population loses all lineages except one, the one that was fixed by genetic drift. This same genetic drift process can be viewed from a time-backward perspective. A sample of lineages in the present must eventually be the product of a single ancestral lineage at some point back in the past that happened to become more frequent under random sampling. The coalescent model turns the random sampling process around, asking: what is the probability that two lineages in the present can be traced back to a single lineage in the previous generation? Answering this question relies on the same probability tools that were used earlier in the chapter to describe the process of genetic drift.

Before moving on with a more formal description, let's consider a metaphor for the coalescence process to set the stage. Imagine a sealed box full of bugs. Each bug moves around the box at random. Whenever two bugs meet by chance, one of them (picked at random) completely eats the other one in an instant. When a bug is eaten the population of bugs decreases by one and the remaining bugs continue to move about the box at random. The time that elapses between bug meetings tends to get longer as the number of bugs in the box gets smaller. This is because chance meetings between bugs depend on the density of bugs in the box. Eventually, the entire box that was full of bugs initially will wind up holding only a single bug after some time has passed. Each bug is analogous to a lineage and one bug eating another is analogous to a coalescent event. The very last bug is analogous to the lineage that is the most recent common ancestor.

Coalescence or coalescent event The point in time where a pair of lineages or genealogies trace back in time to a single common ancestral lineage (to coalesce literally means to grow together or to fuse). Genealogy The record of ancestor-descendant relationships for a family or locus.

Lineage A line of descent or ancestry for a homologous DNA sequence or a locus (regardless of whether or not copies of the locus are identical or different). Most recent common ancestor (MRCA) The first common ancestor of all lineages (or gene copies) at some time in the past for a sample of lineages taken in the present. Gene copy or allele copy A replicated DNA sequence that has passed from an ancestor to a descendant; used synonymously with the term lineage.

Waiting time The mean or expected time back in the past until a single coalescence event in a sample of lineages.

A schematic representation of the ancestor-descendant process for two generations can be seen in Fig. 3.23a for a set of haploid lineages. Using rules of random sampling based around the Wright-Fisher model (and its assumptions) we can develop a prediction for the number of generations back in time until two lineages "find" their MRCA or coalesce to a single lineage. Consider a random sample of two of the 2N total lineages in the present generation. Given that one of these two sampled lineages finds its ancestor in the previous generation, what is the probability that the other lineage also shares that same common ancestor such that a coalescent event occurs? Given that one of the lineages has a given common ancestor, for coalescence to occur the other lineage must have the same ancestor among the 2N possible ancestors in the previous generation. Thus

0 0

Post a comment