## Info

is greater than the average frequency of heterozygotes in the subdivided populations (see Table 4.7).

Now let's determine the variance in allele frequency among the two populations before and after fusion. Initially, the variance in allele frequency for the two subdivided populations is

whereas var(g) is zero after fusion because there is no longer any subdivision for allele frequencies. Take note of the fact that the initial variance in the allele frequencies (0.08) is exactly twice the difference between the average frequency of albinos before fusion (equation 4.32) and the expected frequency of albinos in the fused population (equation 4.34)! With fusion of the subdivided populations, each homozygote has decreased by 4% and the heterozygote has increased by exactly the same total amount or 8%.

This example shows that removing the allele frequency differences between the two subpopulations by making them into a panmictic population has changed the total population heterozygosity. The result is exactly what is predicted by the Wahlund effect, with more total population heterozygosity under panmixia than under subdivision. Subdivided populations store some genetic variation as differences (variation) in allele frequency among populations at the expense of heterozygosity in the total population. Another way to think of this is that population subdivision is equivalent to inbreeding that increases the total population homozygosity (or reduces the total population heterozygosity). A fused or panmictic population has a larger effective size than individual subdivided populations with restricted gene flow. In the subpopulations, mating is most probable within the subpopulation rather than with a migrant from the total population. The subpopulations therefore have more autozygosity compared to a panmictic population of equivalent size, analogous to the decline in heterozygosity seen in a single finite population due to genetic drift (see section 3.4).

A more realistic application of Wahlund's principle can be found in forensic DNA profiling. As covered in section 2.4, the use of DNA markers to determine the expected frequency of a given genotype occurring by chance relies on estimates of allele frequencies in various racially defined human populations. Although allele frequencies at loci used in DNA profiles have been estimated in many populations, there are a limited number of these reference allele-frequency databases available. It is therefore possible that population-specific allele-frequency estimates are not available for some individuals depending on their racial, ethnic, or geographic background. A further complication is that many individuals have ethnically diverse ancestry that may not be represented by any single set of available reference allele frequencies. None of this would be a problem in DNA profiling if human populations exhibited panmixia, since there would then be uniform allele frequencies among all racially defined human populations. However, racially and geographically defined human populations like

Table 4.8 Expected frequencies for individual DNA-profile loci and the three loci combined with and without adjustment for population structure. Calculations assume that FIS = 0 and use the upper-bound estimate of FST = 0.05 in human populations. Allele frequencies are given in Table 2.3.

Expected genotype frequency

Locus

With panmixia

With population structure

D3S1358 D21S11 D18S51 All loci

2(0.2118)(0.1626) = 0.0689 2(0.1811)(0.2321) = 0.0841 (0.0918)2 = 0.0084 (0.0689)(0.0841)(0.0084) = 0.000049

2(0.2118)(0.1626)(1 - 0.05) = 0.0655 2(0.1811)(0.2321)(1 - 0.05) = 0.0799 (0.0918)2 + 0.0918(1 - 0.0918)(0.05) = 0.0126 (0.0655)(0.0799)(0.0126) = 0.000066

those used to construct allele-frequency reference databases show up to 3-5% population divergence of allele frequencies (Rosenberg et al. 2002).

We can use Wahlund's principle to adjust DNAprofile odds ratios for the effects of population structure. This requires a method to adjust the expected genotype frequency at each locus to account for the increased frequency of homozygotes and the decreased frequency of heterozygotes caused by the divergence of allele frequencies among populations. The adjusted expected frequencies for homozygote genotypes are f(AjAj) = p2 + p(l - Pj)FIT

and the adjusted expected frequencies for heterozygote genotypes are f(AjAj) = 2piPj - (2pipj)FIT = 2pp(l - Fit) (4.38)

where i and j represent different alleles at the A locus and FIT measures the total departure of genotype frequencies from frequencies expected under panmixia due both to non-random mating within populations and allele-frequency divergence among populations (National Research Council, Commission on DNA Forensic Science 1996). If mating within populations is random (FIS = 0) then FIT is equivalent to Fst in these two equations. In that case, applying these corrections increases the frequency of homozygotes and decreases the frequency of heterozygotes in proportion to the degree of allele frequency divergence among populations.

In section 2.4, the expected frequency of a three-locus DNA profile was determined under the assumptions of Hardy-Weinberg and panmixia. Let's return to that example and adjust the expected genotype frequency and odds ratio to compensate for popula tion structure in human populations. The expected genotype frequencies are given in Table 4.8 based on the upper bound estimate of FST = 0.05 in human populations. The adjustment reduces the expected frequencies of the two heterozygous loci and increases the expected frequency of the homozygous locus. The odds ratio for chance match of this three-locus genotype was one in 20,408 under the assumption of panmixia and becomes one in 15,152 after adjusting for population structure. Thus, population structure increases the expected frequency of this three-locus genotype by about 35% of its expected frequency under panmixia. A random match for this three-locus genotype is more probable after adjustment for population structure. Compensating for population structure in determining DNA-profile odds ratios is required to obtain an accurate estimate

Problem box 4.3 Account for population structure in a DNA-profile match probability

Return to section 2.4 and Problem box 2.1 to determine the expected genotype frequency and probability of a random match after compensation for population structure seen in human populations. Assume that FST = 0.05 for human populations. How does the expected genotype frequency change at individual loci when there is population structure and why? Is the 10-locus genotype still rare enough that the chance of a random match is low?

of how often DNA profiles match by chance alone (National Research Council, Commission on DNA Forensic Science 1996). Using equations 4.37 and 4.38 to adjust for population structure is necessary when an appropriate reference allele-frequency database is not available, the ethnicity of the individual is not known, or the genotype comes from a person of mixed ancestry and therefore the choice of the appropriate database is not obvious.

The next section explores models of population structure that can be used to infer the causes of a given pattern of population structure.

4.5 Models of population structure

• Continent-island, two-island, and infinite island models.

• Stepping-stone and metapopulation population models.

• General expectations and conclusions from the different migration models.

The various models of population structure attempt to approximate various gene-flow properties likely to be found in actual populations. However, these models do not necessarily capture the exact mixture of gene-flow features in actual populations. It is likely, in fact, that gene flow within and among actual subpopulations of real organisms is not as easily categorized nor as invariant as is assumed in these models. Nonetheless, these models of population structure are useful tools to study the general principles that cause population differentiation. The utility of these different models of population structure is their ability to show basic and somewhat general features of the impact of rates of gene flow, the size of subpopulations, and the patterns of genetic connectedness among subpopulations on the evolution of genotype and allele frequencies within and among populations.

### Continent-island model

Perhaps the simplest model of gene flow is called the continent-island model (Fig. 4.12a). It assumes that there is one very large population where allele frequency changes very little over short periods of time and a smaller population that receives migrants from the large continent population each generation. The island population experiences the replacement of a proportion m of its individuals through migration, with 1 - m of the original individuals remaining each generation. (We assume that the proportion m of island individuals replaced by gene flow each generation either die or emigrate to the continent population, which is so large that immigrants do not impact allele frequencies.) The continent-island model assumes no genetic drift, no natural selection, random mating in both populations, that migrants are a random sample of genotypes, and no mutation.

Continent-island model An idealized model of population subdivision and gene flow that assumes one very large population where allele-frequency changes only slowly over time (like a continent full of many individuals) connected by gene flow with a small population where migrants make up a finite proportion of the individuals present each generation. Gene flow from the island to the continent occurs but the continent population is assumed to be large enough that immigration has a negligible effect on allele frequencies.

Based on this situation along with its assumptions, it is possible to predict how gene flow changes allele frequency at a diallelic locus in the island population over one generation. Allele frequency in the island population one generation in the future (call it p{+1"d) is a function of (i) the allele frequency in the proportion of the island population that are not migrants and (ii) the allele frequency in the proportion of the island population that arrives via gene flow from the continent population. This can be stated in an equation as p l=1nd = p ti_0nd(1 - m) + pcontinentm (4.39)

and used to predict the island population allele frequency after one generation of gene flow from the continent. Expanding the right side of this equation gives p island _ p island — p islandm + pcontinentm (4 40)

which can be rearranged to an equation that gives the change in allele frequency in the island population over one generation:

island island island continent pt=1 pt=0 - m(pt=0 p ) (4.41)

in a form readily interpreted in biological terms.

Figure 4.12 Classic models of population structure make different assumptions about the paths and rates of gene flow among subpopulations. (a) In the continent-island model, gene flow is essentially unidirectional from a very large population to a smaller population. The continent population is so large that allele frequencies are not impacted by emigration or drift whereas allele frequencies in the small population(s) are strongly influenced by immigration. (b) The island model has equal rates of gene flow exchanged by all populations regardless of the number of populations or their physical locations. (c, d) Stepping-stone models restrict gene flow to populations that are either adjacent or nearby in one (c) or two (d) dimensions and thereby incorporate isolation by distance. Gene-flow models can also incorporate the extinction and re-colonization of subpopulations, a feature commonly added to stepping-stone model populations. Each panel shows the rate of gene flow indicated by the arrows if m percent of each population is composed of migrants and 1 - m is composed of non-migrating individuals each generation.

Figure 4.12 Classic models of population structure make different assumptions about the paths and rates of gene flow among subpopulations. (a) In the continent-island model, gene flow is essentially unidirectional from a very large population to a smaller population. The continent population is so large that allele frequencies are not impacted by emigration or drift whereas allele frequencies in the small population(s) are strongly influenced by immigration. (b) The island model has equal rates of gene flow exchanged by all populations regardless of the number of populations or their physical locations. (c, d) Stepping-stone models restrict gene flow to populations that are either adjacent or nearby in one (c) or two (d) dimensions and thereby incorporate isolation by distance. Gene-flow models can also incorporate the extinction and re-colonization of subpopulations, a feature commonly added to stepping-stone model populations. Each panel shows the rate of gene flow indicated by the arrows if m percent of each population is composed of migrants and 1 - m is composed of non-migrating individuals each generation.

Equation 4.41 predicts that the degree of difference between allele frequencies in the island and continent populations (p'£0nd - pcontment) will determine the direction as well as the rate of change in the island allele frequency as long as the rate of gene flow is not zero (m ^ 0). For example, if pit=0nd > pcontment then the island allele frequency should decrease. Likewise, the island allele frequency is expected to increase if p't=0nd < pcontinent. To use a numerical example, suppose tliat pt=0nd = 0.1 and pcontinent = 0.9. The difference between the island and continent allele frequencies is -0.8, so according to equation 4.41 the island allele frequency should increase for any amount of gene flow. If m = 0.1, then the island allele frequency will increase by 0.08 to p't^nd = 0.18 in one generation.

The expected change in allele frequency due to a single generation of gene flow can also be extended to predict allele frequency in the island population over an arbitrary number of generations. If there is a second generation of gene flow, the allele frequency in the island population is then pi=hnd = pi=1nd(1 - m) + pcontinentm (4.42)

Substituting pt=1nd as defined in equation 4.41 into this equation, pisland = (pisland(1 - m) + pfontinentm)(1 - m) + pcontinentm

and rearranging terms, pi=2nd = pi=a!,nd(1 - m)2 + pcontinent(m(1 - m) + m)

to eventually give an expectation for the island allele frequency after two generations of gene flow

(in terms of the initial island allele frequency

Notice that the exponents are equal to the number of generations that have elapsed. Changing these exponents to an arbitrary number leads to the allele frequency in the island population after t generations have elapsed starting from an initial allele frequency gives pisIand _ pisIand(l _ m)t + pcontinent (1 _ (1 _ m)t)

which can be rearranged to pisland = pcontinent + (pisland — pContinent)(1 — m)t (4 47)

The rate of allele frequency change in the island population can also be seen in this equation. The proportion of the island population that made up its initial allele frequency decreases by (1 - m)t, approaching zero as time passes. This means that the island population is increasingly composed of immigrants from the continent. Therefore, the allele-frequency difference between the island and continent decreases toward zero over time and the allele frequency of the island approaches the allele frequency of the continent. Figure 4.13 shows how the island allele frequency approaches the continent allele frequency over time for a range of initial island allele frequencies. Notice the smooth approach to the continent allele frequency: this is a consequence of the fact that the outcome is completely determined by a constant rate of gene flow and has no random processes such as genetic drift to introduce chance variation.

These predictions of the continent-island model are consistent with intuition. Given that the continent population has a constant allele frequency over time, the island population should eventually reach an identical allele frequency when the two are mixed. How long it takes for the two populations a o a a o a

## Post a comment