## Info

<0.0440

Candidate parent An individual in the pool of possible parents that shares one or both alleles found in an offspring genotype at all loci . Cryptic gene flow Gene-flow events incorrectly assigned to candidate parents but actually due to unobserved parents outside the area where candidate parents were sampled, leading to an underestimate of gene-flow distances. Exclusion Rejection of an individual as a possible parent due to genetic mismatch (neither allele in the individual's genotype is identical to one of the alleles in the progeny genotype).

Exclusion probability The chance that an individual can be rejected as a candidate parent due to genetic mismatch; depends on allele frequencies and increases with the number of loci and the numbers of alleles per locus employed in a parentage analysis.

We can express the probability that an individual taken at random from a population would be ruled out as a parent due to genetic mismatch. Equation 4.2 gives the probability of a random match at a single locus, or the probability that a genotype has a matching allele by chance alone. If a genotype does not match by chance then it is excluded from possibly being the parent. This means that the exclusion probability for a single individual sampled at random from a population is just one minus the probability of a random match:

If more than one candidate parent is sampled from a population, the probability of exclusion for each individual is independent (the genotype of each individual represents a random sampling of the alleles present in the population). Therefore, the total probability of ruling out or excluding all candidate parents is the product of the exclusion probabilities for each individual. For a sample of n individuals from a population, the total probability of exclusion is then:

P(exclusion for n individuals)

This means that the chance of exclusion decreases as more individuals are sampled from a population. This is the same as saying that the chances of samp ling an individual that matches a parental haplotype just by chance increases as more candidate parents are sampled.

Based on the exclusion probability in a population of n candidate parents, we can estimate the chances that a random match does occur. Since the exclusion probability is the chance of not matching at random, the probability of a haplotype match between a candidate parent and an offspring in a population of n individuals is just one minus the probability of exclusion for n individuals, or

P(random match in n individuals)

This is the probability that a haplotype matching the true parent will occur at random in a sample of n candidate parents.

The probability of a random match in a sample of n candidate parents (equation 4.6) can be thought of as the chance that a candidate parent is mistakenly assigned as the true parent since its genotype provides the matching haplotype by chance, while the true parent is not identified since it is not included in the sample of candidate parents. This phenomenon is referred to as cryptic gene flow in paternity analysis since the true gene flow event is not identified, even though a parent has been mistakenly inferred for the progeny. If the true parent was not included in the sample of candidate parents because it was outside the sampling area, incorrectly inferred parentage results in an underestimate of gene flow distances. Equation 4.6 shows that the probability of incorrectly assigning parentage due to random matches increases as the number of candidate parents increases for a given expected genotype frequency.

Returning to the C. alta example in Table 4.2, we can determine the chances that one of the candidate parents is incorrectly inferred to be a father while the true father remains undetected as well as the chances of paternity exclusion with the 30 candidate parents in the study. For seed 3-1 the maternal and paternal parents are the same (Table 4.4), indicating a self-fertilization event. Based on the paternal parent haplotype expected frequency, the chance of paternity exclusion is (1 - 0.044)30 = 0.259 and the probability of a random match is therefore 0.741. Since the four-locus inferred paternal haplotype is expected to occur very frequently (74% of the time) by chance in a sample of 30 candidate parents, there is also a good chance that the seed could appear self-fertilized even though it was actually sired by an individual not in the sample of candidate parents. For seed 989 1-1 where tree 1946 was the only included candidate parent, the chance of paternity exclusion is (1 - 0.0000665)30 = 0.9980 and the probability of a random match is therefore 0.0020. The five-locus inferred paternal haplotype for seed 989 1-1 is expected to appear by chance in only two of 1000 samples of 30 candidate parents given the estimated allele frequencies.

There are four general outcomes for each offspring-known-parent pair in parentage analysis, as follows.

1 A single candidate parent is identified as the parent. Such single parentage assignments need to be interpreted in light of the exclusion probability or likelihood of parentage.

Problem box 4.1 Calculate the probability of a random haplotype match and the exclusion probability

The seed 25-1 from maternal tree 989 shows exact haplotype matches with candidate paternal tree 4865 (see Table 4.2). Using the allele frequencies provided in Table 4.3, calculate the probability of a random match for the paternal haplotype. Then use this probability of a random match to calculate the exclusion probability for the sample of 30 candidate parents. What loci are most and least useful in determining paternity for these two seed progeny? Why?

Interact box 4.1 Average exclusion probability for a locus

In planning a parentage analysis study, it is necessary to determine whether a set of genetic markers will have a sufficiently small probability of exclusion (this is called the power of the genetic markers). As shown in equation 4.4, the exclusion probability will depend on the expected genotype frequency for a single parental haplotype. This expected genotype frequency is in turn a function of the number of alleles and the allele frequencies at each locus. Since there are many possible genotypes for a locus with three or more alleles, the average probability of exclusion is used to estimate the power of a set of genetic markers to demonstrate nonpaternity (see Chakraborty et al. 1988; Weir 1996).

You can use an Excel spreadsheet that has been set up on the textbook website to calculate the average probability of exclusion (abbreviated as PE in the spreadsheet) for a case of one locus with six alleles and one locus with 12 alleles. The spreadsheet uses the allele frequencies (that you can modify) to calculate (i) the expected frequencies of each maternal-parent-offspring genotype combination and (ii) the exclusion probabilities for the paternal haplotype(s) for each maternal parent-offspring genotype combination. The average exclusion probability is then the average of the exclusion probabilities where each exclusion probability is weighted by the expected frequency of the maternal-parent-offspring genotype combination. The spreadsheet follows the derivation for a locus with three alleles given in Table 1 of Chakraborty et al. (1988). The maximum average exclusion probability occurs when all alleles at a locus have identical allele frequencies (e.g. each allele has a frequency of 1/6 when there are six alleles). The maximum average exclusion probability is computed in each spreadsheet according to:

k where k is the number of alleles at the locus (Selvin 1980).

Compare the average probability of exclusion for cases where the frequencies of each allele are very similar to cases where one or a few alleles are very common and the remaining alleles are rare. How does the evenness of the frequencies for the alleles influence the average exclusion probability? How do you combine the average exclusion probabilities for multiple loci? What is the average exclusion probability of two loci with 12 alleles or two loci with six alleles when allele frequencies are all equal for each locus? How many independent loci with 12 equally frequent alleles would be required for a probability of exclusion of 90% when there are 50 candidate parents?

2 Multiple candidate parents are identified for a single progeny. In these cases one commonly used criterion is to assign as parent the candidate parent with the lowest probability of matching by chance. Additional criteria might also include spatial separation from the known parent, degree of reproductive overlap with the known parent, or reproductive dominance, if such information is available.

3 None of the candidate parents have a genotype that could have combined with the known parent to yield the progeny genotype. In this case the actual parent may not be present in the sample of candidate parents. Such an outcome is often used to infer that the gene-flow event leading to that progeny was from a relatively long distance from a parent outside the sample area of candidate parents (so-called off-plot gene flow). However, it is also possible that the actual parent is in the population of candidate parents but has a genetic mismatch at one or more loci due to a genotyping error or mutation. An additional alternative is that the actual parent was inside the sampling area of candidate parents when mating took place, but the individual either died or migrated before sampling of the candidate parents was carried out.

4 Parentage is assigned to a candidate parent but the true parent is an individual not included in the sample of possible parents. When making paternity assignments, the chance of incorrectly assigning paternity within a group of sampled individuals when the father is actually outside the population, or missing a "cryptic gene flow" event, will be related to the expected frequency of a given multilocus genotype.

Parentage analyses measure gene flow by inferring numerous mating events within the population of candidate parents that lead to each sampled progeny or juvenile in a population. This provides estimates of quantities such as the average distance between parents or the number of matings where both parents were within a sample area compared to the number of matings where a parent was outside that area. This means that resulting estimates of gene flow do not rely on any model of population structure or gene flow other than the assumptions that are used to construct the parentage assignments themselves. The resulting estimates of gene flow are therefore considered "direct." The clear strength of parentage analyses is that much can be learned about patterns of mating since parental pairings that lead to a specific offspring can often be identified with medium to high confidence.

Parentage analyses have been a critically important tool used to learn about mating and relatedness patterns in wild populations. An example is the numerous studies of parentage among bird nestlings that overturned the long-held idea that birds were usually monogamous breeders. Instead, birds have variable and complex mating patterns where mating outside of nesting pairs by both females and males can be common and juveniles in the nest may not be related to one or both of the nest-attendant "parents" (Westneat & Stewart 2003). Parentage analyses have also been used in a wide variety of plant and animal species to produce detailed descriptions of mating and gene-flow patterns.

Although the term direct has connotations of precision and ready insight, it is important to recognize that parentage analyses do have limitations when used to infer patterns of gene flow. A major limitation stems from the fact that most parentage studies cover a time scale of only a few generations at most. In all organisms with population sizes that are stable through time, each parent produces just one offspring on average that survives to reproduce successfully. The other progeny die or do not reproduce. This means that many, perhaps even most, of the progeny included in parentage studies ultimately do not reproduce. This problem is particularly acute in long-lived organisms, where parentage studies examine only a very small fraction of progeny produced over a period much less than the average individual lifetime. Gene flow can be thought of as the long-term average of the matings that lead to individuals that survive and contribute progeny to the next generation. How effective parentage studies are at estimating longer-term patterns of gene flow then depends on the sampling duration of parentage studies relative to generation time and how variable parentage patterns are over the short term compared to their long-term averages.

4.3 Fixation indices to measure the pattern of population subdivision

• Extending the fixation index to measure the pattern of population structure through FIS, FST, and F

0 0