Math box Variance of a binomial variable

The variance in the outcome of binomial sampling over many trials is surprisingly easy to derive. Assuming a diallelic locus, let p be the fraction of successes (such as sampling the A allele) and q be the fraction of failures (such as failing to sample the A allele and getting an a allele instead) so that p + q = 1. In the Appendix, a variance is defined as the average of the squared differences between each estimate and the average. If 1 is used to represent a success and 0 a failure and p and q are used to represent the frequency of each outcome instead of the sums used in the Appendix, the variance in successes is:

The average of a binomial variable (x) is simply the probability of a success or p, just as when flipping a fair coin a large number of times the number of successes would approach the expected value of one-half heads or tails. Substituting p for x gives:

which can then be simplified by substituting 1 - p = q into the left-hand term:

and multiplying out the right-hand term to give o2 = pq2 + qp2 (3.10)

This result can then be rearranged by finding a factor common to both terms:

which simplifies after noticing that q + p = 1:

Markov chains

The next step in understanding genetic drift is to consider its effects in a large number of replicate populations. Instead of focusing on allele frequency in just a single population like in the last section, let's now explore the case where there is a collection of numerous independent but identical populations (an infinite number of replicate populations is sometimes called an ensemble in physics and mathematics). Using the approach of genetic drift in multiple finite populations, this section will cultivate an understanding of how drift works on average among many populations and will develop a prediction of how rapidly genetic drift causes populations to reach fixation and loss.

To get started, consider populations composed of a diallelic locus in a single diploid individual. Since there are only two alleles in a population, there are three possibilities for the numbers of one of the alleles: zero, one, and two copies. Each of these possible states in a population could be referred to by the number of A alleles, 0 through 2, which can be summarized in notation as P(0), P(1), and P(2). With this very basic type of population, we can ask: what are chances that a population starting out in one of these three states ends up in one of these three states due to sampling error? For example, what is the chance of starting out with two copies of A and ending up with one copy of A with a sample size of one individual (two gametes) between generations? This chance is known as the transition probability for allelic states. The transition probability is determined with the binomial formula:

where i is the initial number of alleles, j is the number of alleles after sampling, and N is the sample size of diploid individuals. As before, 2N =-NN-

enumerates the possible draws that yield j copies of the allele and pjq2N-j is the probability of sampling j copies of the allele given the allele frequencies in the initial population.

Equation 3.13 can be used to determine the expected frequencies of populations with a given allelic state in one generation based on the frequencies of populations in each allelic state in the previous generation. To predict the frequency of populations with one allelic state, we need to add up the chances that populations in all states in the previous generation transition to this state. Let's work through what is essentially bookkeeping to see this. The expected frequency of populations with two A alleles in generation one is the sum of the probabilities that populations a generation before with zero, one, and two A alleles become populations with two A alleles through sampling error. This can be stated in an equation as:

for the case of populations of a single diploid individual. In equation 3.14, the probability or frequency of a given allelic state is indicated by P(x) with subscripts to indicate the generation. In a population with one A and one a allele, the chance of sampling two A alleles, P1^2, is

 / \
0 0