Appendix

Statistical uncertainty

• Quantities in population genetics are estimated with error.

• Parameters and parameter estimates.

• Introduction to variance, standard deviation, and standard error.

Statistical concepts arise in this book in several places. Chapter 1 points out the distinction between idealized parameters, which are exact values, and parameter estimates that have uncertainty obtained through sampling from populations. Both the variance and the covariance are important concepts that appear in a number of chapters, especially in Chapters 9 and 10 that cover quantitative genetics. This Appendix is meant to provide a basic introduction to statistical concepts relevant to these topics for readers without much prior background.

Imagine drawing a random sample of objects, say a handful of jelly beans from a candy dish, and weighing each one. The weights will not be identical, but there will be some value that occurs most often and some range of values. Plotting these values on a graph such as that in Fig. A.1 would show how often each one occurs between the lowest and highest values, or their frequency distribution (often truncated to just "distribution" in conversation). We often use the average (mean and average are used synonymously here) or the mode (the most frequently occurring value) to describe the central tendency or middle of a frequency distribution.

Let's examine a hypothetical case that will show the distinction between a parameter and a parameter estimate as well as illustrate a common means to quantify uncertainty caused by sampling variance. Imagine we would like to estimate the frequency of the A allele (for a locus with two alleles) in a population of mice. These mice inhabit barns in an area where there are many isolated farms, each with a suitable barn. Therefore, the mice are found in many discrete populations that make up a larger total population. An example of this type of population is diagrammed in Fig. A.2.

We would like to estimate the frequency of the A allele in the entire population, which we will call p. The entire population has an exact allele frequency, the parameter p, which we could only know if we determined the genotype of every mouse in the population. Since it would be very difficult to sample every mouse, we take samples from a number of distinct, independent populations to estimate the allele frequency within each population. The average of allele frequencies in sampled populations will be our estimate of the parameter p. Call the estimate of allele frequency for each barn pi, where the subscript i is just an index of which barn the value came from. For simplicity, we assume in this illustration that each value of pi within a barn is known without error.

A common quantitative tool to measure and express the range of values within a sample is called the variance (symbolized by o2 or a lower-case "sigma" squared). In plain language, the variance is a standardized measure of the range of observed values relative to the average. It is simple to obtain the average allele frequency among all of the sampled

50 r

50 r

Weight in grams

Figure A.1 The frequency distribution of hypothetical weights for 200 jelly beans. The mean is 5.06 and the variance is 1.06.

Weight in grams

Figure A.1 The frequency distribution of hypothetical weights for 200 jelly beans. The mean is 5.06 and the variance is 1.06.

0 0

Post a comment