## Parameters and parameter estimates

While developing the expectations of population genetics in this book, we will most often be working with idealized quantities. For example, allele frequency in a population is a fundamental quantity. For a genetic locus with two alleles, A and a, it is common to say that p equals the frequency of the A allele and q equals the frequency of the a allele. In mathematics, parameter is another term for an idealized quantity like an allele frequency. It is assumed that parameters have an exact value. Put another way, parameters are idealized quantities where the messy, real-life details of how to measure the quantities they represent are completely ignored.

Empirical population genetics measures quantities such as allele frequencies to give parameter estimates by sampling and then measuring the alleles and genotypes present in actual populations. All experiments, observations, and even simulations in population genetics produce parameter estimates of some sort. There is a subtle notational convention used to indicate an estimate, the hat or ~ character above a variable. Estimates wear hats whereas parameters do not. Using allele frequency as an example, we would say p (pronounced "p hat") equals the number of A alleles sampled divided by the total number of alleles sampled. Intuitively, we can see from the denominator in the expression for p that the allele frequency estimate will depend on the sample we gather to make the estimate.

In all populations a parameter has one true value. For the allele frequency p, knowing this true value would require examining the genotype of every individual and counting all A and a alleles to determine their frequency in the population. This task is impractical or impossible in most cases. Instead, we rely on an estimate of allele frequency, p, obtained from a sample of individuals from the population. Sampling leads to some uncertainty in parameter estimates because repeating the sampling and parameter estimate process would likely lead to a somewhat different parameter estimate each time. Quantifying this uncertainty is important to determine whether repeated sampling might change a parameter estimate by just a little or change it by a lot. When dealing with parameters, we might expect that p + q = 1 exactly if there are only two alleles with allele frequencies p and q. However, if we are dealing with estimates we might say the two allele frequency estimates should sum to approximately one (p + q ~ 1) since each allele frequency is estimated with some error. The more uncertain the estimates of p and q, the less we should be surprised to find that their sum does not equal the expected value of one.

Parameter A variable or constant appearing in a mathematical expression; a value (usually unknown) used to represent a certain population characteristic; any factor that defines a system and determines or limits its performance.

Estimate An indication of the value of an unknown quantity based on observed data; an approximation of a true score, parameter, or value; a statistical estimate of the value of a parameter.

It could be said that statistics sits at the intersection of theoretical and empirical population genetics. Parameters and parameter estimates are fundamentally different things. Estimation requires effort to understand sampling variation and quantify sources of error and bias in samples and estimates. The distinction between parameters and estimates is critical when comparing actual populations with expectations to test hypotheses. When large, random samples can be taken, estimates are likely to have minimal error. However, there are many cases where estimates have a great deal of uncertainty, which limits the ability to evaluate expectations. There are also instances where very different processes may produce very similar expected results. In such cases it may be difficult or impossible to distinguish the different potential causes of a pattern due to the approximate nature of estimates. While this book focuses mostly on parameters, it is useful to bear in mind that testing or comparing expectations requires the use of parameter estimates and statistics that quantify sampling error. The Appendix provides a review of some basic statistics that are used in the text.

### Inductive and deductive reasoning

Population genetics employs both inductive and deductive reasoning in an effort to understand the biological processes operating in actual populations as well as to elucidate the general processes that cause population genetic phenomena. The inductive approach to population genetics involves assembling measures of genetic variation (parameter estimates) from various populations to build up evidence that can be used to identify the underlying processes that produced the observed patterns. This approach is logically identical to that used by Isaac Newton, who used knowledge of how objects fall to the surface of the Earth as well as knowledge of the movement of planets to arrive at the general principles of gravity. Application of inductive reasoning requires detailed familiarity with the various empirical data types in population genetics, such as DNA sequences, along with the results of studies that report observed patterns of genetic variation. From this accumulated empirical information it is then possible to draw more general conclusions about the qualities and quantities of genetic variation in populations. Model organisms like D. melanogaster and Arabidopsis thaliana play a large role in population genetic conclusions reached by inductive reasoning. Because model organisms receive a large amount of scientific effort, to completely sequence their genomes for example, a great deal of available genetic data are accumulated for these species. Based on this evidence, many firm conclusions have been made about the population genetics of particular model species. Although model organisms provide very rich sources of empirical information, the number of species is limited by definition so that any generalizations may not apply universally to all species.

0 0