where d is the observed proportion of sites that differ between two DNA sequences, or the p distance. K is then the estimate of the actual number of sites that have experienced divergence events corrected for multiple hits with the Jukes-Cantor nucleotide-substitution model.

A few examples will help show how the Jukes-Cantor model correction works in practice. Imagine two DNA sequences that differ at 1 site in 10 so the p distance is 10% or d = 0.10. This level of observed divergence is an under-estimate because it does not account for multiple hits. To adjust for multiple hits we compute corrected divergence as

