3 Chapter 3 – Hardy-Weinberg Equilibrium Or Not?
One very common starting point for Quantitative Genetics theory is the exploration of what happens at a single locus in a large, random mating population when there has been no selection, migration or mutation happening. This is the classic concept of Hardy-Weinberg Equilibrium. It is a very simplistic view of a Utopian situation that hopefully never occurs in a domestic livestock or plant population. Why not? Domesticated animals and plants are the result of selection so just the fact that they are domesticated nullifies the selection assumption as well as the migration assumption as domestic plants and animals have been moved around the globe. The random mating assumption – we hope that one has been nullified as well if geneticists are doing a successful job of selection. So why teach things that are not happening? Well, assuming an equilibrium state as a starting point gives us a stable reference from which to branch out. So although it rarely happens exactly to the letter of the assumptions in real life, assuming we have a large, random mating population with no selection, migration or mutation in the current generation is a good place to start from. In this e-book, we will use the assumptions of Hardy-Weinberg Equilibrium (HWE) as a framework for our exploration of the single locus. In other words, we will systematically smash the HWE conditions and assumptions one by one. But before we can do that, we need to set the stage and understand allele and genotypic frequencies.
Allele Frequencies
We’ll start with calculating genotypic and allele frequency. Calculating allele frequencies is one of the fundamental things in Quantitative Genetics and we will use this over and over throughout this e-book. There are several possible levels of information availability.
If you know the number of individuals with at least two of the three genotypes, you can compute the frequencies from there. For example, suppose you know that out of 200 individuals, 72 have the AA genotype and 96 have the Aa genotype. That leaves 200 – 72 – 96 = 32 individuals with the aa genotype since there are only three possible genotypes. To calculate allele frequencies, one way is to count the number of individuals with the A allele. AA individuals have two copies = 72 x 2 = 144 A alleles and the Aa individuals have one copy = 96 A alleles for a total of 144 + 96 = 240 A alleles in our example. Out of a possible 400 alleles (two per individual) that means f(A) = 240 / 400 = 0.6. Similarly, for the a alleles, there are 96 Aa individuals with one copy and 32 aa individuals with two copies for a total of 96 + (2 x 32) = 160 out of 400 alleles or f(a) = 160 / 400 = 0.4 . By convention, we usually refer to the frequency of the dominant allele with a lower case p so in this example, p = f(A) = 0.6. We usually use a lower case q for the frequency of the recessive allele so q = (a) = 0.4. Note that since there are only two alleles and they represent 100% of the alleles, p + q = 1 and p = 1 – q and q = 1 – p.
That process is a good learning tool but it can get cumbersome and many of the formulae we will encounter are based on a more direct route. From the same example, note that 72 out of 200 individuals are AA so let’s label that genotypic frequency or proportion with a capital P as P = f(AA) = 72 / 200 = 0.36 . Then, for the heterozygotes, let’s label that genotypic frequency or proportion with a capital H, H = f(Aa) = 96 / 200 = 0.48 . Finally, for the homozygous recessive individuals, let’s label that genotypic frequency or proportion with a capital Q, as Q = f(aa) = 32 / 200 = 0.16. Since the heterozygotes have one copy of each allele, they contribute ½ to the frequency of each allele. This yields two formulae you will see and use often:
[latex]p = f(A) = P + \frac{1}{2} H[/latex]
[latex]q = f(A) = Q + \frac{1}{2} H[/latex]
In this example, [latex]p = f(A) = P + \frac{1}{2} H = 0.36 + \frac{1}{2} (0.48) = 0.6[/latex] . Similarly, [latex]q = f(A) = Q + \frac{1}{2} H = 0.16 + \frac{1}{2} (0.48) = 0.4[/latex] . Note again that p + q = 1 so q = 1 – p and p = 1 – q. Also, calculating P, H, and Q the way we did, P + H + Q = 1
Hardy-Weinberg Equilibrium
One topic that always comes up right after a discussion of Mendelian genetics is Hardy-Weinberg Equilibrium (HWE). Two researchers working at the turn of the 20-th century once Mendel’s work was rediscovered, Godfrey Hardy (1877-1947) and Wilhelm Weinberg (1862-1937), independently came up with the idea that if nothing messes with a population, allele frequencies stay constant and those allele frequencies define genotypic frequencies via random mating. So each successive generation should have the same allele and genotypic frequencies. This is obviously an idealistic view of a population but it does give us a foundation upon which we can build a number of concepts. Some of the spin-off properties from Hardy-Weinberg equilibrium include ways to exploit the consistency of frequencies to estimate allele frequencies.
HWE says “Allele frequencies and genotypic frequencies remain constant and genotypic frequencies are determined by allele frequencies in a large, random mating population in the absence of selection, migration and mutation.” This means that if p = 0.6 in one generation then it is 0.6 in all generations and f(AA) = p2 = 0.36, f(Aa) = 2pq = 2(0.6)(0.4) = 0.48 and f(aa) = q2 = 0.16 since f(AA) = f(A)xf(A) = p x p = p2 and so on. Random mating is the calculation of the frequency of the progeny genotypes from the parental allele frequencies. A Punnet Square or calculating p2, 2pq, and q2 is random mating.
Suppose we know a recessive condition, aa, has a frequency of 1% in the population but since it is recessive, we cannot differentiate the AA from the Aa individuals due to dominance of the A allele and we don’t know what H is. Can we estimate q? Yes, but we need to make an assumption. If we assume the population is in Hardy-Weinberg Equilibrium or more precisely we have a single generation resulting from random mating then we can assume that P = p2, H = 2pq and Q = q2 . If we know that Q = f(aa) = q2 = 0.01 then we can estimate [latex]q = \sqrt{0.01} = 0.1[/latex] and then p = 1 – 0.1 = 0.9 . How bad is the assumption required to calculate this? Basically, you are ignoring the contribution of the heterozygotes to the a allele frequency. Thus, this estimation only works if the heterozygotes are 2pq of the population.
Hardy-Weinberg Equilibrium also provides a means to determine the maximum number of heterozygous individuals in a large, random mating population. The frequency of the heterozygous genotype is 2pq. Since p + q = 1, this is equivalent to H = 2(1-q)q = 2-2q2. To find the maximum, take the partial derivative of H with respect to q and set it equal to zero (did I hear a groan with the mention of calculus?).
[latex]\frac{\partial H}{\partial q} = \frac {\partial (2 - 2q^2)}{\partial q} = 2 - 4q[/latex]
and set equal to zero so 2 – 4q = 0 so 2 = 4q so q = 0.5 at Hmax
So the maximum value of H occurs at q = 0.5 and therefore p = 0.5 so H = 2pq = 0.5 is the maximum value of H with random mating.
What if we have more than two alleles?
We can use Hardy-Weinberg Equilibrium as an assumption to extend our theory to the three allele scenario also. Suppose we have three alleles. If we have HWE, then the frequencies of the genotypes will depend upon the frequencies of the alleles, exactly as they do with the bi-allelic case.
As an example, for rabbit coat colour one locus has three alleles creating six genotypes and resulting in three phenotypes. Full (or normal) colour, F, f(F) = p, is dominant to Himalayan, H, f(H) = q, which is dominant to Albino, A, f(A) = r. Note we can’t use capital and small letters for alleles since there are three alleles in play. So the full colour phenotype can be achieved with the FF, FH and FA genotypes with frequencies p2, 2pq and 2pr respectively. The Himalayan phenotype can be achieved with the HH and HA genotypes with frequencies q2 and 2qr respectively. And finally, the Albino phenotype can be achieved with just one genotype, AA with frequency r2. But we have a whole bunch of variables. How can we figure out allele frequencies?
The number of full colour rabbits will be the number of FF, FH and FA genotypes and the proportion of full colour rabbits will be that number divided by the total number of rabbits. But the proportion of genotypes for that phenotype is [latex]p^2 + 2pq + 2pr = \frac{total\; full\; colour}{total\; rabbits}[/latex]. Similarly the number of Himalayan rabbits will be the number of HH and HA genotypes and the proportion of Himalayan rabbits will be that number divided by the total number of rabbits. But the proportion of genotypes for that phenotype is [latex]q^2 + 2pr = \frac{total\; Himalayan}{total\; rabbits}[/latex]. And finally, the number of Albino rabbits will be the number of AA genotypes and the proportion of Albino rabbits will be that number divided by the total number of rabbits. But the proportion of genotypes for that phenotype is [latex]r^2 = \frac{total\; Albino}{total\; rabbits}[/latex].
Starting from the Albino rabbits and working our way up while assuming random mating, we can solve for the frequencies of the alleles.
First notice [latex]r^2 = \frac{total\; Albino}{total\; rabbits} \;so\; r = \sqrt\frac{total\; Albino}{total\; rabbits}[/latex]
Then notice that if we combine the Himalayan and Albino categories, the number of Himalayan and Albino rabbits will be the number of HH and HA genotypes plus the number of AA genotypes and the proportion of Himalayan and Albino rabbits will be that number divided by the total number of rabbits. But the proportion of genotypes for those two phenotypes is:
[latex]q^2 + 2pr + r^2 = \frac{total\; Himalayan\; +\; total\; Albino}{total\; rabbits}[/latex]
We can take the square root of both sides since [latex]\sqrt{q^2 + 2pr + r^2} = q + r[/latex]
so [latex]\sqrt{q^2 + 2pr + r^2} = \sqrt\frac{total\; Himalayan\; +\; total\; Albino}{total\; rabbits}[/latex]
so [latex]q + r = \sqrt\frac{total\; Himalayan\; +\; total\; Albino}{total\; rabbits}[/latex]
and therefore [latex]q = \sqrt\frac{total\; Himalayan\; +\; total\; Albino}{total\; rabbits} - r[/latex]
and then p = 1 – q – r
Let’s do an example. If the number of individuals with the Himalayan colour pattern represent 31% of the population and the albinos represent 14% of the population, what is the frequency of the F allele (ie what is f(F) = p)?
First, calculate [latex]r = f(A) = \sqrt{0.14} = 0.3742[/latex]
Next, calculate [latex]q = f(H) = \sqrt{0.31 + 0.14} - 0.3742 = 0.2966[/latex]
And finally p = f(F) = 1 – q – r = 1 – 0.2966 – 0.3742 = 0.3292
Thankfully, that’s the last time we will look at three alleles at a single locus in this e-book. More than two alleles is just an extension of the two allele situation so we can focus on two alleles from now on knowing we could extend to three or more if we had to.