Figure 6.6 Gene creation through segmental duplication. Fusion of segmental duplications from TBC1D3 and USP32 result in a hominid-specific oncogene, USP6. Figure redrawn and reprinted by permission from Macmillan Publishers Ltd: Nature Reviews Genetics (Bailey and Eichler 2006), copyright 2006.

and synonymous changes (Ka and Ks respectively) within and between species showed evidence of strong positive selection. The differences between humans and Old World monkeys for example were very striking with Ka/ Ks quotients of 13 (values above one are generally taken as evidence of selection). Further analysis suggested the major effect of positive selection was operating in a common ancestor to human and African apes, after the separation of human and chimpanzee lineages from the orangutan less than 12 million years ago (Johnson et al. 2001).

In their review, Conrad and Antonarakis describe a number of other factors influencing gene retention following duplication including the degree of conservation, the sensitivity of genes to dosage effects (with loss of fitness for example if only one copy were available), and the regulatory and architectural complexity of the gene (duplicated genes tending to encode longer proteins with more cis-regulatory domains) (Conrad and Antonarakis 2007). The other major mechanism for generating proteomic diversity, alternative splicing (Box 1.15; Section 11.6), shows a negative correlation with gene duplication when analysed among gene families of different sizes (Kopelman et al. 2005; Su et al. 2006). Duplicated genes show fewer alternatively spliced iso-forms than single copy genes, particularly for recently duplicated genes.

6.3.3 Duplication rates over evolutionary timescales

Gene duplication has been considered to play a key role in evolution for a number of years. In 1970, Ohno proposed that gene duplication was very important in allowing genomes to grow and diversify (Ohno 1970). The subsequent availability of DNA sequence data from a diverse range of species allowed the rate of gene duplication and fate of duplicated genes to be assessed. The work of Connery and Lynch published in 2000 suggested that the rate of duplication was much higher than previously thought with, on average, a gene duplicating once every 100 million years (Lynch and Conery 2000). The authors compared the protein coding sequences available for human, mouse, chicken (Gallus gallus), worm, fly, rice (Oryza sativa), the flowering plant thale cress (Arabidopsis thaliana), and yeast (Saccharomyces cerevisiae). Dating of duplication events from the number of silent nucleotide changes indicated that most duplicates were relatively young. Almost all were silenced by degenerative mutations within a few million years (the average half life for a gene duplicate was estimated to be 4 million years) while strong purifying selection was noted for the few surviving functional duplicates. The high rates of duplication were similar across species such that a genome of 15 000 genes was likely to acquire 60-600 duplicate genes over a million years.

In 2006, Dermuth and colleagues analysed a number of mammalian whole genome sequences and proposed that since the split from chimpanzees there have been 689 genes gained and 86 lost along the lineage leading to modern humans (Demuth et al. 2006). When the 689 genes gained in humans and the 729 lost in chimpanzees are combined, this showed that humans and chimpanzees differ by at least 6% in their complement of genes (1418 out of 22 000 genes).

6.3.4 Evolutionary fate of duplicated genes

In evolutionary terms, the most likely outcome for a duplicated gene will be loss of function (Ohno 1970). Duplication of a gene in the absence of selective pressure on the copy allows rapid divergence with deleterious mutations leading to loss of function, becoming a pseudogene (Box 6.3) and eventually disappearing from the genome.

By contrast, gain or change in function through mutation occurring in the duplicated copy will allow the mutation to become fixed in the population by natural selection and the gene duplication to persist in the genome (Force et al. 1999; Lynch and Conery 2000). For a small minority of duplicated genes such effects are seen: mutations may lead to selectively advantageous novel functions for the duplicated copy while the other copy retains its original function, a process described as 'neofunctionalization'. This may, for example, involve mutations in noncoding DNA leading to diversity in tissue or developmental specificity of gene expression, or more rarely a change in the coding sequence. Alternatively, both the original and duplicated copies of the gene may mutate and acquire new and complementary functions to those of the original gene, leading to 'subfunctionalization' (Lynch and Force 2000). A number of other models have been proposed

Box 6.3 Pseudogenes

Pseudogenes genes show a high degree of sequence homology to a non-allelic functional gene but are themselves non-functional, usually due to a lack of protein coding ability (Jacq et al. 1977). A pseudogene may be generated by nonsense mutation, frameshift mutation, or partial nucleo-tide deletion. There are many examples of pseu-dogenes across the genome, notably within the major histocompatibility complex (MHC) region.

for the duplicate gene copy, notably 'genetic robustness' whereby there is redundancy but the highly conserved copy acts as a backup in the event of deleterious mutations occurring in the original gene (Gu et al. 2003).

6.4 Gene duplication and multigene families

Gene duplication and conversion has led to the evolution of multigene families (Ohta 2000; Nei and Rooney 2005). Multigene families are thought to arise from a common ancestral gene, leading to a group of genes with similar functions and DNA sequence. In some cases 'supergene' families are seen composed of related multigene families.

6.4.1 Olfactory receptor and globin supergene families

The largest superfamily known in vertebrate genomes is the olfactory receptors, comprising 17 gene families (Glusman et al. 2001). In humans the family comprises about 800 genes found in clusters of tandem arrays on all chromosomes except 22 and Y; 42% of olfactory receptor genes are found on chromosome 11. The mean cluster size is 300 kb with 80% of clusters comprising six to 138 genes. This is a remarkable proportion of the genome, indeed it is described as the 'olfactory subgenome' comprising as it does of nearly 1% of the human genome and over 30 Mb of sequence. Relative to the mouse lineage, humans have similar numbers of gene clusters but appear to have lost many olfactory receptor genes, while mice have gained many (mice have about 1400 olfactory receptor genes of which about 1040 are functional)

(Niimura and Nei 2005). In humans only about 390 olfactory receptor genes are functional, the remainder being pseudogenes. The number of olfactory receptor genes is thought to have increased by tandem duplication and chromosomal rearrangements (Nei and Rooney 2005).

In the conclusion to his paper published in 1961 on gene evolution and the haemoglobins (Fig. 6.7), Vernon Ingram wrote:

The suggestion is made that a single primitive myoglobin like haem protein is the evolutionary forerunner of all four types of peptide chain in the present day human haemoglobins, and of the corresponding peptide chains in other vertebrate haemoglobins. Such a scheme involves an increase in the number of haemoglobin genes from one to five by repeated gene duplications and translocations; the scheme may thus illustrate a general phenomenon in gene evolution. (Ingram 1961)

Myoglobin a2 All Hbs

Y2 Fetal Hb

Hb A2

Figure 6.7 Evolution of the haemoglobin chains. This figure from Ingram's paper published in 1961 shows points of gene duplication followed by translocation (denoted by an open circle), the a chain is the ancestral peptide chain. Reprinted by permission from Macmillan Publishers Ltd: Nature (Ingram 1961), copyright 1961.

A large body of research has subsequently demonstrated that over the last approximately 800 million years a superfamily of genes have become established from an ancestral globin gene (Fig. 6.8). The encoded proteins, haemoglobin, myoglobin, neuroglobin, and cytoglobin, continue to share the ability to bind oxygen, although the process of evolution has led, for example, to tissue-specific expression (myoglobin in muscle, neuroglobin in neuronal tissues). Divergence of a and p globin is thought to have occurred 450-500 million years ago, with duplication within p globin 150-200 million years ago leading to the proto-p and proto-e genes (Czelusniak et al. 1982; Goodman et al. 1987). A complex series of gene duplications, inactivation, fusion, and conversion events appear to have occurred specific to different mammalian lineages (Fig. 6.9) (Aguileta et al. 2004). The consequences of genetic diversity at the human a and p globin gene clusters for disease were discussed in Chapter 1 and the evidence for selection, notably in relation to malaria, are discussed in Sections 10.2 and 13.2.

6.4.2 Models for the evolution of multigene families

Nei and Rooney have reviewed different models for the evolution of multigene families (Nei and Rooney 2005). Analysis of the genes encoding haemoglobin a, p, y, 8, and myoglobin provided a paradigm for a divergent mode of evolution, in which phylogenetically related genes gradually diverged with the acquisition of new gene functions by duplicate genes (Ingram 1961). The

Ancestral globin ~800 Myr

550 Myr

450 Myr

Ancestral globin ~800 Myr

550 Myr

450 Myr

150 Myr


Z Li Mill

150 Myr

0 0

Post a comment