Wia

II 111 It -I1IIMI II II II llll-1 I

Physical location

Figure 6.4 Tandemly arranged genes show variable distribution across chromosomes. Open boxes represent centromeres. Reproduced from Shoja and Zhang (2006), by permission of Oxford University Press.

duplication to nonhomologous pericentromeric regions (Bailey and Eichler 2006). For example, duplication of a 9.7 kb segment in the adrenoleukodystrophy ALD locus from the X chromosome to a pericentromeric region of chromosome 2, followed by duplication from 2p11 to 10p11, 16p11, and 22q11. These duplicated regions still have high sequence homology and are believed to have arisen over the last 5-10 million years (Eichler et al. 1997).

Subtelomeric regions are a second major 'hotspot' of duplication in which a high concentration of recent inter-chromosomal segmental duplications have been found (Linardopoulou et al. 2005). Subtelomeres are transition zones ranging in size from 10 to 300 kb, which are found near the tips of chromosomes, between chromosome-specific sequences and the arrays of telomeric repeats that cap each chromosome (Box 7.3) (Mefford and Trask 2002). Subtelomeres are remarkably dynamic and variable regions, comprising 25 gene families in a patchwork array of duplicated blocks showing high homology but great diversity in copy number and chromosomal location. Gene products include olfactory receptors and cytokines as well as transcription factor proteins. The patchwork of segmental duplications is thought to have arisen due to complex double-stranded DNA breakage and repair leading to numerous repeated translocations at the ends of chromosomes (Linardopoulou et al. 2005). Multiple events have led to a mosaic of adjacent duplicons with maintenance of sequence orientation between copies. These interchromosomal duplications are thought to have occurred over very recent evolutionary time: 49% of known subtelomeric sequence is believed to have been generated after humans and chimpanzees diverged (Linardopoulou et al. 2005).

For such a small portion of the genome, subtelomeres account for a very high proportion of highly homologous duplications: 40% of all duplications in the sequenced genome with a sequence identity of 98.7% or more (Linardopoulou et al. 2005). The rate of gene duplication in subtelomeres is four times that of the genome-wide average with seven gene duplicates having arisen in human subtelomeres per million years. Polymorphism in the extent of duplication in healthy individuals is recognized, for example stable interindividual variation in the length of alleles involving the subtelomeric region on the short arm of chromosome 16 (Wilkie et al. 1991). The consequences can also be severe, with a number of genetic diseases associated with subtelomeric chromosomal rearrangements (Section 5.3).

6.2.3 Non-allelic homologous recombination, segmental duplications and genomic disorders

Non-allelic homologous recombination (section 5.2.1) and replication error are thought to be the most important mechanisms leading to tandem duplications (Bailey and Eichler 2006). Segmental duplications, and in particular LCRs, are themselves prone to non-allelic homologous recombination at meiosis which can lead to genomic rearrangements and genomic disorders (Section 5.2); such events in mitosis can result in a mosaic somatic cell population carrying genomic rearrangements which are associated with cancer or mosaic manifestations of gen-omic disorders (mechanisms of genomic rearrangements reviewed by Gu et al. 2008). If LCRs are present on the same chromosome and in a direct orientation with each other, non-allelic homologous recombination may result in duplications and deletions; if LCRs are present in opposite orientations, inversions. The efficiency of non-allelic homologous recombination is related to the distance between LCRs. Within LCRs, a minimal length of extremely homologous sequence is required, usually between 300 and 500 bp. Particular sequences are associated with double stranded DNA breaks (for example palindromes, minisatellites and mobile DNA elements) which are seen as 'hotspots' for non-allelic homologous recombination (Gu et al. 2008).

Specific chromosomal regions which have been the target of multiple independent duplication events are described as duplication hubs or acceptor regions (reviewed in Bailey and Eichler 2006). These can be associated with disease phenotypes such as DiGeorge syndrome, a genomic disorder caused by non-allelic homologous recombination between three highly identical (greater than 99%) duplication hubs on chromosome 22q11 (Box 5.2).

6.2.4 Alu elements and segmental duplications

Alu elements are a family of retrotransposons (class I mobile DNA elements), noncoding DNA sequences approximately 300 bp long which are found extensively across the human genome (Section 8.4). They are particularly associated with segmental duplications and low copy repeats. Alu insertions are seen to be enriched at the junctions of duplications, with 27% of segmental duplications found to terminate within an Alu repeat (Bailey et al. 2003). A burst of Alu retroposon activity has been identified in primates occurring 35-40 million years ago (Shen et al. 1991). Bailey and colleagues propose that this sensitized the ancestral genome to Alu-Alu-mediated recombination events and initiated an expansion of gene-rich segmental duplications (Bailey et al. 2003).

6.2.5 Segmental duplications in primates and other species

Segmental duplications have been found in a range of other genomes, ranging from the worm (Caenorhabditis elegans) (Mounsey et al. 2002) and the fly (Drosophila melanogaster) (Fiston-Lavier et al. 2007), to the mouse (Mus musculus) (Cheung et al. 2003b; Bailey et al. 2004), rat (Rattus norvegi-cus) (Tuzun et al. 2004), and dog (Canis familiaris) where an estimated 2-4% of the genome is duplicated. Segmental duplications among mammalian genomes are larger than those seen in the worm or fly. Differences are seen between mammals, with interchromosomal segmental duplications more common in humans (48%) compared with mice and rats (13% and 15%, respectively) (Bailey et al. 2004; Tuzun et al. 2004), while tandemly duplicated segments are less common (45% in humans versus 70-90% in mice, chickens, and rats) (She et al. 2006)).

Publication of the chimpanzee (Pan troglodytes) genome sequence (Box 10.1) highlighted significant differences in terms of segmental duplications with one third of human duplications (showing more than 94% sequence identity) not found in the chimpanzee genome (Cheng et al. 2005). These were remarkable differences, highlighting the extent of recent segmental duplications in primate evolution. Cheng and colleagues estimated that differences involving duplicated segments accounted for much more of the sequence difference between humans and chimpanzees than fine scale sequence diversity (2.7% compared to 1.2% at the level of single base pair differences). Even within shared duplications, there was evidence of significant copy number variation. Further insights came from analysis of the genome sequences of other primates. The rhesus macaque (Macaca mulatta) is an Old World Monkey thought to share a common ancestor with humans some 25 million years ago (Fig 6.5). Sequencing of the macaque genome (Box 10.2) highlighted much lower levels of segmental duplication, comprising 2.3% of the genome in comparison with humans and chimpanzees where 5 to 6% of the genome consists of segmental duplications (Gibbs et al. 2007).

A recent analysis of duplications among four primate genomes (human, chimpanzee, orang-utan and macaque) resolved these differences further, highlighting how 80% of human segmental duplications arose after the divergence of the hominoid linages from Old World Monkeys (analyzing duplications more than 20 kb in size with more than 94% identity) (Marques-Bonet 2009). Indeed, Eichler and colleagues showed evidence of a highly significant increase in duplication activity in the common ancestor of humans and African great apes, and after the divergence of the gorilla and human-chimpanzee lineages - the authors note how this contrasts with the 'slowing' of other processes generating genetic diversity including point mutations and retrotransposon activity in the hominoid lineage (Marques-Bonet 2009).

6.3 Duplication and evolution

6.3.1 Whole genome duplications

Segmental duplications are thought to have had a less dominant evolutionary impact than ancient whole genome duplications, which were associated with major evolutionary changes involving evolutionary transitions and adaptive radiation of species (Maere et al. 2005). Current hypotheses suggest either two rounds of whole genome duplication early in vertebrate evolution (Dehal and Boore 2005) or a single round of whole genome duplication followed by gene family expansion through small scale segmental and tandem duplication 50-150 million years ago (Gu et al. 2002).

6.3.2 Gene creation

Segmental duplications are, however, thought to be one of the main mechanisms for the creation of new genes over evolutionary time, predominantly through duplications of entire genes and more rarely through exon shuffling and the creation of fusion transcripts (Box 6.2)

New World Monkeys

Old World Monkeys

Hominoids

African Great Apes

Marmoset Macaque Orang-utan Gorilla Chimpanzee Human

Old World Monkeys

African Great Apes

Marmoset Macaque Orang-utan Gorilla Chimpanzee Human

Fig 6.5 Phylogeny of hominoids, old world and new world monkeys. Estimated times of divergence indicated (Myr, million years ago). Figure adapted and reprinted by permission from MacMillan Publishers Ltd: Nature Reviews Genetics (Samonte and Eichler 2002), copyright 2002; and Nature (Marques-Bonet et al. 2009), copyright 2009.

Fig 6.5 Phylogeny of hominoids, old world and new world monkeys. Estimated times of divergence indicated (Myr, million years ago). Figure adapted and reprinted by permission from MacMillan Publishers Ltd: Nature Reviews Genetics (Samonte and Eichler 2002), copyright 2002; and Nature (Marques-Bonet et al. 2009), copyright 2009.

(Taylor and Raes 2004; Bailey and Eichler 2006). Genes within segmental duplications have been noted to commonly (1) show an excess of structural (copy number) variation; (2) have strong positive signatures of selection (Section 10.2); and (3) encode proteins involved in the immune response, reproduction, nuclear function, olfactory reception, and drug detoxification (Bailey and Eichler 2006). Such features are consistent with an important role in primate and human adaptive evolution.

On the short arm of chromosome 16, segmental duplications are highly variable in copy number between species, with evidence of strong positive selection (Johnson et al. 2001). Segmental duplications account for more than 10% of the euchromatic sequence on the short arm of chromosome 16, with 20 elements described (denoted low copy repeat sequences on chromosome 16, LCR16a to LCR16t) (Loftus et al. 1999). Of particular interest is LCR16a, a duplicated segment 20 kb in length with high sequence identity which is present as 15 duplicated copies in humans (Johnson et al. 2001). Johnson and colleagues found evidence of a major proliferation in the number of these duplicons between great apes and Old World monkeys, with 17 copies in gorillas and 25-30 copies in chimpanzees but only one or two copies of LCR16a found in all Old World monkeys.

A likely ancestral origin at 16p13.1 was found, with duplication to other chromosomes noted in orangutans and chimpanzees consistent with lineage-specific expansion. Strikingly, exonic regions within the repeats were found to be hypervariable (10% nucleotide divergence compared to 2% in intronic sequence). Analysis of average nucleotide substitution rates for nonsynonymous

Box 6.2 Juxtaposition of segmental duplications leading to new genes

A fusion of segmental duplications from two genes on the long arm of chromosome 17 were shown to lead to the creation of the oncogene USP6 (Tre2) (ubiquitin-specific protease 6) on the short arm of the same chromosome, at 17p13.2 (Fig. 6.6) (Paulding et al. 2003). This chimeric gene is expressed in a variety of cancers and is hominid-specific. It is thought to have arisen 21-33 million years ago from duplications of an ancient, highly conserved gene USP32 and TBC1D3 (TBC1 domain family member 3). TBC1D3 is thought to have itself arisen from a recent segmental duplication leading to rapid dispersal through the primate lineage.

0 0

Post a comment