Na

on comparative gains or losses of DNA. lafrate and colleagues used large insert clone arrays to cover about 12% of the genome with a detection size of approximately 50 kb, while Sebat and coworkers used much smaller probes (oligonucleotides) with on average one probe every 35 kb (detection size 105 kb). Both technologies allowed only limited resolution and genomic coverage, and represented only a fraction of the likely variation across the genome. However, the unexpected scale of copy number variation that was found promoted great research interest and further high resolution analyses.

lafrate and colleagues analysed 55 unrelated individuals of whom 16 had chromosomal imbalances (the latter group were used to assess the sensitivity and specificity of the approach: all expected abnormalities were found) (lafrate et al. 2004). DNA from this panel of individuals was hybridized to the array and compared to pooled male or female genomic DNA from karyotyp-ically and phenotypically normal individuals. In total 255 copy number variants were found (Fig. 4.2), an average of 12.4 per individual, with 102 occurring in more than one individual, 24 in more than 10% of people, and six in more than 20%. The most common copy number polymorphism identified, present in nearly half of people analysed, involved a 150-425 kb region spanning AMYTA and AMY2A (amylase alpha loci) with gain or loss of the region in an equal proportion of those bearing the polymorphism. Entire genes were spanned by 67 of the 255 copy number variants. The authors believed that most of the variation represented tandem copy number changes and established the Database of Genomic Variants as a repository for structural genomic variation (http:// projects.tcag.ca/variation/).

Sebat and colleagues (2004) analysed a smaller number of individuals (20) and identified 76 unique copy number variants, only five of which were previously known. They found that among the individuals studied, on average any two individuals differed by 11 such variants (a similar figure to lafrate and colleagues) with an average length of 465 kb. Cytogenetic analysis confirmed a high proportion of variants tested. In this study the copy number variants were again widely distributed across chromosomes although clusters of three or more variants were noted at regions of chromosomes 6, 8, and 15, suggestive of hotspots of variation. A six-fold excess of segmental duplications was noted among deleted regions, in duplicated regions a 12-fold excess was noted (see Section 4.2.6 for a discussion of the role of segmental duplications in copy number variation).

4.2.2 Towards a global map of copy number variation

In contrast to these initial studies, Tuzun and colleagues in 2005 published the results of a sequence-based approach to try to produce a 'fine scale map' of structural variation across the genome (Tuzun et al. 2005). They hoped to assess a much greater proportion of the human genome and at a higher level of precision than the array CGH approaches reported by Sebat and lafrate, so as to detect deletions and insertions, together with inversions and smaller insertion/deletion polymorphisms (indels). Tuzun and coworkers compared sequence data for a North American female donor individual (NA15510) with that of the reference genome, which is more than 70% derived from a single individual (RPC1-11). This involved mapping 'paired end sequence data from the fosmid DNA genomic library' for NA15510, analysing 581 Mb of sequence, and identifying structural variants greater than 8 kb in size.

Overall 297 sites of variation were found, 139 insertions, 102 deletions, and 56 inversion breakpoints; 112 structural variants were validated and when subsequently 57 sites were analysed in a panel of 47 individuals, 28% showed copy number variation by array CGH (Tuzun et al. 2005). The data of Tuzun and colleagues showed a ten-fold enrichment of structural variation at segmen-tally duplicated regions of the genome. ln common with other surveys of structural variation they also noted overrepresentation of structural variation involving genes involved in drug detoxification, innate immunity and inflammation, surface integrity, and antigens.

ln 2006 Redon and colleagues published a firstgeneration map of copy number variation across the human genome for a large panel of ethnically diverse individuals (Redon et al. 2006). The team studied 270 lymphoblastoid cell lines from the lnternational HapMap Project, established from individuals of African, European, and Asian origin (Box 9.1). An important question with such immortalized cell lines is whether somatic artefacts would confound analysis of germline copy number variation. To address this, the investigators

18 19 20 21 22 Y

Figure 4.2 Copy number variation across the genome. An unexpectedly high level of variation in copy number was found on array CGH analysis (Iafrate et al. 2004). Circles shown to the right of each chromosome indicate the number of individuals with copy gains (black) and losses (grey) for each clone among 39 unrelated, healthy control individuals. Redrawn and reprinted by permission from Macmillan Publishers Ltd: Nature Genetics (Iafrate et al. 2004), copyright 2004.

Figure 4.2 Copy number variation across the genome. An unexpectedly high level of variation in copy number was found on array CGH analysis (Iafrate et al. 2004). Circles shown to the right of each chromosome indicate the number of individuals with copy gains (black) and losses (grey) for each clone among 39 unrelated, healthy control individuals. Redrawn and reprinted by permission from Macmillan Publishers Ltd: Nature Genetics (Iafrate et al. 2004), copyright 2004.

performed extensive karyotyping and removed aberrant chromosomes from analysis. Two platforms were used for detecting copy number variation, comparative analysis of hybridization intensities on SNP arrays and comparative genomic hybridization (CGH) using Whole Genome TilePath array (Fig. 4.3). The two approaches were complementary, the former facilitating detection of smaller copy number variants. An important caveat of this and other studies at this time is their limited power to detect smaller copy number variants (notably in the range 1-20 kb) such that many such variants have yet to be identified.

Redon and colleagues found 1447 copy number variable regions among the 270 individuals, spanning a total of 360 Mb of sequence (Redon et al. 2006). This equates to 12% of the human genome and accounts for more nucleotide content than SNPs. Two-thirds of variants were replicated, predominantly using the second technological platform. Within parent-child trios, biallelic copy number variants were found to be heritable. A heterogeneous chromosomal distribution was found for copy number variants, the proportion of any given chromosome involved ranged from 6% to 19% (Fig. 4.4). For most variants the ancestral state could not be determined: the minor allele was denoted as being derived, with deletions having a minor allele of lower copy number and duplications of higher copy number.

Copy number variable regions were more commonly found outside genes and ultraconserved elements, with a greater bias away from genes for deletions compared to duplications (Redon et al. 2006). This is thought to be due to deletions being under stronger selective pressure (purifying selection) for the removal of deleterious variants from the population (Brewer et al. 1998, 1999;

log2 ratio of 1 copy number (test vs 0

reference genome) -1

log2

(test/ reference)

log2

(tes2t/ reference)

Comparative genome hybridization using Whole Genome TilePath array

Comparative intensity analysis Affymetrix 500K early access SNP chip

0 0

Post a comment