Figure 4.7 Whole gene deletions among individuals from the International HapMap Consortium. Frequencies of deleted alleles among 269 individuals from the three HapMap populations are shown: CEU (European geographic ancestry), JCH (Japanese and Chinese), and YRI (African). Gene expression levels are for lymphoblastoid cell lines established from individuals in HapMap populations. Redrawn and reprinted by permission from Macmillan Publishers Ltd: Nature Genetics (McCarroll et al. 2006), copyright 2006.

Box 4.3 Reference DNA and copy number variants

To identify structural variation requires comparison with a reference DNA source, dataset, or genomic sequence. In 2007, the human genome reference assembly release from the US National Center for Biotechnology Information Build 36 was a mosaic of 708 different sources (Feuk et al. 2006a) with 302 known gaps and incomplete coverage: this may confound comparative analysis and, for unannotated segments, lead to structural variation being missed (Scherer et al. 2007). Moreover, the use of multiple different DNAs or pools of DNA as reference controls significantly complicates the analysis of copy number differences and database standardization; adoption of a standardized reference control DNA would be of significant benefit (Scherer et al. 2007).

terminology, complete reporting of sample descriptions and experimental methodologies, quality control, and annotation of structural variants (Scherer et al. 2007). Only recently has the terminology for describing structural variants become more uniform (Redon et al. 2006) but differences in terminology still underlie much of the heterogeneity currently observed between studies of structural variation. The different technologies currently in use differ significantly in sensitivity and specificity, and the extent of smaller copy number variants remains largely unknown because of the limitations of current detection methods. Moreover, lack of resolution means the boundaries or breakpoints of variants remain poorly resolved.

4.2.5 Extent of copy number variation

With these caveats in mind, the number of copy number variants reported in specific surveys (see Fig. 4.1) ranges from 76 (Sebat et al. 2004) to 3654 (Wong et al. 2007), with the latter thought to have a high proportion of false positives. In total 17 641 copy number variants are recorded at the Database of Genomic Variants (http://; date of access September 2008), which combined data from 49 studies. Using this database, Scherer and coauthors noted that structural variation covered 18.8% of the euchromatic genome (538 Mb), based on analysis of less than 1000 genomes of people without a known disease phenotype (Scherer et al. 2007). Copy number variants are found genome-wide but show evidence of clustering in pericentromeric and subtelomeric regions - regions known to be rich in segmental duplications (Section 6.2.3). As well as segmental duplications, copy number variants show strong correlations with exons and mobile DNA elements such as Alu repeats (Section 8.4). At present there is considerable difficulty in defining the precise locations and specific DNA sequences of copy number variants within copy number variable regions (the 'variant breakpoints') as the resolution of current techniques is generally poor and based on, for example, the coordinates of the BAC probe.

In order to 'complete the map of human genetic variation', a project has been launched from the National Human Genome Research Institute by the Human Genome Structural Variation Working Group which aims to sequence large insert clones from many different phenotypically normal individuals of African, European, and Asian ancestry so as to systematically identify and resolve structural variants (Eichler et al. 2007). The increased capacity of new sequencing technologies such as the 454 system employing highly parallel array-based pyrosequencing will facilitate such an effort (Box 1.24). Korbel and colleagues published work in 2007 showing how structural variation down to a 3 kb resolution could be defined by paired-end mapping using the 454 sequencing system for two individuals, a female thought to be of European descent previously analysed by Tuzun and colleagues (NA15510) and a female from the Yoruba population in Nigeria (NA18505) (Korbel et al. 2007). This work, which involved sequencing more than 10 million and 21 million paired ends for the two individuals, respectively, fine mapped 853 deletions, 322 insertions, and 122 inversions. Based on full genomic coverage, 761 structural variants of more than 3 kb in size were predicted for NA15510 compared to the reference genome, and 887 for NA18505. Overall, 45% of structural variants were shared between the two individuals. In terms of alleles, 23% and 15-20% of structural variants were homozygous for NA15510 and NA18505, respectively. The authors noted that two genomic regions known to be associated with genomic disorders were hotspots for structural variation with 13 structural variants in an 8 Mb region at 22q11.2 and 29 in an 18 Mb region at 7q11 (Korbel et al. 2007).

4.2.6 Segmental duplications and identifying copy number variation

The presence of highly homologous, large segmental duplications flanking a region predisposes it to recurrent chromosomal rearrangements and genomic disease (Section 5.2.1). Sharp and colleagues investigated the role such duplications might play in generating structural variation within normal populations by designing a CGH array targeted at regions flanked by highly homologous intrachromosomal duplications (Sharp et al. 2005). One hundred and thirty such regions were identified, which the authors described as 'potential hotspots of recombination' or 'regions of potential genomic instability'. The authors analysed 47 ethnically diverse, phenotyp-ically normal individuals and found 119 regions of copy number polymorphism, 73 of which were previously unknown. Copy number polymorphisms were found in 39% of hotspots, a four-fold enrichment in regions flanked by or containing large highly homologous segmental duplications. In terms of copy number polymorphism, equal numbers of deletions and duplications were found.

In a follow-up study of larger numbers of normal individuals, Locke and colleagues analysed 269 individuals from the International HapMap Project compared to a single reference individual using their targeted CGH array. This again showed a clear association with copy number variation, 84 out of 130 hotspot regions showed copy number differences (Locke et al. 2006). The presence of parent-child trios within the individuals analysed also allowed heritability to be assessed with a mendelian pattern of inheritance for copy number polymorphisms observed. These studies provided important further evidence to support a key role for segmental duplications in mediating chromosomal rearrangements, leading to both genomic disease and structural variation within normal human populations. The approach also proved an efficient way of identifying new likely pathogenic microde-letions and duplications among individuals with mental retardation (Section 5.4).

4.2.7 Structural versus nucleotide diversity

Surveys of structural variation have highlighted how for any two individuals in a population there is greater difference at the structural level than at the level of nucleotide diversity (Sebat 2007). Copy number variation between individuals has been conservatively estimated at 4 Mb of genetic difference in comparison with 2.5 Mb for SNPs; the contribution of copy number variation is likely to be significantly higher given our limited ability to detect small copy number variants with available technologies. The total genomic variability between people has been estimated at a difference of at least 0.2%, with more than

0.12% at a structural level and 0.08% at the nucleotide level (Sebat 2007). Structural variation also contributes significantly more to genetic diversity between species than single nucleotide substitutions (Cooper et al. 2007). Analysis of sequence divergence between humans and chimpanzees (Pan troglodytes) is ~1.2% from 35 million fixed single nucleotide substitutions; this increases to ~5% when approximately 5 million structural variants involving gain or loss of DNA between species are considered (Fig. 4.8) (Cheng et al. 2005; Feuk et al. 2005; Newman et al. 2005).

4.3 Copy number variation and gene expression

Copy number variation may modulate levels of gene expression through effects on gene dosage in which there is loss or gain of functional gene copies (McCarroll et al. 2006), or through disruption of the gene or noncoding DNA sequences involved in control of gene expression (Kleinjan and van Heyningen 2005; Lee et al. 2006). The latter proved surprisingly common when a genome-wide analysis of the association between copy number variation and gene expression was published in 2007 (Stranger et al. 2007a). Stranger and colleagues analysed gene expression in resting lymphoblastoid cell lines established from 210 unrelated individuals in the International HapMap Project (Section 9.2.4) from four populations. They were able to analyse the association between expression levels of 14 925 transcripts from 14 072 genes and copy number variation; for this collection of cell lines, data on



Number of events



1 bp

Was this article helpful?

0 0

Post a comment