Figure 4.5 Examples of population differentiation in copy number variation indicating positive selection. Frequency histograms for individuals in different HapMap populations are shown for two examples, encompassing CCL3L1 and a region near to MAPT. Redrawn and reprinted by permission from Macmillan Publishers Ltd: Nature (Redon et al. 2006), copyright 2006.
McCarroll and colleagues describe how 'footprints' in the contiguous SNP data allow for identification of deletions: a deletion is likely if a cluster of SNPs are seen that deviate from the expected mendelian inheritance within a family trio; or there are a cluster of SNPs showing apparent deviation from Hardy-Weinberg equilibrium; or there are a cluster of null genotypes (Fig. 4.6) (McCarroll et al. 2006). There is a high risk of false positives in such an approach due to artefact or genotyping errors, but by looking for clusters and using appropriate statistical thresholds, the authors were able to analyse 1.3 million SNPs among 269 individuals of four ethnic groups from the Phase I HapMap (Box 9.1). This led to the identification of 541 candidate deletion variants with a median size of 7 kb (range 1-745 kb), of these 278 occurred in multiple, unrelated individuals.
McCarroll and colleagues validated 90 predicted variants using a range of FISH and PCR-based approaches. The approach allowed the identification of ten genes relatively commonly deleted among the populations studied, although the frequencies of the deleted alleles varied considerably between populations (Fig. 4.7) and present in some cases as homozygous nulls. The observed variation in gene expression for GSTT! and GSTM1, encoding glutathione S-transferases involved in detoxification by conjugation with glutathione (Section 4.4.3), and UGT2B17, encoding uridine diphosphate glucuronosyl-transferases involved in steroid hormone metabolism, was largely explained by the differences in gene dosage between individuals (Fig. 4.7).
Other investigators have also mined the dense genotyping data from HapMap to identify deletions. Conrad and colleagues looked for mendelian inconsistencies among trios to identify deletions transmitted to the child and performed extensive validation using a custom tiling path array for comparative genome hybridization to find a false discovery rate of 14% (Conrad et al. 2006). A total of 345 predicted deletions were found among 30 individuals of European descent (CEU) with a median size of 10.6 kb (range 0.3-404 kb), while those 30 individuals of African origin (YRI) had 590 predicted deletions (median 8.5 kb, range 0.5-1200 kb) consistent with greater genetic diversity in African populations. The deletions included 267 known or predicted genes of which 92 were completely deleted. Overall, the authors suggested a typical individual was hemizygous for between 30 and 50
deletions of greater than 5 kb in size (lower and upper range derived from European and African HapMap populations, respectively) (Conrad et al. 2006).
Finally, Hinds and colleagues used an array CGH approach to analyse 24 unrelated individuals and find evidence of deletions based on comparison with the reference human genome (Hinds et al. 2006). The investigators screened 100-200 Mb of DNA and found 215 deletions ranging in size from 70 bp to 7 kb (median size of 0.75 kb); 100 of these deletions were validated (Hinds et al. 2006).
Combining the three studies by McCarroll, Conrad, and Hinds to screen for deletions among apparently normal individuals, 1000 deletions were identified. There was surprisingly little overlap between the studies in term of the deletions identified, illustrating differences in approach and the likely large number of predominantly smaller deletions still to be identified (Eichler 2006). Greater diversity in deletions was noted for those of African descent, deletions found were smaller than perhaps expected, and they were underrepresented on the X chromosome and for coding exons. In comparison with SNPs, more extreme selective pressure was noted for deletions with a greater number of rare deletions in comparison with SNPs (Eichler 2006).
Between 2004 and 2007, at least 12 surveys were published investigating the extent of human genomic structural variation among individuals with apparently normal phenotypes (summarized in Fig. 4.1). These studies were reviewed by Scherer and colleagues (2007), who noted the significant heterogeneity in many aspects of these and other studies of structural variation. These included the number of genomes studied, the tissue sources of DNA analysed, the different reference samples used for genome comparisons (Box 4.3), the range of technologies and approaches used for discovery (having very different resolutions and abilities to detect variants of particular sizes), the experimental quality controls used, and the extent to which putative structural variants were validated.
The difficulties of integrating current surveys of structural variation lead Scherer and colleagues to call for improved standardization in the field with respect to
Clusters of mendelian inconsistency
(due to hemizygous genotypes being miscalled as homozygous)
Was this article helpful?