Info

Scherer and colleagues define copy number variation as DNA segments greater than 1 kb in size in which a comparison of two or more genomes reveals gains (by insertion or duplication) or losses (by deletions or null genotypes) of genomic copy

number relative to a designated reference genome sequence; copy number polymorphism is present when such variation is present in more than 1% of the reference or general population (Scherer et al. 2007).

technological advances to detect submicroscopic structural variation at a genome-wide level. These include microarray-based comparative genomic hybridization (array CGH) (Box 4.2) using oligonucleotides (Sebat et al. 2004; Hinds et al. 2005) and bacterial artificial chromosome (BAC) clones (lafrate et al. 2004; Sharp et al. 2005; Redon et al. 2006; Wong et al. 2007), comparing clone paired-end sequence to the reference human assembly sequence (Tuzun et al. 2005) or human genome assemblies (Khaja et al. 2006) and detection of deletions and duplications based on single nucleotide polymorphism (SNP) mapping (Conrad et al. 2006; McCarroll et al. 2006) (Fig. 4.1). These techniques have bridged the gap between classical cytogenetic techniques using microscopy to detect structural variation greater than 3-15 Mb in size (depending on the banding pattern of the chromosome), and molecular approaches for the detection of small scale variation including targeted fluorescence in situ hybridization (FISH) (25-300 kb), multiplex ligation-dependent probe amplification (MLPA), quantitative polymerase chain reaction (PCR), and DNA sequencing (1-700 bp) that can detect down to the resolution of single nucleotide changes.

4.2.1 Copy number variation is common within normal populations

In 2004 two landmark papers were published which demonstrated that among apparently phenotypically normal individuals there was a much higher level of large scale copy number variants than previously appreciated (lafrate et al. 2004; Sebat et al. 2004). Both research teams used microarray technology to hybridize DNA from panels of healthy people and identify copy number variation based

Box 4.2 Using DNA microarrays to analyse copy number variation

Array CGH has proved a powerful method of identifying copy number variation (reviewed in Carter 2007). Conventional metaphase CGH has its origins in tumour biology, in particular analysis of metaphase chromosomes (a stage in mitosis when chromosomes are condensed and centrally aligned prior to separation into two daughter cells) (Kallioniemi et al. 1992). Test and reference DNA samples were labelled with different fluorochromes and analysed by FISH. The relative intensities of fluorescence for the two fluorochromes analysed along the chromosomes allowed the detection of regions of gain or loss but afforded only low resolution, down to 5-10 Mb and often less, at telomeres. Arrays spotted with large insert clone DNAs originally prepared for sequencing as part of the Human Genome Project were subsequently used and provided much improved resolution, with test and reference DNAs hybridized together to the array after fluorescent labelling each with different dyes (Pinkel et al. 1998). Iafrate and colleagues, for example, used one BAC clone every 1 Mb (Iafrate et al. 2004). Using BACs provides genome-wide coverage and low noise but the resolution is still limited (maximum theoretical resolution is the size of one BAC, ~100-300 kb, but ~2-3 Mb in practice); this can be improved with fosmid and cosmid clones and cDNA clones, and PCR products have also been used. However, to date the highest resolutions are achieved with oligonucleotide arrays, theoretically as low as 5 kb using highest density arrays containing for example 2 million probes (Nimblegen HD2) (Carter 2007). There are limitations of oligo arrays in terms of signal to noise ratios and coverage in repeat regions rich in low copy repeats (LCRs) and segmental duplications. Genotyping (SNP) arrays have also been used successfully, based on hybridizing a single DNA sample and determining relative copy number variation compared with a reference set of individuals (Redon et al. 2006). The challenge remains how to identify copy number variation in the range 500 bp to 5 kb and this may be facilitated by lower cost high throughput sequencing by synthesis on arrays, for example the Roche-Nimblegen 454 and Illumina (Solexa) systems (Bentley 2006).

Reference

Coverage

Analysis

Number of individuals

Number of events or regions

Median size (bp)

Total bp

Mills 2006

16 million whole genome shotgun traces

Alignment of sequence traces from SNP Consortium resequencing

Was this article helpful?

0 0

Post a comment