Other 169 607

Figure 6.14 Insertion/deletion (indel) polymorphisms in the human genome. Data mining of resequencing traces from three diverse human populations involving 36 subjects revealed 415 436 non-redundant indels (Mills et al. 2006). Figure prepared using data from Mills et al. (2006), with permission from Cold Spring Harbor Laboratory Press.

Figure 6.14 Insertion/deletion (indel) polymorphisms in the human genome. Data mining of resequencing traces from three diverse human populations involving 36 subjects revealed 415 436 non-redundant indels (Mills et al. 2006). Figure prepared using data from Mills et al. (2006), with permission from Cold Spring Harbor Laboratory Press.

330 candidate genes in 24 individuals of African descent and 23 individuals of European descent revealed 33 829 nucleotide substitutions and 2393 indels. On average, seven indels were found per gene, occurring once every 2714 bases and ranging in size from 1 to 543 bp in length; 46% of indels involved a single base pair and 84% were less than 5 bp. The allele frequencies and patterns of linkage disequilibrium for indels were noted by the authors to be similar to those of the nucleotide substitutions. In a later study, Bhangale and coworkers were able to apply their algorithm for automated detection and genotyping of indels from sequencing traces to regions of the genome functionally characterized in the ENCODE (ENCyclopedia Of DNA Elements) Project (Section 9.2.4) (Bhangale et al. 2006).

Mills and colleagues have recently produced a first map of indel variation across the human genome by using a computational strategy to mine DNA resequencing traces (Mills et al. 2006). These traces were generated by SNP discovery projects involving 36 diverse individuals across three diverse human population groups. Their map contains 415 436 unique indels (Fig. 6.14). About half could be mapped onto the chimp genome sequence to provide a reference ancestral sequence: based on this, 47% were insertions and 53% deletions. The indels ranged in size from 1 to 9989 bp in length. Approximately one-third of the indels involved a single base pair deletion or insertion, of these 84% were A:T or T:A. A further third of the indels were either monomeric base pair expansions or multibase repeat expansions involving 2-15 bp repeat units.

Such repeats have been of great utility as genetic markers, for example (CA)n repeat expansions, as reviewed in Chapter 7; they may also be associated with significant phenotypes as seen with the trinucleotide (CGG)n repeat expansion of more than 200 repeats at the FMR! gene associated with fragile X mental retardation syndrome (Box 7.8) (Penagarikano et al. 2007). The remaining indels contained either random DNA sequence or, in a very small minority, transposon insertions.

The dataset from Mills and colleagues indicated an indel occurring on average once every 7.2 kb. Some gen-omic regions appeared to be 'hotspots' of diversity with a much higher frequency of indels (up to 24-fold); increased SNP diversity was found at the same sites. Overall, it was estimated that indels accounted for 15.6% of discovered polymorphisms and that human populations were likely to have approximately 1.5 million indels.

6.6.3 Functional consequences of indels

When indels occur in coding DNA, a change in the amino acid sequence will always occur, in contrast to single nucleotide substitutions which can be synonymous. The change may be to insert or delete an amino acid, or cause a frameshift and loss of protein function. Over the course of this book many examples of such events are described, including in relation to the globin genes (Section 1.3.5) and the delta-F508 mutation (p.F508del) in cystic fibrosis (Section 2.3.1). Other examples include an ACCC heterozygous deletion (c.989_992delACCC) in the PAX8 gene on chromosome 2q12-q14, which results in a frame-shift and premature stop codon, truncating the encoded protein and rendering it transcriptionally inactive. PAX8 is an important transcription factor involved in thyroid cell proliferation and development and this deletion mutation is associated with thyroid dysfunction (de Sanctis et al. 2004). A 27 bp deletion in the GPIBA gene (encoding the platelet glycoprotein 1b receptor for von Willebrand factor) on chromosome 17 has been associated with a severe bleeding disorder, platelet-type von Willebrand's disease (Othman et al. 2005). As noted previously, there is evidence of strong selective pressures acting on indels occurring in coding DNA over time: since the divergence of humans and chimpanzees, occurrence of indels in coding compared to intergenic or intronic DNA has been found to be highly suppressed (Chen et al. 2007; de la Chaux et al. 2007; Messer and Arndt 2007).

When indels occur in the promoter and other regions of DNA important to regulation of gene expression, the consequences can also be marked. For example, a dramatic effect of a single base pair indel is seen at the MMP1 (encoding matrix metalloproteinase 1) gene promoter: the presence of a G insertion (AAGAT to AAGGAT) 1607 nt upstream of the transcriptional start site created an erythroblast transformation-specific (Ets) transcription factor binding site (GGA) and was associated with increased transcriptional activity (Rutter et al. 1998). The '2G' indel was present at a high frequency in Caucasian individuals (allele frequency 0.5) and even higher among cancer cell lines where a copy of the 2G allele was present in seven out of eight lines tested. A number of studies have since demonstrated highly significant associations with cancer, notably ovarian (Kanamori et al. 1999), lung (Zhu et al. 2001), and colorectal cancer (Ghilardi et al. 2001; Hinoda et al. 2002).

In the promoter region of the NFKB1 gene on chromosome 4q24, which encodes the NFkB transcription factor protein p105/p50 isoforms, an ATGG indel was found 94 nt from the transcriptional start site (Karban et al. 2004). This polymorphism modulated protein-DNA interactions and transcriptional activity in a reporter gene system and showed association with risk of the inflammatory bowel disease ulcerative colitis (Karban et al. 2004) and sporadic cancers (Lewander et al. 2007); other investigators showed no association with susceptibility to ulcerative colitis (Mirza et al. 2005; Oliver et al. 2005). Another example of indels modulating gene expression includes a 6 bp deletion in CASP8, which has been associated with loss of a binding site for the transcription factor stimulatory protein 1 (Sp1) and cancer risk (Sun et al. 2007), although this remains controversial (Frank et al. 2007; Haiman et al. 2008).

6.7 Summary

Deletion and duplication events have led to much of the diversity we see today in our genomic landscape and are thought to have been a major force in enabling evolutionary change. Segmental duplications are common, comprising an estimated 5% of the human genome, with an excess noted on particular chromosomes such as chromosomes 15 and 22, and within specific chromosomal regions, notably subtelomeric and pericentromeric regions (Bailey

0 0

Post a comment