Contents

Preface xvii

Acknowledgements xix

1 Lessons from haemoglobin 1

1.1 Introduction 1

Box 1.1 Haemoglobinopathies 1

1.2 Genetic variation and a molecular basis for disease 3

1.2.1 A difference at the protein level between haemoglobin molecules 3 Box 1.2 Sickle cell disease 3 Box 1.3 Genotype and phenotype 3 Box 1.4 Chromosomes 4 Box 1.5 Genes 5 Box 1.6 An amino acid difference responsible for Hb S 6

1.2.2 Mendelian inheritance, alleles and traits 6 Box 1.7 Phenotype of sickle cell disease 7 Box 1.8 Alleles 8

1.2.3 Sequencing the HBB gene and defining the variant responsible for Hb S 9 Box 1.9 DNA structure 10 Box 1.10 DNA sequencing 12 Box 1.11 Transcription 14 Box 1.12 Translation 14 Box 1.13 Nomenclature to describe sequence variants 19 Box 1.14 Mutation and polymorphism 24

1.2.4 Methods of detecting the Hb S DNA sequence variant 24

1.3 Genetic diversity involving the globin genes 27

1.3.1 Structural variants of haemoglobin and the thalassaemias 27

1.3.2 HBB sequence diversity and sickle cell disease 30 Box 1.15 Alpha thalassemia 31 Box 1.16 Beta thalassemia 31

1.3.3 Transitions versus transversions 32 Box 1.17 CpG dinucleotides and C to T transitions 32

1.3.4 Synonymous versus nonsynonymous changes 33 Box 1.18 Codon position and degeneracy 33

1.3.5 Insertions or deletions may result in frameshift events 35

1.3.6 Deletions, duplications, and copy number variation 36 Box 1.19 Copy number variation 37

1.3.7 Gene fusion 37

1.3.8 Sequence variation, RNA splicing, and RNA processing 38 Box 1.20 Splicing 38

1.3.9 Sequence diversity in noncoding DNA modulating gene expression 40

1.3.10 Tandem repeats 43 Box 1.21 Tandemly repeated DNA 43

1.3.11 Mobile DNA elements and chromosomal rearrangements 43

1.3.12 Monosomy and trisomy of the terminal end of chromosome 16p 44 Box 1.22 Translocations 44 Box 1.23 Alpha thalassaemia and mental retardation 45

1.4 Diversity across the genome 46

1.4.1 Classifying genetic variation 46

1.4.2 Sequencing the human genome 48

1.4.3 Repetitive DNA sequences are common 50

1.4.4 Whose genome was sequenced? 50

1.4.5 Resequencing diploid human genomes 50 Box 1.24 Next generation sequencing 51

1.5 Summary 52

1.6 Reviews 52

2 Finding genes and specific genetic variants responsible for disease 53

2.1 Introduction 53

2.2 Linkage analysis 53

2.2.1 Defining linkage 53 Box 2.1 Meiosis 54 Box 2.2 Homologous recombination (crossing over) 54

2.2.2 Genetic markers 54

2.3 Application of linkage analysis and positional cloning to mendelian diseases 56

Box 2.3 Recombination fraction and genetic distance 58

Box 2.4 Lod score 58

Box 2.5 Positional cloning 58

2.3.1 Cystic fibrosis and the delta-F508 mutation 59 Box 2.6 Cystic fibrosis 59

2.3.2 Treacher Collins syndrome 60 Box 2.7 Treacher Collins-Franceschetti syndrome 60

2.3.3 Linkage disequilibrium mapping and mendelian disease 61 Box 2.8 Linkage disequilibrium and haplotypes 62 Box 2.9 Diastrophic dysplasia 64

2.3.4 Allelic and genetic heterogeneity in mendelian diseases 64

2.3.5 Linkage analysis and common disease 66

2.4 Genetic association studies and common disease 66

2.4.1 A small number of robustly demonstrated associations? 67

2.4.2 Study design and statistical power 67

2.4.3 Genetic admixture and association with disease 70

2.4.4 TNF and candidate gene association studies 72

2.5 Alzheimer's disease 72

Box 2.10 Alzheimer's disease 73

2.5.1 Early onset familial Alzheimer's disease: rare variants underlying a mendelian trait 74

2.5.2 APOE e4 and late onset Alzheimer's disease 77

2.6 Common and rare genetic variants associated with venous thrombosis 80

Box 2.11 Thrombophilia and venous thrombosis 80

2.6.1 Factor V Leiden 81

2.6.2 Genetic diversity and thrombophilia: insights and applications 82

2.7 Summary 83

2.8 Reviews 84

3 Cytogenetics and large scale structural genomic variation 85

3.1 Introduction 85

3.2 A historical perspective on cytogenetics 85

Box 3.1 Electronic resources and databases of human structural genomic variation 86

3.3 Chromosomal diversity involving gain or loss of complete chromosomes 89

3.3.1 Constitutional and somatic variation in chromosome number 89 Box 3.2 Aneuploidy and polyploidy 89

3.3.2 Chromosomal abnormalities and development 90

3.3.3 Polyploidy 90

3.3.4 Trisomy 91

3.3.5 Monosomy 91 Box 3.3 Down syndrome 92 Box 3.4 Klinefelter syndrome and sex chromosome aneuploidy 93

3.4 Translocations 93

3.4.1 Reciprocal translocations 93 Box 3.5 Turner syndrome 94

3.4.2 Robertsonian translocations 94 Box 3.6 Palindromic AT-rich repeats and recurrent reciprocal translocation 97

Box 3.7 Duchenne muscular dystrophy 97

3.5 Chromosomal rearrangements 98

3.5.1 Large scale structural variation resulting from intrachromosomal rearrangements 98

3.5.2 Genomic disorders 99

3.5.3 Marker chromosomes 99

3.5.4 Isochromosomes 99

3.6 Summary 101

Box 3.8 Cat eye syndrome 102

3.7 Reviews 103

4 Copy number variation in health and susceptibility to disease 105

4.1 Introduction 105

4.2 Surveys of copy number variation 105

Box 4.1 Copy number variation and polymorphism 105

Box 4.2 Using DNA microarrays to analyse copy number variation 106

4.2.1 Copy number variation is common within normal populations 106

4.2.2 Towards a global map of copy number variation 108

4.2.3 Finding deletions across the genome within normal human populations 112

4.2.4 Integrating surveys of structural variation 113 Box 4.3 Reference DNA and copy number variants 115

4.2.5 Extent of copy number variation 116

4.2.6 Segmental duplications and identifying copy number variation 116

4.2.7 Structural versus nucleotide diversity 117

4.3 Copy number variation and gene expression 117

4.4 Copy number variation, diet, and drug metabolism 118

4.4.1 Duplication of the salivary amylase gene and high starch diet 118

4.4.2 Copy number variation and drug metabolism: role of CYP2D6 119 Box 4.4 CYP genes encode cytochrome P450 enzymes 119 Box 4.5 Consequences of CYP2D6 duplication: a case report 120

4.4.3 Whole gene deletions of glutathione S-transferase enzymes, catalytic activity, and cancer risk 121

4.5 Copy number variation and susceptibility to common multifactorial disease 121

4.5.1 Psoriasis risk and p defensin gene copy number 121 Box 4.6 Psoriasis 121

4.5.2 Copy number variation of FCGR3B and susceptibility to autoimmune disease 122

4.5.3 CCL3L1, HIV, and autoimmunity 122

4.5.4 Copy number and complement genes 123

4.6 Summary 123

4.7 Reviews 124

5 Submicroscopic structural variation and genomic disorders 125

5.1 Introduction 125

5.2 Genomic disorders 125

5.2.1 Segmental duplications and genomic disorders 125 Box 5.1 Genomic disorders 125

5.2.2 Recurrent rearrangements involving chromosome 22q11 126

Box 5.2 DiGeorge syndrome and velocardiofacial syndrome 128

Box 5.3 Williams-Beuren syndrome 128

5.2.3 Reciprocal genomic disorders 128 Box 5.4 Charcot-Marie-Tooth disease 130

5.2.4 Non-recurrent genomic disorders 131 Box 5.5 Parkinson's disease 131

5.2.5 Genomic disorders and control of gene expression 132

5.2.6 Genomic disorders showing parent of origin effects 132

5.3 Terminal deletions and subtelomeric disease 132

Box 5.6 Prader-Willi syndrome 133

Box 5.7 Angelman syndrome 133

Box 5.8 Terminal deletion of chromosome 1p36 syndrome 135

Box 5.9 Cri du chat syndrome 135

5.4 Pathogenic copy number variation, mental retardation, and autism 136

5.4.1 Subtelomeric rearrangements and idiopathic mental retardation 136

5.4.2 Copy number variation among cases of mental retardation 136 Box 5.10 De novo deletion at 17q21.31 and mental retardation 137 Box 5.11 Autism spectrum disorders 138

5.4.3 De novo copy number mutations and autism 138

5.5 Inversions in health and disease 139

5.5.1 Inversions may cause severe disease 139 Box 5.12 Haemophilia A 139

5.5.2 Inversion and deletion at 17q21.31 with evidence of selection 140

5.5.3 Finding inversions across the human genome 140

5.6 Summary 141

5.7 Reviews 141

6 Segmental duplications and indel polymorphisms 143

6.1 Introduction 143

6.2 Nature and extent of segmental duplications 143

6.2.1 Segmental duplications are common in the human genome 143 Box 6.1 Terminology relating to duplication events 143

6.2.2 Pericentromeric and subtelomeric regions are hotspots for segmental duplications 144

6.2.3 Non-allelic homologous recombination, segmental duplications and genomic disorders 147

6.2.4 Alu elements and segmental duplications 147

6.2.5 Segmental duplications in primates and other species 147

6.3 Duplication and evolution 148

6.3.1 Whole genome duplications 148

6.3.2 Gene creation 148 Box 6.2 Juxtaposition of segmental duplications leading to new genes 149

6.3.3 Duplication rates over evolutionary timescales 150

6.3.4 Evolutionary fate of duplicated genes 150

Box 6.3 Pseudogenes 151

6.4 Gene duplication and multigene families 151

6.4.1 Olfactory receptor and globin supergene families 151

6.4.2 Models for the evolution of multigene families 152 Box 6.4 Ribosomal RNA 154 Box 6.5 Gene conversion 154 Box 6.6 CASP12 gene duplication and selective advantage of an inactive pseudogene 155

6.4.3 Immunoglobulin gene families 155

6.5 Segmental duplication, deletion, and gene conversion 156

6.5.1 Lessons from the study of the genetics of colour vision 156 Box 6.7 Trichromatic colour vision 158 Box 6.8 Red-green colour vision defects 158 Box 6.9 Rhesus blood group and disease 160

6.5.2 Rhesus blood groups: genetic diversity involving duplication and deletion 161

6.6 Insertion/deletion polymorphisms: 'indels' 161

Box 6.10 Indels 162

6.6.1 Human-specific indels and selection 163

6.6.2 Mapping the extent of indel polymorphism 163

6.6.3 Functional consequences of indels 165

6.7 Summary 165

6.8 Reviews 166

7 Tandem repeats 167

7.1 Introduction 167

7.2 Satellite DNA 167

Box 7.1 Satellites, minisatellites, and microsatellites 168

7.2.1 A functional role for satellite DNA? 169

7.2.2 Satellite repeats and disease 169 Box 7.2 Facioscapulohumeral muscular dystrophy 169

7.3 Minisatellite DNA 170

7.3.1 Polymorphic minisatellites 170 Box 7.3 Telomeres and tandem repeats 170

7.3.2 Functional effects of minisatellites 171

7.3.3 Minisatellites and disease: examples from epilepsy and diabetes 171 Box 7.4 Progressive myoclonic epilepsy 172 Box 7.5 INS variable number tandem repeat 172

7.4 Genetic profiling using mini- and microsatellites 173

7.4.1 DNA fingerprinting 173

7.4.2 Genetic profiling using a panel of short tandem repeats 174

7.5 Microsatellite DNA 174 7.5.1 Short tandemly repeated DNA sequences are common and polymorphic 174

7.5.2 Classification and occurrence of microsatellites 174

7.5.3 Generation and loss of microsatellite DNA 176 Box 7.6 Genome-wide survey of human microsatellites 177

7.5.4 Utility of microsatellite markers 179

7.5.5 Functional consequences of microsatellites 180 Box 7.7 Hereditary non-polyposis colorectal cancer and microsatellite instability 181

7.6 Unstable repeats and neurological disease 182

7.6.1 Trinucleotide repeat expansion and loss of function: lessons from fragile X and Friedreich's ataxia 184

Box 7.8 Fragile X syndrome 184

Box 7.9 Fragile X tremor/ataxia syndrome 186

Box 7.10 Friedreich's ataxia 187

7.6.2 Polyglutamine disorders 187 Box 7.11 Spinocerebellar ataxia 188 Box 7.12 Spinal and bulbar muscular atrophy (Kennedy's disease) 189 Box 7.13 Huntington's disease 189 Box 7.14 Predictive testing for Huntington's disease 190

7.6.3 Disease resulting from RNA-mediated gain of function:

myotonic dystrophy 190

Box 7.15 Myotonic dystrophy 191

7.7 Summary 191

7.8 Reviews 193

8 Mobile DNA elements 195

8.1 Introduction 195

Box 8.1 Mobile DNA elements 195

8.2 DNA transposons: a fossil record in the genome 198

Box 8.2 Transposable elements and exaptation 199

8.3 L1 retrotransposable elements 199

Box 8.3 SETMAR and gene fusion 201

8.4 Alu elements: parasites of Lis 202

8.4.1 Extent and diversity of Alu elements 202

8.4.2 Consequences of Alu insertions 202

8.5 Mobile DNA elements and human population genetics 204

Box 8.4 Diversity in recent Alu and L1 insertions 204

8.5.1 Genetic diversity and human origins 204

Box 8.5 Recent African Origins hypothesis 206

8.6 Summary 207

8.7 Reviews 209

9 SNPs, HapMap, and common disease 211

9.1 Introduction 211

9.2 SNPs, association, and genetic susceptibility to common disease 211

9.2.1 Strategic approaches 212

9.2.2 Surveying SNP diversity: lessons from the SNP Consortium and Human Genome Project 213

9.2.3 Haplotype blocks and haplotype tagging SNPs 214

9.2.4 The International HapMap Project 215 Box 9.1 Phase I HapMap populations 217

9.2.5 Large scale SNP mapping and insights into recombination 221 Box 9.2 SNP-related databases 222

9.3 Genome-wide association studies 224

9.3.1 Insights into the design, analysis, and interpretation of genome-wide association studies 226

9.3.2 The Wellcome Trust Case Control Consortium study of seven common diseases 230

9.3.3 FTO and common obesity traits 234

9.4 Age-related macular degeneration 234

Box 9.3 Age-related macular degeneration 236

9.4.1 Genome-wide association, linkage, and complement factor H gene 237

9.4.2 Disease association at the 10q26 susceptibility locus 238

9.5 Lessons from inflammatory bowel disease 240

9.5.1 A role for inherited factors in inflammatory bowel disease 240 Box 9.4 Inflammatory bowel disease 241

9.5.2 Crohn's disease and variants of the NOD2 gene 241 Box 9.5 NOD2 and genome-wide association studies 245

9.5.3 Linkage studies and other inflammatory bowel disease susceptibility loci 245

9.5.4 Genome-wide association studies and inflammatory bowel disease 248

9.5.5 Genetic diversity in IL23R pathway genes and inflammatory bowel disease 248

9.5.6 Autophagy and Crohn's disease 252 Box 9.6 Autophagy 253

9.5.7 Insights from the Wellcome Trust Case Control Consortium study 253

9.5.8 Gene deserts and other loci 254

9.6 Summary 255

9.7 Reviews 256

10 Fine scale sequence diversity and signatures of selection 257

10.1 Introduction 257

10.2 Genetic diversity and evidence of selection 257

10.2.1 Hitch-hiking and selective sweeps 258

10.2.2 Extended haplotypes of high frequency 258

10.2.3 Differences in allele frequency 258

10.2.4 Comparisons between species 260

10.3 Evidence for selection at a nucleotide level from sequencing the chimpanzee and macaque genomes 261

10.3.1 Diversity between the human and chimpanzee genomes 261

Box 10.1 Sequencing the chimpanzee genome 262 10.3.2 Sequencing the rhesus macaque provides new insights into genetic diversity and selection 264

Box 10.2 Sequencing the macaque genome 264

10.4 Lactase persistence 265

Box 10.3 Lactase persistence, adult-type hypolactasia, and congenital lactase deficiency 265

10.4.1 Genetic diversity and lactase persistence in European populations 266

10.4.2 Different alleles show association among African pastoralists 269

10.4.3 Lactase persistence in Middle Eastern populations 269

10.4.4 Diversity within an enhancer region modulating LCT expression 270

10.5 Human pigmentation, diversity at SLC24A5, and insights from zebrafish 271

10.5.1 Genetics of pigmentation 271

10.5.2 Golden zebrafish mutants led to the identification of SLC24A5 271

10.5.3 Variation at SLC24A5, skin pigmentation, and evidence of selection 272

10.6 Genome-wide analyses 272

10.7 Summary 275

10.8 Reviews 276

11 Genetics of gene expression 277

11.1 Introduction 277

11.2 Variation in gene expression is common and heritable 278

Box 11.1 Lymphoblastoid cell lines 279

11.3 Mapping the genetic basis of variation in gene expression 279

11.3.1 Genetical genomics 279

11.3.2 Local and distant regulatory variation 280 Box 11.2 Quantitative trait loci 281

11.3.3 Genetical genomics and model organisms 281

11.4 Mapping genetic variation and gene expression in human populations 285

11.4.1 Insights from lymphoblastoid cell lines 285

11.4.2 Insights into genetic susceptibility to asthma 288 Box 11.3 Asthma 289

11.4.3 Genetics of gene expression in primary human cells and tissues 291 Box 11.4 Genetic determinants of HDL cholesterol 293

11.5 Allele-specific gene expression 294

11.5.1 Allele-specific gene expression among autosomal non-imprinted genes 294

Box 11.5 Genomic imprinting 295

Box 11.6 Quantification of allele-specific gene expression 296

11.5.2 Large scale analysis of allele-specific gene expression 298

11.5.3 Allele-specific expression based on RNA polymerase loading 299 Box 11.7 Chromatin immunoprecipitation 300

11.6 Genetic variation and alternative splicing 300

11.6.1 Alternative splicing in health and disease 301

11.6.2 Common genetic variation and alternative splicing 301

11.7 Genetic variation: from transcriptome to proteome 303

Box 11.8 Genetic variation and IL6 receptor levels 305

11.8 Summary 306

11.9 Reviews 307

12 Extreme diversity in the major histocompatibility complex 309

12.1 Introduction 309

12.2 MHC genes, the immune response, and disease 309

12.2.1 MHC class I and II molecules 310 Box 12.1 Class I molecules and antigen presentation via the endogenous pathway 311 Box 12.2 Class II molecules and antigen presentation via the exogenous pathway 312

12.2.2 Biological complexity among the many genes found in the human MHC 312

12.2.3 Genetic diversity in the MHC and disease:

insights from rheumatoid arthritis 313

Box 12.3 Non-MHC disease associations with rheumatoid arthritis 315

12.3 Polymorphism, haplotypes, and disease 316

12.3.1 Infectious disease, selection, and maintenance of MHC polymorphism 316

12.3.2 Ancestral haplotypes 317

12.3.3 Abacavir hypersensitivity 317 Box 12.4 Functional consequences of the 8.1 haplotype 317

12.3.4 Extreme polymorphism at HLA-DRB1 318 Box 12.5 Pharmacogenomics and abacavir hypersensitivity 318

12.3.5 The MHC Haplotype Project 319 Box 12.6 MHC haplotypes associated with disease 320

12.3.6 A map of diversity across the MHC 321

12.4 Getting in the groove: diversity in MHC class II alleles 321

12.4.1 Narcolepsy 321 Box 12.7 Narcolepsy 323

12.4.2 Coeliac disease 323

12.4.3 Type 1 diabetes 324 Box 12.8 Non-MHC associations with type 1 diabetes 325

12.5 HLA-B27 and susceptibility to ankylosing spondylitis 326

Box 12.9 Non HLA-B27 associations with ankylosing spondylitis 327 Box 12.10 Models of HLA-B27 and disease mechanism in ankylosing spondylitis 329

12.6 Genetic variation and haemochromatosis 330

Box 12.11 Lessons from murine studies of HFE 332

12.7 Many forms of genetic diversity are exhibited by complement C4 333

Box 12.12 Complement C4A and C4B 334

12.8 A SNP modulating the splicing of the BTNL2 gene is associated with sarcoidosis 337

Box 12.13 Sarcoidosis 337

12.9 Summary 339

12.10 Reviews 340

13 Parasite wars 341

13.1 Introduction 341

13.2 Malaria, genetic diversity, and selection 342

13.2.1 Inherited factors and resistance to malaria 342 Box 13.1 Malaria 343

13.2.2 Thalassaemia, natural selection and malaria 345

13.2.3 Malaria and structural haemoglobin variants 348

13.2.4 Duffy antigen and vivax malaria 350

13.2.5 Malaria parasites, oxidative stress, and G6PD enzyme deficiency 352 Box 13.2 Glucose-6-phosphate dehydrogenase deficiency 352

13.2.6 Polymorphism of immune genes 353 Box 13.3 Resistance to malaria among the Fulani 356

13.2.7 Cytoadhesion and immune evasion: host and parasite diversity 357

13.3 Genetic diversity and susceptibility to Leishmaniasis in mouse and man 358

Box 13.4 Leishmaniasis 358

13.4 Helminth infection 360

13.4.1 Genetic susceptibility to Ascaris infection 360

13.4.2 Schistosomiasis and other helminth infections 360 Box 13.5 Schistosomiasis 363

13.5 Summary 363

13.6 Reviews 364

14 Human genetic diversity and HIV 365

14.1 Introduction 365

Box 14.1 HIV and AIDS 365

14.2 Genetic variation in coreceptors and coreceptor ligands 367

14.2.1 Polymorphism of CCR5 and HIV-1 infection 367 Box 14.2 HIV infection 369 Box 14.3 Variants of HIV, coreceptor specificity, and disease progression 370

14.2.2 Haplotypic structure of the CCR5 locus: evolutionary insights, variation between ethnic groups, and relationship to HIV-1

disease susceptibility 372

Box 14.4 CCR2 polymorphism and disease progression in HIV-1 infection 373

14.2.3 Coreceptor ligands and HIV-1 375

14.2.4 Copies count: copy number variation in CCL3L1, a natural ligand of CCR5, and HIV disease 376

14.3 Barriers to retroviral infection 377

14.3.1 Genetic diversity in TRIM5a gives insights into the impact of retroviruses during primate evolution 377 Box 14.5 Analysis of primate sequence diversity reveals ancient positive selection in TRIM5a and defines a key functional element of the protein 380

14.3.2 APOBEC3G: an innate host defence mechanism against retroviral infection 380

14.4 Genetic diversity in HLA, KIR, and HIV-1: strategies for survival 381

Box 14.6 Polymorphism of KIRs in health, evolution, and disease 383

14.5 Summary 384

14.6 Reviews 385

15 Concluding remarks and future directions 387

15.1 Introduction 387

15.2 Cataloguing human genetic variation 387

15.3 Genetics of disease 389

15.4 Functional consequences of genetic variation 389

15.5 Medical applications and pharmacogenomics 390

15.6 Lessons from the past, looking to the future 391

Glossary 393

References 401

Index 467

Was this article helpful?

0 0

Post a comment