C C G A G A G G T G A G A A G A G G G A A G C G G A C C

Figure 2.3 Finding sequence variants responsible for diastrophic dysplasia (DTD). (A) Linkage disequilibrium mapping showed that almost all DTD chromosomes carried an ancestral haplotype 1-1 based on restriction enzyme digestion. (B) SLC26A2 gene showing the site of T to C substitution abolishing splice site in intron 1 together with restriction sites for Styl and EcoRI in the adjacent CSF1R gene. (C) Sequencing traces illustrating the nucleotide substitution responsible for almost all Finnish cases of DTD. (D) Restriction digest using Hphl illustrating the loss of restriction site in presence of T to C substitution for an affected patient; in the carrier both T and C alleles are present, leading to two bands. Redrawn and reprinted by permission from Macmillan Publishers Ltd: Nature Genetics (Hastbacka et al. 1992), copyright 1992; European Journal of Human Genetics (Hastbacka et al. 1999), copyright 1992. Panel B adapted from screenshot of UCSC Genome Browser (Kent et al 2002) (http://genome.ucsc.edu/) (Human March 2006 Assembly).

TT homozygote (unaffected)

Figure 2.3 Finding sequence variants responsible for diastrophic dysplasia (DTD). (A) Linkage disequilibrium mapping showed that almost all DTD chromosomes carried an ancestral haplotype 1-1 based on restriction enzyme digestion. (B) SLC26A2 gene showing the site of T to C substitution abolishing splice site in intron 1 together with restriction sites for Styl and EcoRI in the adjacent CSF1R gene. (C) Sequencing traces illustrating the nucleotide substitution responsible for almost all Finnish cases of DTD. (D) Restriction digest using Hphl illustrating the loss of restriction site in presence of T to C substitution for an affected patient; in the carrier both T and C alleles are present, leading to two bands. Redrawn and reprinted by permission from Macmillan Publishers Ltd: Nature Genetics (Hastbacka et al. 1992), copyright 1992; European Journal of Human Genetics (Hastbacka et al. 1999), copyright 1992. Panel B adapted from screenshot of UCSC Genome Browser (Kent et al 2002) (http://genome.ucsc.edu/) (Human March 2006 Assembly).

the finding of allelic heterogeneity is seen as strengthening the evidence of a causal relationship between variation in the gene and the phenotype (Risch 2000).

Variation at different loci can also lead to the same phenotype (described as genetic heterogeneity) as illustrated by early onset familial Alzheimer's disease, which is discussed in more detail later in Section 2.5.1. In this disease, rare variants at three different genes (APP, PSEN1, and PSEN2) have been shown to cause the observed phenotype. Within families the relationship is, however, specific between a given gene variant and the phenotype, which allows linkage and positional cloning approaches to be applied. In this example a common pathway leading to Alzheimer's disease based on amyloid accumulation provides a unifying mechanism within which variation at these different genes may lead to the same phenotype.

Other genetic and environmental variation may also contribute to the phenotypic heterogeneity observed in some mendelian disorders. For example, marked variation in penetrance is seen for the iron storage disorder haemochromatosis, which in most cases arises due to homozygosity for a missense mutation in the HFE gene (Section 12.6). Among the proposed environmental and genetic modifiers is a common SNP of the BMP2 gene (encoding bone morphogenetic protein 2) which modulates iron burden (Milet et al. 2007).

2.3.5 Linkage analysis and common disease

For mendelian diseases, linkage studies have proved a very robust approach with a low false positive rate (Risch 2000). By contrast, much less success has been achieved using linkage-based approaches for common multifactorial traits. Here it has been increasingly recognized that diversity in many genes is likely to be involved, each with individually modest effect sizes and further modulated by environmental factors to a much greater extent than with rare diseases showing Mendelian inheritance. Critically, for polygenic diseases there is usually no clear pattern of inheritance within a pedigree.

Linkage approaches have been applied in common multifactorial diseases but in only a minority of cases has this been highly informative. There have been some striking successes such as the mapping of a Crohn's disease susceptibility locus to chromosome 16q by genome-wide linkage analysis (Section 9.5.2). Crohn's disease is a common debilitating inflammatory bowel disorder in which genetic factors have been extensively investigated (Section 9.5). Positional cloning and linkage disequilibrium mapping helped to refine the inflammatory bowel disease locus (IBD1) on chromosome 16 and led to the identification of specific mutations of the NOD2 gene conferring significantly increased risk of disease (Section 9.5.2). Other loci resolved by linkage analysis in Crohn's disease include regions of chromosome 6p (IBD3) and chromosome 5q31 (IBD5), however many other IBD loci reported from linkage scans have not been convincingly replicated (Section 9.5.3).

Linkage analysis has also been applied successfully to study particular subtypes of some common diseases where individuals display a mendelian pattern of inheritance as seen with early onset Alzheimer's disease (Section 2.5.1) and maturity onset diabetes of the young (OMIM 606391). The latter is caused by mutations in a number of different genes including HNF4A (encoding hepatocyte nuclear factor-4-alpha) on chromosome 20q12-q13.1 (MODY type 1) (Yamagata et al. 1996).

2.4 Genetic association studies and common disease

Genetic association studies have been extensively used to try to define genetic variation associated with disease susceptibility, notably in the context of common multifactorial traits. The approach was advocated by Risch and Merikangas in 1996 as more powerful than linkage analysis for such diseases where a modest effect size was likely, such that application of linkage analysis would require an unfeasibly high number of families (Risch and Merikangas 1996). It was only 10 years later, however, that our ability to genotype hundreds of thousands of SNPs as common genetic markers in the context of a publically available finished human genome sequence and improved understanding of the genetic architecture of human diversity that would allow the successful application of genome-wide association studies (Section 9.3).

Prior to this, association studies adopting a candidate gene approach based on biological plausibility were extensively used across a very wide range of common diseases for which there were varying levels of evidence to support a role for genetic factors. There were notable successes but also increasing scepticism as in many cases initial studies failed to be replicated. There are many reasons proposed for this but in essence what was not clear at that time was the small size of relative risks associated with possession of the majority of individual alleles and the nature of the underlying allelic architecture. We now know that common multifactor-ial traits are likely to involve several genes and multiple variants of individually modest magnitude of effect, which are neither necessary nor sufficient for disease to occur.

Many studies did not have sufficient statistical power to find an association or to replicate it, and there were issues with the significance thresholds chosen, how to correct for multiple comparisons, and the potential bias from underlying population stratification (Section 2.4.3). The effects of linkage disequilibrium were difficult to dissect, particularly as often only a very limited number of genetic markers were selected for analysis. Issues with phenotype definition, overestimation of the magnitude of initial association, testing of multiple hypotheses, publication bias, population-specific differences in underlying linkage disequilibrium, and gene-gene and gene-environment interactions have all been raised as further reasons for failure to replicate genetic association studies (Cardon and Bell 2001; Hirschhorn et al. 2002; Healy 2006). A number of these factors are considered in more detail below and elsewhere in the context of specific diseases or traits.

2.4.1 A small number of robustly demonstrated associations?

In a comprehensive literature review of association studies published between 1986 and 2000, a total of 603 different gene disease associations involving 238 genes and 133 common diseases were analysed by Hirschhorn and colleagues (2002). These excluded the very many associations reported with the MHC on chromosome 6 and those with blood group antigens, and were restricted to variants with a minor allele frequency of at least 1% in or close to known genes. For 166 associations in which three or more publications were available to review, only six associations were judged to be reproducible with a high level of statistical confidence (Hirschhorn et al. 2002). This is raising the bar to a high level and should be regarded as a minimal set of robust associations with many other reported associations likely to be informative.

The six associations noted by Hirschhorn and colleagues included factor V Leiden and venous thrombosis (reviewed in Section 2.6.1); possession of the APOE e4 allele and late onset Alzheimer's disease (reviewed in Section 2.5.2); a nonsynonymous SNP of CLTA4 and Graves' disease (an autoimmune disease involving the thyroid gland) (Donner et al. 1997); a 32 bp deletion of the CCR5 gene and HIV-1 infection (Section 14.2.1); a tandem repeat upstream of the INS gene and type 1 diabetes (Box 7.5); and a nonsynonymous SNP of the prion protein gene PRNP on chromosome 20p13 associated with sporadic Creutzfeldt-Jakob disease (Palmer et al. 1991). Despite this very low number of apparently robust associations, in a subsequent paper Hirschhorn and colleagues highlighted how while false positive results are common among initial reports, a substantial number of real associations of modest effect are likely to be present which become significant on meta analysis (Lohmueller et al. 2003). Such analysis showed significant replication among follow-up studies for eight out of 25 studies selected from the initial set of 166 frequently studied associations (Lohmueller et al. 2003). This and other studies illustrate the power of meta analysis in this context (loannidis et al. 2001).

2.4.2 Study design and statistical power

For any genetic study of human disease, whether linkage or association based, a clearly defined phenotype using specific diagnostic criteria is essential. There will often be variation within a phenotype related to aetiology or other factors, and minimizing such heterogeneity is important to maximize the chances of success.

The levels of resolution achieved by genetic association studies are potentially very different from linkage analysis. With linkage analysis, sets of markers are used to map chromosomal location by statistical analysis of observed informative recombination events. The resolution will be coarse as there are relatively few opportunities for such events within the generations of the pedigrees studied, and the detail achieved can be seen as large blocks of inherited haplotypes with a 'disease region' that may span hundreds of genes. This contrasts with genetic association studies. Here an association will usually arise due to linkage disequilibrium between the genetic marker and functional variant unless the causative variant has been included in the set of genetic markers analysed. As it is the population which is being sampled rather than a limited number of generations of a family, the scale of resolution will be much finer as the observed linkage disequilibrium at a given chromosomal region will reflect human ancestry and the multiple complex recombination and mutational events which will have occurred (Fig. 2.4) (Cardon and Bell 2001). Despite this, fine mapping disease association studies remains a formidable challenge and our ability to resolve causative functional variants, particularly for noncoding changes, still represents a major road block.

Both case-control and family-based study designs have been extensively used in genetic association studies, the case-control design looking for evidence of statistically significant association between the frequency of a given genetic variant among cases of the disease compared to controls. The selection of gene regions for candidate gene association studies was driven by biological plausibility with well characterized genes, such as TNF encoding the cytokine tumour necrosis factor, the subject of many reported disease association studies ranging from infectious and autoimmune disease to cancer (Section 2.4.4).

Often, single genetic markers (usually SNPs) were selected for analysis in a case-control design, while later studies sought to increase SNP coverage for a gene region as awareness of the extent of diversity, haplo-type structure, and linkage disequilibrium increased. Given that there are very few instances where a causative variant had been identified, candidate gene association studies relied on either successfully genotyping the latter or finding association with SNPs in linkage disequilibrium with the causative variant. The low prior probability of finding a true association when testing a small number of markers in a candidate gene is likely to have been a major factor in the large number of false positive studies reporting an initial association (Risch 2000). In Chapter 12 the nature and consequences of genetic diversity in the MHC are reviewed: relative to other genomic loci many more robust disease associations were found in this region, which Risch suggested was due to the much higher prior probability of a functional or linked variant being analysed (Risch 2000).

Study power is extremely important in association studies, with many reported studies underpowered to detect a given minimal magnitude of effect and allele frequency. Large sample sizes are required for more modest effect sizes and rarer allele frequencies. Power is also influenced by local patterns of linkage disequilibrium and other alleles at the same or a different locus which lead independently to the same disease (allelic and genetic heterogeneity, respectively). Cardon and Bell describe how in 2001 reported sample sizes for case-control studies had been modest to that time, with 100 or less cases and equal numbers of controls.

For the modest effect sizes typically seen with common multifactorial disease traits, insufficient statistical power can lead to false negative results on initial reporting of association testing or with replication studies. This was illustrated by work looking at genetic variation in the PPARG gene at chromosome 3p25 encoding a specific transcription factor 'peroxisome proliferator-activated receptor gamma', important in adipocyte differentiation and gene expression. A specific nonsynonymous SNP, rs1801282, in which a C to G nucleotide substitution leads to substitution of proline for alanine at amino acid position 12 (p.P12A), was initially reported as showing a strong association with type 2 diabetes with an odds ratio (OR) of 4.35 (P = 0.03) for individuals homozygous for the C allele (Deeb et al. 1998). Four of five subsequent studies failed to find a significant association with diabetes although a modest elevated risk was present and the studies were thought to have insufficient sample sizes to reliably detect the association (Altshuler et al. 2000a). A larger study of 3000 individuals comprising different family-based and case-control cohorts did find significant association for the C allele encoding proline although the odds ratio was modest (OR 1.25, P = 0.002) (Altshuler et al. 2000a), highlighting that initial reports may be overestimates of an association and that replication studies should be well powered (Hirschhorn et al. 2002). This association also serves to illustrate that it is not necessarily possession of the rarer allele which is associated with disease risk; the high frequency of the

Functional variant, m

0 0

Post a comment