Association study using case-control design

Resolution from association analysis is much finer as the allelic architecture of the population sampled is based on many historical recombination and mutational events. The haplotype on which the functional mutation arose breaks down over time.

Figure 2.4 Genetic association analysis. Genetic association studies sample from a population in which many historical recombination and mutational events will have occurred allowing increased resolution from genotyped markers. Linkage disequilibrium between the genotyped genetic marker and functional variant(s) allows association with disease even if the latter is not directly genotyped. Adapted by permission from Macmillan Publishers Ltd: Nature Reviews Genetics (Cardon and Bell 2001), copyright 2001.

m m allele also emphasizes that while the effect for the individual on disease susceptibility is modest, the attributable risk may be much higher.

The need for much larger genetic association studies involving thousands of individuals which more recently become possible through collaboration between research groups, allowing genome-wide association studies and replication studies to become a reality (WTCCC 2007). Large ongoing prospective cohort collections such as UK Biobank (, which aims to recruit

500 000 people aged 40-69 years, will prove very exciting resources for future genetic studies with a wealth of associated epidemiological data; power calculations suggest for example that after 6 years of recruitment the study will be powered to detect an odds ratio of 1.3 or higher for type 2 diabetes (Palmer 2007).

In Chapter 9 the development and application of genome-wide genetic association studies are reviewed in detail (Section 9.3). A number of different factors enabled such studies to become a reality, not least the availability of large cohorts of carefully phenotyped cases of different diseases. The remarkable efforts to catalogue SNPs and the underlying haplotypic diversity across populations (Section 9.2) were to dramatically improve our understanding of allelic architecture and allow the selection of informative common SNP markers for use in genome-wide association studies. These factors, together with the capacity for high throughput genotyping, the availability of the human genome sequence, and advances in statistical analysis, were to set the stage for dramatic improvements in our ability to interrogate the genetic basis of common multifactorial disease.

2.4.3 Genetic admixture and association with disease

A major potential confounder of case-control disease association studies is the effect of genetic admixture or population subdivision. This can arise when a population containing two or more ethnic groups or subgroups is studied. The frequency of many alleles varies between ethnic groups or 'segments' of a population relating to genetic drift or founder effects, as may the prevalence of disease (Slatkin 1991; Cavalli-Sforza et al. 1994; Pritchard and Rosenberg 1999). If there has not been careful matching of ethnic groups between cases and controls, apparent disease associations may arise. Thus if disease prevalence in a given group is higher than the others, that group may be overrepresented among the cases compared to the controls and any genetic markers that are present at a high frequency in that particular ethnic group will appear to be associated with disease (Hirschhorn et al. 2002). A number of conditions vary in prevalence between ethnic groups, such as hypertension, which is more common among African Americans than Caucasians; study of a mixed population will run the risk of spurious association with disease for any marker allele more frequent in the African American population (Reich and Goldstein 2001).

The issues relating to genetic admixture and disease association are well illustrated by a study of type 2 diabetes among the Pima and Papago Native American tribes of southern Arizona (Knowler et al. 1988). A large longitudinal study over 20 years allowed the association with disease of a particular immunoglobulin heavy chain haplotype (Gm3;5'1}14) to be analysed among a cohort of

4920 individuals with a high prevalence of diabetes. The association appears dramatic in a well powered study with a clear protective effect associated with possession of a specific Gm haplotype and type 2 diabetes (Fig. 2.5A) (Knowler et al. 1988). However, this result arises due to Caucasian admixture among the Native American population studied leading to population stratification. Among those of full Pima and Papago Indian heritage, type 2 diabetes has a much higher prevalence compared to Caucasians or those with lower fractions of Indian heritage; conversely the particular Gm haplotype studied has a very low frequency among those of full Indian heritage (0.006) compared to Caucasians in the United States (0.665) (Fig. 2.5B). Thus when the data are stratified based on Caucasian admixture (expressed as the fraction of full Indian heritage), the prevalence of diabetes is the same among those who do or do not possess the Gm haplotype (Fig. 2.5C) (Knowler et al. 1988). The risk of diabetes does vary dependent on the degree of Caucasian admixture but this has nothing to do with the Gm haplotype studied - that haplotype is simply a very good marker of the degree of Caucasian admixture in Native Americans (Williams et al. 1986).

Careful matching of cases and controls is important to avoid the effects of significant population stratification but even so 'cryptic stratification' may persist. Family-based methodologies, notably the transmission disequilibrium test, circumvent the problem by analysing affected children and their parents. Actual transmission of alleles to offspring can then be compared to expected transmission, provided that at least one parent is heterozygous for a given allele allowing testing for linkage and association between marker and disease (Spielman et al. 1993; Ewens and Spielman 1995). Alleles showing disease association will be transmitted more often than the expected 50 : 50 transmission based on mendelian inheritance. Recruitment of family members is, however, more demanding than for unrelated individuals, particularly for late onset diseases, and requires additional genotyping (Hirschhorn et al. 2002). Unlinked genetic markers have also been used to both define the extent of population stratification and as a means of statistical correction (Pritchard and Rosenberg 1999; Reich and Goldstein 2001). More recent genome-wide association studies using many hundreds of thousands of genetic markers allow powerful statistical approaches to defining hidden

All subjects

Gm3;5,13,14 Present hapl┬░type Absent


Present Absent

Prevalence ratio 0.27 (95% CI 0.18-0.40) Chi-squared value 61.6, P < 0.001

Gm3®,",14 haplotype


age prevalence 'Z

0 0

Post a comment