In 2007 the results of a genome-wide association study into seven common diseases in British individuals were published (WTCCC 2007). This was a pioneering study by a consortium of over 50 UK research groups, which was seen as establishing the utility of the genome-wide
Figure 9.12 Low frequency variants with intermediate penetrance and disease. Schematic diagram illustrating the contrast between very rare variants with high penetrance classically found to be responsible for mendelian disease versus common variants with modest to low penetrance resolved in genome-wide association (GWA) scans of common disease. A third major group may be highly significant in determining disease susceptibility but to date is not detected well by either linkage analysis or genome-wide association scans. Redrawn and reprinted by permission from Macmillan Publishers Ltd: Nature Reviews Genetics (McCarthy et al. 2008), copyright 2008.
approach, resolving many design and analytical issues, and providing a very important dataset that clearly demonstrated previously established susceptibility loci as well as numerous novel susceptibility loci in diseases of major public health importance.
A total of 500 568 SNPs were genotyped among 2000 cases for each of the seven diseases and 3000 controls, comprising 1500 individuals from the 1958 British Birth Cohort and 1500 blood donors (WTCCC 2007). Among the methodological and analytical issues related to genome-wide association addressed by this study were quality control, genotype calling, use of imputation to infer genotypes, use of common controls, and statistical power. Novel disease associated loci were shown to be associated with modest effect sizes, namely odds ratios of less than 1.5. This was recognized to have significant consequences for the power of the WTCCC study, having 80% power to detect a ratio of 1.5 but only 43% power to detect a ratio of 1.3, with a P value threshold of less than 5 x 10-7. This was despite this being one of the largest genome-wide association study sample sizes analysed to date: even larger samples sizes would be required to increase the detection of further novel loci, likely to be associated with odds ratios of 1.2 and below.
The analysis was restricted to individuals of European ancestry; at recruitment almost all individuals self identified as white Europeans. The study demonstrated the utility of the HapMap dataset to look for non-European ancestry based on multidimensional scaling (Fig. 9.13) from which 153 individuals were excluded from analysis. Additional samples were excluded related to contamination, false identity, and relatedness, leaving a total of 16 179 individuals to be analysed for 469 557 SNPs passing quality control assessment. The two control groups were different in terms of sample collection and preparation, and in the age and population groups sampled. However, there were few significant differences between the two control groups in terms of allele frequencies, and they were used as a combined 'shared' control group in subsequent analyses.
A notable result of the study related to the analysis of geographic variation and population structure, performed due to concerns that hidden population structure within the cases and controls from the British population analysed may lead to confounding. Allele frequency
Subsequently identified as individuals of Afro-Caribbean ancestry
CEU (European ancestry)
YRI panel (African ancestry)
Was this article helpful?