Excluded samples

Subsequently identified as being of South Asian ancestry (India and Pakistan)

(East Asian ancestry)

Figure 9.13 Use of data from HapMap to identify individuals with evidence of non-European ancestry. Samples for Wellcome Trust Case Control Consortium (WTCCC) and YRI (African geographic ancestry), CEU (European), CHB (Chinese), and JPT (Japanese) HapMap samples were plotted for the first two principal components obtained by multidimensional scaling. WTCCC samples were then selected for exclusion where they were not clustered with the CEU sample set. Reprinted by permission from Macmillan Publishers Ltd: Nature (WTCCC 2007), copyright 2007.

differences were found between the 12 geographic regions of the UK analysed for 13 genomic regions with variation noted along a northwest-southeast axis, thought to relate to natural selection in ancestral populations. Highly significant associations were found with LCT (encoding lactase) at 2q21, HLA at 6p21, and TLR genes at 4p14, which had been previously reported, with the remaining associated loci representing new findings that may relate to previous selection by tuberculosis, pellagra, or leprosy (WTCCC 2007). No disease associations were found in the genomic loci showing strong geographic differentiation and it was felt that population structure would not significantly confound the genome-wide disease association mapping study.

In terms of the seven diseases analysed, all had previously been demonstrated to have a significant genetic component in terms of defining disease susceptibility. Classical and Bayesian statistical approaches were used in the analysis with trend and general genotype tests performed between each case and the pooled set of controls. It was notable that the largest numbers of significant associations were with type 1 diabetes and Crohn's disease, which have the highest sibling relative risks. Among these conditions, and the other five diseases analysed (type 2 diabetes, bipolar disorder, hypertension, coronary artery disease, and rheumatoid arthritis), 13 out of 15 previously reported 'robustly replicated' loci showed association in the WTCCC dataset (WTCCC 2007). Of those that did not, variants at APOE on chromosome 19q13 previously associated with coronary artery disease were poorly tagged on the genotyping platform used, while variants at INS (encoding insulin) previously associated with type 1 diabetes on 11p15 (Box 7.5) did show strong association but the SNP genotyping narrowly failed quality control. The Consortium used a genome-wide level of significance of 5 X 10-7 to identify 25 strongly associated susceptibility



