Sequencing the HBB gene and defining the variant responsible for Hb S

DNA was recognized as the 'heritable material' in 1944 by Avery, Macleod, and McCarty (Avery et al. 1944); its double helical structure was elucidated by Watson and Crick in 1953 (Watson and Crick 1953); and the nature of the genetic code solved by Nirenberg, Khorana, and Holley in the early 1960s (Box 1.9) (Fig 1.6 and Fig 1.7) (Nirenberg 1963). The determination of the DNA sequence for the globin genes had its basis in the development of groundbreaking new chemical and enzymatic methods for DNA sequencing (Box 1.10). Initial sequencing studies focused on determining the partial and later full sequence of globin RNA. For example the ß globin messenger RNA sequence was determined using the Maxam and Gilbert technique by synthesizing double-stranded DNA from the RNA, and this was found to agree with predictions based on the amino acid sequence and earlier partial sequencing of RNA (Efstratiadis et al. 1977). In the same year, sequences of the noncoding region of human ß globin RNA were published using the 'plus/minus' method of Sanger (Proudfoot 1977).

Advances in molecular cloning techniques (Cohen et al. 1973) allowed isolation and amplification of the ß globin gene that was localized to the short arm of chromosome 11 (Messing et al. 1977; Wilson et al. 1977; Sanders-Haigh et al. 1980); subsequently the full nucleotide sequence was determined using the Maxam Gilbert sequencing method (Lawn et al. 1980).

Variation in DNA sequence of the HBB gene (see Fig. 1.1), encoding ß globin, was found to be responsible for Hb S. The HBB gene is located on chromosome 11p15.4 and comprises three exons (Fig. 1.9). The linear flow of information from DNA to RNA to amino acid chain through transcription and translation are illustrated with reference to the HBB gene (Box 1.11 and Box 1.12) (Fig 1.10 and Fig 1.11) (Strachan and Read 2004). The sequence variant resulting in Hb S is found near the start of the first exon of HBB and comprises an A to T nucleotide substitution in the non-template strand, which alters the RNA codon from 'GAG' to 'GUG', resulting in a change in amino acid residue from glutamic acid to valine (Fig. 1.12) (Kan and Dozy 1978; Frenette and Atweh 2007).

How should the variation responsible for Hb S be described? A number of different approaches have been taken and illustrate some of the complexities of defining and describing DNA sequence diversity (Fig. 1.13) (Beutler 1993; Beutler et al. 1996). Historically, an amino acid-based designation for describing variants was used as sequences were first available at the protein level, as was the case for haemoglobin, in advance of knowledge of the DNA code (Beutler 1993). A numbering system based on the amino acid sequence was possible with names beginning with a letter, for example E6V (glutamic acid for valine substitution at position 6) (Beaudet and Tsui 1993; AHCMN 1996). This system described the protein phenotype rather than the genotype and had the advantage of relative simplicity and insights into biological effect. However a number of problems with an amino acid-based approach were noted, not least that a particular amino acid change may result from a number of different nucleotide changes due to degeneracy of the genetic code (Beutler 1993; Beutler et al. 1996). For example, a histidine to glutamine substitution may result from a change in the codon from CAU, to CAA or CAG. It is therefore not always possible to deduce the DNA sequence variant from the amino acid change.

Furthermore there was controversy in amino acid notation in terms of the starting point. Early literature based on protein sequence considered the processed protein in which methionine is co-translationally cleaved at the point the amino acid sequence is about 25 amino acids long such that valine is the first amino acid and the Hb S variant denoted E6V (Glu6Val). However, current recommendations refer to the unprocessed protein, the primary translation product in which methionine is amino acid +1 so that the Hb S change would now be designated E7V. Ambiguities also arise in terms of whether the native, partly processed,

Was this article helpful?

0 0

Post a comment