Molecular Comparisons

What Darwin didn't - couldn't - know is that the comparative evidence becomes even more convincing when we include molecular genetics, in addition to the anatomical comparisons that were available to him.

Just as the vertebrate skeleton is invariant across all vertebrates while the individual bones differ, and just as the crustacean exoskeleton is invariant across all crustaceans while the individual 'tubes' vary, so the DNA code is invariant across all living creatures, while the individual genes themselves vary. This is a truly astounding fact, which shows more clearly than anything else that all living creatures are descended from a single ancestor. Not just the genetic code itself, but the whole gene/protein system for running life, which we dealt with in Chapter 8, is the same in all animals, plants, fungi, bacteria, archaea and viruses. What varies is what is written in the code, not the code itself. And when we look comparatively at what is written in the code - the actual genetic sequences in all these different creatures - we find the same kind of hierarchical tree of resemblance. We find the same family tree - albeit much more thoroughly and convincingly laid out - as we did with the vertebrate skeleton, the crustacean skeleton, and indeed the whole pattern of anatomical resemblances through all the living kingdoms.

If we want to work out how closely related any pair of species is - say, how close a hedgehog is to a monkey - the ideal would be to look at the complete molecular texts of every gene of both species, and compare every jot and tittle, as a biblical scholar might compare two scrolls or fragments of Isaiah. But it is time-consuming and expensive. The Human Genome Project took about ten years, representing many person-centuries. Although it would now be possible to achieve the same result in a fraction of the time, it would still be a large and expensive undertaking, as would the hedgehog genome project. Like the Apollo moon landings, and like the Large Hadron Collider (which has just been switched on in Geneva as I write - the gigantic scale of this international endeavour moved me to tears when I visited), the complete deciphering of the human genome is one of those achievements that makes me proud to be human. I am delighted that the chimpanzee genome project has now been successfully accomplished, and the equivalent for various other species. If the present rate of progress continues (see 'Hodgkin's Law' below), it will soon be economically feasible to sequence the genome of every pair of species whose closeness of cousinship we might want to measure. Meanwhile, for the most part we have to resort to sampling particular parts of their genomes, and it works pretty well.

We can sample by picking out a few choice genes (or proteins, whose sequences are directly translated from genes) and comparing them across species. I'll come to that in a moment. But there are other ways of doing a kind of crude, automatic sampling, and the technologies to do that have been around for longer. An early method, which works surprisingly well, exploits the immune system of rabbits (you could actually use any animal you like, but rabbits do the job nicely). As part of the body's natural defence against pathogens, the rabbit's immune system manufactures antibodies against any foreign protein that enters the bloodstream. Just as you could tell that I have had whooping cough by looking at the antibodies in my blood, so you can tell what a rabbit has been exposed to in the past by looking at its immune response in the present. The antibodies present in the rabbit constitute a history of the natural shocks to which its flesh has been heir - including artificially injected proteins. If you inject, say, a chimpanzee protein into a rabbit, the antibodies that it makes will subsequently attack the same protein if it is injected again. But suppose your second injection is of the equivalent protein, not from a chimpanzee but from a gorilla? The rabbit's prior exposure to the chimpanzee protein will have partially forearmed it against the gorilla version, but the reaction will be weaker. And it will also have forearmed it against the kangaroo version of the protein, but the reaction will be weaker still, given that the kangaroo is much less closely related to the chimpanzee that did the priming than the gorilla is. The strength of the rabbit's immune response to a subsequent injection of a protein is a measure of the resemblance of that protein to the original to which the rabbit was first exposed. It was by this method, using rabbits, that Vincent Sarich and Allan Wilson, at the University of California at Berkeley, demonstrated in the 1960s that humans and chimpanzees are much more closely related to each other than anybody had previously realized.

There are also methods that use the genes themselves, comparing them across species directly rather than comparing the proteins they encode. One of the oldest and most effective of these methods is called DNA hybridization. DNA hybridization is usually what lies behind those statements one often sees along the lines of: 'Humans and chimpanzees share 98 per cent of their genes.' There is some confusion, by the way, about exactly what is meant by percentage figures such as these. Ninety-eight per cent of what is identical? The exact figure depends on how large the units are that we are counting. A simple analogy makes this clear, and it does so in an interesting way, because the differences between the analogy and the real thing are as revealing as the similarities. Suppose we have two versions of the same book and we want to compare them. Perhaps it is the book of Daniel, and we want to compare the canonical version with an ancient scroll that has just been discovered in a cave overlooking the Dead Sea. What percentage of the chapters of the two books are identical? Probably zero, for it takes only one discrepancy, anywhere in the whole chapter, for us to say the two are not identical. What percentage of their sentences are identical? The percentage will now be much higher. Even higher will be the percentage of words that are identical, because words have fewer letters than sentences - fewer opportunities to bust the identity. But a word resemblance is still broken if any one letter in the word differs. Therefore, if you line the two texts up side by side and compare them letter by letter, the percentage of identical letters will be even higher than the percentage of identical words. So an estimate like '98 per cent in common' doesn't mean anything unless we specify the size of the unit we are comparing. Are we counting chapters, words, letters or what? And the same is true when we compare DNA from two species. If you are comparing whole chromosomes, the percentage shared is zero, because it only takes one tiny difference, somewhere along the chromosomes, to define the chromosomes as different.

The often-quoted figure of about 98 per cent for the shared genetic material of humans and chimps actually refers neither to numbers of chromosomes nor to numbers of whole genes, but to numbers of DNA 'letters' (technically, base pairs) that match each other within the respective human and chimp genes. But there is a pitfall. If you do the lining up naively, a missing letter (or an added letter), as opposed to a mistaken letter, will cause all subsequent letters to mismatch, because they will then all be staggered, one step out (until there is a mistake in the other direction to bring them back into step again). It is clearly unfair to let the estimate of discrepancies be inflated in this way. A scholar's eye, scanning two scrolls of Daniel, automatically copes with this, in a way that is hard to quantify. How can we do it with DNA? This is where we leave our analogy with books and scrolls and go straight to the real thing because, as it happens, the real thing - DNA - is easier to understand than the analogy!

If you gradually heat DNA, there comes a point - somewhere around 85°C - when the bonding between the two strands of the double helix breaks, and the two helices separate. You can think of 85°C, or whatever the temperature turns out to be, as a 'melting point'. If you let it cool again, each single helix spontaneously joins up again with another single helix, or fragment of single helix, wherever it finds one with which it can pair, using the ordinary base-pairing rules of the double helix. You might think that this would always be the partner from which it lately separated, and with which, of course, it is perfectly matched. Indeed it could be, but it usually isn't as tidy as that. Fragments of DNA will find other fragments with which they can pair, and they will usually not be exactly their original partners. And indeed, if you add separated fragments of DNA from another species, fragments of the single strands are quite capable of joining up with fragments of single strands from the wrong species, in just the same way as they will join up with single strands from the right species. Why should they not? It is the remarkable conclusion of the Watson-Crick molecular biology revolution that DNA is just DNA. It doesn't 'care' whether it is human DNA, chimp DNA or apple DNA. Fragments will happily pair off with complementary fragments wherever they find them. Nevertheless, the strength of bonding is not always equal. Single-stranded lengths of DNA bond more tightly with matching single strands than they do with less similar single strands. This is because more of the 'letters' of the DNA (Watson and Crick's 'bases') find themselves opposite partners with which they cannot pair. The bonding of the strands is therefore weakened - like a zip fastener with some teeth missing.

How shall we measure this strength of bonding, after fragments from different species have found each other and united? By an almost ludicrously simple method. We measure the 'melting point' of the bonds. You remember I said that the melting point of double-stranded DNA is about 85°C. This is true of normal, properly matched double-stranded DNA, as when a strand of human DNA is 'melted' away from a complementary strand of human DNA. But when the bonding is weaker - as when a human strand has bonded with a chimpanzee strand - a slightly lower temperature is sufficient to break the bond. And when human DNA has bonded with DNA from a more distant cousin like a fish or a toad, an even lower temperature suffices to separate them. The difference between the melting point when a strand is bonded to one of its own kind, and the melting point when it is bonded to a strand from another species, is our measure of the genetic distance between the two species. As a rule of thumb, each decrease by 1° Celsius in the 'melting point' is approximately equivalent to a drop of 1 per cent in the number of DNA letters matched (or an increase of 1 per cent in the number of missing teeth in the zip fastener).

There are complications in the method, which I haven't gone into, and tricky problems, which have ingenious solutions. For instance, if you mix human with chimp DNA, much of the fragmented human DNA will bond with other human DNA fragments, and much of the chimp DNA will bond with its own kind. How do you separate off the hybrid DNA, whose 'melting point' is what you really want to measure, from the 'same-kind' DNA? The answer is by a clever trick involving previous radioactive labelling. But the details would take us too far off our path. The main point here is that DNA hybridization is the technique that leads scientists to figures like 98 per cent for the genetic similarity between humans and chimpanzees, and it yields predictably lower percentages as you move to more distantly related pairs of animals.

The newest method of measuring the similarity between a pair of matching genes from different species is the most direct, and the most expensive: actually read the sequence of letters in the genes themselves, using the same methods as were used for the Human Genome Project. Although it is still expensive to compare the entire genome, you can get a good approximation by comparing a sample of genes, and this is now increasingly done.

Whichever technique we use for measuring similarity between two species, whether it is rabbit antibodies, or melting points, or direct sequencing, the next step is pretty much the same. Having obtained a single number representing the similarity between each pair of species, we then place the figures in a table. Take a set of species and write their names, in the same order, as both the column headings and the row headings. Then place the percentage similarities in the appropriate cells. The table will be triangular (half of a square) because, for example, the percentage similarity between human and dog will be the same as the similarity between dog and human. So if you filled in all of a square table each of the two halves either side of the diagonal would mirror the other.

Now, what sort of results should we expect? On the evolution model we should predict that you'll find yourself putting a high score in the cell connecting human and chimpanzee; a lower score in the cell connecting human and dog. The human/dog cell should theoretically have an identical resemblance score to the chimpanzee/ dog cell because humans and chimpanzees have exactly the same degree of relation to dogs. It should be identical, too, to the monkey/ dog cell and the lemur/dog cell. This is because humans, chimpanzees, monkeys and lemurs are all connected to the dog via their common ancestor, an early primate (which probably looked a bit like a lemur). The same score should show up in the human/ cat, chimpanzee/cat, monkey/cat and lemur/cat cells, because cats and dogs are related to all primates via the shared ancestor of all carnivores. There should be a much lower score - ideally equally low - in all the cells uniting, say, a squid with any mammal. And it shouldn't matter which mammal you choose, since all are equally distant from a squid.

These are strong theoretical expectations, but there is no reason why, in practice, they should not be violated. If they were violated, it would be evidence against evolution. What actually happens turns out to be - within statistical margins of error - just what we should expect on the assumption that evolution has happened. This is another way of saying that, if you put the genetic distances between pairs of species on the limbs of a tree, everything adds up in a satisfying way. Of course the adding up is not quite perfect. Numerical expectations in biology are seldom realized with better than approximate accuracy.

Comparative DNA (or protein) evidence can be used to decide - on the evolutionary assumption - which pairs of animals are closer cousins than which others. What turns this into extremely powerful evidence for evolution is that you can construct a tree of genetic resemblances separately for each gene in turn. And the important result is that every gene delivers approximately the same tree of life. Once again, this is exactly what you would expect if you were dealing with a true family tree. It is not what you would expect if a designer had surveyed the whole animal kingdom and picked and chosen - or 'borrowed' -the best proteins for the job, wherever in the animal kingdom they might be found.

The earliest large-scale study along these lines was done by a group of geneticists in New Zealand led by Professor David Penny. Penny's group took five genes which, although not identical across all mammals, were similar enough to have earned the same name in all. The details don't matter, but for the record the five genes were those for haemoglobin A, haemoglobin B (haemoglobins give blood its red colour), fibrinopeptide A, fibrinopeptide B (fibrinopeptides are used in clotting blood) and cytochrome C (which plays an important role in cellular biochemistry). They chose eleven mammals to compare: rhesus monkey, sheep, horse, kangaroo, rat, rabbit, dog, pig, human, cow and chimpanzee.

Penny and his colleagues thought statistically. They wanted to calculate the probability that, purely by chance, two molecules would yield the same family tree, if evolution wasn't true. So they tried to imagine all possible trees that could terminate in eleven descendants. It's a surprisingly large number. Even if you limit yourself to 'binary trees' (that is, trees with branches that only bifurcate - no tri-furcating or higher-furcating), the total number of possible trees is more than 34 million. The scientists patiently looked at every one of the 34 million trees and compared each one with the other 33,999,999

trees. No, of course they didn't! It would take too much computer time. They did, however, devise a clever statistical approximation, a shortcut equivalent to that mammoth calculation.

This is how the method of approximation worked. They took the first of the five genes, say haemoglobin-A (in all cases I use the name of the protein to stand for the gene that codes for that protein). Of all those millions of trees, they wanted to find which was the most 'parsimonious' where haemoglobin-A was concerned. Parsimonious here means 'needing to postulate the minimum amount of evolutionary change'. For example, all those thousands of trees that assumed that the closest cousin to a human was a kangaroo while humans and chimpanzees are more distantly related, proved to be very unparsimonious trees: they needed to assume a lot of evolutionary change, in order to yield the result that kangaroos and humans had a recent common ancestor. Haemoglobin-A's verdict would be along these lines:

This is a terribly unparsimonious tree. Not only do I have to put in lots of mutational work in order to end up so different in humans and kangaroos, despite our close cousinship according to this tree, I also have to put in lots of mutational work in the other direction, in order to ensure that, despite their great separation on this particular tree, humans and chimps somehow ended up with such similar haemoglobin-A. I vote against this tree.

Haemoglobin-A delivers a verdict of this kind, some verdicts more favourable than others, on each of the 34 million trees, and finally ends up choosing a few dozen top-ranking trees. Of each of these topranking trees, haemoglobin-A would say something like this:

This tree puts humans and chimpanzees as close cousins, and it puts sheep and cows as close cousins, and it puts kangaroos out on a limb. This turns out to be a very good tree, because it makes me do hardly any mutational work at all to explain the evolutionary changes. This is an excellently parsimonious tree. It gets the haemoglobin-A vote!

Of course, it would have been nice if haemoglobin-A, and every other gene, could have come up with a single most parsimonious tree, but that is too much to ask. Among the 34 million trees, it is only to be expected that several slightly different trees should tie for haemoglobin-A's top-ranking slot.

Now, how about haemoglobin-B? How about cytochrome-C? Each one of the five proteins is entitled to its own separate vote, to find its own preferred (that is, most parsimonious) trees from among the 34 million trees. It would be perfectly possible for cytochrome-C to come up with a completely different vote on which is the most parsimonious tree. It could turn out that the cytochrome-C of humans really is very similar to that of kangaroos, and very different from that of chimpanzees. Far from saluting the close pairing of sheep and cow discerned by haemoglobin-A, cytochrome-C might find that it hardly needs to mutate at all in order to place sheep very close to, say, monkeys, and in order to place cows very close to rabbits. On the creation hypothesis there is no reason why that shouldn't happen. But what Penny and his colleagues actually found was that there was astonishingly high agreement among all five proteins (and they used yet more clever statistics to show how unlikely such concordance would be by chance). All five proteins 'voted' for pretty much the same subset of trees from among the 34 million possible trees. This is, of course, exactly what we would expect on the assumption that there really is only one true tree relating all eleven animals, and it is the family tree: the tree of evolutionary relationships. What is more, the consensus tree that the five molecules all voted for turned out to be the same as zoologists had already worked out on anatomic and palaeontological, not molecular, grounds.

The Penny study was published in 1982, quite a while ago now. The intervening years have seen a prolific multiplication of detailed evidence on the exact sequences of genes of lots and lots of species of animals and plants. Agreement on the most parsimonious trees now extends far beyond the eleven species and five molecules that Penny and his colleagues studied. Theirs was just a nice example, overwhelming as their statistical evidence proved. The sum total of genetic sequence data now available puts the matter beyond all conceivable doubt. Far more convincingly even than the (also highly convincing) fossil evidence, the evidence from comparisons among genes is converging, rapidly and decisively, on a single great tree of life. Above is a tree for the eleven species of the Penny study, which represents a modern consensus vote from many different parts of the mammalian genome. It is the consistency of agreement among all the different genes in the genome that gives us confidence, not only in the historical accuracy of the consensus tree itself, but also in the fact that evolution has occurred.

1 Sictp

Horit Ddjb Human

I- Hibbll

Was this article helpful?

0 0
How To Bolster Your Immune System

How To Bolster Your Immune System

All Natural Immune Boosters Proven To Fight Infection, Disease And More. Discover A Natural, Safe Effective Way To Boost Your Immune System Using Ingredients From Your Kitchen Cupboard. The only common sense, no holds barred guide to hit the market today no gimmicks, no pills, just old fashioned common sense remedies to cure colds, influenza, viral infections and more.

Get My Free Audio Book


Post a comment