The History Of Gene Families

Conservation of developmental regulatory genes: inferences about animal ancestorsfrom comparative genomics

Animal genomes contain many thousands of genes, many of which are required for basic processes common to cellular life. These "housekeeping"genes are shared by most living organisms and predate the evolution of multicellular animal life. Because housekeeping genes are fundamental to cell structure, viability, and function, they are not the most likely candidates for the genes critical to the evolution of complex animal body plans. Instead, this chapter focuses on the subset of genes that controls patterning and differentiation—namely, the components of the genetic toolkit for development. Previous chapters have described the developmental functions and interactions of many toolkit genes in the model organisms such as Drosophila and the mouse. But how and when did this toolkit of genes evolve? And how can we hope to understand the early evolution of toolkit genes, given that the ancient animals that carried those genes are now extinct?

Much of our understanding of the evolution of the genetic toolkit for development is based on deductive logic, using information from the genomes of extant organisms (including mammals such as humans and mice, the insect Drosophila, the nematode worm Caenorhabditis elegans, the yeast Saccharomyces cerevisiae, and some plants) to extrapolate to the past. Similarity between gene sequences found in two (or more) different organisms is most easily explained by common history. In other words, a gene that is conserved among a group of animals was present in the last common ancestor of that group. During the subsequent independent evolution of each animal lineage from the last common ancestor, the gene sequence diverged to create related (but not identical) genes, one in each species. Genes with similar sequences that are found within a single animal genome are also related, as they are products of the duplication and divergence of ancestral genes.

The knowledge of the complete genome sequence of a growing list of animals allows the comparison of extant genomes and the reconstruction of ancestral genomes. The systematic comparison of whole genomes provides a history not only of gene sequences but also of repetitive sequences, gene organization, gene order (and rearrangements), gene duplication, and cis-regulatory regions, as well as organismal phylogeny. Furthermore, as more genomes are sequenced, it becomes possible to determine ancestral and derived conditions with respect to the number and identity of genes, and to make well-informed inferences about the genomes of long-extinct animal ancestors. The conservation of toolkit genes in fruit flies and mice discussed in earlier chapters anticipated the findings of comparative genomics, and extends to humans, Fugu rubripes (pufferfish), Ciona intestinalis (urochordate, basal to the chordate lineage), C. elegans, and other bilaterian species (Box 4.1). All of these organisms inherited the basic set of toolkit genes from the last common ancestor of all bilaterian phyla (Fig. 4.1).

The characterization of toolkit genes in some organisms is particularly informative. An animal lineage that branches near the base of a clade represents a basal member or an out-group which can help to establish the ancestral condition for that clade. For example, the cnidarians are an outgroup to the bilaterian clade, the cephalochordate Amphioxus is a close outgroup to the vertebrates, and the onychophora are an outgroup to the arthropods. The resemblance of these animals to their extinct ancestors has led them to be called "living fossils." The complement of developmental genes shared between these outgroups and representative model organisms reflects the state of the toolkit before the radiation of bilaterians, vertebrates, and arthropods, respectively. As Darwin foreshadowed (see the opening quote for this chapter), they "aid us in forming a picture of the ancient forms of life."

Gene duplication

One major process involved in the assembly of the bilaterian toolkit for development is the duplication and divergence of genes and the creation of gene families. For example, more than 40% of the genes in the nematode C. elegans have sequence similarity to other C. elegans genes and thus arose at some point from gene duplication events. In fact, tens to hundreds of genes are duplicated in animal genomes every million years, a signification contribution to genome evolution. Duplicated genes, which are often linked in tandem, may arise from slipping errors during DNA replication, errors in the repair of double-strand DNA breaks, or unequal cross-over events during recombination. Some developmental genes are found as closely related, linked gene pairs, indicating that they derive from tandem duplication events. In Drosophila, for example, the gene pairs engrailed and Invected, spalt and spalt-related, and gooseberry and gooseberry-neuro are all tightly linked. Over time, tandemly duplicated genes may become physically separated through chromosomal rearrangements and translocations.

Such tandemly duplicated genes create a unique opportunity for further expansion of a gene family. Unequal crossing over between mispaired tandem copies leads to one chromosome with a duplication and one chromosome with the corresponding deletion (Fig. 4.2). The chromosome that contains the duplicated region also carries a new chimeric gene, located between the parental copies. If more than two tandemly arrayed members of a gene family exist, unequal crossing over can lead to gene duplication as well as the creation of a new chimeric gene, when the cross-over is out of register by more than one gene. This mechanism of expansion of related tandemly arrayed genes may explain the evolution of the Hox complex, for example. New Hox genes have evolved as chimeras (or duplicates) of older Hox genes, and the Hox genes at each end of the complex may represent the "oldest" members of the complex.

Large-scale duplications of chromosomal segments or even of entire genomes (tetraplo-idization) have also occurred during animal evolution. Such duplication events generate large syntenic blocks, which are duplicated arrays of genes that are found in the same order, distributed throughout the genome. This mechanism of gene duplication rapidly increases

Box 4.1 Identifying and Analyzing Toolkit Genes in Different Animals

The conservation of developmental regulatory genes allows members of gene families to be identified in different animals based on sequence similarity. Historically, molecular biology techniques such as degenerate polymerase chain reaction (PCR) and library screening have facilitated the isolation of homologous genes from both model and nonmodel organisms. Both of these techniques rely on the ability to detect sequence similarity by nucleic acid hybridization between two homologous genes.

Degenerate PCR takes advantage of mixtures of short oligonucleotides that match all possible codon combinations for two conserved peptide sequences within a gene; these mixtures are used to amplify the intervening region from a target pool of nucleic acid (typically, genomic DNA or cDNA). Thus genes that are characterized by conserved protein motifs are good candidates for degenerate PCR, which can rapidly and selectively isolate a region of a gene from any animal's genome.

Library screening uses a known gene sequence as a probe to isolate other genes with similar DNA sequence. Libraries are pools of randomly isolated, but unsorted, pieces of genomic DNA or cDNA that can be screened multiple times to identify genes of interest. Such screens are particularly useful for isolating genes from nonmodel organisms.

The identification of toolkit genes in model organisms has become an exercise in searching computer databases. The advent of genome sequencing projects provides large databases for sequence comparison. The number, sequence, and chromosomal map position of members of conserved gene families can be rapidly cataloged based on sequence similarity. Internet-based computer resources, some of which are listed below, are the best way to search and analyze genome sequences from model organisms.

The identification of conserved genes from different organisms allows gene sequences to be analyzed for evolutionary relatedness. Many computer programs and tools are available for aligning and comparing the conserved sequences of members of gene families. Large molecular biology Internet servers in the USA and Europe include the following:

• The National Center for Biotechnology Information (NCBI) at the National Institute of Heath (

• The European Bioinformatics Institute, part of the European Molecular Biology Laboratory (

• The ExPASy Molecular Biology Server at the Swiss Institute of Bioinformatics (

These sites integrate sequence databases (GenBank, SWISS-PROT), protein family databases (PFAM, Prosite), genome project websites, and other genome analysis tools. Powerful computer programs that are used to generate molecular phylogenies ("gene trees") include PHYLIP ( and PAUP* ( These programs build molecular phylogenies of related gene sequences using several computer models of molecular evolution.

Figure 4.1

Metazoan phylogeny

The phylogenetic relationships between many Metazoan phyla have been resolved using molecular data. Three primary bilaterian clades exist: the deuterostomes (including echinoderms, hemichordates, urochordates, cephalochordates, and vertebrates), the arthropod + onychophora + priapulid clade, and the lophotrochozoans (including annelids, molluscs, most flatworms, and lophophorates). The last common ancestor of all bilaterian phyla is indicated in the figure. Basal branches off of the prebilaterian stem lineage lead to the Cnidaria (jellyfish, anemones, coral) and the Porifera (sponges); the position of the nematodes is uncertain.

Figure 4.2

Expansion of a tandemly linked gene cluster

Related genes that are tandemly linked present a target for mispairing and unequal crossing over. The sequence similarity between related, linked gene pairs may facilitate mispairing of homologous chromosomes, and a cross-over event can generate a novel chimeric gene between the parental genes. The example shown here is reminiscent of the Hox genes, with gray representing the conserved homeodomain. Crossing over between adjacent paralogous Hox genes (shown in red and purple) generates a chimera with the 5' sequence of one parent and the 3' sequence of the other, with the break point falling in or near the conserved region. Note that mispairing between more distant genes in the complex can generate duplicated genes in addition to the new chimera. The propensity for the Hox genes to remain in a linked cluster may make them particularly susceptible to tandem expansion by this mechanism.

the total number of genes within a genome. The presence of two, three, or four copies of many developmental genes and syntenic regions of linked genes in vertebrate genomes provides evidence of large-scale duplications or tetraploidization events in the vertebrate lineage (Fig. 4.3). The initial tetraploidy of genes and chromosomes may gradually vanish, as linked genes become separated or are lost due to chromosomal rearrangements and deletions.

Because the duplication and divergence of genes parallels the branching of animal lineages, specific terminology has been developed to describe the historical relationships between gene family members found both within and between different animal genomes. All of the genes in a given gene family share sequence similarity and hence are termed homologs, as they share common ancestry. Two important distinctions are made among homolog genes, based on how they arise during evolution. Genes that are found in different animals and that arose from a single gene in the common ancestor of those animals are called orthologs. For example, the divergence of insects has created orthologous labial genes in each insect species. Genes that arose from gene duplication events in a single genome are called paralogs. For example, the Hox genes of Drosophila are a complex of paralogous genes.

The difference between orthologous and paralogous genes is most easily depicted by constructing a phylogenetic tree of related gene sequences. A gene phylogeny depicts the evolutionary relatedness, or history, of members of a gene family. Figure 4.4 shows the relationship between deuterostome engrailed genes. Each node on this gene tree represents either an animal lineage bifurcation or a gene duplication. The divergence of animal lineages created the orthologous engrailed genes found in the sea urchin, amphioxus, and the lamprey. A gene duplication generated the paralogous En1 and En2 genes found in chicks, mice, and humans. The En1 genes are more closely related to one another than to any En2 gene, indicating a more recent common ancestry. Similarly, En1 genes and En2 genes are more closely

0 0

Post a comment