FIGURE 1.2 Gene numbers and genome size. Data include only estimates of total gene number based on genome sequencing. The estimates for Fugu, Strongy-locentrotus, and Ciona are extrapolations from minor fractions of the genome sequenced; data for others are from almost complete genomic sequences. Hs, Homo sapiens; Fr, Fugu rubripes (puffer fish); Sp, Strongylocentrotus purpuratus; Ci, Ciona intestinalis (ascidian); Dm, Drosophila melanogaster; Ce, Caenorhabditis elegans. The number of genes indicated by red bars (top scale) are from the following sources: Hs, Collins, 1995; Rubin et al., 2000; Fr; Brenner et ai., 1993; Sp, gene number calculation of Cameron et al., 2000 (based on S. purpuratus Genome Project data), and genome size from Hinegardner, 1974; Ci, Simmen et al., 1998; Dm, Rubin et al., 2000; Ce, C. elegans Sequencing Consortium, 1998. Dashed lines indicate larger estimates also consistent with current data. Genome sizes, indicated by blue bars (bottom scale), are given in base pairs (blue numerals), shown within the figure. Gene number estimates are also given (red numerals).
Suppose we concern ourselves specifically with the two classes of gene which it is now clear are most directly engaged in setting up the spatial domains of gene expression which underlie all aspects of the developmental process. These are genes encoding transcription factors and genes encoding intercellular signaling ligands, receptors, and some downstream components of signaling pathways. Regulatory and signaling pathway genes together function in bilaterian development to transduce spatial intercellular signaling events into spatial changes in transcription factor presence or activity. How much do bilaterians of very different body plan and very different phylogenetic affiliation differ from one another in their repertoires of genes encoding these key classes of protein? An enormous amount of recent data indicates that the answer is, basically very little. Such is the aggregate outcome of thousands of laborious studies in which orthologous genes (i.e., genes belonging to the same immediate family, descendant from the same common ancestor gene) have been recovered from different organisms. Generalizing from what we know from the huge mammalian database; from Drosophila, in which developmental roles of many transcription factors and signaling pathway components were first revealed by mutational phenotypes; from the almost complete C. elegans (C. elegans Sequencing Consortium, 1998) and Drosophila (Adams et al., 2000; Rubin et al., 2000) genome projects; and from studies on specific genes in other vertebrates, invertebrate chordates, sea urchins, and a few other creatures, the following quite remarkable statement can be made: Except for a few clade-specific losses, the genetic repertoire of every bilaterian is likely to include genes encoding every known major family of transcription factors, and components of every known signaling pathway. Results from the C. elegans genome sequence are particularly revealing, since only when the complete sequence is available can we determine what is not present in a genome (see review of Ruvkun and Hobert, 1998). For example, the C. elegans genome is lacking a gene encoding the Hedgehog (Hh) intercellular signaling ligand, which is essential in both Drosophila and chordate development, and the hh gene is undoubtedly an example of loss of an otherwise panbilaterian signaling component in the evolutionary line leading to C. elegans. But a gene encoding the downstream transcription factor which Hh signaling affects, that is, the gene encoding the Cubitus Interruptus or Gli transcriptional regulator, is present in the C. elegans genome. So are genes encoding proteins similar to the Patched transmembrane receptor, which is involved in Hh signaling in other organisms (Ruvkun and Hobert, 1998).
Comparison of the known repertoires of signaling and regulatory genes permits as well a second, equally significant generalization (Rubin et al., 2000; Ruvkun and Hobert, 1998). In each bilaterian clade, though all the regulatory and signaling gene families are present, these gene families have diversified differently; e.g., the different bilaterian genomes may have different numbers of genes encoding transcription factors belonging to the various subfamilies of homeodomain regulators, ETS regulators, T-box regulators, nuclear receptors or, winged helix regulators, and different numbers of Dpp or TGF/? ligands.
Furthermore, the duplication and diversification of these gene families and subfamilies are always accompanied by—or driven by—diversification of their functional roles in development. This is a major process in bilaterian evolution.
So, we can exclude the proposition that given bilaterian body plans and morphological structures differ from others because each has its own specific classes of gene regulatory protein and its own set of signaling pathways. Instead the opposite is true. A common repertoire of types of transcriptional regulator and of signaling pathways constitutes the shared bilaterian heritage, the utilization of which underlies all forms of bilaterian development. This argument can be broadened to include many other classes of gene, e.g., genes encoding the cytoskeletal proteins or enzymes that carry out metabolic functions; and most importantly, genes encoding the properties of many panbilaterian differentiated cell types, such as are muscle cells, neurons, photoreceptors, secretory cells, and so forth.
The working parts of the genome are the genes and their czs-regulatory elements. Since the key classes of gene are shared amongst bilaterians, we come down to the regulatory apparatus. The internal architecture of cz's-regulatory elements determines how each gene will function, as we explore in the following; and the architecture of the networks in which they are interconnected determine the deployment of sets of genes in developmental space and time. It was possible to deduce that genomic regulatory architecture constitutes the structural, genetic basis for the morphological features of animals 30 years ago (see e.g., Britten and Davidson, 1969, 1971); now we know it for a certainty.
Overview of Regulatory Architecture czs-Regulatory elements can be thought of as information processing units "wired" into the regulatory network so that they receive multiple inputs, in the form of the multiple transcription factors which bind within them. The inputs vary depending on when it is in development and in what cell a given gene is located, and the output is the "instruction" given to the basal transcription apparatus which determines whether the gene is to be silent, or active at a specified rate. If the gene encodes a transcription factor the output leads to all the other cz'5-regulatory elements within which there are functional target sites for that transcription factor. We speak of these as "downstream" linkages, while the inputs to a regulatory gene are the termini, in its own czs-regulatory system, of its "upstream" linkages. The internal architecture of a czs-regulatory element is that which enables it to "process" the various inputs it receives, resolving these inputs into a single output. Perhaps this seems an unnecessarily baroque mode of description. As everyone knows one can usually find a piece of DNA, often described as an "enhancer," which will generate a certain pattern of developmental expression when introduced into a recipient egg or animal: so why is it important to worry about its internal functional architecture and its information processing activities? There are two direct answers to this question. The first is that only by verifying the functional meaning of the specific transcription factor target sites within a as-regulatory element do we understand what the genomic DNA sequence of the element means, and understanding the functional meaning of the genomic DNA sequence might well be considered the most important problem in bioscience. The second is that the information processing function of the cz's-regulatory element constitutes the link between the diverse circumstances presented in each cell, and the response capacities hardwired into the genomic regulatory sequence.
The fundamental requirement for cz's-regulatory information processing is easy to see a priori. Consider the problem faced by a given gene in a given cell at a given moment in development. Somehow this gene has to "know" when it is in the developmental process; what cells are adjacent and what they are doing or saying; what is the lineage or developmental status of its own cell. It must "know" whether the cell is in cycle, if that affects the need for transcripts of our gene; and also what regulatory events (mediated by other genes) have occurred earlier which would causally affect its own activity. The set of transcription factors that bind within a cz's-regulatory module can be considered not only as biochemical effectors of function, but also as incident bearers to the gene of these kinds of relevant biological information. Note that "transcription factor" is used here as a neutral term, denoting any protein which displays a high specificity for a particular cz's-regulatory DNA sequence, and which performs some function that affects transcriptional output. Transcription factors execute a variety of functions, e.g., repression, activation, transduction of external signaling, or architectural alteration of cz's-regulatory complexes, and they mediate diverse cz's-regulatory logic functions. For example, a commonly observed czs-regulatory format is one in which two different transcription factors responding to two different inputs, perhaps an intercellular signal and a lineage marker, must both be bound in order for there to be any output ("and" logic). For a transcription factor to affect the output of a given cz's-regulatory element its target site must of course be included in the sequence of that element, and this is the genetic component of the system. But the factor also must be presented in the cell nucleus at a concentration that promotes occupancy of these target sites a significant fraction of the time, and it must be presented in an active form. Transcription factor concentrations and activities depend on circumstances. A factor may be synthesized only in certain spatial domains of the organism, and it may be active only after a signal transduction pathway has modified either it or a bound cofactor. It is in this direct, mechanistic sense that transcription factors convey circumstantial information to cz's-regulatory elements.
Bilaterian organisms are complex, and have many parts, and they execute many developmental processes. Genes at all levels or positions in the regulatory operating system, for instance genes encoding signal pathway components, are typically utilized many times over during the life cycle, at different stages, and in different cells. A general feature of cz's-regulatory architecture in bilaterians is that the diverse phases of expression of given genes are frequently mediated by diverse cz's-regulatory elements, here referred to as regulatory "modules," strung out in the DNA flanking the gene or in its introns. A cartoon illustrating some principles of modular czs-regulatory organization is shown in Fig. 1.3. Use of the term "module" to denote individual cz's-regulatory elements has some advantages compared to "enhancer," since the functions of these elements may be more complex and integrative than implied by the verb "to enhance," and indeed some modules function in some cells to repress rather than stimulate transcription. Nonetheless in the following both terms are to be found, "enhancer" usually in accord with its use in a cited study.
Every cz's-regulatory module may be considered a control device which is called into play by those transcription factors for which it contains target sites, when and where these are present and active; and the module is otherwise silent. Each module is an information processing regulatory device in the sense just discussed. A survey of some well-characterized cz's-regulatory modules active in developmental processes (Arnone and Davidson, 1997) yielded the conclusion that four to eight different transcription factors typically service each module, as symbolized by the vertical input arrows in Fig. 1.3. Sites for given factors are often multiple within a module. To a first approximation this can be regarded as a means for increasing the probability that that factor will at any one moment be bound within the module. Modular regulatory organization provides the organism with the services of a given gene in multiple developmental contexts, but as also implied in Fig. 1.3 the "price" is an additional layer of complexity: some czs-"trafficking" controls must operate so that the module relevant in any given context is the one which communicates its output to the basal transcription apparatus. This undoubtedly requires control of DNA looping, mediated by direct protein:protein interactions involving the transcription factors bound within the active module, perhaps via cofactors bound in turn to them. The most proximal region of the whole cis-regulatory system may have a special importance for the function of the whole system. This is suggested by several examples in which the proximal cz's-regulatory region is required for activity, in addition to one or another of the upstream regulatory modules, but how general is this feature remains to be seen. As will be seen in later chapters, modularity in cz's-regulatory systems is essential to their developmental function, and is also a key to understanding the evolution of developmental processes. For now we leave this discussion with the simple but essential point that modular cz's-regulatory structure constitutes a discretely organized, DNA map, which represents in physical terms the different phases of gene expression that are to be installed throughout the life cycle, for every gene.
The morphological structures of bilaterians are of course never the product of single genes or single regulatory systems. Direct RNA complexity measurements (reviewed by Davidson, 1986), and now an increasingly enormous bank of EST data obtained from cDNA libraries of given tissues, confirm the a priori assumption that many hundreds and often thousands of genes must be expressed in order to create any given tissue, body part, or multicellular structure. These genes are controlled in developmental space and time by large developmental gene
FIGURE 1.3 Modular cis-regulatory information processing. The cartoon shows several kilobase (kb) upstream of a gene operating under the control of two cis-regulatory modules (MODI and MOD2), each of which is operative in a particular spatial domain at a particular time in development. Each module receives multiple parallel inputs (arrows). The diagram shows examples of the kinds of inputs each module might receive and of course is not meant to imply that every module utilizes these same particular inputs. The specific inputs of each kind will be different for MODI and MOD2, symbolized by the solid vs. dashed colored arrows. The inputs are of two types, positive and negative. The red barred inputs denote different spatial repressors which are utilized in each module to set boundaries of expression in the spatial domain where that module functions, i.e., these inputs repress the gene across the relevant boundaries. The blue activators are downstream of different intracellular signaling pathways; the tan activators turn on the gene when cells are in cycle; the different green activators are present in cells of the respective lineages that constitute the fields in which the gene will be active, but only when all inputs are present. These inputs can be thought of as bringing the indicated kinds of situational biological information to the gene. Each module acts by communication of its output to the proximal ds-regulatory module (PROX), which may receive other inputs; and may further process (e.g., amplify) the output of MODI or MOD2. PROX then communicates directly with the basal transcription apparatus (BTA). Note that the major spatial information processing occurs in the developmentally regulated, upstream c/s-regulatory modules, and that the BTA simply responds to the various alternative outputs of the two upstream regulatory modules, as transmitted to it by PROX.
regulatory networks, as mentioned at the outset. Hence, returning to the question of what aspects of the genomic regulatory system are responsible for diversity in bilaterian morphologies, the most accurate and comprehensive answer is diversity in the architecture of developmental gene regulatory networks. "Network architecture" is a term that, in brief, denotes the organization of regulatory linkages which connect cis-regulatory elements by means of cis-trans interactions. The character of developmental regulatory network architecture is a problem of major importance, though the real nature of these networks is just beginning to emerge from experimental data. Suffice it to say that in principle genomic changes which alter network architecture have the power to create new developmental processes, because they can affect the activities of large sets of genes. Examples might include the insertion or appearance of new spatial ci's-regulatory modules in the vicinity of a gene controlling a battery of other genes, thus causing these genes to be expressed in different spatial domains of an embryo; or c?s-regulatory changes in target sites which result in the addition of genes to preexistent gene batteries; or target site changes that bring network subelements under control of different signaling functions. Network architecture is ultimately specified by the identity of the target sites within all the participating as-regulatory elements. It follows that analysis of genomic as-regulatory systems and their linkages holds the key to understanding how genomes encode the properties of organisms.
Was this article helpful?