Genome sizes at a glance

Haploid organisms have only one copy of their DNA; diploid organisms have two copies. Humans are diploid: A person's genome consists of two DNA copies, one from Mom and one from Dad. Both copies contain the same type of genes (eye-color genes, for example), whose "specifics" (blue eyes versus brown eyes, for example) may or may not be different.

To standardize across all organisms, when scientists talk about genome size, they talk about the size of a haploid genome. For diploid organisms, genome size corresponds to the amount of DNA in a non-fertilized egg or in a sperm cell. Table 15-1 lists the haploid genome sizes and the number of genes for many organisms. It also identifies what branch of the tree of life the organism occupies.

Table 15-1 Genome Sizes


Table 15-1 Genome Sizes



million bases)

Estimated Number of Genes

Average Gene Density*



2900 Mb


1 gene per 100,000 bases



2500 Mb


1 gene per 100,000 bases

Fruit flies


180 Mb


1 gene per 9,000 bases



125 Mb


1 gene per 4,000 bases

thaliana (a little flowering plant)

thaliana (a little flowering plant)

Round Eukaryote 97 Mb 19,100 1 gene per 5,000 bases worm


Branch of Tree of Life

Estimated Size (Mb= million bases)

Estimated Number of Genes

Average Gene Density*



12 Mb


1 gene per 2,000 bases

E. coli


4.7 Mb


1 gene per 1,400 bases

H. influenzae (can cause blood poisoning and meningitis)


1.8 Mb


1 gene per 1,000 bases



430 Mb


1 gene per 10,000 bases

Entamoeba histolytica (a single-celled amoeba)


24 Mb


1 gene per 4,000 bases

*Average gene density refers to how many bases of DNA there are for each gene.

*Average gene density refers to how many bases of DNA there are for each gene.


Note: The reason the numbers in Table 15-1 differ between "Estimated Size" and "Estimated Number of Genes" is that the "Estimated Size" includes both coding and non-coding DNA (explained in the section "Distinguishing between genes and non-coding DNA"), whereas the "Estimated Number of Genes" entries include only the coding DNA.

The tree of life (refer to Chapter 9) has three main branches:

^ The Eubacteria: These are all the bacteria you've heard of, including E. coli, staph, strep, and other such critters

^ The Archea: These are the other group of single-celled things without a nucleus. These look pretty similar to the Eubacteria under a microscope but turn out to be very different when scientists are able to figure out their DNA sequences.

^ The Eukaryotes: These include all the organisms whose cells have a nucleus — yeast, pine trees, you, and so on — basically anything big enough to see as well as most of the biggest things that are still too small to see.

The C value and the C-value paradox

When scientists began to measure the genome sizes of different organisms, two things became apparent: Within a species, genome sizes are the same, but across species they differ quite a bit and not necessarily in the way you'd expect. The following sections explain.

Genome sizes consistent within a species

Within a species, every organism has the same size of genome. This finding makes perfect sense. The instruction manual to make one person should be the same length as the instruction manual to make somebody else, although the details vary from person to person. Both people need the instructions to make eyes, for example, but the exact details — the color and shape of the eyes — may vary from person to person.

What's true for people is true for other species as well: The instruction manual is the same length, even if some of the instructions are slightly different. There are some exceptions to this, however, such as with E. coli, whose genome size can vary, as explained in the later section, "Getting genes from other lines: Lateral gene transfer."

Because the size of the genome is constant across all individuals in a species, a species' genome size is referred to as its C value, with C standing for constant.

Genome sizes vary between species

Between species, genome size varies greatly — a fact that is extremely puzzling, because although it makes sense that different organisms require different-size instruction manuals, no obvious connection exists between the size of the species' genome and that species' complexity. For that reason, scientists call the discrepancy between complexity and genome size the C-value paradox (or the C-value enigma).


Distinguishing between genes and non-coding DNA

An organism's genome can roughly be divided into two parts:

^ Genes (coding DNA): These sequences of DNA are transcribed and are the genes that determine phenotype.

During transcription, DNA sequences are copied to RNA. During another process called translation, the RNA is copied to amino acids, chains of which are called proteins. Not all the transcribed RNAs are translated into proteins; they have some other jobs. Chapter 3 has the details on these processes.

i Non-coding DNA. These areas of DNA aren't transcribed. In other words, they don't seem to do anything.

Number of genes

If you take a close look at the numbers in Table 15-1, you may notice that some of the differences make sense. You'd probably expect single-celled organisms to have fewer genes than multicelled organisms, and that's what you find in some instances. Humans, for example, have about 20,000 genes, whereas E. coli, a species of Eubacteria (beneficial bacteria) that inhabits the human gut, has slightly more than 4,000 genes.

But in other instances, the numbers aren't what you'd expect. Although humans have twice as many genes as fruit flies, rice plants have almost twice as many genes as we do. At first pass, rice isn't obviously twice as complex as humans are. Because scientists don't know what most of the rice genes do, they don't really understand why rice has so many genes, but they do know that having this many genes isn't a universal property of plants. The small weed Arabidopsis has about the same number of genes as humans but far fewer than the rice plant has.

While it's true that the littlest critters — viruses, eubacteria, and archea — have the smallest genomes and the smallest numbers of genes, there are other single-celled creatures, like certain amoebas, that have enormous genomes and the same number of genes as some (but not all) multicellular organisms. The single celled amoeba Entamoeba histolytica has almost 10,000 genes, not that many fewer than a fly!

Amount of non-coding DNA

Another thing you may notice in Table 15-1 is that different organisms have different amounts of non-coding DNA, represented in the "Average Gene Density" column. The more bases there are for each gene indicates more non-coding DNA. So humans, who have only one gene for every 100,000 bases, have quite a bit of "wasted" space, or non-coding DNA. H. influenzae, on the other hand, has one gene per thousand bases, meaning it has virtually no non-coding DNA.

Clear patterns appear between the major groups of organisms:

i Eubacteria and Archea have almost no non-coding DNA.

i Eukaryotes have non-coding DNA, but the amount of non-coding DNA varies widely among them. Some ferns have 100 times as much non-coding DNA as humans do.

i Viruses don't have non-coding DNA, but they don't fit neatly on the tree of life. In fact, viruses probably are not a single group. They have such small genomes that very little information is available to group them with other organisms.

No one knows exactly why organisms on the different branches of the tree of life have different amounts of non-coding DNA, although scientists can make educated guesses:

1 Size of the organism: It seems reasonable that the smallest organisms simply don't have room for extra stuff. You can't fit a gallon of milk into a quart container. The same constraints wouldn't exist for eukaryotic cells. Your genome has a lot of non-coding DNA, but the nucleus is still only a small part of the cell; it seems to have room to spare.

1 Rapid cell division: For organisms such as viruses and bacteria, for which rapid division is a key component of fitness, the extra time that replicating a larger genome takes is too much of a selective disadvantage, so non-coding DNA doesn't accumulate.

1 Population size: Maybe non-coding DNA can accumulate more easily in eukaryotic organisms because they have smaller population sizes, on which genetic drift (random events) can be a more influential evolutionary force. (Head to Chapter 6 for info on genetic drift.)

Was this article helpful?

0 0

Post a comment