Lysozyme is a ubiquitous bacteriolytic enzyme found in virtually all animals. Its function is to cleave the P(14) glycosidic bonds between N-acetyl glu-cosamine and C-acetyl muramic acid in the cell walls of bacteria. As it is present in body fluids, such as saliva, serum, tears, etc., it is often the first line of defense against foreign bacteria. In foregut fermenters, which are animals whose anterior part of the stomach functions as a chamber for bacterial fermentation of ingested plant matters, lysozyme is secreted in the posterior parts of the digestive system so that it can be used to free nutrients from within the bacterial cell. This type of digestion has independently arisen twice in the evolution of placental mammals: once in the ruminants and once in leaf-eating colobine monkeys. In both cases, lysozyme has been recruited to degrade the cell walls of bacteria, which carries on fermentation in the foregut.
Therefore, the usage of lysozyme in the digestive system is a derived trait that has evolved to suit eating leaves as their main source of nutrition. Another trait evolved to suit this life history of colobine monkeys is the evolution of an enlarged stomach with numerous sections, similar to but much less elaborate than that of cows (Fleagle, 1999). Stewart and Wilson (1987) noticed that there are five uniquely shared amino acids between the lysozyme sequences from cows and langurs compared to only one amino acid uniquely shared by those from cows and horses. Since cows and langurs diverged much earlier than the separation of the cow and horse lineages, the uniquely shared amino acids in these two species are likely to be the results of a series of adaptive parallel substitutions that occurred independently in both lineages (i.e., an example of convergent evolution at the molecular level). The adaptive nature of these substitutions is such that some of them contribute to a better performance of lysozyme at low pH values (see Li, 1997).
For the above reason, the molecular evolution of lysozyme has been a favorite example of adaptive evolution, and it often serves as a model example to assess the performance of statistical methods for detecting selection from DNA sequence data. In the next section, one such method is described.
Statistical Analyses to Detect Positive Selection in Lysozyme Sequences
Protein coding DNA sequences can be divided into two types of sites. First, substitutions at some sites can change the encoded amino acids. These are called nonsynonymous substitutions. Substitutions that do not cause any amino acid changes, due to the degeneracy of the genetic code, are synonymous (Li, 1997). Nonsynonymous mutations have direct phenotypic consequences (changes in the protein product) and, therefore, may be subject to natural selection. Synonymous mutations are not subject to selection at the protein level, although selection may operate at the RNA or translation level.
These differences in the effects of selection on the two types of mutation in the protein coding regions form the basis of inferring the underlying forces on DNA sequence evolution. The rate of synonymous substitutions (dS) is considered to reflect the rate of mutation in that region, while the rate of nonsynonymous substitutions (dN) is shaped by specific types of selection for that region. Therefore, a dN/dS ratio smaller than 1 means that nonsynonymous mutations have been fixed more slowly than the mutation rate or the neutral rate. This can be explained by selection to preserve the existing protein sequences, often called, negative or purifying selection. In fact, most protein sequences are assumed to evolve according to this fashion because most of the changes in protein sequences are likely to be deleterious in effect. A dN/dS ratio equal to 1 (statistically) suggests that mutations on the sequences are all equal in fitness, regardless of the consequences. This is often referred as a neutral mode of evolution. On the other hand, a dN/dS ratio significantly greater than 1 means that more nonsynonymous substitutions occurred than did synonymous mutations. As the mutation rate within the same gene is likely to be similar, this strongly suggests that many non-synonymous mutations were selectively fixed (i.e., positive selection had driven the fixation of such mutations).
Yang (1998) developed a maximum likelihood approach to estimate the dN/dS ratio along each lineage in the phylogenetic tree of the species under study. This method takes into account the transition/transversion rate bias and nonuniform codon usage; it is often not straightforward to accommodate these factors by approximate pairwise methods. His method can accommodate the uniform-ratio model, with a single dN/dS ratio over all lineages of interest, as well as a free-ratio model at the other extreme, which assumes different underlying dN/dS ratios for different lineages. Intermediate models are also available to implement. Then a likelihood ratio test can be performed to compare the performances of different models.
Yang (1998) used this method to test whether the presence of the presumed positive selection can be detected from the DNA sequences of lysozyme. The lysozyme gene sequences of 24 primate species were analyzed. The result from an analysis utilizing a subset of seven sequences is shown in Figure 1.
The free-ratio model, which assumes different dN/dS ratios for different branches, performed significantly better than the one-ratio model, which assumes a single dN/dS ratio for all the branches. The branch leading to colobine monkeys (branch c) and the branch leading to hominoids (branch h) are long (i.e., they have accumulated many changes) and have very high dN/dS ratios. The dN/dS ratios along the c and h branches were significantly greater than the background ratios. The dN/dS ratio along the h branch was significantly greater than 1, indicating that positive selection had operated during the lysozyme evolution along this lineage. This is in agreement with
Figure 1. The maximum-likelihood estimates of the numbers of nonsynonymous and synonymous substitutions in each branch for the entire lysozyme coding sequences in seven primate species. The "free-ratio" model is used, which assumes different dN/dS ratios for different branches. Branches are proportional to the total numbers of substitutions.
a previous analysis (Messier and Stewart, 1997). The dN/dS ratio along the c branch was not significantly greater than 1. However, the hypothesis that the dN/dS ratio along this branch was greater than 1 was never rejected. Therefore, this result is compatible with both relaxed selective constraints and operation of positive selection along the c lineage. Since lysozyme did not lose function along branch c, but acquired a new function, the hypothesis of positive selection appears more plausible than reduced selective constraints.
There are a variety of likelihood methods developed to detect natural selection on the nucleotide level (for a review, see Yang and Bielawski, 2000). This is to account for more realistic evolutionary models. For example, the method described above assumes that all amino acid sites are under the same selective pressure, with the same dN/dS ratio. The analysis effectively averages the dN/dS ratios across all sites and positive selection is detected only if that average is significantly greater than 1. This assumption is very conservative; it is more realistic to imagine positive selection operating only on a few amino acid sites, while most of the amino acid sites are under strong purifying selection due to functional constraint. To address this possibility, Nielsen and Yang (1998) implemented a likelihood-ratio test to account for several classes of sites with different intensities of selective pressure. This method is more realistic and may provide an a priori hypothesis that certain structural and functional domains of the protein are under positive selection.
However, if adaptive evolution occurs only in a short time interval and affects only a few crucial amino acids, then this method is not likely to be powerful because this approach can detect positive selection only if the dN/ dS ratio averaged over all lineages is greater than 1. Yang and Nielsen (2002) subsequently extended their model so that it allows the dN/dS ratio to vary both among sites and among lineages in a likelihood framework. These models may be useful for identifying positive selection along prespecified branches that affects only a few sites in the protein.
In reality, however, some models require unnecessarily large numbers of evolutionary parameters. Also, comparisons between different submodels are often biologically meaningless. In addition, implementing a model with a large number of parameters requires long amino acid coding sequences and large sample sizes; otherwise, the power of the tests are usually low. Particularly, to test whether some particular branches were under positive selection requires additional information. Nevertheless, in the case of lysozyme, the branch leading to the ancestor of the colobine monkeys has consistently been shown to be under positive selection (Yang, 1998; Yang and Nielsen, 2002).
Was this article helpful?