Total evidence or analysis of congruence of molecular and morphological character

sets? Owing to the revolution in acquisition of nucleotide sequences, it is commonplace to have data sets with both morphological and molecular data. For some techniques, like maximum likelihood, we cannot combine these data into a single analysis, as there is no single model that embraces the total data set. Indeed, some have argued that it might be best, in all analyses, to consider different types of data separately and then to combine the "best" of both sources of information. Not so carefully hidden in these arguments is a disdain for morphological data. Nucleotide data are thought to be superior by virtue of numbers alone; after all, we often can get thousands of sites each with 4 character states. Even a good morphological data set is likely to have no more than 100 characters, each with a few states. Because many morphological traits represent continuous measurements (e.g., body size, claw length), it is not clear how or whether to convert them to discontinuous characters suitable for phylogenetic analysis. If an inspection of trees from both data types yields a conflict, a choice of the "better analysis" will likely be biased in favor of the molecular data set. It is not always clear that molecular data are inherently superior. Rapid evolution at sites with only 4 character states makes for multiple hits (i.e., homoplasy) and difficulties in alignment of sequences, especially when additions and deletions occur.

An alternative approach to choosing the "best data set" is to extract a tree from the combination of two or more different data sets. Minimally, this process might allow us to focus on the incongruities between data sets, which would lead to questions about the reliability of given characters (Bremer 1996). For example, a positively goofy cladogram that does not square with all sources of evidence and common sense might be reckoned to derive from a poor molecular alignment. This sort of reasoning, of course, can apply only when we have reasonable expectations of the cladistic relationship in the first place.

There are three basic approaches to extracting an answer from different data sets:

1. Calculate trees from the data sets separately. Then take a qualitative look to see what differences appear, or feel more confident if the two trees are congruent.

2. Calculate trees from the data sets separately, then calculate a consensus tree, which is a tree that contains the minimal set of monophyletic groups that can be supported by both trees.

3. Combine the data at the outset, creating a total evidence data set, and calculate a tree.

The reckoning of two separate data sets (Figure 2.9) produces separate analyses for two individual data sets, and a consensus tree might be the minimal representation of evolutionary relationships supported by the evidence. A consensus tree extracts the parts of two other trees that are in agreement, even if they may present some apparent contradictions (Adams 1972; Swofford 1991). Consensus trees inevitably produce degradation of bifurcating nodes to multifurcations, or stars. This is not an improvement so much as an admission of uncertainty. Cases have been found (Figure 2.9), moreover, in which the total data set produces more resolution and a tree that is clearly more informative and correct than the consensus of two trees representing different data types (Barrett et al. 1991; Eernisse and Kluge 1993). Whether this can be generalized to larger numbers of informative characters is unclear.

It is as yet unclear whether considering total evidence and taking it as the "answer" is very much superior to comparison of individual data sets in order to search for incongruities. But intuitively, it makes sense that more information will be extracted from a single analysis of the total evidence as opposed to extracting consensus trees from multiple data sets. Construction of consensus trees tends to

0 0

Post a comment