A parsimonious scenario of gene gain and loss in eukaryotic evolution

As discussed in the previous section, the Dollo parsimony tree based on gene presence absence shows some conflicts with the accepted phyloge-netic tree of the eukaryotic crown group, the principal clades of which have been established with considerable confidence. In particular, some conflicting observations notwithstanding, the consensus of many phylogenetic analyses points to an animal-fungus clade, grouping of micro-sporidia with the fungi, and a coelomate (chordate-arthropod) clade among...

Links between MP and ML

Given a sequence C (wi, , Wk) of characters on X, we put Pr(C T, p)mp J max(Pr(Wi T, p) wi e c(i)) where the supremum is taken over all admissible choices of p and c(i) c(w,-) is the set of extensions of Wi to V. Note that Pr(C T, p) is the probability of generating the k characters by independent and identical evolution under a Poisson model with parameters (T, p). Similarly one has analogous definitions for the 'no common mechanism' Poisson model, in which each character evolves independently...

How phylogenetically informative is a single rstate character

In this section we consider the question of to how to quantify the phylogenetic information a single r-state character carries (a priori, without regard to other characters, or to the character's fit on an existing tree). Let w X R be a character. One measure of the phylogenetic information content of w, based on compatibility, is the following where p(w) is the proportion of fully resolved phylogenetic X-trees for which w is homoplasy-free. For example, if w assigns the same state to all...

Conclusions and recommendations

The present analysis of the 500-terminal rbcL matrix, conducted with a variety of software settings, demonstrates (unsurprisingly) that the amount of branch swapping that is required to complete a one-stage analysis increases with the number of trees held in memory. However, the rate of increase in the required amount of branch swapping is uneven, and we have demonstrated that the intervals in which the branch swapping requirements ascend most steeply are those in which the average tree length...

Alignment and optimization

Two approaches have been developed to deal with the absence of preordained homologies and analyze sequence data. On one hand, methods have been devised to create the missing primary homology statements that are then analyzed by standard techniques broadly referred to as multiple alignment. Traditionally, sequence data have undergone this pre-phylogenetic analysis step to permit familiar procedures akin to those used with anatomical characters. A second approach is to directly optimize sequence...

What exactly is a terminal branch on a tree that is a row in the data matrix

People who publish phylogenetic analyses are usually cavalier about what their terminal branches represent. One often sees species or other taxon names, or even geographic designations of populations, attached to terminal branches of published trees without explanation. Larger-scale units might indeed be a well-justified TU, but they need to be justified, not assumed a priori. Taxa or populations are never the fundamental things from which phylogenies are actually built. Not even individuals...

The Poisson model

In this section and the next we consider the simplest tree-based model for the evolution of characters with state space R, which we will refer to here simply as the Poisson model on R (with parameters (T, p)). In this model, we have a tree T on X, select any element x0 e X as a reference vertex, and direct all edges of T away from x0. We will regard the value from R assigned to vertex x0 as being given (it would make little difference to the argu ments below if we allowed the state at x0 to be...

A brief history of parsimony methods for phylogenetic analysis

Willi Hennig, the German dipterist, is widely considered to be the father of modern phylo-genetics, and his book Phylogenetic Systematics (Hennig 1966) had a broad-reaching influence in the early development of the field. Hennig's greatest contributions are observed in his clear definitions of monophyly, in his discussion of the evidence used to determine monophyly (i.e. synapomorphy), and in his strict adherence to phylogenetic classifications. However, Hennig's explication of the methods by...

Homology the Hennig Farris auxiliary principle and parsimony analysis

A crucial assumption in the above interpretation of a single character is Hennig's auxiliary principle, stating 'that the presence of apomorphous characters in different species is always reason for suspecting kinship i.e. that the species belong to a monophyletic group , and that their origin by convergence should not be assumed a priori' (Hennig 1966, p. 121 square brackets present in original). In this quote, the term 'character' refers to a 'special character' (Hennig 1966, p. 89), which is...

Matrices of character presence absence and Dollo parsimony

A simple but critically important concept that was introduced in the context of the COG analysis is a phyletic (phylogenetic) pattern, which is the pattern of representation (presence absence) of the analyzed species in each COG (Tatusov et al. 1997 Koonin and Galperin 2002). Similar notions have been independently developed and applied by others (Gaasterland and Ragan 1998 Pellegrini et al. 1999). The COGs show a wide scatter of phyletic patterns, with only a small minority (approximately 1 )...

Problems with estimations of monophyly by MCMC

In this section, the discussion will be within the realm of the rules and goals postulated by defenders of model-based methods. We also have other general concerns about model-based methods these reflect a viewpoint not shared by Bayesians, and are therefore discussed in the following section. While the MCMC can be used to estimate any parameter of the evolutionary process, we are concerned here with the estimates that are relevant for phylogenetic studies estimations of monophyly of groups....

Newer methods for parsimony analysis

The first breakthrough in analyzing what might now be considered large gt 150 terminal data sets came with the introduction of the parsimony jackknife by Farris et al. 1996 , which remains the fastest method by which to undertake a parsimony analysis. Using the parsimony jackknife, Kallersjo et al. 1998 analyzed a data set of 2538 terminals, and their results include, as far as we are aware, the largest cladogram ever published. Rice et al. 1997, p. 559 , referring to the parsimony jackknife...

Genomic characters

This is the era of whole-genome sequencing molecular data are becoming available at a rate unanticipated even a few years ago. Sequencing projects in a number of countries have produced a growing number of fully sequenced genomes, providing computational biologists with tremendous opportunities. However, comparative genomics has so far largely been restricted to pairwise comparisons of genomes for instance, to identify syntenic regions, orthologous genes, and common regulatory elements between...

The ontological status of phylogeny what is ideographic science is not nomothetic science

The historical science of phylogenetic inference is ideographic Grant 2002 . The word ideographic, in this context, springs from the idea that relative recency of common ancestry can be represented directly as a concrete, spatio-temporally restricted, explainable thing, the phylogenetic hypothesis, cladogram, or tree, as can the accompanying transformation of an inherited trait or homologue. For all such things there is orderliness to their unfolding, a transformation series, and for more...

Ideographic theory unification

While FP may be both necessary and sufficient in the inference of phylogeny see previous two sections , the question remains whether QPS addresses more than the empirical in the evaluation of scientific hypotheses. Can that ideographic theory make significant contributions to the philosophical to metaphysical system building In addressing this question from the point of view of theory unification Friedman 1983 McAllister 2000 , I briefly survey a small sample of relevant areas of comparative...

Genomics and Dollo parsimony validity of the Dollo principle for different types of genomic data

Dollo parsimony assumes that each derived character state originates only once, and homoplasies exist only in the form of reversals to the primitive condition. Obviously, this is not an absolute but a probabilistic notion. It is not physically impossible for dolphins to re-evolve feet or for yeast to re-evolve the lost system for post-transcriptional gene silencing Aravind et al. 2000 but it appears exceedingly unlikely that these features could reappear in the same form as the lost ones, at...

Dollo parsimony applied to evolution of eukaryotic gene structure

Most of the eukaryotic protein-coding genes contain multiple introns that are spliced out of the pre-mRNA by a distinct, large RNA-protein complex, the spliceosome, which is conserved in all eukaryotes Dacks and Doolittle 2001 . The positions of some spliceosomal introns are conserved in orthologous genes from plants and animals Marchionni and Gilbert 1986 Logsdon et al. 1995 Boudet et al. 2001 . A recent systematic analysis of pairwise alignments of homologous proteins from animals, fungi, and...

Dollo parsimony analysis of prokaryotic gene order

As discussed above, Dollo parsimony is hardly applicable to the analysis of evolution of prokar-yotic gene repertoires because extensive HGT leads to gross violations of the irreversibility principle. However, it might be possible to come up with nearly irreversible, Dollo-compatible characters even in the case of prokaryotic genome evolution. Elements of gene order are, perhaps, the most obvious candidates for the role of such characters in this category. Genome colinearity is preserved only...