Murphy et al. (2001b) concatenated and expanded the data sets of Madsen et al. (2001) and Murphy et al. (2001a) to generate a data set that included 19 nuclear segments and three mitochondrial genes (12S rRNA, tRNA valine, 16S rRNA) for 42 placental taxa and two marsupial outgroups. Some taxa were chimeric, being composed of sequences from species belonging to the same well-supported (noncontroversial) monophyletic group (see Murphy et al., 2001b). After excluding regions of the data set that were judged alignment-ambiguous, the data set was 16,397 bp nucleotides in length. Of these, 14,750 nucleotides were from nuclear genes and 1,647 nucleotides were from mitochondrial genes.
Data were analyzed using likelihood-based analyses, including Bayesian phy-logenetic analyses (Huelsenbeck et al., 2001; Huelsenbeck and Ronquist, 2001; Larget and Simon, 1999; Mau et al., 1999). Likelihood methods are statistically consistent given a correct model of sequence evolution and have the potential to resolve complex phylogenetic problems (Whelan et al., 2001). In both maximum likelihood and Bayesian analyses, we used the general time reversible (GTR) model of sequence evolution with a gamma (r) distribution of rates and an allowance for a proportion of invariant sites (I) based on the results of Modeltest (Posada and Crandall, 1998). Additional details on model parameters are given in Murphy et al. (2001b). PAUP 4.0 (Swofford, 1998) was used to perform maximum likelihood (ML) analyses, including nonparametric bootstrapping. However, it was necessary to employ phylogenetic constraints (see asterisks in Figure 1) and limit searching to nearest neighbor interchanges in ML bootstrap analyses because of computational demands. Whereas ML analyses search for tree(s) having the highest likelihood score, Bayesian methods sample trees according to their posterior probability properties (Huelsenbeck et al., 2001). An advantage of the Bayesian approach is that complex models of sequence evolution, including GTR + r+1, can be employed with large data sets and without the need for phylogenetic constraints.
Even though the Bayesian approach is feasible for large data sets, analytical calculation of Bayesian posterior probabilities requires summation over all topologies and integration over all possible combinations of branch length and substitution model parameter values (Huelsenbeck et al., 2001). These calculations become analytically intractable for even small phylogeny problems (Huelsenbeck et al., 2001), and posterior probabilities must be estimated using other methods. One method is the Markov chain Monte Carlo (MCMC) approach with Metropolis-Hastings sampling (Huelsenbeck et al., 2001; Huelsenbeck and Ronquist, 2001). New states for the Markov chain are proposed using a stochastic mechanism, acceptance probabilities for the new state are calculated, and the new state is accepted if the acceptance probability is higher than a uniform random variable between 0 and 1. Using this approach, a large set of trees can be evaluated from the universe of potential phylogenetic trees (Huelsenbeck et al., 2001; Huelsenbeck and Ronquist, 2001). This provides a powerful alternative to searching for a single maximum likelihood tree and evaluating the reliability of this tree using the nonparametric bootstrap. We used MrBayes 2.01 (Huelsenbeck and Ronquist, 2001), which performs Bayesian analyses using Metropolis-coupled Markov chain Monte Carlo (MCMCMC) sampling, to approximate posterior probabilities distributions for the topology and parameters of the model of sequence evolution. MCM-CMC runs n chains simultaneously and allows for state swaps between chains. Relative to approaches that employ a single Markov chain, MCMCMC is less susceptible to local entrapment and is more efficient at crossing deep valleys in a landscape of phylogenetic trees (Huelsenbeck and Ronquist, 2001).
Bayesian analyses employed four independent chains (three heated, one cold; see Huelsenbeck and Ronquist, 2001), all starting from random trees, and were run for 300,000 or 600,000 generations. Chains were sampled every 20 generations and burnin values were set at 75,000 generations based on empirical evaluation. Additional details are given in Murphy et al. (2001b). Bayesian analyses were also performed with single-taxon outgroup jackknifing and with subsets of nucleotide sequences. The latter included nuclear genes only and mt rRNA genes only. We also partitioned the nuclear data set in two different ways. First, protein-coding genes (12,988 bp) versus UTRs (untranslated regions) (1762 bp). Second, 1st + 2nd codon positions (8658 bp) versus 3rd codon positions + UTRs (6092 bp). All supplementary analyses were run for 300,000 generations with burnin set at 60,000 or 75,000 generations based on empirical evaluation.
Was this article helpful?