## Posterior probability estimation using Markov chain Monte Carlo

Calculation of the posterior probability of a tree is computationally expensive because it involves summation over all possible trees, and for each tree requires integration over all possible permutations of branch lengths and substitutionmodel parameters (Huelsenbeck et al. 2001, 2002). This is not possible in most practical applications due to computational and time constraints, and requires that posterior probabilities be approximated (Huelsenbeck et al. 2002). Markov chain Monte Carlo (MCMC) methods are used to approximate the posterior probabilities of trees, and allow contemporary Bayesian methods to be computationally feasible (Tierney 1994; Huelsenbeck et al. 2001, 2002). The MCMC method is used to sample the posterior probability distribution of trees. The application of the MCMC to phylogeny inference is discussed in detail by Mau and Newton (1997), Yang and Rannala (1997), Mau et al. (1999), Larget and Simon (1999), Newton et al. (1999) and summarized in Huelsenbeck et al. (2001, 2002) and Pagel et al. (2004). The general process MCMC is used to approximate the posterior probability density is as follows. First, a random tree is selected. Second, another tree is proposed by changing one variable of the original tree (e.g., topology, branch length, model parameters, etc.), and the two trees are compared using the Metropolis-Hastings algorithm (Metropolis et al. 1953; Hastings 1970; Green 1995; Huelsenbeck et al. 2002). If the tree represents an improvement, it is accepted, or sampled. If not, the tree is either accepted or rejected proportional to the likelihood ratio between it and the previous tree (Pagel et al. 2004). The chain stabilizes after a sufficient period of run time (called the "burn-in"). Once stable the chain randomly walks through the universe of trees, sampling each tree in proportion to its frequency in the actual posterior density (Pagel et al. 2004). The longer the chain is run, the greater precision with which the actual posterior distribution of trees is approximated (Pagel et al. 2004). Metropolis-coupled MCMC uses multiple, simultaneous Markov chains, improves mixing and convergence, and allows exceedingly large data sets that are beyond the scope of conventional single-chain MCMC Bayesian methods to be analyzed (Geyer 1991; Huelsenbeck et al. 2001, 2002).