Note

Following includes codes used to run bayou to find shifts in tarsus length variation in birds. It uses bash and R codes interchangeably along with chunks that should be submitted as jobs using a resource manager in a supercomputing cluster (here we use slurm and snakemake). Please make sure you are running the codes in the correct interface.

1 Getting raw data files

Bayou is used to fit Bayesian reversible-jump multi-optima OU models to phylogenetic comparative data (Uyeda and Harmon 2014). See https://github.com/uyedaj/bayou for more information about the bayou code including tutorials. We wanted to get regions of the avian phylognetic tree where there were shifts in tarsus lenght. For this we followed the framework presented in Rombaut et al. (2022). For the phylogenetic tree and tarsus length data, we downloaded the data from Tobias et al. (2022) at this link. Tarsus length measurements are in the file AVONET1_BirdLife.csv in the folder ELEData/TraitData/. The phylogenetic tree generated from Jetz et al. (2012) can be found in the folder ELEData/PhylogeneticData/. The filename is HackettStage1_0001_1000_MCCTreeTargetHeights.nex.

2 Cleaning up the raw data

First we need to get the tarsus length measurements formatted along with the body weight for each species to correct tarsus length for body mass. Similarly, we will need to prune the tree and data to include the same taxa.

The column Group is not present in the original Avonet table. Since running bayou on the entire tree is impractical computationally, we adopted the approach presented in Rombaut et al. (2022) by splitting the dataset into 18 groups. The groups are as follows:

Groups Number_of_taxa
1_Palaeognathae 26
2_Galloanserae 287
3_Mirandornithes 19
4_Columbimorphae 274
5_Strisores 301
6_Gruiformes 45
7_Aequornithes 447
8_Accipitriformes 256
9_Coraciiformes 335
10_Falconiformes 45
11_Psittaciformes 222
12_Suboscines 781
13_Meliphagoidea 171
14_Corvoidea 411
15_Sylvoidea 532
16_Muscicapoidea 451
17_Nectaroidea 54
18_Passeroidea 748
library(ggplot2)
library(ape)
library(bayou)

all_data <- read.csv("AVONET1_BirdLife.csv", header = T)
all_tree <- read.nexus("HackettStage1_0001_1000_MCCTreeTargetHeights.nex")

#change space between species names to underscores
all_data$species <- gsub(" ", "_", all_data$Species1)
#remove species which are not present in the tree
all_data_in_tree <- subset(all_data, species %in% all_tree$tip.label) 
#keep only required columns
all_data_in_tree <- all_data_in_tree[,c("Group", "Family1", "species", "Mass", "Tarsus.Length"),]
#log transform the mass and tarsus lengths
all_data_in_tree$ln_tarsus <- log(all_data_in_tree$Tarsus.Length)
all_data_in_tree$ln_mass <- log(all_data_in_tree$Mass)

#get groups
all_groups <- unique(all_data$Group)

The R script accepts the values 1 through 18 for the respective group and can therefore subset the data for that group and format it for bayou. This allows us to run the program in parallel. We also ran two instance of each run. The number of generations depended on whether or not the chains converged.

use_group <- 7 #this would run for group number 7, 7_Aequornithes

#gather group subset
group <- grep(paste("^",as.numeric(use_group),"_",sep=""), all_groups, value = T)
group_init <- substr(group, 1, 8)

#iteration number and generations
iter <- 1
runs <- 20000000 #20 million
output_folder <- "./"

#prepare data for bayou input
all_data.1 <- subset(all_data_in_tree, Group %in% group)
all_tree.1 <- keep.tip(all_tree, all_data.1$species)
all_data.1$species <- factor(all_data.1$species, levels = all_tree.1$tip.label)
all_data.1 <- all_data.1[order(all_data.1$species),]
#note: order of tips and data should be same

dat <- all_data.1$ln_tarsus
names(dat) <- all_data.1$species

pred <- data.frame(lnMass = all_data.1$ln_mass)
rownames(pred) <- all_data.1$species

3 Setting up bayou priors

You need to set up priors to run bayou. We used similar parameters as mentioned in Rombaut et al. (2022). For tuning parameters we ran short runs (50,000 generations) of different values of tuning parameters until tuning acceptance ratio (which you will see in the output to screen as .alpha for alpha and so on). The bayou manual says good acceptance is 0.2-0.4 so if acceptance ratio is higher than 0.4 increase value of parameter and if acceptance ratio is lower than 0.2 decrease value of parameter.

#priors for bayou
priors <- make.prior(all_tree.1, plot.prior = T,
                     dists=list(dalpha="dhalfcauchy", dsig2="dhalfcauchy", dbeta_lnMass="dnorm",
                                dsb="dsb", dk="cdpois", dtheta="dnorm", dloc="dunif"),
                     param=list(dalpha=list(scale=0.1), dsig2=list(scale=0.1),
                                dbeta_lnMass=list(mean=2, sd=0.5),
                                dk=list(lambda=if(nrow(all_data.1)*0.05 < 1){1}
                                        else{floor(nrow(all_data.1)*0.05)}, 
                                        kmax=floor(nrow(all_data.1)*0.5)),
                                dsb=list(bmax=1,prob=1), dtheta=list(mean=0.3, sd=0.5)))

#list of tuning parameters for the 18 runs
DN1 <- list(list(alpha=4, sig2=3.6, beta_lnMass=0.1, k=c(1,1), theta=1, slide=1),
            list(alpha=7, sig2=1.3, beta_lnMass=0.2, k=c(1,1), theta=2.5, slide=1),
            list(alpha=6, sig2=2, beta_lnMass=0.2, k=c(1,1), theta=1, slide=1),
            list(alpha=9, sig2=1.2, beta_lnMass=0.2, k=c(1,1), theta=2, slide=1),
            list(alpha=7, sig2=1.3, beta_lnMass=0.4, k=c(1,1), theta=2.5, slide=1),
            list(alpha=6, sig2=2.3, beta_lnMass=0.2, k=c(1,1), theta=1, slide=1),
            list(alpha=8, sig2=1, beta_lnMass=0.3, k=c(1,1), theta=3, slide=1),
            list(alpha=8, sig2=1.2, beta_lnMass=0.2, k=c(1,1), theta=2, slide=1),
            list(alpha=9, sig2=1.2, beta_lnMass=0.2, k=c(1,1), theta=2.5, slide=1),
            list(alpha=6, sig2=2.6, beta_lnMass=0.3, k=c(1,1), theta=2, slide=1),
            list(alpha=7, sig2=1.8, beta_lnMass=0.2, k=c(1,1), theta=1.5, slide=1),
            list(alpha=5, sig2=1, beta_lnMass=0.5, k=c(1,1), theta=3.5, slide=1),
            list(alpha=7, sig2=1.8, beta_lnMass=0.1, k=c(1,1), theta=0.5, slide=1),
            list(alpha=7, sig2=1.2, beta_lnMass=0.3, k=c(1,1), theta=2.5, slide=1),
            list(alpha=7, sig2=1, beta_lnMass=0.3, k=c(1,1), theta=3, slide=1),
            list(alpha=6, sig2=1, beta_lnMass=0.3, k=c(1,1), theta=3, slide=1),
            list(alpha=7, sig2=3, beta_lnMass=0.2, k=c(1,1), theta=0.5, slide=1),
            list(alpha=6, sig2=1, beta_lnMass=0.3, k=c(1,1), theta=3, slide=1))

model.N1 <- makeBayouModel(dat ~ lnMass, rjpars = c("theta", "lnMass"),  
                           tree=all_tree.1, dat=dat, pred=pred, prior=priors, D=DN1[[use_group]])
mcmc.N1 <- bayou.makeMCMC(all_tree.1, dat, pred=pred, model=model.N1$model, prior=priors, startpar=model.N1$startpar, 
                          new.dir=output_folder, outname=paste(group_init, iter, sep="_"), plot.freq=NULL, SE = 0.1)

#This will run 20 million generations for group 7 iteration 1
mcmc.N1$run(runs)

It is also cogent to run a few plots and graphs at this stage as well as saving an image of the data in case you need to restart run.

chain.N1 <- set.burnin(mcmc.N1$load(), 0.5)
save.image(paste(output_folder, group_init, ".RData", sep=""))
#plot(chain.N1, auto.layout=FALSE)
par(mfrow=c(1,1))
plotSimmap.mcmc(chain.N1, burnin = 0.3, pp.cutoff = 0.3)
shiftsumsN1 <- shiftSummaries(chain.N1, mcmc.N1, pp.cutoff=0.3)
save.image(paste(output_folder, group_init, ".RData", sep=""))
plotShiftSummaries(shiftsumsN1, lwd=2, single.plot=TRUE, label.pts=FALSE)
dev.off()

4 Analyzing convergence

Once you have two runs for each group, you can check for convergence using the Gelman R statistic for convergence. We used a value less than 1.1 for considering the parameters of two runs converged.

all_data <- read.csv("AVONET1_BirdLife.csv", header = T)
output_folder <- "./"

pdf(paste(output_folder, "convergence", ".pdf", sep=""))
for (x in seq(1,2)){
  output_folder <- "./"
  all_groups <- unique(all_data$Group)
  group <- grep(paste("^",x,"_",sep=""), all_groups, value = T)
  group_init <- substr(group, 1, 8)
  load(paste(output_folder, group_init,"_2.RData", sep=""))
  output_folder <- "./"
  chain2 <- chain.N1
  load(paste(output_folder, group_init,"_1.RData", sep=""))
  print(group)
  summary(chain.N1)
  summary(chain2)
  start_gap <- length(chain2$gen)/2
  post_gap <- length(chain2$gen)*3/4*10
  gelman <- gelman.R("lnL", chain2, chain.N1, freq=10000, start = start_gap)
  abline(h = mean(gelman$R), col = "red", lwd = 4, lty = 4)
  text(x=post_gap, y=mean(gelman$R), labels = round(mean(gelman$R),5), pos=3, col="red")
  gelman <- gelman.R("prior", chain2, chain.N1, freq=10000, start = start_gap)
  abline(h = mean(gelman$R), col = "red", lwd = 4, lty = 4)
  text(x=post_gap, y=mean(gelman$R), labels = round(mean(gelman$R),5), pos=3, col="red")
  gelman <- gelman.R("alpha", chain2, chain.N1, freq=10000, start = start_gap)
  abline(h = mean(gelman$R), col = "red", lwd = 4, lty = 4)
  text(x=post_gap, y=mean(gelman$R), labels = round(mean(gelman$R),5), pos=3, col="red")
  gelman <- gelman.R("sig2", chain2, chain.N1, freq=10000, start = start_gap)
  abline(h = mean(gelman$R), col = "red", lwd = 4, lty = 4)
  text(x=post_gap, y=mean(gelman$R), labels = round(mean(gelman$R),5), pos=3, col="red")
  gelman <- gelman.R("k", chain2, chain.N1, freq=10000, start = start_gap)
  abline(h = mean(gelman$R), col = "red", lwd = 4, lty = 4)
  text(x=post_gap, y=mean(gelman$R), labels = round(mean(gelman$R),5), pos=3, col="red")
  gelman <- gelman.R("ntheta", chain2, chain.N1, freq=10000, start = start_gap)
  abline(h = mean(gelman$R), col = "red", lwd = 4, lty = 4)
  text(x=post_gap, y=mean(gelman$R), labels = round(mean(gelman$R),5), pos=3, col="red")
  gc()
}
dev.off()

If runs have not converged sufficiently, you can run the particular run for longer by rerunning bayou., bayou will pick up where it stopped. We ran to a maximum of 50 million generations for some groups.

5 Visualizing some outputs

We recovered shifts in 10 families of birds. The groups are distributed as follows.

References

Jetz, Walter, Gavin H Thomas, Jeffery B Joy, Klaas Hartmann, and Arne O Mooers. 2012. “The Global Diversity of Birds in Space and Time.” Nature 491 (7424): 444–48.
Rombaut, Louie MK, Elliot JR Capp, Christopher R Cooney, Emma C Hughes, Zoë K Varley, and Gavin H Thomas. 2022. “Allometric Conservatism in the Evolution of Bird Beaks.” Evolution Letters 6 (1): 83–91.
Tobias, Joseph A, Catherine Sheard, Alex L Pigot, Adam JM Devenish, Jingyi Yang, Ferran Sayol, Montague HC Neate-Clegg, et al. 2022. “AVONET: Morphological, Ecological and Geographical Data for All Birds.” Ecology Letters 25 (3): 581–97.
Uyeda, Josef C, and Luke J Harmon. 2014. “A Novel Bayesian Method for Inferring and Interpreting the Dynamics of Adaptive Landscapes from Phylogenetic Comparative Data.” Systematic Biology 63 (6): 902–18.