Note

Following includes codes used to generate an annotated list of conserved elements for birds. It uses bash and R codes interchangeably along with other instructions to run. Please make sure you are running the codes in the correct interface.

1 Introduction

This markdown file documents the generation of a dataset of ATACseq peaks for chicken, mapped to the bGalGal1.mat.broiler.GRCg7b (henceforth only referred to as galgal7b). You can get the genome here.

We originally started with 116 fasta files comprising 43 unique tissues and treatments at different stages of chick development. We mapped them to the galgal7b genome. We omitted 15 fasta files (see bottom of table) due to quality of reads and problems with alaingments leaving us with 101 fasta files of 36 tissue types.

Tissue_type Replicate SRA_accession Total_reads_mapped Data_source
cone cell e5 - 1 SRR11916725 400534 Lonfat et al. (2021)
cone cell e5 - 2 SRR11916726 586532 Lonfat et al. (2021)
retinal cell e5-6 1 SRR10533482 619494 Patoori et al. (2020)
retinal cell e5-6 2 SRR10533483 1155904 Patoori et al. (2020)
wing bud HH24 1 SRR13428042 392412 Jhanwar et al. (2021)
wing bud HH24 2 SRR13428041 501428 Jhanwar et al. (2021)
wing bud HH22 1 SRR13428040 323406 Jhanwar et al. (2021)
wing bud HH22 2 SRR13428039 422077 Jhanwar et al. (2021)
wing bud HH20 1 SRR13428038 195410 Jhanwar et al. (2021)
wing bud HH20 2 SRR13428037 197882 Jhanwar et al. (2021)
limb bud HH24 1 SRR11878467 392158 Malkmus et al. (2021)
limb bud HH24 2 SRR11878466 501842 Malkmus et al. (2021)
presomitic mesoderm HH14 1 SRR10917286 2429861 Mok et al. (2021)
presomitic mesoderm HH14 2 SRR10917285 2801157 Mok et al. (2021)
presomitic mesoderm HH14 3 SRR10917274 2288277 Mok et al. (2021)
epithelial somites HH14 1 SRR10917266 1992608 Mok et al. (2021)
epithelial somites HH14 2 SRR10917265 2105774 Mok et al. (2021)
epithelial somites HH14 3 SRR10917264 1836757 Mok et al. (2021)
hindlimb HH18 1 SRR10059742 1023402 Young et al. (2019)
hindlimb HH18 2 SRR10059741 795418 Young et al. (2019)
hindlimb HH18 3 SRR10059740 603441 Young et al. (2019)
forelimb HH18 1 SRR10059737 915961 Young et al. (2019)
forelimb HH18 2 SRR10059738 781829 Young et al. (2019)
forelimb HH18 3 SRR10059739 855197 Young et al. (2019)
flank HH18 1 SRR10059734 687477 Young et al. (2019)
flank HH18 2 SRR10059735 661340 Young et al. (2019)
flank HH18 3 SRR10059736 881883 Young et al. (2019)
neural crest HH9 1 SRR10279641 465478 Rothstein and Simoes-Costa (2020)
neural crest HH9 2 SRR10279640 1052795 Rothstein and Simoes-Costa (2020)
neural crest HH6 1 SRR10279639 1188400 Rothstein and Simoes-Costa (2020)
neural crest HH6 2 SRR10279638 1828905 Rothstein and Simoes-Costa (2020)
negative control NC2- E2- HH25 3 SRR8492067 1308863 Ling and Sauka-Spengler (2019)
negative control NC2- E2- HH25 2 SRR8492066 1470813 Ling and Sauka-Spengler (2019)
negative control NC2- E2- HH25 1 SRR8492065 984023 Ling and Sauka-Spengler (2019)
neural crest NC2- HH25 3 SRR8492064 739201 Ling and Sauka-Spengler (2019)
neural crest NC2- HH25 2 SRR8492063 840937 Ling and Sauka-Spengler (2019)
neural crest NC2- HH25 1 SRR8492062 699955 Ling and Sauka-Spengler (2019)
neural crest NC2+ HH25 3 SRR8492061 908522 Ling and Sauka-Spengler (2019)
neural crest NC2+ HH25 2 SRR8492060 991016 Ling and Sauka-Spengler (2019)
neural crest NC2+ HH25 1 SRR8492059 717557 Ling and Sauka-Spengler (2019)
negative control NC2- E2- HH18 3 SRR8492058 943111 Ling and Sauka-Spengler (2019)
negative control NC2- E2- HH18 2 SRR8492057 1419320 Ling and Sauka-Spengler (2019)
negative control NC2- E2- HH18 1 SRR8492056 993025 Ling and Sauka-Spengler (2019)
neural crest NC2- E2+ HH18 3 SRR8492055 1080428 Ling and Sauka-Spengler (2019)
neural crest NC2- E2+ HH18 2 SRR8492054 1325955 Ling and Sauka-Spengler (2019)
neural crest NC2- E2+ HH18 1 SRR8492053 1506587 Ling and Sauka-Spengler (2019)
neural crest NC2+ E2+ HH18 3 SRR8492052 1657227 Ling and Sauka-Spengler (2019)
neural crest NC2+ E2+ HH18 2 SRR8492051 1863603 Ling and Sauka-Spengler (2019)
neural crest NC2+ E2+ HH18 1 SRR8492050 1656390 Ling and Sauka-Spengler (2019)
neural crest E2+ HH12 3 SRR8492049 591477 Ling and Sauka-Spengler (2019)
neural crest E2+ HH12 2 SRR8492048 746994 Ling and Sauka-Spengler (2019)
neural crest E2+ HH12 1 SRR8492047 685209 Ling and Sauka-Spengler (2019)
negative control NC2- HH12 3 SRR8492046 566067 Ling and Sauka-Spengler (2019)
negative control NC2- HH12 2 SRR8492045 640875 Ling and Sauka-Spengler (2019)
negative control NC2- HH12 1 SRR8492044 692517 Ling and Sauka-Spengler (2019)
neural crest NC2+ HH12 3 SRR8492043 1337110 Ling and Sauka-Spengler (2019)
neural crest NC2+ HH12 2 SRR8492042 654113 Ling and Sauka-Spengler (2019)
neural crest NC2+ HH12 1 SRR8492041 1474878 Ling and Sauka-Spengler (2019)
non-neural crest 8-10ss 1 SRR8062752 327185 Williams et al. (2019)
non-neural crest 8-10ss 2 SRR8062751 713343 Williams et al. (2019)
non-neural crest 8-10ss 3 SRR8062750 544369 Williams et al. (2019)
non-neural crest 8-10ss 4 SRR8062749 526987 Williams et al. (2019)
non-neural crest 5-6ss 1 SRR8062748 406114 Williams et al. (2019)
non-neural crest 5-6ss 2 SRR8062747 536168 Williams et al. (2019)
non-neural crest 5-6ss 3 SRR8062746 490289 Williams et al. (2019)
non-neural crest 5-6ss 4 SRR8062745 769000 Williams et al. (2019)
neural crest 8-10ss FOXD3 1 SRR8062744 643461 Williams et al. (2019)
neural crest 8-10ss FOXD3 2 SRR8062743 510546 Williams et al. (2019)
neural crest 8-10ss FOXD3 3 SRR8062742 505559 Williams et al. (2019)
epiblast HH4 1 SRR8062741 347257 Williams et al. (2019)
epiblast HH4 2 SRR8062740 355175 Williams et al. (2019)
metatarsal skin e12 retinoic acid 1 SRR6782021 889266 Lai et al. (2018)
metatarsal skin e12 retinoic acid 2 SRR6782020 911850 Lai et al. (2018)
metatarsal skin e12 RCAS-GFP 1 SRR6782019 1672701 Lai et al. (2018)
metatarsal skin e12 RCAS-GFP 2 SRR6782018 1622741 Lai et al. (2018)
metatarsal skin e12 RCAS-CTNNB1 1 SRR6782017 514530 Lai et al. (2018)
metatarsal skin e12 RCAS-CTNNB1 2 SRR6782016 332392 Lai et al. (2018)
keel e10 1 SRR6713569 799482 Sackton et al. (2019)
keel e10 2 SRR6713570 815108 Sackton et al. (2019)
keel e10 3 SRR6713592 705974 Sackton et al. (2019)
hindlimb e4.5 1 SRR6713571 1057708 Sackton et al. (2019)
hindlimb e4.5 2 SRR6713572 816446 Sackton et al. (2019)
hindlimb e4.5 3 SRR6713577 992959 Sackton et al. (2019)
full sternum e10 1 SRR6713573 1243533 Sackton et al. (2019)
full sternum e10 2 SRR6713574 847942 Sackton et al. (2019)
full sternum e10 3 SRR6713580 1091860 Sackton et al. (2019)
inferior sternum e9 1 SRR6713575 1128212 Sackton et al. (2019)
inferior sternum e9 2 SRR6713576 1171526 Sackton et al. (2019)
inferior sternum e9 3 SRR6713578 1127918 Sackton et al. (2019)
forelimb e4.5 1 SRR6713579 883046 Sackton et al. (2019)
forelimb e4.5 2 SRR6713585 351224 Sackton et al. (2019)
forelimb e4.5 3 SRR6713586 717558 Sackton et al. (2019)
flight muscle e10 1 SRR6713581 697283 Sackton et al. (2019)
flight muscle e10 2 SRR6713583 641133 Sackton et al. (2019)
flight muscle e10 3 SRR6713584 513558 Sackton et al. (2019)
flight muscle e9 1 SRR6713582 586575 Sackton et al. (2019)
flight muscle e9 2 SRR6713587 560014 Sackton et al. (2019)
flight muscle e9 3 SRR6713588 403399 Sackton et al. (2019)
superior sternum e9 1 SRR6713589 815243 Sackton et al. (2019)
superior sternum e9 2 SRR6713590 909182 Sackton et al. (2019)
superior sternum e9 3 SRR6713591 1292178 Sackton et al. (2019)
FNP cells HH24 1 SRR6418911 0 Laugsch et al. (2018)
cone cell e5+ 1 SRR11916722 0 Lonfat et al. (2021)
cone cell e5+ 2 SRR11916723 0 Lonfat et al. (2021)
cone cell e5+ 3 SRR11916724 0 Lonfat et al. (2021)
epiblast HH3 1 SRR12965854 118497 Crispatzu et al. (2021)
maturing somites HH14 1 SRR10917280 0 Mok et al. (2021)
maturing somites HH14 2 SRR10917281 0 Mok et al. (2021)
maturing somites HH14 3 SRR10917282 0 Mok et al. (2021)
differentiated somites HH14 1 SRR10917276 0 Mok et al. (2021)
differentiated somites HH14 2 SRR10917275 0 Mok et al. (2021)
differentiated somites HH14 3 SRR10917273 0 Mok et al. (2021)
somites HH10 1 SRR8062739 2405 Williams et al. (2019)
neural crest 5-6ss pax7 1 SRR8062737 23955 Williams et al. (2019)
neural crest 5-6ss FOXD3 1 SRR8062736 0 Williams et al. (2019)
neural crest 5-6ss FOXD3 2 SRR8062735 0 Williams et al. (2019)

2 Methods

2.1 Gather all files

We used sra toolkit to download the files and convert them to fastq files. You can download the toolkit from https://hpc.nih.gov/apps/sratoolkit.html.

#bash

#get SRA files
prefetch \
SRR11916725 SRR11916726 SRR8062740  SRR8062741  SRR10917264 SRR10917265 \
SRR10917266 SRR10059734 SRR10059735 SRR10059736 SRR6713581  SRR6713583 \
SRR6713584  SRR6713582  SRR6713587  SRR6713588  SRR6713579  SRR6713585 \
SRR6713586  SRR10059737 SRR10059738 SRR10059739 SRR6713573  SRR6713574 \
SRR6713580  SRR6713571  SRR6713572  SRR6713577  SRR10059740 SRR10059741 \
SRR10059742 SRR6713575  SRR6713576  SRR6713578  SRR6713569  SRR6713570 \
SRR6713592  SRR11878466 SRR11878467 SRR6782016  SRR6782017  SRR6782018 \
SRR6782019  SRR6782020  SRR6782021  SRR8492056  SRR8492057  SRR8492058 \
SRR8492065  SRR8492066  SRR8492067  SRR8492044  SRR8492045  SRR8492046 \
SRR8062742  SRR8062743  SRR8062744  SRR8492047  SRR8492048  SRR8492049 \
SRR10279638 SRR10279639 SRR10279640 SRR10279641 SRR8492053  SRR8492054 \
SRR8492055  SRR8492062  SRR8492063  SRR8492064  SRR8492050  SRR8492051 \
SRR8492052  SRR8492041  SRR8492042  SRR8492043  SRR8492059  SRR8492060 \
SRR8492061  SRR8062745  SRR8062746  SRR8062747  SRR8062748  SRR8062749 \
SRR8062750  SRR8062751  SRR8062752  SRR10917274 SRR10917285 SRR10917286 \
SRR10533482 SRR10533483 SRR6713589  SRR6713590  SRR6713591  SRR13428037 \
SRR13428038 SRR13428039 SRR13428040 SRR13428041 SRR13428042 -O ./

#convert SRA to fastq
mkdir -p fastq
fasterq-dump \
SRR11916725 SRR11916726 SRR8062740  SRR8062741  SRR10917264 SRR10917265 \
SRR10917266 SRR10059734 SRR10059735 SRR10059736 SRR6713581  SRR6713583 \
SRR6713584  SRR6713582  SRR6713587  SRR6713588  SRR6713579  SRR6713585 \
SRR6713586  SRR10059737 SRR10059738 SRR10059739 SRR6713573  SRR6713574 \
SRR6713580  SRR6713571  SRR6713572  SRR6713577  SRR10059740 SRR10059741 \
SRR10059742 SRR6713575  SRR6713576  SRR6713578  SRR6713569  SRR6713570 \
SRR6713592  SRR11878466 SRR11878467 SRR6782016  SRR6782017  SRR6782018 \
SRR6782019  SRR6782020  SRR6782021  SRR8492056  SRR8492057  SRR8492058 \
SRR8492065  SRR8492066  SRR8492067  SRR8492044  SRR8492045  SRR8492046 \
SRR8062742  SRR8062743  SRR8062744  SRR8492047  SRR8492048  SRR8492049 \
SRR10279638 SRR10279639 SRR10279640 SRR10279641 SRR8492053  SRR8492054 \
SRR8492055  SRR8492062  SRR8492063  SRR8492064  SRR8492050  SRR8492051 \
SRR8492052  SRR8492041  SRR8492042  SRR8492043  SRR8492059  SRR8492060 \
SRR8492061  SRR8062745  SRR8062746  SRR8062747  SRR8062748  SRR8062749 \
SRR8062750  SRR8062751  SRR8062752  SRR10917274 SRR10917285 SRR10917286 \
SRR10533482 SRR10533483 SRR6713589  SRR6713590  SRR6713591  SRR13428037 \
SRR13428038 SRR13428039 SRR13428040 SRR13428041 SRR13428042 -O ./fastq -e 10

2.2 Remove adapters

We use NGmerge to remove adapters. You can get NGmerge at https://github.com/jsh58/NGmerge.

#bash

#We created a file acc_list.txt that lists all the accessions, one per line
mkdir -p fastq_ngmerge
while IFS= read -r line; do
  NGmerge -a -1 ./fastq/${line}_1.fastq -2 ./fastq/${line}_2.fastq -o ./fastq_ngmerge/${line} -v -u 73 -n 6
done < "acc_list.txt"

2.3 Map to galgal7

We used bwa v0.7.17 and samtools v1.9 to map all the files to the galgal7b genome.

#bash
bwa index galgal7b.fna
samtools faidx galgal7b.fna
samtools dict galgal7b.fna -o galgal7b.dict

mkdir -p bams
while IFS= read -r line; do
    bwa mem -t 10 galgal7b.fna ./fastq_ngmerge/${line}_1.fastq ./fastq_ngmerge/${line}_2.fastq | \
    samtools view -b -u | \
    samtools sort --threads 8 -n -o ./bams/${line}.bam
done < "acc_list.txt"

2.4 Call ATAC peaks

We called peaks two ways, one for each tissue type and one for all the files altogether. We used a -e filter to remove mitochondrial sequences.

#bash
mkdir -p beds

Genrich -t ./bams/SRR11916725.bam,./bams/SRR11916726.bam \
-o ./beds/cone_cell_e5_-.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10533482.bam,./bams/SRR10533483.bam \
-o ./beds/retinal_cell_e5-6.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR13428042.bam,./bams/SRR13428041.bam \
-o ./beds/wing_bud_HH24.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR13428040.bam,./bams/SRR13428039.bam \
-o ./beds/wing_bud_HH22.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR13428038.bam,./bams/SRR13428037.bam \
-o ./beds/wing_bud_HH20.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR11878467.bam,./bams/SRR11878466.bam \
-o ./beds/limb_bud_HH24.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10917286.bam,./bams/SRR10917285.bam,./bams/SRR10917274.bam \
-o ./beds/presomitic_mesoderm_HH14.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10917266.bam,./bams/SRR10917265.bam,./bams/SRR10917264.bam \
-o ./beds/epithelial_somites_HH14.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10059742.bam,./bams/SRR10059741.bam,./bams/SRR10059740.bam \
-o ./beds/hindlimb_HH18.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10059737.bam,./bams/SRR10059738.bam,./bams/SRR10059739.bam \
-o ./beds/forelimb_HH18.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10059734.bam,./bams/SRR10059735.bam,./bams/SRR10059736.bam \
-o ./beds/flank_HH18.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10279641.bam,./bams/SRR10279640.bam \
-o ./beds/neural_crest_HH9.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10279639.bam,./bams/SRR10279638.bam \
-o ./beds/neural_crest_HH6.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492067.bam,./bams/SRR8492066.bam,./bams/SRR8492065.bam \
-o ./beds/negative_control_NC2-_E2-_HH25.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492064.bam,./bams/SRR8492063.bam,./bams/SRR8492062.bam \
-o ./beds/neural_crest_NC2-_HH25.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492061.bam,./bams/SRR8492060.bam,./bams/SRR8492059.bam \
-o ./beds/neural_crest_NC2+_HH25.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492058.bam,./bams/SRR8492057.bam,./bams/SRR8492056.bam \
-o ./beds/negative_control_NC2-_E2-_HH18.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492055.bam,./bams/SRR8492054.bam,./bams/SRR8492053.bam \
-o ./beds/neural_crest_NC2-_E2+_HH18.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492052.bam,./bams/SRR8492051.bam,./bams/SRR8492050.bam \
-o ./beds/neural_crest_NC2+_E2+_HH18.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492049.bam,./bams/SRR8492048.bam,./bams/SRR8492047.bam \
-o ./beds/neural_crest_E2+_HH12.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492046.bam,./bams/SRR8492045.bam,./bams/SRR8492044.bam \
-o ./beds/negative_control_NC2-_HH12.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492043.bam,./bams/SRR8492042.bam,./bams/SRR8492041.bam \
-o ./beds/neural_crest_NC2+_HH12_.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8062752.bam,./bams/SRR8062751.bam,./bams/SRR8062750.bam,./bams/SRR8062749.bam \
-o ./beds/non-neural_crest_8-10ss.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8062741.bam,./bams/SRR8062740.bam \
-o ./beds/epiblast_HH4.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6782021.bam,./bams/SRR6782020.bam \
-o ./beds/metatarsal_skin_e12_retinoic_acid.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6782019.bam,./bams/SRR6782018.bam \
-o ./beds/metatarsal_skin_e12_RCAS-GFP.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6782017.bam,./bams/SRR6782016.bam \
-o ./beds/metatarsal_skin_e12_RCAS-CTNNB1.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713569.bam,./bams/SRR6713570.bam,./bams/SRR6713592.bam \
-o ./beds/keel_e10.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713571.bam,./bams/SRR6713572.bam,./bams/SRR6713577.bam \
-o ./beds/hindlimb_e4.5.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713573.bam,./bams/SRR6713574.bam,./bams/SRR6713580.bam \
-o ./beds/full_sternum_e10.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713575.bam,./bams/SRR6713576.bam,./bams/SRR6713578.bam \
-o ./beds/inferior_sternum_e9.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713579.bam,./bams/SRR6713585.bam,./bams/SRR6713586.bam \
-o ./beds/forelimb_e4.5.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713581.bam,./bams/SRR6713583.bam,./bams/SRR6713584.bam \
-o ./beds/flight_muscle_e10.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713582.bam,./bams/SRR6713587.bam,./bams/SRR6713588.bam \
-o ./beds/flight_muscle_e9.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713589.bam,./bams/SRR6713590.bam,./bams/SRR6713591.bam \
-o ./beds/superior_sternum_e9.bed -j -e NC_053523.1 -r

#Call all bams at once
Genrich -t \
./bams/SRR10059734.bam,./bams/SRR10059735.bam,./bams/SRR10059736.bam,\
./bams/SRR10059737.bam,./bams/SRR10059738.bam,./bams/SRR10059739.bam,\
./bams/SRR10059740.bam,./bams/SRR10059741.bam,./bams/SRR10059742.bam,\
./bams/SRR10279638.bam,./bams/SRR10279639.bam,./bams/SRR10279640.bam,\
./bams/SRR10279641.bam,./bams/SRR10533482.bam,./bams/SRR10533483.bam,\
./bams/SRR10917264.bam,./bams/SRR10917265.bam,./bams/SRR10917266.bam,\
./bams/SRR10917274.bam,./bams/SRR10917285.bam,./bams/SRR10917286.bam,\
./bams/SRR11878466.bam,./bams/SRR11878467.bam,./bams/SRR11916725.bam,\
./bams/SRR11916726.bam,./bams/SRR13428037.bam,./bams/SRR13428038.bam,\
./bams/SRR13428039.bam,./bams/SRR13428040.bam,./bams/SRR13428041.bam,\
./bams/SRR13428042.bam,./bams/SRR6713569.bam,./bams/SRR6713570.bam,\
./bams/SRR6713571.bam,./bams/SRR6713572.bam,./bams/SRR6713573.bam,\
./bams/SRR6713574.bam,./bams/SRR6713575.bam,./bams/SRR6713576.bam,\
./bams/SRR6713577.bam,./bams/SRR6713578.bam,./bams/SRR6713579.bam,\
./bams/SRR6713580.bam,./bams/SRR6713581.bam,./bams/SRR6713582.bam,\
./bams/SRR6713583.bam,./bams/SRR6713584.bam,./bams/SRR6713585.bam,\
./bams/SRR6713586.bam,./bams/SRR6713587.bam,./bams/SRR6713588.bam,\
./bams/SRR6713589.bam,./bams/SRR6713590.bam,./bams/SRR6713591.bam,\
./bams/SRR6713592.bam,./bams/SRR6782016.bam,./bams/SRR6782017.bam,\
./bams/SRR6782018.bam,./bams/SRR6782019.bam,./bams/SRR6782020.bam,\
./bams/SRR6782021.bam,./bams/SRR8062740.bam,./bams/SRR8062741.bam,\
./bams/SRR8062742.bam,./bams/SRR8062743.bam,./bams/SRR8062744.bam,\
./bams/SRR8062745.bam,./bams/SRR8062746.bam,./bams/SRR8062747.bam,\
./bams/SRR8062748.bam,./bams/SRR8062749.bam,./bams/SRR8062750.bam,\
./bams/SRR8062751.bam,./bams/SRR8062752.bam,./bams/SRR8492041.bam,\
./bams/SRR8492042.bam,./bams/SRR8492043.bam,./bams/SRR8492044.bam,\
./bams/SRR8492045.bam,./bams/SRR8492046.bam,./bams/SRR8492047.bam,\
./bams/SRR8492048.bam,./bams/SRR8492049.bam,./bams/SRR8492050.bam,\
./bams/SRR8492051.bam,./bams/SRR8492052.bam,./bams/SRR8492053.bam,\
./bams/SRR8492054.bam,./bams/SRR8492055.bam,./bams/SRR8492056.bam,\
./bams/SRR8492057.bam,./bams/SRR8492058.bam,./bams/SRR8492059.bam,\
./bams/SRR8492060.bam,./bams/SRR8492061.bam,./bams/SRR8492062.bam,\
./bams/SRR8492063.bam,./bams/SRR8492064.bam,./bams/SRR8492065.bam,\
./bams/SRR8492066.bam,./bams/SRR8492067.bam \
-o ./beds/all_qc.bed -j -e NC_053523.1 -r

2.5 Generate combined dataset

To generate a combined dataset, we used the Genrich call file using all files at once and added all the peaks that were unique to each of the specific tissue type.

#bash
bedtools intersect -v -b beds/all_qc.bed -a beds/cone_cell_e5_-.bed > \
beds_intersect/cone_cell_e5_-_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/retinal_cell_e5-6.bed > \
beds_intersect/retinal_cell_e5-6_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/wing_bud_HH24.bed > \
beds_intersect/wing_bud_HH24_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/wing_bud_HH22.bed > \
beds_intersect/wing_bud_HH22_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/wing_bud_HH20.bed > \
beds_intersect/wing_bud_HH20_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/limb_bud_HH24.bed > \
beds_intersect/limb_bud_HH24_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/presomitic_mesoderm_HH14.bed > \
beds_intersect/presomitic_mesoderm_HH14_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/epithelial_somites_HH14.bed > \
beds_intersect/epithelial_somites_HH14_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/hindlimb_HH18.bed > \
beds_intersect/hindlimb_HH18_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/forelimb_HH18.bed > \
beds_intersect/forelimb_HH18_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/flank_HH18.bed > \
beds_intersect/flank_HH18_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_HH9.bed > \
beds_intersect/neural_crest_HH9_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_HH6.bed > \
beds_intersect/neural_crest_HH6_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/negative_control_NC2-_E2-_HH25.bed > \
beds_intersect/negative_control_NC2-_E2-_HH25_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_NC2-_HH25.bed > \
beds_intersect/neural_crest_NC2-_HH25_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_NC2+_HH25.bed > \
beds_intersect/neural_crest_NC2+_HH25_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/negative_control_NC2-_E2-_HH18.bed > \
beds_intersect/negative_control_NC2-_E2-_HH18_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_NC2-_E2+_HH18.bed > \
beds_intersect/neural_crest_NC2-_E2+_HH18_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_NC2+_E2+_HH18.bed > \
beds_intersect/neural_crest_NC2+_E2+_HH18_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_E2+_HH12.bed > \
beds_intersect/neural_crest_E2+_HH12_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/negative_control_NC2-_HH12.bed > \
beds_intersect/negative_control_NC2-_HH12_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_NC2+_HH12_.bed > \
beds_intersect/neural_crest_NC2+_HH12__intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/non-neural_crest_8-10ss.bed > \
beds_intersect/non-neural_crest_8-10ss_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/non-neural_crest_5-6ss.bed > \
beds_intersect/non-neural_crest_5-6ss_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/epiblast_HH4.bed > \
beds_intersect/epiblast_HH4_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/metatarsal_skin_e12_retinoic_acid.bed > \
beds_intersect/metatarsal_skin_e12_retinoic_acid_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/metatarsal_skin_e12_RCAS-GFP.bed > \
beds_intersect/metatarsal_skin_e12_RCAS-GFP_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/metatarsal_skin_e12_RCAS-bTNNB1.bed > beds_intersect/metatarsal_skin_e12_RCAS-bTNNB1_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/keel_e10.bed > \
beds_intersect/keel_e10_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/hindlimb_e4.5.bed > \
beds_intersect/hindlimb_e4.5_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/full_sternum_e10.bed > \
beds_intersect/full_sternum_e10_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/inferior_sternum_e9.bed > \
beds_intersect/inferior_sternum_e9_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/forelimb_e4.5.bed > \
beds_intersect/forelimb_e4.5_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/flight_muscle_e10.bed > \
beds_intersect/flight_muscle_e10_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/flight_muscle_e9.bed > \
beds_intersect/flight_muscle_e9_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/superior_sternum_e9.bed > \
beds_intersect/superior_sternum_e9_intersect.bed

#cat all the uniques into one and then add to main file
cat beds_intersect/*.bed | bedtools sort -i "stdin" | \
bedtools merge -i "stdin" > beds/all_uniques.bed

cat beds/all_qc.bed beds/all_uniques.bed | bedtools sort -i "stdin" > atac_all.bed 

2.6 Annotate atac file

To generate annotation we used bedtools annotate.

#bash

bedtools annotate -counts -i atac_all.bed -files \
beds/cone_cell_e5_-.bed beds/epiblast_HH4.bed beds/epithelial_somites_HH14.bed \
beds/flank_HH18.bed beds/flight_muscle_e10.bed beds/flight_muscle_e9.bed \
beds/forelimb_e4.5.bed beds/forelimb_HH18.bed beds/full_sternum_e10.bed \
beds/hindlimb_e4.5.bed beds/hindlimb_HH18.bed beds/inferior_sternum_e9.bed \
beds/keel_e10.bed beds/limb_bud_HH24.bed beds/metatarsal_skin_e12_RCAS-CTNNB1.bed \
beds/metatarsal_skin_e12_RCAS-GFP.bed beds/metatarsal_skin_e12_retinoic_acid.bed \
beds/negative_control_NC2-_E2-_HH18.bed beds/negative_control_NC2-_E2-_HH25.bed \
beds/negative_control_NC2-_HH12.bed beds/neural_crest_E2+_HH12.bed \
beds/neural_crest_HH6.bed beds/neural_crest_HH9.bed beds/neural_crest_NC2-_E2+_HH18.bed \
beds/neural_crest_NC2-_HH25.bed beds/neural_crest_NC2+_E2+_HH18.bed \
beds/neural_crest_NC2+_HH12_.bed beds/neural_crest_NC2+_HH25.bed \
beds/non-neural_crest_5-6ss.bed beds/non-neural_crest_8-10ss.bed \
beds/presomitic_mesoderm_HH14.bed beds/retinal_cell_e5-6.bed \
beds/superior_sternum_e9.bed beds/wing_bud_HH20.bed beds/wing_bud_HH22.bed \
beds/wing_bud_HH24.bed > atac_all_annot.txt

#Clean up in R to include header line

2.7 Stats of annotated ATAC

To generate some statistics for annotated ATACseq file.

library(data.table)
library(ggplot2)
library(gridExtra)

atac_data <- fread("atac_all_annot.txt")

atac_data$length <- atac_data$Stop - atac_data$Start
atac_data$length2 <- atac_data$length
atac_data$length2[which(atac_data$length >= 2000)] <- 2000

#Drop elements not called in individual dataset
atac_data$tissues <- rowSums(atac_data[,5:40])
atac_data <- subset(atac_data, tissues > 0)

g1 <- ggplot(atac_data, aes(x = atac_data$length2)) +
  geom_histogram(bins = 50) +
  theme_classic() +
  geom_vline(xintercept = mean(atac_data$length), col = "red", linetype = "dashed") +
  labs(title="Distribution of length of ATACseq elements",x="Length of Elements (bp)", y = "Count")+
  scale_x_continuous(limits = c(0,2000), expand = c(0, 0)) +
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE), expand = expansion(mult = c(0, .1)))

g2 <- ggplot(atac_data, aes(x = atac_data$tissues)) +
  geom_histogram(bins = 36) +
  theme_classic() +
  labs(x="Shared ATACseq elements", y = "Count")+
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE), expand = expansion(mult = c(0, .1)))

gs <- arrangeGrob(g1, g2, ncol = 2)
grid.arrange(gs)

2.7.0.1 Relationship of loci

We used the annotations to create a presence (1)/absence (0) matrix and use IQTree2 to analyze the matrix in a tree-building framework. The matrix was subsequently formatted into a phylip like format for analysis with IQTree.

library(pheatmap)
atac_data_cor <- atac_data[,5:40]
#corr_map <- cor(atac_data_cor)
corr_map <- read.table("ATAC_corr_map.txt", header = T, row.names = 1, sep = "\t")
pheatmap(log(corr_map,base = 10))

To run the tree:

#bash
iqtree2-mpi -s all_tissues.phylip.varsites.phy --seqtype MORPH -m MK+ASC -T 12

Heatmap of elements

3 Predict ATAC-seq in non chicken

We hope to use these open chromatin regions for genomes on birds other than chicken. It is not well known whether a region that is open in a chicken is open in other avian species. An approach called TACIT was developed by Kaplow et al. (2022) that uses machine learning to predict open chromatin regions in different species using a training dataset. We adopted this approach to predict open chromatin regions in different birds species using data from chicken and emu.

3.1 The code

The code is a python code modified from Kaplow et al. (2022) that takes a config file as input and train a neural network on ATACseq peaks get code here (machinations.py).

3.2 Preparing the datasets

Here we use ATACseq calls for Hindlimbs at HH18 stage of chicken (Gallus gallus) and emu (Dromaius novaehollandiae) and use calls for tinamou (Eudromia elegans) to validate the dataset. First, we need to create a dataset of positive (ATAC peak open) and negative (ATAC peak closed). For this we compared elements between chicken and emu and characterized any element that overlapped with one another and present in both datasets as positive (open). For any elements that were open in one species and absent in the other species we binned the open elements in the positive category and the homologous sequence of the absent element in the negative (closed) category. Then we divided this into a training, validation, and evaluation datasets (binned based on scaffold location). These files were then used to train the model.

These steps include jumping from R to bash and back. Please make sure you run the code in the appropriate framework.

First we need to create a bed file for the two species that include sequences of all the same size. The python code can accommodate different sizes but the missing data gets accounted as information for the neural network and the resulting assignments are segregated by size of element (without Ns) rather than the patterns within it.

#R
#getting pos and neg datasets

###inputs; i use a and b to designate two datasets. Replace a and b with respective bed files 
#to run for any group without having to change any other info
bed_a <- "emu_hindlimb_HH18.bed"
bed_a_id <- "emu"
bed_b <- "galgal_hindlimb_HH18.bed"
bed_b_id <- "gal"

###outputs
working_folder <- "./"

###We need to make sure all elements are the same length. Here we chose 500 bp
ele_size <- 500

#A function to parse bed files from Genrich to output unique elements
#that are of the size we need
process_bed <- function(bed_file, id, size){
  bed <- read.table(bed_file, sep="\t", header = F, 
                    col.names = c("Scaffold", "Start", "Stop", "Peak_ID", "Score", 
                                  "Strand", "SignalValue", "p", "q", "Summit"))
  bed$Peak_ID <- paste(id, bed$Peak_ID, sep="_")
  bed$Length <- bed$Stop - bed$Start
  bed$Summit_Base <- bed$Start + bed$Summit
  bed$Summit_Start <- bed$Summit_Base - size/2
  bed$Summit_Stop <- bed$Summit_Base + size/2 - 1
  bed <- subset(bed, Summit_Start >= 0)
  return(bed)
}

#Get bed files for the two species
bed_a.data <- process_bed(bed_a, bed_a_id, ele_size)
bed_b.data <- process_bed(bed_b, bed_b_id, ele_size)

#Write the output as a bed file
write.table(bed_a.data[,c("Scaffold", "Summit_Base", "Summit_Base", "Peak_ID")], 
            paste(working_folder, "bed_a_summit.bed", sep=""), 
            quote = F, row.names = F, col.names = F, sep="\t")
write.table(bed_b.data[,c("Scaffold", "Summit_Base", "Summit_Base", "Peak_ID")], 
            paste(working_folder, "bed_b_summit.bed", sep=""), 
            quote = F, row.names = F, col.names = F, sep="\t")

Next we will clean up the bed files a bit

#bash

bedtools merge -o count,distinct -d 500 -i bed_a_summit.bed > bed_a_unique_summit.bed
bedtools merge -o count,distinct -d 500 -i bed_b_summit.bed > bed_b_unique_summit.bed

This step removes closeby peaks that overlapped when we extended a peak size to 500 bp.

#R
bed_a.uniq <- read.table(paste(working_folder, "bed_a_unique_summit.bed", sep=""), sep="\t", header = F, 
                         col.names = c("Scaffold", "Summit_Start", "Summit_Stop", "Count", "Peak_ID"))
bed_b.uniq <- read.table(paste(working_folder, "bed_b_unique_summit.bed", sep=""), sep="\t", header = F,
                         col.names = c("Scaffold", "Summit_Start", "Summit_Stop", "Count", "Peak_ID"))

#if peaks overlapped, only choose the first peak among the mix
bed_a.uniq$Peak_ID <- gsub(",.*", "", bed_a.uniq$Peak_ID)
bed_b.uniq$Peak_ID <- gsub(",.*", "", bed_b.uniq$Peak_ID)

write.table(bed_a.data[bed_a.data$Peak_ID %in% bed_a.uniq$Peak_ID, 
            c("Scaffold", "Summit_Start", "Summit_Stop", "Peak_ID")],
            paste(working_folder, "bed_a_peaks_500.bed", sep=""), 
            quote = F, row.names = F, col.names = F, sep = "\t")
write.table(bed_b.data[bed_b.data$Peak_ID %in% bed_b.uniq$Peak_ID, 
            c("Scaffold", "Summit_Start", "Summit_Stop", "Peak_ID")],
            paste(working_folder, "bed_b_peaks_500.bed", sep=""), 
            quote = F, row.names = F, col.names = F, sep = "\t")

To compare the two datasets, we need it to be in the same genetic coordinate system. For this we will use the hal alignment to liftover the emu sequences into galgal7b and vice versa.

#bash

#using hal to convert each to the other species coordinates and then use sort and merge to get merged peaks
halLiftover all_birds_new.hal Gallus_gallus7 bed_b_peaks_500.bed Dromaius_novaehollandiae bed_b2a_peaks_500.bed
halLiftover all_birds_new.hal Dromaius_novaehollandiae bed_a_peaks_500.bed Gallus_gallus7 bed_a2b_peaks_500.bed

#clean up the liftover files, we'll consider anything 100 bp nearby as being a convergent peak
bedtools sort -i bed_b2a_peaks_500.bed | bedtools merge -o distinct -d 100 -i stdin > bed_b2a_peaks_merged.bed
bedtools sort -i bed_a2b_peaks_500.bed | bedtools merge -o distinct -d 100 -i stdin > bed_a2b_peaks_merged.bed

#get intersects
bedtools intersect -a bed_a_peaks_500.bed -b bed_b2a_peaks_merged.bed -wa -wb > bed_a_intersect.bed
bedtools intersect -a bed_b_peaks_500.bed -b bed_a2b_peaks_merged.bed -wa -wb > bed_b_intersect.bed

More cleanup in R to make all the sequences in the lifted over files again 500 bp

#R

bed_a2b_merged <- read.table(paste(working_folder, "bed_a2b_peaks_merged.bed", sep=""), sep="\t", header = F,
                             col.names = c("Scaffold", "Summit_Start", "Summit_Stop", "Peak_ID"))
bed_b2a_merged <- read.table(paste(working_folder, "bed_b2a_peaks_merged.bed", sep=""), sep="\t", header = F,
                             col.names = c("Scaffold", "Summit_Start", "Summit_Stop", "Peak_ID"))

#remove peaks that span big gaps, or merge two original peaks
bed_a2b_merged <- bed_a2b_merged[!(duplicated(bed_a2b_merged$Peak_ID)|duplicated(bed_a2b_merged$Peak_ID, fromLast=TRUE)),]
bed_b2a_merged <- bed_b2a_merged[!(duplicated(bed_b2a_merged$Peak_ID)|duplicated(bed_b2a_merged$Peak_ID, fromLast=TRUE)),]
bed_a2b_merged <- bed_a2b_merged[!grepl(",", bed_a2b_merged$Peak_ID),]
bed_b2a_merged <- bed_b2a_merged[!grepl(",", bed_b2a_merged$Peak_ID),]

#get length of converted elements and keep elements that are between 400 and 600 bps
bed_a2b_merged$Length <- bed_a2b_merged$Summit_Stop - bed_a2b_merged$Summit_Start
bed_b2a_merged$Length <- bed_b2a_merged$Summit_Stop - bed_b2a_merged$Summit_Start
bed_a2b_merged <- bed_a2b_merged[which(bed_a2b_merged$Length > 400 & 
                                         bed_a2b_merged$Length < 600),]
bed_b2a_merged <- bed_b2a_merged[which(bed_b2a_merged$Length > 400 & 
                                         bed_b2a_merged$Length < 600),]

#readjust lengths of elements to equal 500 bps, remove peaks that are less than 500 bp
bed_a2b_merged$Chop_Left <- ceiling((bed_a2b_merged$Length - 500)/2)+bed_a2b_merged$Summit_Start
bed_a2b_merged$Chop_Right <- bed_a2b_merged$Summit_Stop - floor((bed_a2b_merged$Length - 500)/2) - 1
bed_b2a_merged$Chop_Left <- ceiling((bed_b2a_merged$Length - 500)/2)+bed_b2a_merged$Summit_Start
bed_b2a_merged$Chop_Right <- bed_b2a_merged$Summit_Stop - floor((bed_b2a_merged$Length - 500)/2) - 1
bed_a2b_merged <- subset(bed_a2b_merged, Chop_Left >= 0)
bed_b2a_merged <- subset(bed_b2a_merged, Chop_Left >= 0)

#intersects
bed_a_intersects <- read.table(paste(working_folder, "bed_a_intersect.bed", sep=""), sep="\t", header = F)
bed_b_intersects <- read.table(paste(working_folder, "bed_b_intersect.bed", sep=""), sep="\t", header = F)

#get files in order and get samtools formatted output to get fasta files
#some fasta files will be truncated because we do not account for the size of scaffolds
positives_a <- bed_a.data[bed_a.data$Peak_ID %in% bed_a.uniq$Peak_ID, 
                          c("Scaffold", "Summit_Start", "Summit_Stop", "Peak_ID")]
positives_b <- bed_b.data[bed_b.data$Peak_ID %in% bed_b.uniq$Peak_ID, 
                          c("Scaffold", "Summit_Start", "Summit_Stop", "Peak_ID")]
positives_a$Samtool <- paste(positives_a$Scaffold, ":", positives_a$Summit_Start, "-", positives_a$Summit_Stop, sep="")
positives_b$Samtool <- paste(positives_b$Scaffold, ":", positives_b$Summit_Start, "-", positives_b$Summit_Stop, sep="")

#for negatives we want all the things that did not intersect
negatives_a_id <- setdiff(bed_a.data$Peak_ID, bed_a_intersects$V4)
negatives_b_id <- setdiff(bed_b.data$Peak_ID, bed_b_intersects$V4)

negatives_a <- bed_a2b_merged[bed_a2b_merged$Peak_ID %in% negatives_a_id, 
                          c("Scaffold", "Chop_Left", "Chop_Right", "Peak_ID")]
negatives_b <- bed_b2a_merged[bed_b2a_merged$Peak_ID %in% negatives_b_id, 
                          c("Scaffold", "Chop_Left", "Chop_Right", "Peak_ID")]
negatives_a$Samtool <- paste(negatives_a$Scaffold, ":", negatives_a$Chop_Left, "-", negatives_a$Chop_Right, sep="")
negatives_b$Samtool <- paste(negatives_b$Scaffold, ":", negatives_b$Chop_Left, "-", negatives_b$Chop_Right, sep="")

#write these positives and negatives bed file
write.table(positives_a, paste(working_folder, "positives_a.bed", sep=""),
            quote = F, row.names = F, col.names = F, sep = "\t")
write.table(positives_b, paste(working_folder, "positives_b.bed", sep=""),
            quote = F, row.names = F, col.names = F, sep = "\t")
write.table(negatives_a, paste(working_folder, "negatives_a.bed", sep=""),
            quote = F, row.names = F, col.names = F, sep = "\t")
write.table(negatives_b, paste(working_folder, "negatives_b.bed", sep=""),
            quote = F, row.names = F, col.names = F, sep = "\t")

Since ATAC peaks within in coding loci might have different patterns than those in intragenic regions, we will filter out all the peaks that overlap a coding loci. We will use a gff file containing only coding cds for each species.

#bash

bedtools intersect -a positives_a.bed -b dronov_cds.bed -v > positives_a_nocds.bed
bedtools intersect -a positives_b.bed -b galgal_cds.bed -v > positives_b_nocds.bed
bedtools intersect -a negatives_a.bed -b dronov_cds.bed -v > negatives_a_nocds.bed
bedtools intersect -a negatives_b.bed -b galgal_cds.bed -v > negatives_b_nocds.bed

Filter out these regions that overlap a coding region

#R

positives_a_cds <- read.table(paste(working_folder, "positives_a_nocds.bed", sep=""), header = F, 
                              col.names = c("Scaffold", "Start", "Stop", "Peak_ID", "Samtools"))
positives_b_cds <- read.table(paste(working_folder, "positives_b_nocds.bed", sep=""), header = F,
                              col.names = c("Scaffold", "Start", "Stop", "Peak_ID", "Samtools"))
negatives_a_cds <- read.table(paste(working_folder, "negatives_a_nocds.bed", sep=""), header = F,
                              col.names = c("Scaffold", "Start", "Stop", "Peak_ID", "Samtools"))
negatives_b_cds <- read.table(paste(working_folder, "negatives_b_nocds.bed", sep=""), header = F,
                              col.names = c("Scaffold", "Start", "Stop", "Peak_ID", "Samtools"))

#At the same time I will also divvy up the data into training, validation, and evaluation datsets. For this I will use the samtools fai index file to get the size of each scaffold for that species, sort it by size and distribute it into training, validation, and evalutaion datasets.

chrom_sizes_a <- read.table("Dromaius_novaehollandiae_droNov1.fna.fai", sep="\t", header = F, stringsAsFactors = F)
chrom_sizes_a <- chrom_sizes_a[,c(1,2)]
chrom_sizes_a <- chrom_sizes_a[order(chrom_sizes_a$V2, decreasing = T),]
chrom_sizes_a$order <- seq(1,nrow(chrom_sizes_a))

#First two will go for training
positives_a_train <- subset(positives_a_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(1,60,4)]])
positives_a_train <- rbind(positives_a_train, subset(positives_a_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(2,60,4)]]))

#Third will go for validation
positives_a_valid <- subset(positives_a_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(3,60,4)]])

#Fourth will go for evaluation
positives_a_eval <- subset(positives_a_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(4,60,4)]])

#Same for the negative datasets
negatives_b_train <- subset(negatives_b_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(1,60,4)]])
negatives_b_train <- rbind(negatives_b_train, subset(negatives_b_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(2,60,4)]]))
negatives_b_valid <- subset(negatives_b_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(3,60,4)]])
negatives_b_eval <- subset(negatives_b_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(4,60,4)]])

#Write the output in samtools format
write.table(positives_a_train$Samtools, paste(working_folder, "positives_a_train.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(positives_a_valid$Samtools, paste(working_folder, "positives_a_valid.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(positives_a_eval$Samtools, paste(working_folder, "positives_a_eval.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)

write.table(negatives_b_train$Samtools, paste(working_folder, "negatives_b_train.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(negatives_b_valid$Samtools, paste(working_folder, "negatives_b_valid.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(negatives_b_eval$Samtools, paste(working_folder, "negatives_b_eval.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)

#Repeat for the second species
chrom_sizes_b <- read.table("galgal7b.fna.fai", sep="\t", header = F, stringsAsFactors = F)
chrom_sizes_b <- chrom_sizes_b[,c(1,2)]
chrom_sizes_b <- chrom_sizes_b[order(chrom_sizes_b$V2, decreasing = T),]
chrom_sizes_b$order <- seq(1,nrow(chrom_sizes_b))

positives_b_train <- subset(positives_b_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(1,60,4)]])
positives_b_train <- rbind(positives_b_train, subset(positives_b_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(2,60,4)]]))
positives_b_valid <- subset(positives_b_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(3,60,4)]])
positives_b_eval <- subset(positives_b_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(4,60,4)]])

negatives_a_train <- subset(negatives_a_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(1,60,4)]])
negatives_a_train <- rbind(negatives_a_train, subset(negatives_a_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(2,60,4)]]))
negatives_a_valid <- subset(negatives_a_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(3,60,4)]])
negatives_a_eval <- subset(negatives_a_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(4,60,4)]])

write.table(positives_b_train$Samtools, paste(working_folder, "positives_b_train.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(positives_b_valid$Samtools, paste(working_folder, "positives_b_valid.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(positives_b_eval$Samtools, paste(working_folder, "positives_b_eval.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)

write.table(negatives_a_train$Samtools, paste(working_folder, "negatives_a_train.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(negatives_a_valid$Samtools, paste(working_folder, "negatives_a_valid.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(negatives_a_eval$Samtools, paste(working_folder, "negatives_a_eval.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)

Get the fasta files for each grouping and then combine the different groups for the two species into one. We can finally use this for in the machinations.py file.

#bash

samtools faidx Dromaius_novaehollandiae_droNov1.fna -r positives_a_train.txt > positives_a_train.fa
samtools faidx Dromaius_novaehollandiae_droNov1.fna -r positives_a_valid.txt > positives_a_valid.fa
samtools faidx Dromaius_novaehollandiae_droNov1.fna -r positives_a_eval.txt > positives_a_eval.fa
samtools faidx Dromaius_novaehollandiae_droNov1.fna -r negatives_b_train.txt > negatives_b_train.fa
samtools faidx Dromaius_novaehollandiae_droNov1.fna -r negatives_b_valid.txt > negatives_b_valid.fa
samtools faidx Dromaius_novaehollandiae_droNov1.fna -r negatives_b_eval.txt > negatives_b_eval.fa

samtools faidx galgal7b.fna -r positives_b_train.txt > positives_b_train.fa
samtools faidx galgal7b.fna -r positives_b_valid.txt > positives_b_valid.fa
samtools faidx galgal7b.fna -r positives_b_eval.txt > positives_b_eval.fa
samtools faidx galgal7b.fna -r negatives_a_train.txt > negatives_a_train.fa
samtools faidx galgal7b.fna -r negatives_a_valid.txt > negatives_a_valid.fa
samtools faidx galgal7b.fna -r negatives_a_eval.txt > negatives_a_eval.fa

##combine a and b into one
cat positives_a_train.fa positives_b_train.fa > positives_train.fa
cat positives_a_valid.fa positives_b_valid.fa > positives_valid.fa
cat positives_a_eval.fa positives_b_eval.fa > positives_eval.fa

cat negatives_a_train.fa negatives_b_train.fa > negatives_train.fa
cat negatives_a_valid.fa negatives_b_valid.fa > negatives_valid.fa
cat negatives_a_eval.fa negatives_b_eval.fa > negatives_eval.fa

3.3 Run the python script

Use the config file to run the python script. Once the model is made and evaluated you can use the same file to evaluate any other fasta file.

4 Evaluating comparative ATAC-seq results

4.1 Setting up the comparative data

To check for ATAC-seq peaks, we used the center of the accelerated elements sequences to get a 500 bp region around the element and then used that as an evaluation dataset. We did this for all 79 species.

#for atacseq get 500 bp regions from center of CE
final_data_filtered <- fread("final_data_filtered.txt", sep = "\t", header = T)
phyloacc_subset <- fread("accelerated_elements_all.txt", sep = "\t")
atac_ce <- final_data_filtered[which(final_data_filtered$id %in% phyloacc_subset$original.id), 1:4]

atac_ce$center <- floor((atac_ce$start+atac_ce$end)/2)
atac_ce$center2 <- atac_ce$center + 1

fwrite(atac_ce[,c("scaffold", "center", "center2", "id")], "ce_accl_center_15k.bed", quote = F, sep = "\t", row.names = F, col.names = F)

Liftover the center region to all other species, add 250 bp to the front and back and then get sequences for each species.

#bash

species=("Gavialis_gangeticus" "Struthio_camelus" "Rhea_americana" "Eudromia_elegans" "Dromaius_novaehollandiae" "Anas_platyrhynchos" "Gallus_gallus7" "Phoenicopterus_ruber" "Columba_livia" "Tauraco_erythrolophus" "Cuculus_canorus" "Caprimulgus_europaeus" "Apus_apus" "Calypte_anna" "Porphyrio_hochstetteri" "Pluvialis_apricaria" "Sterna_hirundo" "Aptenodytes_patagonicus" "Aptenodytes_forsteri" "Pygoscelis_adeliae" "Pygoscelis_papua" "Pygoscelis_antarcticus" "Megadyptes_antipodes" "Eudyptula_novaehollandiae" "Eudyptula_albosignata" "Eudyptula_minor" "Spheniscus_demersus" "Spheniscus_humboldti" "Spheniscus_magellanicus" "Eudyptes_pachyrhynchus" "Eudyptes_sclateri" "Eudyptes_chrysolophus" "Eudyptes_schlegeli" "Eudyptes_chrysocome" "Eudyptes_filholi" "Eudyptes_moseleyi" "Macronectes_giganteus" "Theristicus_caerulescens" "Aquila_chrysaetos" "Athene_cunicularia" "Trogon_surrucura" "Bucorvus_abyssinicus" "Halcyon_senegalensis" "Chloroceryle_aenea" "Actenoides_hombroni" "Merops_nubicus" "Pogoniulus_pusillus" "Colaptes_auratus" "Falco_peregrinus" "Melopsittacus_undulatus" "Acanthisitta_chloris" "Myiozetetes_cayanensis" "Chiroxiphia_lanceolata" "Malurus_cyaneus" "Lichenostomus_cassidix" "Lanius_collurio" "Corvus_cornix" "Eremophila_alpestris" "Parus_major" "Acrocephalus_scirpaceus" "Progne_subis" "Riparia_riparia" "Hirundo_rustica" "Brachypodius_melanocephalos" "Pycnonotus_jocosus" "Phylloscopus_trochilus" "Sylvia_atricapilla" "Certhia_americana" "Sturnus_vulgaris" "Ficedula_albicollis" "Taeniopygia_guttata" "Passer_domesticus" "Motacilla_alba" "Serinus_canaria" "Emberiza_elegans" "Molothrus_ater" "Setophaga_coronata" "Diglossa_brunneiventris")

for i in ${species[@]};
do
      halLiftover all_birds_new.hal Gallus_gallus7 ce_accl_center_15k.bed $i regions_new/ce_accl_center_${i}.bed
done

# clean the bed files
for file in regions_new/*.bed
do
        file="$(basename file .bed)"
        bedtools merge -d 20 -c 4 -o "distinct" $file |
        awk 'BEGIN{ FS="\t"; OFS=FS }{ if(($3 -$2) <= 500)) print $1,$2,$3,$4 }'
done

#Add 250bp to either side of the center
while IFS= read -r line
do
        awk '{ if ($2-250>0) print $1":"$2-250"-"$3+248 }' ./regions_new/ce_accl_center_${line}.bed > ./regions/ce_accl_center_${line}.txt
done < "all_species.txt"

#Get fasta sequences for each species, line1 is the species name and line2 is the genome location
while read -r line1 line2
do
        samtools faidx $line2 -r regions/ce_accl_center_${line1}.txt > fastas/ce_accl_center_${line1}.fa
done < "new_bird_aln.txt"

#Concatenate the sequences
cat fastas/*.fa > all_fastas_15k.fa

#Use the fasta as an evaluation dataset in the python script above

4.2 Parsing the results

#R
#parse atac probs for elements
atac_probs <- fread("hindlimb_HH18_epoch20_lay450_batch64_lr1e7_eval_all_atac_prob.txt", sep = "\t")

species=c("Gavialis_gangeticus", "Struthio_camelus", "Rhea_americana", "Eudromia_elegans", 
          "Dromaius_novaehollandiae", "Anas_platyrhynchos", "Gallus_gallus7", "Phoenicopterus_ruber",
          "Columba_livia", "Tauraco_erythrolophus", "Cuculus_canorus", "Caprimulgus_europaeus", 
          "Apus_apus", "Calypte_anna", "Porphyrio_hochstetteri", "Pluvialis_apricaria", 
          "Sterna_hirundo", "Aptenodytes_patagonicus", "Aptenodytes_forsteri", "Pygoscelis_adeliae", 
          "Pygoscelis_papua", "Pygoscelis_antarcticus", "Megadyptes_antipodes", "Eudyptula_novaehollandiae", 
          "Eudyptula_albosignata", "Eudyptula_minor", "Spheniscus_demersus", "Spheniscus_humboldti", 
          "Spheniscus_magellanicus", "Eudyptes_pachyrhynchus", "Eudyptes_sclateri", "Eudyptes_chrysolophus", 
          "Eudyptes_schlegeli", "Eudyptes_chrysocome", "Eudyptes_filholi", "Eudyptes_moseleyi", 
          "Macronectes_giganteus", "Theristicus_caerulescens", "Aquila_chrysaetos", "Athene_cunicularia", 
          "Trogon_surrucura", "Bucorvus_abyssinicus", "Halcyon_senegalensis", "Chloroceryle_aenea", 
          "Actenoides_hombroni", "Merops_nubicus", "Pogoniulus_pusillus", "Colaptes_auratus", 
          "Falco_peregrinus", "Melopsittacus_undulatus", "Acanthisitta_chloris", "Myiozetetes_cayanensis", 
          "Chiroxiphia_lanceolata", "Malurus_cyaneus", "Lichenostomus_cassidix", "Lanius_collurio", 
          "Corvus_cornix", "Eremophila_alpestris", "Parus_major", "Acrocephalus_scirpaceus", 
          "Progne_subis", "Riparia_riparia", "Hirundo_rustica", "Brachypodius_melanocephalos", 
          "Pycnonotus_jocosus", "Phylloscopus_trochilus", "Sylvia_atricapilla", "Certhia_americana", 
          "Sturnus_vulgaris", "Ficedula_albicollis", "Taeniopygia_guttata", "Passer_domesticus", 
          "Motacilla_alba", "Serinus_canaria", "Emberiza_elegans", "Molothrus_ater", 
          "Setophaga_coronata", "Diglossa_brunneiventris")

galgal_elems <- read.table("ce_accl_center_15k.bed", sep = "\t")

for (aspecies in species){
  elem_file <- read.table(paste(aspecies, ".bed", sep = ""), sep = "\t")
  elem_file$samtools <- paste(elem_file$V1, ":", elem_file$V2 - 250, "-", elem_file$V3 + 248, sep="")
  atac_probs_new <- merge(atac_probs, elem_file, by.x = "V1", by.y = "samtools", all = F)
  atac_probs_new <- atac_probs_new[!duplicated(atac_probs_new$V4)]
  colnames(atac_probs_new)[2] <- aspecies
  galgal_elems <- merge(galgal_elems, atac_probs_new[,c(2,6)], by.x = "V4", by.y = "V4", all.x = T, all.y = F)
}


#code to get four categories
#chicken ATAC yes; chicken yes; accl yes + chicken ATAC no; chicken no; accl no
#chicken ATAC yes; chicken yes; accl no + chicken ATAC no; chicken no; accl yes
#chicken ATAC yes; chicken no; accl no + chicken ATAC no; chicken yes; accl yes
#chicken ATAC yes; chicken no; accl yes + chicken ATAC no; chicken yes; accl no

#atac main data
atac_hl <- read.table("ce_accl_center_15k_hl_atac_annot.bed", header = F)
atac_hl$atac <- atac_hl$V5 > 0 
atac_df <- atac_hl[which(atac_hl$V4 %in% galgal_elems$V4), c("V4", "atac")]
colnames(atac_df) <- c("id", "atac")
atac_df <- merge(atac_df, phyloacc_subset[,c(1:3)], by.x = "id", by.y = "original.id")

rate_data <- fread("M2_elem_Z.txt", header = T)
results <- fread("rate_postZ_M2.txt", header = T, sep = "\t")
results <- results[,c(1:6,seq(10,322,4))]

colnames(results) <- gsub("_3", "", colnames(results))

#using 0.5 as the barrier
for (elem in 1:nrow(atac_df)){
  element <- atac_df$id[elem]
  if (galgal_elems[elem, "Gallus_gallus7"] > 0.5){
    atac_df[elem, "atac_pred"] <- TRUE
  }else{
    atac_df[elem, "atac_pred"] <- FALSE
  }
  accl_taxa <- intersect(colnames(rate_data)[rate_data[which(rate_data$`Locus ID` %in% atac_df[elem, "phyloacc.id"]),] == 2],
                         colnames(results)[results[which(results$`Locus ID` %in% atac_df[elem, "phyloacc.id"]),] >= 0.9])
  accl_taxa <- accl_taxa[accl_taxa != "Gallus_gallus4"]
  accl_atac <- table(galgal_elems[elem, accl_taxa] > 0.5)
  if (length(accl_atac)==2){
    atac_df[elem, "accl_atac_pred"] <- unname(accl_atac[2]/(accl_atac[1] + accl_atac[2]))
  }
  else if (length(accl_atac)==0){
    atac_df[elem, "accl_atac_pred"] <- NA
  }
  else if (names(accl_atac)=="TRUE"){
    atac_df[elem, "accl_atac_pred"] <- 1
  }
  else {
    atac_df[elem, "accl_atac_pred"] <- 0
  }
  all_atac <- table(galgal_elems[elem,] > 0.5)
  atac_df[elem, "all_atac_pred"] <- unname(all_atac[2]/(all_atac[1] + all_atac[2]))
}

atac_df$accl_atac_pred_bool <- atac_df$accl_atac_pred > 0.5

table(atac_df[,c("atac_pred", "accl_atac_pred_bool", "atac")])

#where atac shifts
shifted <- atac_df[which(!atac_df$atac & !atac_df$atac_pred & atac_df$accl_atac_pred_bool),]
shifted <- rbind(shifted, atac_df[which(atac_df$atac & atac_df$atac_pred & !atac_df$accl_atac_pred_bool),])

shifted_elems <- subset(final_data[,c(1:4)], id %in% shifted$id)
write.table(shifted_elems, "atac_shift_elems.txt", sep = "\t", col.names = F, row.names = F, quote = F)

#get closeby genes
library(stringr)
shifted_genes <- fread("atac_shift_genes.txt", sep = "\t")
shifted_genes <- subset(shifted_genes, V14 < 10000)

shifted_genes$id <- str_extract(shifted_genes$V13, "Name=[A-Za-z0-9-_.]*")
shifted_genes$id <- gsub("Name=", "", shifted_genes$id)

References

Crispatzu, Giuliano, Rizwan Rehimi, Tomas Pachano, Tore Bleckwehl, Sara Cruz-Molina, Cally Xiao, Esther Mahabir, Hisham Bazzi, and Alvaro Rada-Iglesias. 2021. “The Chromatin, Topological and Regulatory Properties of Pluripotency-Associated Poised Enhancers Are Conserved in Vivo.” Nature Communications 12 (1): 1–17.
Jhanwar, Shalu, Jonas Malkmus, Jens Stolte, Olga Romashkina, Aimée Zuniga, and Rolf Zeller. 2021. “Conserved and Species-Specific Chromatin Remodeling and Regulatory Dynamics During Mouse and Chicken Limb Bud Development.” Nature Communications 12 (1): 1–17.
Kaplow, Irene M, Alyssa J Lawler, Daniel E Schaffer, Chaitanya Srinivasan, Morgan E Wirthlin, BaDoi N Phan, Xiaomeng Zhang, et al. 2022. “Relating Enhancer Genetic Variation Across Mammals to Complex Phenotypes Using Machine Learning.” bioRxiv.
Lai, Yung-Chih, Ya-Chen Liang, Ting-Xin Jiang, Randall B Widelitz, Ping Wu, and Cheng-Ming Chuong. 2018. “Transcriptome Analyses of Reprogrammed Feather/Scale Chimeric Explants Revealed Co-Expressed Epithelial Gene Networks During Organ Specification.” BMC Genomics 19 (1): 1–12.
Laugsch, Magdalena, Michaela Bartusel, Hafiza Alirzayeva, Agathi Karaolidou, Rizwan Rehimi, Giuliano Crispatzu, Milos Nikolic, et al. 2018. “Disruption of the TFAP2A Regulatory Domain Causes Banchio-Oculo-Facial Syndrome (BOFS) and Illuminates Pathomechanisms for Other Human Neurocristopathies.”
Ling, Irving TC, and Tatjana Sauka-Spengler. 2019. “Early Chromatin Shaping Predetermines Multipotent Vagal Neural Crest into Neural, Neuronal and Mesenchymal Lineages.” Nature Cell Biology 21 (12): 1504–17.
Lonfat, Nicolas, Su Wang, ChangHee Lee, Mauricio Garcia, Jiho Choi, Peter J Park, and Connie Cepko. 2021. “Cis-Regulatory Dissection of Cone Development Reveals a Broad Role for Otx2 and Oc Transcription Factors.” Development 148 (9): dev198549.
Malkmus, Jonas, Laurène Ramos Martins, Shalu Jhanwar, Bonnie Kircher, Victorio Palacio, Rushikesh Sheth, Francisca Leal, et al. 2021. “Spatial Regulation by Multiple Gremlin1 Enhancers Provides Digit Development with Cis-Regulatory Robustness and Evolutionary Plasticity.” Nature Communications 12 (1): 1–17.
Mok, Gi Fay, Leighton Folkes, Shannon A Weldon, Eirini Maniou, Victor Martinez-Heredia, Alice M Godden, Ruth M Williams, et al. 2021. “Characterising Open Chromatin in Chick Embryos Identifies Cis-Regulatory Elements Important for Paraxial Mesoderm Formation and Axis Extension.” Nature Communications 12 (1): 1–15.
Patoori, Sruti, Nathalie Jean-Charles, Ariana Gopal, Sacha Sulaiman, Sneha Gopal, Brian Wang, Benjamin Souferi, and Mark M Emerson. 2020. “Cis-Regulatory Analysis of Onecut1 Expression in Fate-Restricted Retinal Progenitor Cells.” Neural Development 15 (1): 1–20.
Rothstein, Megan, and Marcos Simoes-Costa. 2020. “Heterodimerization of TFAP2 Pioneer Factors Drives Epigenomic Remodeling During Neural Crest Specification.” Genome Research 30 (1): 35–48.
Sackton, Timothy B, Phil Grayson, Alison Cloutier, Zhirui Hu, Jun S Liu, Nicole E Wheeler, Paul P Gardner, et al. 2019. “Convergent Regulatory Evolution and Loss of Flight in Paleognathous Birds.” Science 364 (6435): 74–78.
Williams, Ruth M, Ivan Candido-Ferreira, Emmanouela Repapi, Daria Gavriouchkina, Upeka Senanayake, Irving TC Ling, Jelena Telenius, Stephen Taylor, Jim Hughes, and Tatjana Sauka-Spengler. 2019. “Reconstruction of the Global Neural Crest Gene Regulatory Network in Vivo.” Developmental Cell 51 (2): 255–76.
Young, John J, Phil Grayson, Scott V Edwards, and Clifford J Tabin. 2019. “Attenuated Fgf Signaling Underlies the Forelimb Heterochrony in the Emu Dromaius Novaehollandiae.” Current Biology 29 (21): 3681–91.