Note

Following includes codes used to generate an annotated list of conserved elements for birds. It uses bash and R codes interchangeably along with other instructions to run. Please make sure you are running the codes in the correct interface.

1 Introduction

This markdown file documents the generation of a dataset of ATACseq peaks for chicken, mapped to the bGalGal1.mat.broiler.GRCg7b (henceforth only referred to as galgal7b). You can get the genome here.

We originally started with 116 fasta files comprising 43 unique tissues and treatments at different stages of chick development. We mapped them to the galgal7b genome. We omitted 15 fasta files (see bottom of table) due to quality of reads and problems with alaingments leaving us with 101 fasta files of 36 tissue types.

Tissue_type	Replicate	SRA_accession	Total_reads_mapped	Data_source
cone cell e5 -	1	SRR11916725	400534	Lonfat et al. (2021)
cone cell e5 -	2	SRR11916726	586532	Lonfat et al. (2021)
retinal cell e5-6	1	SRR10533482	619494	Patoori et al. (2020)
retinal cell e5-6	2	SRR10533483	1155904	Patoori et al. (2020)
wing bud HH24	1	SRR13428042	392412	Jhanwar et al. (2021)
wing bud HH24	2	SRR13428041	501428	Jhanwar et al. (2021)
wing bud HH22	1	SRR13428040	323406	Jhanwar et al. (2021)
wing bud HH22	2	SRR13428039	422077	Jhanwar et al. (2021)
wing bud HH20	1	SRR13428038	195410	Jhanwar et al. (2021)
wing bud HH20	2	SRR13428037	197882	Jhanwar et al. (2021)
limb bud HH24	1	SRR11878467	392158	Malkmus et al. (2021)
limb bud HH24	2	SRR11878466	501842	Malkmus et al. (2021)
presomitic mesoderm HH14	1	SRR10917286	2429861	Mok et al. (2021)
presomitic mesoderm HH14	2	SRR10917285	2801157	Mok et al. (2021)
presomitic mesoderm HH14	3	SRR10917274	2288277	Mok et al. (2021)
epithelial somites HH14	1	SRR10917266	1992608	Mok et al. (2021)
epithelial somites HH14	2	SRR10917265	2105774	Mok et al. (2021)
epithelial somites HH14	3	SRR10917264	1836757	Mok et al. (2021)
hindlimb HH18	1	SRR10059742	1023402	Young et al. (2019)
hindlimb HH18	2	SRR10059741	795418	Young et al. (2019)
hindlimb HH18	3	SRR10059740	603441	Young et al. (2019)
forelimb HH18	1	SRR10059737	915961	Young et al. (2019)
forelimb HH18	2	SRR10059738	781829	Young et al. (2019)
forelimb HH18	3	SRR10059739	855197	Young et al. (2019)
flank HH18	1	SRR10059734	687477	Young et al. (2019)
flank HH18	2	SRR10059735	661340	Young et al. (2019)
flank HH18	3	SRR10059736	881883	Young et al. (2019)
neural crest HH9	1	SRR10279641	465478	Rothstein and Simoes-Costa (2020)
neural crest HH9	2	SRR10279640	1052795	Rothstein and Simoes-Costa (2020)
neural crest HH6	1	SRR10279639	1188400	Rothstein and Simoes-Costa (2020)
neural crest HH6	2	SRR10279638	1828905	Rothstein and Simoes-Costa (2020)
negative control NC2- E2- HH25	3	SRR8492067	1308863	Ling and Sauka-Spengler (2019)
negative control NC2- E2- HH25	2	SRR8492066	1470813	Ling and Sauka-Spengler (2019)
negative control NC2- E2- HH25	1	SRR8492065	984023	Ling and Sauka-Spengler (2019)
neural crest NC2- HH25	3	SRR8492064	739201	Ling and Sauka-Spengler (2019)
neural crest NC2- HH25	2	SRR8492063	840937	Ling and Sauka-Spengler (2019)
neural crest NC2- HH25	1	SRR8492062	699955	Ling and Sauka-Spengler (2019)
neural crest NC2+ HH25	3	SRR8492061	908522	Ling and Sauka-Spengler (2019)
neural crest NC2+ HH25	2	SRR8492060	991016	Ling and Sauka-Spengler (2019)
neural crest NC2+ HH25	1	SRR8492059	717557	Ling and Sauka-Spengler (2019)
negative control NC2- E2- HH18	3	SRR8492058	943111	Ling and Sauka-Spengler (2019)
negative control NC2- E2- HH18	2	SRR8492057	1419320	Ling and Sauka-Spengler (2019)
negative control NC2- E2- HH18	1	SRR8492056	993025	Ling and Sauka-Spengler (2019)
neural crest NC2- E2+ HH18	3	SRR8492055	1080428	Ling and Sauka-Spengler (2019)
neural crest NC2- E2+ HH18	2	SRR8492054	1325955	Ling and Sauka-Spengler (2019)
neural crest NC2- E2+ HH18	1	SRR8492053	1506587	Ling and Sauka-Spengler (2019)
neural crest NC2+ E2+ HH18	3	SRR8492052	1657227	Ling and Sauka-Spengler (2019)
neural crest NC2+ E2+ HH18	2	SRR8492051	1863603	Ling and Sauka-Spengler (2019)
neural crest NC2+ E2+ HH18	1	SRR8492050	1656390	Ling and Sauka-Spengler (2019)
neural crest E2+ HH12	3	SRR8492049	591477	Ling and Sauka-Spengler (2019)
neural crest E2+ HH12	2	SRR8492048	746994	Ling and Sauka-Spengler (2019)
neural crest E2+ HH12	1	SRR8492047	685209	Ling and Sauka-Spengler (2019)
negative control NC2- HH12	3	SRR8492046	566067	Ling and Sauka-Spengler (2019)
negative control NC2- HH12	2	SRR8492045	640875	Ling and Sauka-Spengler (2019)
negative control NC2- HH12	1	SRR8492044	692517	Ling and Sauka-Spengler (2019)
neural crest NC2+ HH12	3	SRR8492043	1337110	Ling and Sauka-Spengler (2019)
neural crest NC2+ HH12	2	SRR8492042	654113	Ling and Sauka-Spengler (2019)
neural crest NC2+ HH12	1	SRR8492041	1474878	Ling and Sauka-Spengler (2019)
non-neural crest 8-10ss	1	SRR8062752	327185	Williams et al. (2019)
non-neural crest 8-10ss	2	SRR8062751	713343	Williams et al. (2019)
non-neural crest 8-10ss	3	SRR8062750	544369	Williams et al. (2019)
non-neural crest 8-10ss	4	SRR8062749	526987	Williams et al. (2019)
non-neural crest 5-6ss	1	SRR8062748	406114	Williams et al. (2019)
non-neural crest 5-6ss	2	SRR8062747	536168	Williams et al. (2019)
non-neural crest 5-6ss	3	SRR8062746	490289	Williams et al. (2019)
non-neural crest 5-6ss	4	SRR8062745	769000	Williams et al. (2019)
neural crest 8-10ss FOXD3	1	SRR8062744	643461	Williams et al. (2019)
neural crest 8-10ss FOXD3	2	SRR8062743	510546	Williams et al. (2019)
neural crest 8-10ss FOXD3	3	SRR8062742	505559	Williams et al. (2019)
epiblast HH4	1	SRR8062741	347257	Williams et al. (2019)
epiblast HH4	2	SRR8062740	355175	Williams et al. (2019)
metatarsal skin e12 retinoic acid	1	SRR6782021	889266	Lai et al. (2018)
metatarsal skin e12 retinoic acid	2	SRR6782020	911850	Lai et al. (2018)
metatarsal skin e12 RCAS-GFP	1	SRR6782019	1672701	Lai et al. (2018)
metatarsal skin e12 RCAS-GFP	2	SRR6782018	1622741	Lai et al. (2018)
metatarsal skin e12 RCAS-CTNNB1	1	SRR6782017	514530	Lai et al. (2018)
metatarsal skin e12 RCAS-CTNNB1	2	SRR6782016	332392	Lai et al. (2018)
keel e10	1	SRR6713569	799482	Sackton et al. (2019)
keel e10	2	SRR6713570	815108	Sackton et al. (2019)
keel e10	3	SRR6713592	705974	Sackton et al. (2019)
hindlimb e4.5	1	SRR6713571	1057708	Sackton et al. (2019)
hindlimb e4.5	2	SRR6713572	816446	Sackton et al. (2019)
hindlimb e4.5	3	SRR6713577	992959	Sackton et al. (2019)
full sternum e10	1	SRR6713573	1243533	Sackton et al. (2019)
full sternum e10	2	SRR6713574	847942	Sackton et al. (2019)
full sternum e10	3	SRR6713580	1091860	Sackton et al. (2019)
inferior sternum e9	1	SRR6713575	1128212	Sackton et al. (2019)
inferior sternum e9	2	SRR6713576	1171526	Sackton et al. (2019)
inferior sternum e9	3	SRR6713578	1127918	Sackton et al. (2019)
forelimb e4.5	1	SRR6713579	883046	Sackton et al. (2019)
forelimb e4.5	2	SRR6713585	351224	Sackton et al. (2019)
forelimb e4.5	3	SRR6713586	717558	Sackton et al. (2019)
flight muscle e10	1	SRR6713581	697283	Sackton et al. (2019)
flight muscle e10	2	SRR6713583	641133	Sackton et al. (2019)
flight muscle e10	3	SRR6713584	513558	Sackton et al. (2019)
flight muscle e9	1	SRR6713582	586575	Sackton et al. (2019)
flight muscle e9	2	SRR6713587	560014	Sackton et al. (2019)
flight muscle e9	3	SRR6713588	403399	Sackton et al. (2019)
superior sternum e9	1	SRR6713589	815243	Sackton et al. (2019)
superior sternum e9	2	SRR6713590	909182	Sackton et al. (2019)
superior sternum e9	3	SRR6713591	1292178	Sackton et al. (2019)
~~FNP cells HH24~~	1	SRR6418911	0	Laugsch et al. (2018)
~~cone cell e5+~~	1	SRR11916722	0	Lonfat et al. (2021)
~~cone cell e5+~~	2	SRR11916723	0	Lonfat et al. (2021)
~~cone cell e5+~~	3	SRR11916724	0	Lonfat et al. (2021)
~~epiblast HH3~~	1	SRR12965854	118497	Crispatzu et al. (2021)
~~maturing somites HH14~~	1	SRR10917280	0	Mok et al. (2021)
~~maturing somites HH14~~	2	SRR10917281	0	Mok et al. (2021)
~~maturing somites HH14~~	3	SRR10917282	0	Mok et al. (2021)
~~differentiated somites HH14~~	1	SRR10917276	0	Mok et al. (2021)
~~differentiated somites HH14~~	2	SRR10917275	0	Mok et al. (2021)
~~differentiated somites HH14~~	3	SRR10917273	0	Mok et al. (2021)
~~somites HH10~~	1	SRR8062739	2405	Williams et al. (2019)
~~neural crest 5-6ss pax7~~	1	SRR8062737	23955	Williams et al. (2019)
~~neural crest 5-6ss FOXD3~~	1	SRR8062736	0	Williams et al. (2019)
~~neural crest 5-6ss FOXD3~~	2	SRR8062735	0	Williams et al. (2019)

2 Methods

2.1 Gather all files

We used sra toolkit to download the files and convert them to fastq files. You can download the toolkit from https://hpc.nih.gov/apps/sratoolkit.html.

#bash

#get SRA files
prefetch \
SRR11916725 SRR11916726 SRR8062740  SRR8062741  SRR10917264 SRR10917265 \
SRR10917266 SRR10059734 SRR10059735 SRR10059736 SRR6713581  SRR6713583 \
SRR6713584  SRR6713582  SRR6713587  SRR6713588  SRR6713579  SRR6713585 \
SRR6713586  SRR10059737 SRR10059738 SRR10059739 SRR6713573  SRR6713574 \
SRR6713580  SRR6713571  SRR6713572  SRR6713577  SRR10059740 SRR10059741 \
SRR10059742 SRR6713575  SRR6713576  SRR6713578  SRR6713569  SRR6713570 \
SRR6713592  SRR11878466 SRR11878467 SRR6782016  SRR6782017  SRR6782018 \
SRR6782019  SRR6782020  SRR6782021  SRR8492056  SRR8492057  SRR8492058 \
SRR8492065  SRR8492066  SRR8492067  SRR8492044  SRR8492045  SRR8492046 \
SRR8062742  SRR8062743  SRR8062744  SRR8492047  SRR8492048  SRR8492049 \
SRR10279638 SRR10279639 SRR10279640 SRR10279641 SRR8492053  SRR8492054 \
SRR8492055  SRR8492062  SRR8492063  SRR8492064  SRR8492050  SRR8492051 \
SRR8492052  SRR8492041  SRR8492042  SRR8492043  SRR8492059  SRR8492060 \
SRR8492061  SRR8062745  SRR8062746  SRR8062747  SRR8062748  SRR8062749 \
SRR8062750  SRR8062751  SRR8062752  SRR10917274 SRR10917285 SRR10917286 \
SRR10533482 SRR10533483 SRR6713589  SRR6713590  SRR6713591  SRR13428037 \
SRR13428038 SRR13428039 SRR13428040 SRR13428041 SRR13428042 -O ./

#convert SRA to fastq
mkdir -p fastq
fasterq-dump \
SRR11916725 SRR11916726 SRR8062740  SRR8062741  SRR10917264 SRR10917265 \
SRR10917266 SRR10059734 SRR10059735 SRR10059736 SRR6713581  SRR6713583 \
SRR6713584  SRR6713582  SRR6713587  SRR6713588  SRR6713579  SRR6713585 \
SRR6713586  SRR10059737 SRR10059738 SRR10059739 SRR6713573  SRR6713574 \
SRR6713580  SRR6713571  SRR6713572  SRR6713577  SRR10059740 SRR10059741 \
SRR10059742 SRR6713575  SRR6713576  SRR6713578  SRR6713569  SRR6713570 \
SRR6713592  SRR11878466 SRR11878467 SRR6782016  SRR6782017  SRR6782018 \
SRR6782019  SRR6782020  SRR6782021  SRR8492056  SRR8492057  SRR8492058 \
SRR8492065  SRR8492066  SRR8492067  SRR8492044  SRR8492045  SRR8492046 \
SRR8062742  SRR8062743  SRR8062744  SRR8492047  SRR8492048  SRR8492049 \
SRR10279638 SRR10279639 SRR10279640 SRR10279641 SRR8492053  SRR8492054 \
SRR8492055  SRR8492062  SRR8492063  SRR8492064  SRR8492050  SRR8492051 \
SRR8492052  SRR8492041  SRR8492042  SRR8492043  SRR8492059  SRR8492060 \
SRR8492061  SRR8062745  SRR8062746  SRR8062747  SRR8062748  SRR8062749 \
SRR8062750  SRR8062751  SRR8062752  SRR10917274 SRR10917285 SRR10917286 \
SRR10533482 SRR10533483 SRR6713589  SRR6713590  SRR6713591  SRR13428037 \
SRR13428038 SRR13428039 SRR13428040 SRR13428041 SRR13428042 -O ./fastq -e 10

2.2 Remove adapters

We use NGmerge to remove adapters. You can get NGmerge at https://github.com/jsh58/NGmerge.

#bash

#We created a file acc_list.txt that lists all the accessions, one per line
mkdir -p fastq_ngmerge
while IFS= read -r line; do
  NGmerge -a -1 ./fastq/${line}_1.fastq -2 ./fastq/${line}_2.fastq -o ./fastq_ngmerge/${line} -v -u 73 -n 6
done < "acc_list.txt"

2.3 Map to galgal7

We used bwa v0.7.17 and samtools v1.9 to map all the files to the galgal7b genome.

#bash
bwa index galgal7b.fna
samtools faidx galgal7b.fna
samtools dict galgal7b.fna -o galgal7b.dict

mkdir -p bams
while IFS= read -r line; do
    bwa mem -t 10 galgal7b.fna ./fastq_ngmerge/${line}_1.fastq ./fastq_ngmerge/${line}_2.fastq | \
    samtools view -b -u | \
    samtools sort --threads 8 -n -o ./bams/${line}.bam
done < "acc_list.txt"

2.4 Call ATAC peaks

We called peaks two ways, one for each tissue type and one for all the files altogether. We used a -e filter to remove mitochondrial sequences.

#bash
mkdir -p beds

Genrich -t ./bams/SRR11916725.bam,./bams/SRR11916726.bam \
-o ./beds/cone_cell_e5_-.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10533482.bam,./bams/SRR10533483.bam \
-o ./beds/retinal_cell_e5-6.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR13428042.bam,./bams/SRR13428041.bam \
-o ./beds/wing_bud_HH24.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR13428040.bam,./bams/SRR13428039.bam \
-o ./beds/wing_bud_HH22.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR13428038.bam,./bams/SRR13428037.bam \
-o ./beds/wing_bud_HH20.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR11878467.bam,./bams/SRR11878466.bam \
-o ./beds/limb_bud_HH24.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10917286.bam,./bams/SRR10917285.bam,./bams/SRR10917274.bam \
-o ./beds/presomitic_mesoderm_HH14.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10917266.bam,./bams/SRR10917265.bam,./bams/SRR10917264.bam \
-o ./beds/epithelial_somites_HH14.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10059742.bam,./bams/SRR10059741.bam,./bams/SRR10059740.bam \
-o ./beds/hindlimb_HH18.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10059737.bam,./bams/SRR10059738.bam,./bams/SRR10059739.bam \
-o ./beds/forelimb_HH18.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10059734.bam,./bams/SRR10059735.bam,./bams/SRR10059736.bam \
-o ./beds/flank_HH18.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10279641.bam,./bams/SRR10279640.bam \
-o ./beds/neural_crest_HH9.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR10279639.bam,./bams/SRR10279638.bam \
-o ./beds/neural_crest_HH6.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492067.bam,./bams/SRR8492066.bam,./bams/SRR8492065.bam \
-o ./beds/negative_control_NC2-_E2-_HH25.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492064.bam,./bams/SRR8492063.bam,./bams/SRR8492062.bam \
-o ./beds/neural_crest_NC2-_HH25.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492061.bam,./bams/SRR8492060.bam,./bams/SRR8492059.bam \
-o ./beds/neural_crest_NC2+_HH25.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492058.bam,./bams/SRR8492057.bam,./bams/SRR8492056.bam \
-o ./beds/negative_control_NC2-_E2-_HH18.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492055.bam,./bams/SRR8492054.bam,./bams/SRR8492053.bam \
-o ./beds/neural_crest_NC2-_E2+_HH18.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492052.bam,./bams/SRR8492051.bam,./bams/SRR8492050.bam \
-o ./beds/neural_crest_NC2+_E2+_HH18.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492049.bam,./bams/SRR8492048.bam,./bams/SRR8492047.bam \
-o ./beds/neural_crest_E2+_HH12.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492046.bam,./bams/SRR8492045.bam,./bams/SRR8492044.bam \
-o ./beds/negative_control_NC2-_HH12.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8492043.bam,./bams/SRR8492042.bam,./bams/SRR8492041.bam \
-o ./beds/neural_crest_NC2+_HH12_.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8062752.bam,./bams/SRR8062751.bam,./bams/SRR8062750.bam,./bams/SRR8062749.bam \
-o ./beds/non-neural_crest_8-10ss.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR8062741.bam,./bams/SRR8062740.bam \
-o ./beds/epiblast_HH4.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6782021.bam,./bams/SRR6782020.bam \
-o ./beds/metatarsal_skin_e12_retinoic_acid.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6782019.bam,./bams/SRR6782018.bam \
-o ./beds/metatarsal_skin_e12_RCAS-GFP.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6782017.bam,./bams/SRR6782016.bam \
-o ./beds/metatarsal_skin_e12_RCAS-CTNNB1.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713569.bam,./bams/SRR6713570.bam,./bams/SRR6713592.bam \
-o ./beds/keel_e10.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713571.bam,./bams/SRR6713572.bam,./bams/SRR6713577.bam \
-o ./beds/hindlimb_e4.5.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713573.bam,./bams/SRR6713574.bam,./bams/SRR6713580.bam \
-o ./beds/full_sternum_e10.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713575.bam,./bams/SRR6713576.bam,./bams/SRR6713578.bam \
-o ./beds/inferior_sternum_e9.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713579.bam,./bams/SRR6713585.bam,./bams/SRR6713586.bam \
-o ./beds/forelimb_e4.5.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713581.bam,./bams/SRR6713583.bam,./bams/SRR6713584.bam \
-o ./beds/flight_muscle_e10.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713582.bam,./bams/SRR6713587.bam,./bams/SRR6713588.bam \
-o ./beds/flight_muscle_e9.bed -j -e NC_053523.1 -r

Genrich -t ./bams/SRR6713589.bam,./bams/SRR6713590.bam,./bams/SRR6713591.bam \
-o ./beds/superior_sternum_e9.bed -j -e NC_053523.1 -r

#Call all bams at once
Genrich -t \
./bams/SRR10059734.bam,./bams/SRR10059735.bam,./bams/SRR10059736.bam,\
./bams/SRR10059737.bam,./bams/SRR10059738.bam,./bams/SRR10059739.bam,\
./bams/SRR10059740.bam,./bams/SRR10059741.bam,./bams/SRR10059742.bam,\
./bams/SRR10279638.bam,./bams/SRR10279639.bam,./bams/SRR10279640.bam,\
./bams/SRR10279641.bam,./bams/SRR10533482.bam,./bams/SRR10533483.bam,\
./bams/SRR10917264.bam,./bams/SRR10917265.bam,./bams/SRR10917266.bam,\
./bams/SRR10917274.bam,./bams/SRR10917285.bam,./bams/SRR10917286.bam,\
./bams/SRR11878466.bam,./bams/SRR11878467.bam,./bams/SRR11916725.bam,\
./bams/SRR11916726.bam,./bams/SRR13428037.bam,./bams/SRR13428038.bam,\
./bams/SRR13428039.bam,./bams/SRR13428040.bam,./bams/SRR13428041.bam,\
./bams/SRR13428042.bam,./bams/SRR6713569.bam,./bams/SRR6713570.bam,\
./bams/SRR6713571.bam,./bams/SRR6713572.bam,./bams/SRR6713573.bam,\
./bams/SRR6713574.bam,./bams/SRR6713575.bam,./bams/SRR6713576.bam,\
./bams/SRR6713577.bam,./bams/SRR6713578.bam,./bams/SRR6713579.bam,\
./bams/SRR6713580.bam,./bams/SRR6713581.bam,./bams/SRR6713582.bam,\
./bams/SRR6713583.bam,./bams/SRR6713584.bam,./bams/SRR6713585.bam,\
./bams/SRR6713586.bam,./bams/SRR6713587.bam,./bams/SRR6713588.bam,\
./bams/SRR6713589.bam,./bams/SRR6713590.bam,./bams/SRR6713591.bam,\
./bams/SRR6713592.bam,./bams/SRR6782016.bam,./bams/SRR6782017.bam,\
./bams/SRR6782018.bam,./bams/SRR6782019.bam,./bams/SRR6782020.bam,\
./bams/SRR6782021.bam,./bams/SRR8062740.bam,./bams/SRR8062741.bam,\
./bams/SRR8062742.bam,./bams/SRR8062743.bam,./bams/SRR8062744.bam,\
./bams/SRR8062745.bam,./bams/SRR8062746.bam,./bams/SRR8062747.bam,\
./bams/SRR8062748.bam,./bams/SRR8062749.bam,./bams/SRR8062750.bam,\
./bams/SRR8062751.bam,./bams/SRR8062752.bam,./bams/SRR8492041.bam,\
./bams/SRR8492042.bam,./bams/SRR8492043.bam,./bams/SRR8492044.bam,\
./bams/SRR8492045.bam,./bams/SRR8492046.bam,./bams/SRR8492047.bam,\
./bams/SRR8492048.bam,./bams/SRR8492049.bam,./bams/SRR8492050.bam,\
./bams/SRR8492051.bam,./bams/SRR8492052.bam,./bams/SRR8492053.bam,\
./bams/SRR8492054.bam,./bams/SRR8492055.bam,./bams/SRR8492056.bam,\
./bams/SRR8492057.bam,./bams/SRR8492058.bam,./bams/SRR8492059.bam,\
./bams/SRR8492060.bam,./bams/SRR8492061.bam,./bams/SRR8492062.bam,\
./bams/SRR8492063.bam,./bams/SRR8492064.bam,./bams/SRR8492065.bam,\
./bams/SRR8492066.bam,./bams/SRR8492067.bam \
-o ./beds/all_qc.bed -j -e NC_053523.1 -r

2.5 Generate combined dataset

To generate a combined dataset, we used the Genrich call file using all files at once and added all the peaks that were unique to each of the specific tissue type.

#bash
bedtools intersect -v -b beds/all_qc.bed -a beds/cone_cell_e5_-.bed > \
beds_intersect/cone_cell_e5_-_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/retinal_cell_e5-6.bed > \
beds_intersect/retinal_cell_e5-6_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/wing_bud_HH24.bed > \
beds_intersect/wing_bud_HH24_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/wing_bud_HH22.bed > \
beds_intersect/wing_bud_HH22_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/wing_bud_HH20.bed > \
beds_intersect/wing_bud_HH20_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/limb_bud_HH24.bed > \
beds_intersect/limb_bud_HH24_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/presomitic_mesoderm_HH14.bed > \
beds_intersect/presomitic_mesoderm_HH14_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/epithelial_somites_HH14.bed > \
beds_intersect/epithelial_somites_HH14_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/hindlimb_HH18.bed > \
beds_intersect/hindlimb_HH18_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/forelimb_HH18.bed > \
beds_intersect/forelimb_HH18_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/flank_HH18.bed > \
beds_intersect/flank_HH18_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_HH9.bed > \
beds_intersect/neural_crest_HH9_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_HH6.bed > \
beds_intersect/neural_crest_HH6_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/negative_control_NC2-_E2-_HH25.bed > \
beds_intersect/negative_control_NC2-_E2-_HH25_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_NC2-_HH25.bed > \
beds_intersect/neural_crest_NC2-_HH25_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_NC2+_HH25.bed > \
beds_intersect/neural_crest_NC2+_HH25_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/negative_control_NC2-_E2-_HH18.bed > \
beds_intersect/negative_control_NC2-_E2-_HH18_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_NC2-_E2+_HH18.bed > \
beds_intersect/neural_crest_NC2-_E2+_HH18_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_NC2+_E2+_HH18.bed > \
beds_intersect/neural_crest_NC2+_E2+_HH18_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_E2+_HH12.bed > \
beds_intersect/neural_crest_E2+_HH12_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/negative_control_NC2-_HH12.bed > \
beds_intersect/negative_control_NC2-_HH12_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/neural_crest_NC2+_HH12_.bed > \
beds_intersect/neural_crest_NC2+_HH12__intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/non-neural_crest_8-10ss.bed > \
beds_intersect/non-neural_crest_8-10ss_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/non-neural_crest_5-6ss.bed > \
beds_intersect/non-neural_crest_5-6ss_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/epiblast_HH4.bed > \
beds_intersect/epiblast_HH4_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/metatarsal_skin_e12_retinoic_acid.bed > \
beds_intersect/metatarsal_skin_e12_retinoic_acid_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/metatarsal_skin_e12_RCAS-GFP.bed > \
beds_intersect/metatarsal_skin_e12_RCAS-GFP_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/metatarsal_skin_e12_RCAS-bTNNB1.bed > beds_intersect/metatarsal_skin_e12_RCAS-bTNNB1_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/keel_e10.bed > \
beds_intersect/keel_e10_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/hindlimb_e4.5.bed > \
beds_intersect/hindlimb_e4.5_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/full_sternum_e10.bed > \
beds_intersect/full_sternum_e10_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/inferior_sternum_e9.bed > \
beds_intersect/inferior_sternum_e9_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/forelimb_e4.5.bed > \
beds_intersect/forelimb_e4.5_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/flight_muscle_e10.bed > \
beds_intersect/flight_muscle_e10_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/flight_muscle_e9.bed > \
beds_intersect/flight_muscle_e9_intersect.bed
bedtools intersect -v -b beds/all_qc.bed -a beds/superior_sternum_e9.bed > \
beds_intersect/superior_sternum_e9_intersect.bed

#cat all the uniques into one and then add to main file
cat beds_intersect/*.bed | bedtools sort -i "stdin" | \
bedtools merge -i "stdin" > beds/all_uniques.bed

cat beds/all_qc.bed beds/all_uniques.bed | bedtools sort -i "stdin" > atac_all.bed

2.6 Annotate atac file

To generate annotation we used bedtools annotate.

#bash

bedtools annotate -counts -i atac_all.bed -files \
beds/cone_cell_e5_-.bed beds/epiblast_HH4.bed beds/epithelial_somites_HH14.bed \
beds/flank_HH18.bed beds/flight_muscle_e10.bed beds/flight_muscle_e9.bed \
beds/forelimb_e4.5.bed beds/forelimb_HH18.bed beds/full_sternum_e10.bed \
beds/hindlimb_e4.5.bed beds/hindlimb_HH18.bed beds/inferior_sternum_e9.bed \
beds/keel_e10.bed beds/limb_bud_HH24.bed beds/metatarsal_skin_e12_RCAS-CTNNB1.bed \
beds/metatarsal_skin_e12_RCAS-GFP.bed beds/metatarsal_skin_e12_retinoic_acid.bed \
beds/negative_control_NC2-_E2-_HH18.bed beds/negative_control_NC2-_E2-_HH25.bed \
beds/negative_control_NC2-_HH12.bed beds/neural_crest_E2+_HH12.bed \
beds/neural_crest_HH6.bed beds/neural_crest_HH9.bed beds/neural_crest_NC2-_E2+_HH18.bed \
beds/neural_crest_NC2-_HH25.bed beds/neural_crest_NC2+_E2+_HH18.bed \
beds/neural_crest_NC2+_HH12_.bed beds/neural_crest_NC2+_HH25.bed \
beds/non-neural_crest_5-6ss.bed beds/non-neural_crest_8-10ss.bed \
beds/presomitic_mesoderm_HH14.bed beds/retinal_cell_e5-6.bed \
beds/superior_sternum_e9.bed beds/wing_bud_HH20.bed beds/wing_bud_HH22.bed \
beds/wing_bud_HH24.bed > atac_all_annot.txt

#Clean up in R to include header line

2.7 Stats of annotated ATAC

To generate some statistics for annotated ATACseq file.

library(data.table)
library(ggplot2)
library(gridExtra)

atac_data <- fread("atac_all_annot.txt")

atac_data$length <- atac_data$Stop - atac_data$Start
atac_data$length2 <- atac_data$length
atac_data$length2[which(atac_data$length >= 2000)] <- 2000

#Drop elements not called in individual dataset
atac_data$tissues <- rowSums(atac_data[,5:40])
atac_data <- subset(atac_data, tissues > 0)

g1 <- ggplot(atac_data, aes(x = atac_data$length2)) +
  geom_histogram(bins = 50) +
  theme_classic() +
  geom_vline(xintercept = mean(atac_data$length), col = "red", linetype = "dashed") +
  labs(title="Distribution of length of ATACseq elements",x="Length of Elements (bp)", y = "Count")+
  scale_x_continuous(limits = c(0,2000), expand = c(0, 0)) +
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE), expand = expansion(mult = c(0, .1)))

g2 <- ggplot(atac_data, aes(x = atac_data$tissues)) +
  geom_histogram(bins = 36) +
  theme_classic() +
  labs(x="Shared ATACseq elements", y = "Count")+
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE), expand = expansion(mult = c(0, .1)))

gs <- arrangeGrob(g1, g2, ncol = 2)
grid.arrange(gs)

2.7.0.1 Relationship of loci

We used the annotations to create a presence (1)/absence (0) matrix and use IQTree2 to analyze the matrix in a tree-building framework. The matrix was subsequently formatted into a phylip like format for analysis with IQTree.

library(pheatmap)
atac_data_cor <- atac_data[,5:40]
#corr_map <- cor(atac_data_cor)
corr_map <- read.table("ATAC_corr_map.txt", header = T, row.names = 1, sep = "\t")
pheatmap(log(corr_map,base = 10))

To run the tree:

#bash
iqtree2-mpi -s all_tissues.phylip.varsites.phy --seqtype MORPH -m MK+ASC -T 12

Heatmap of elements

3 Predict ATAC-seq in non chicken

We hope to use these open chromatin regions for genomes on birds other than chicken. It is not well known whether a region that is open in a chicken is open in other avian species. An approach called TACIT was developed by Kaplow et al. (2022) that uses machine learning to predict open chromatin regions in different species using a training dataset. We adopted this approach to predict open chromatin regions in different birds species using data from chicken and emu.

3.1 The code

The code is a python code modified from Kaplow et al. (2022) that takes a config file as input and train a neural network on ATACseq peaks get code here (machinations.py).

3.2 Preparing the datasets

Here we use ATACseq calls for Hindlimbs at HH18 stage of chicken (Gallus gallus) and emu (Dromaius novaehollandiae) and use calls for tinamou (Eudromia elegans) to validate the dataset. First, we need to create a dataset of positive (ATAC peak open) and negative (ATAC peak closed). For this we compared elements between chicken and emu and characterized any element that overlapped with one another and present in both datasets as positive (open). For any elements that were open in one species and absent in the other species we binned the open elements in the positive category and the homologous sequence of the absent element in the negative (closed) category. Then we divided this into a training, validation, and evaluation datasets (binned based on scaffold location). These files were then used to train the model.

These steps include jumping from R to bash and back. Please make sure you run the code in the appropriate framework.

First we need to create a bed file for the two species that include sequences of all the same size. The python code can accommodate different sizes but the missing data gets accounted as information for the neural network and the resulting assignments are segregated by size of element (without Ns) rather than the patterns within it.

#R
#getting pos and neg datasets

###inputs; i use a and b to designate two datasets. Replace a and b with respective bed files 
#to run for any group without having to change any other info
bed_a <- "emu_hindlimb_HH18.bed"
bed_a_id <- "emu"
bed_b <- "galgal_hindlimb_HH18.bed"
bed_b_id <- "gal"

###outputs
working_folder <- "./"

###We need to make sure all elements are the same length. Here we chose 500 bp
ele_size <- 500

#A function to parse bed files from Genrich to output unique elements
#that are of the size we need
process_bed <- function(bed_file, id, size){
  bed <- read.table(bed_file, sep="\t", header = F, 
                    col.names = c("Scaffold", "Start", "Stop", "Peak_ID", "Score", 
                                  "Strand", "SignalValue", "p", "q", "Summit"))
  bed$Peak_ID <- paste(id, bed$Peak_ID, sep="_")
  bed$Length <- bed$Stop - bed$Start
  bed$Summit_Base <- bed$Start + bed$Summit
  bed$Summit_Start <- bed$Summit_Base - size/2
  bed$Summit_Stop <- bed$Summit_Base + size/2 - 1
  bed <- subset(bed, Summit_Start >= 0)
  return(bed)
}

#Get bed files for the two species
bed_a.data <- process_bed(bed_a, bed_a_id, ele_size)
bed_b.data <- process_bed(bed_b, bed_b_id, ele_size)

#Write the output as a bed file
write.table(bed_a.data[,c("Scaffold", "Summit_Base", "Summit_Base", "Peak_ID")], 
            paste(working_folder, "bed_a_summit.bed", sep=""), 
            quote = F, row.names = F, col.names = F, sep="\t")
write.table(bed_b.data[,c("Scaffold", "Summit_Base", "Summit_Base", "Peak_ID")], 
            paste(working_folder, "bed_b_summit.bed", sep=""), 
            quote = F, row.names = F, col.names = F, sep="\t")

Next we will clean up the bed files a bit

#bash

bedtools merge -o count,distinct -d 500 -i bed_a_summit.bed > bed_a_unique_summit.bed
bedtools merge -o count,distinct -d 500 -i bed_b_summit.bed > bed_b_unique_summit.bed

This step removes closeby peaks that overlapped when we extended a peak size to 500 bp.

#R
bed_a.uniq <- read.table(paste(working_folder, "bed_a_unique_summit.bed", sep=""), sep="\t", header = F, 
                         col.names = c("Scaffold", "Summit_Start", "Summit_Stop", "Count", "Peak_ID"))
bed_b.uniq <- read.table(paste(working_folder, "bed_b_unique_summit.bed", sep=""), sep="\t", header = F,
                         col.names = c("Scaffold", "Summit_Start", "Summit_Stop", "Count", "Peak_ID"))

#if peaks overlapped, only choose the first peak among the mix
bed_a.uniq$Peak_ID <- gsub(",.*", "", bed_a.uniq$Peak_ID)
bed_b.uniq$Peak_ID <- gsub(",.*", "", bed_b.uniq$Peak_ID)

write.table(bed_a.data[bed_a.data$Peak_ID %in% bed_a.uniq$Peak_ID, 
            c("Scaffold", "Summit_Start", "Summit_Stop", "Peak_ID")],
            paste(working_folder, "bed_a_peaks_500.bed", sep=""), 
            quote = F, row.names = F, col.names = F, sep = "\t")
write.table(bed_b.data[bed_b.data$Peak_ID %in% bed_b.uniq$Peak_ID, 
            c("Scaffold", "Summit_Start", "Summit_Stop", "Peak_ID")],
            paste(working_folder, "bed_b_peaks_500.bed", sep=""), 
            quote = F, row.names = F, col.names = F, sep = "\t")

To compare the two datasets, we need it to be in the same genetic coordinate system. For this we will use the hal alignment to liftover the emu sequences into galgal7b and vice versa.

#bash

#using hal to convert each to the other species coordinates and then use sort and merge to get merged peaks
halLiftover all_birds_new.hal Gallus_gallus7 bed_b_peaks_500.bed Dromaius_novaehollandiae bed_b2a_peaks_500.bed
halLiftover all_birds_new.hal Dromaius_novaehollandiae bed_a_peaks_500.bed Gallus_gallus7 bed_a2b_peaks_500.bed

#clean up the liftover files, we'll consider anything 100 bp nearby as being a convergent peak
bedtools sort -i bed_b2a_peaks_500.bed | bedtools merge -o distinct -d 100 -i stdin > bed_b2a_peaks_merged.bed
bedtools sort -i bed_a2b_peaks_500.bed | bedtools merge -o distinct -d 100 -i stdin > bed_a2b_peaks_merged.bed

#get intersects
bedtools intersect -a bed_a_peaks_500.bed -b bed_b2a_peaks_merged.bed -wa -wb > bed_a_intersect.bed
bedtools intersect -a bed_b_peaks_500.bed -b bed_a2b_peaks_merged.bed -wa -wb > bed_b_intersect.bed

More cleanup in R to make all the sequences in the lifted over files again 500 bp

#R

bed_a2b_merged <- read.table(paste(working_folder, "bed_a2b_peaks_merged.bed", sep=""), sep="\t", header = F,
                             col.names = c("Scaffold", "Summit_Start", "Summit_Stop", "Peak_ID"))
bed_b2a_merged <- read.table(paste(working_folder, "bed_b2a_peaks_merged.bed", sep=""), sep="\t", header = F,
                             col.names = c("Scaffold", "Summit_Start", "Summit_Stop", "Peak_ID"))

#remove peaks that span big gaps, or merge two original peaks
bed_a2b_merged <- bed_a2b_merged[!(duplicated(bed_a2b_merged$Peak_ID)|duplicated(bed_a2b_merged$Peak_ID, fromLast=TRUE)),]
bed_b2a_merged <- bed_b2a_merged[!(duplicated(bed_b2a_merged$Peak_ID)|duplicated(bed_b2a_merged$Peak_ID, fromLast=TRUE)),]
bed_a2b_merged <- bed_a2b_merged[!grepl(",", bed_a2b_merged$Peak_ID),]
bed_b2a_merged <- bed_b2a_merged[!grepl(",", bed_b2a_merged$Peak_ID),]

#get length of converted elements and keep elements that are between 400 and 600 bps
bed_a2b_merged$Length <- bed_a2b_merged$Summit_Stop - bed_a2b_merged$Summit_Start
bed_b2a_merged$Length <- bed_b2a_merged$Summit_Stop - bed_b2a_merged$Summit_Start
bed_a2b_merged <- bed_a2b_merged[which(bed_a2b_merged$Length > 400 & 
                                         bed_a2b_merged$Length < 600),]
bed_b2a_merged <- bed_b2a_merged[which(bed_b2a_merged$Length > 400 & 
                                         bed_b2a_merged$Length < 600),]

#readjust lengths of elements to equal 500 bps, remove peaks that are less than 500 bp
bed_a2b_merged$Chop_Left <- ceiling((bed_a2b_merged$Length - 500)/2)+bed_a2b_merged$Summit_Start
bed_a2b_merged$Chop_Right <- bed_a2b_merged$Summit_Stop - floor((bed_a2b_merged$Length - 500)/2) - 1
bed_b2a_merged$Chop_Left <- ceiling((bed_b2a_merged$Length - 500)/2)+bed_b2a_merged$Summit_Start
bed_b2a_merged$Chop_Right <- bed_b2a_merged$Summit_Stop - floor((bed_b2a_merged$Length - 500)/2) - 1
bed_a2b_merged <- subset(bed_a2b_merged, Chop_Left >= 0)
bed_b2a_merged <- subset(bed_b2a_merged, Chop_Left >= 0)

#intersects
bed_a_intersects <- read.table(paste(working_folder, "bed_a_intersect.bed", sep=""), sep="\t", header = F)
bed_b_intersects <- read.table(paste(working_folder, "bed_b_intersect.bed", sep=""), sep="\t", header = F)

#get files in order and get samtools formatted output to get fasta files
#some fasta files will be truncated because we do not account for the size of scaffolds
positives_a <- bed_a.data[bed_a.data$Peak_ID %in% bed_a.uniq$Peak_ID, 
                          c("Scaffold", "Summit_Start", "Summit_Stop", "Peak_ID")]
positives_b <- bed_b.data[bed_b.data$Peak_ID %in% bed_b.uniq$Peak_ID, 
                          c("Scaffold", "Summit_Start", "Summit_Stop", "Peak_ID")]
positives_a$Samtool <- paste(positives_a$Scaffold, ":", positives_a$Summit_Start, "-", positives_a$Summit_Stop, sep="")
positives_b$Samtool <- paste(positives_b$Scaffold, ":", positives_b$Summit_Start, "-", positives_b$Summit_Stop, sep="")

#for negatives we want all the things that did not intersect
negatives_a_id <- setdiff(bed_a.data$Peak_ID, bed_a_intersects$V4)
negatives_b_id <- setdiff(bed_b.data$Peak_ID, bed_b_intersects$V4)

negatives_a <- bed_a2b_merged[bed_a2b_merged$Peak_ID %in% negatives_a_id, 
                          c("Scaffold", "Chop_Left", "Chop_Right", "Peak_ID")]
negatives_b <- bed_b2a_merged[bed_b2a_merged$Peak_ID %in% negatives_b_id, 
                          c("Scaffold", "Chop_Left", "Chop_Right", "Peak_ID")]
negatives_a$Samtool <- paste(negatives_a$Scaffold, ":", negatives_a$Chop_Left, "-", negatives_a$Chop_Right, sep="")
negatives_b$Samtool <- paste(negatives_b$Scaffold, ":", negatives_b$Chop_Left, "-", negatives_b$Chop_Right, sep="")

#write these positives and negatives bed file
write.table(positives_a, paste(working_folder, "positives_a.bed", sep=""),
            quote = F, row.names = F, col.names = F, sep = "\t")
write.table(positives_b, paste(working_folder, "positives_b.bed", sep=""),
            quote = F, row.names = F, col.names = F, sep = "\t")
write.table(negatives_a, paste(working_folder, "negatives_a.bed", sep=""),
            quote = F, row.names = F, col.names = F, sep = "\t")
write.table(negatives_b, paste(working_folder, "negatives_b.bed", sep=""),
            quote = F, row.names = F, col.names = F, sep = "\t")

Since ATAC peaks within in coding loci might have different patterns than those in intragenic regions, we will filter out all the peaks that overlap a coding loci. We will use a gff file containing only coding cds for each species.

#bash

bedtools intersect -a positives_a.bed -b dronov_cds.bed -v > positives_a_nocds.bed
bedtools intersect -a positives_b.bed -b galgal_cds.bed -v > positives_b_nocds.bed
bedtools intersect -a negatives_a.bed -b dronov_cds.bed -v > negatives_a_nocds.bed
bedtools intersect -a negatives_b.bed -b galgal_cds.bed -v > negatives_b_nocds.bed

Filter out these regions that overlap a coding region

#R

positives_a_cds <- read.table(paste(working_folder, "positives_a_nocds.bed", sep=""), header = F, 
                              col.names = c("Scaffold", "Start", "Stop", "Peak_ID", "Samtools"))
positives_b_cds <- read.table(paste(working_folder, "positives_b_nocds.bed", sep=""), header = F,
                              col.names = c("Scaffold", "Start", "Stop", "Peak_ID", "Samtools"))
negatives_a_cds <- read.table(paste(working_folder, "negatives_a_nocds.bed", sep=""), header = F,
                              col.names = c("Scaffold", "Start", "Stop", "Peak_ID", "Samtools"))
negatives_b_cds <- read.table(paste(working_folder, "negatives_b_nocds.bed", sep=""), header = F,
                              col.names = c("Scaffold", "Start", "Stop", "Peak_ID", "Samtools"))

#At the same time I will also divvy up the data into training, validation, and evaluation datsets. For this I will use the samtools fai index file to get the size of each scaffold for that species, sort it by size and distribute it into training, validation, and evalutaion datasets.

chrom_sizes_a <- read.table("Dromaius_novaehollandiae_droNov1.fna.fai", sep="\t", header = F, stringsAsFactors = F)
chrom_sizes_a <- chrom_sizes_a[,c(1,2)]
chrom_sizes_a <- chrom_sizes_a[order(chrom_sizes_a$V2, decreasing = T),]
chrom_sizes_a$order <- seq(1,nrow(chrom_sizes_a))

#First two will go for training
positives_a_train <- subset(positives_a_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(1,60,4)]])
positives_a_train <- rbind(positives_a_train, subset(positives_a_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(2,60,4)]]))

#Third will go for validation
positives_a_valid <- subset(positives_a_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(3,60,4)]])

#Fourth will go for evaluation
positives_a_eval <- subset(positives_a_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(4,60,4)]])

#Same for the negative datasets
negatives_b_train <- subset(negatives_b_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(1,60,4)]])
negatives_b_train <- rbind(negatives_b_train, subset(negatives_b_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(2,60,4)]]))
negatives_b_valid <- subset(negatives_b_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(3,60,4)]])
negatives_b_eval <- subset(negatives_b_cds, Scaffold %in% chrom_sizes_a$V1[chrom_sizes_a$order[seq(4,60,4)]])

#Write the output in samtools format
write.table(positives_a_train$Samtools, paste(working_folder, "positives_a_train.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(positives_a_valid$Samtools, paste(working_folder, "positives_a_valid.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(positives_a_eval$Samtools, paste(working_folder, "positives_a_eval.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)

write.table(negatives_b_train$Samtools, paste(working_folder, "negatives_b_train.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(negatives_b_valid$Samtools, paste(working_folder, "negatives_b_valid.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(negatives_b_eval$Samtools, paste(working_folder, "negatives_b_eval.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)

#Repeat for the second species
chrom_sizes_b <- read.table("galgal7b.fna.fai", sep="\t", header = F, stringsAsFactors = F)
chrom_sizes_b <- chrom_sizes_b[,c(1,2)]
chrom_sizes_b <- chrom_sizes_b[order(chrom_sizes_b$V2, decreasing = T),]
chrom_sizes_b$order <- seq(1,nrow(chrom_sizes_b))

positives_b_train <- subset(positives_b_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(1,60,4)]])
positives_b_train <- rbind(positives_b_train, subset(positives_b_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(2,60,4)]]))
positives_b_valid <- subset(positives_b_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(3,60,4)]])
positives_b_eval <- subset(positives_b_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(4,60,4)]])

negatives_a_train <- subset(negatives_a_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(1,60,4)]])
negatives_a_train <- rbind(negatives_a_train, subset(negatives_a_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(2,60,4)]]))
negatives_a_valid <- subset(negatives_a_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(3,60,4)]])
negatives_a_eval <- subset(negatives_a_cds, Scaffold %in% chrom_sizes_b$V1[chrom_sizes_b$order[seq(4,60,4)]])

write.table(positives_b_train$Samtools, paste(working_folder, "positives_b_train.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(positives_b_valid$Samtools, paste(working_folder, "positives_b_valid.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(positives_b_eval$Samtools, paste(working_folder, "positives_b_eval.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)

write.table(negatives_a_train$Samtools, paste(working_folder, "negatives_a_train.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(negatives_a_valid$Samtools, paste(working_folder, "negatives_a_valid.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)
write.table(negatives_a_eval$Samtools, paste(working_folder, "negatives_a_eval.txt", sep=""), row.names = F, col.names = F, sep="\t", quote = F)

Get the fasta files for each grouping and then combine the different groups for the two species into one. We can finally use this for in the machinations.py file.

#bash

samtools faidx Dromaius_novaehollandiae_droNov1.fna -r positives_a_train.txt > positives_a_train.fa
samtools faidx Dromaius_novaehollandiae_droNov1.fna -r positives_a_valid.txt > positives_a_valid.fa
samtools faidx Dromaius_novaehollandiae_droNov1.fna -r positives_a_eval.txt > positives_a_eval.fa
samtools faidx Dromaius_novaehollandiae_droNov1.fna -r negatives_b_train.txt > negatives_b_train.fa
samtools faidx Dromaius_novaehollandiae_droNov1.fna -r negatives_b_valid.txt > negatives_b_valid.fa
samtools faidx Dromaius_novaehollandiae_droNov1.fna -r negatives_b_eval.txt > negatives_b_eval.fa

samtools faidx galgal7b.fna -r positives_b_train.txt > positives_b_train.fa
samtools faidx galgal7b.fna -r positives_b_valid.txt > positives_b_valid.fa
samtools faidx galgal7b.fna -r positives_b_eval.txt > positives_b_eval.fa
samtools faidx galgal7b.fna -r negatives_a_train.txt > negatives_a_train.fa
samtools faidx galgal7b.fna -r negatives_a_valid.txt > negatives_a_valid.fa
samtools faidx galgal7b.fna -r negatives_a_eval.txt > negatives_a_eval.fa

##combine a and b into one
cat positives_a_train.fa positives_b_train.fa > positives_train.fa
cat positives_a_valid.fa positives_b_valid.fa > positives_valid.fa
cat positives_a_eval.fa positives_b_eval.fa > positives_eval.fa

cat negatives_a_train.fa negatives_b_train.fa > negatives_train.fa
cat negatives_a_valid.fa negatives_b_valid.fa > negatives_valid.fa
cat negatives_a_eval.fa negatives_b_eval.fa > negatives_eval.fa

3.3 Run the python script

Use the config file to run the python script. Once the model is made and evaluated you can use the same file to evaluate any other fasta file.

4 Evaluating comparative ATAC-seq results

4.1 Setting up the comparative data

To check for ATAC-seq peaks, we used the center of the accelerated elements sequences to get a 500 bp region around the element and then used that as an evaluation dataset. We did this for all 79 species.

#for atacseq get 500 bp regions from center of CE
final_data_filtered <- fread("final_data_filtered.txt", sep = "\t", header = T)
phyloacc_subset <- fread("accelerated_elements_all.txt", sep = "\t")
atac_ce <- final_data_filtered[which(final_data_filtered$id %in% phyloacc_subset$original.id), 1:4]

atac_ce$center <- floor((atac_ce$start+atac_ce$end)/2)
atac_ce$center2 <- atac_ce$center + 1

fwrite(atac_ce[,c("scaffold", "center", "center2", "id")], "ce_accl_center_15k.bed", quote = F, sep = "\t", row.names = F, col.names = F)

Liftover the center region to all other species, add 250 bp to the front and back and then get sequences for each species.

#bash

species=("Gavialis_gangeticus" "Struthio_camelus" "Rhea_americana" "Eudromia_elegans" "Dromaius_novaehollandiae" "Anas_platyrhynchos" "Gallus_gallus7" "Phoenicopterus_ruber" "Columba_livia" "Tauraco_erythrolophus" "Cuculus_canorus" "Caprimulgus_europaeus" "Apus_apus" "Calypte_anna" "Porphyrio_hochstetteri" "Pluvialis_apricaria" "Sterna_hirundo" "Aptenodytes_patagonicus" "Aptenodytes_forsteri" "Pygoscelis_adeliae" "Pygoscelis_papua" "Pygoscelis_antarcticus" "Megadyptes_antipodes" "Eudyptula_novaehollandiae" "Eudyptula_albosignata" "Eudyptula_minor" "Spheniscus_demersus" "Spheniscus_humboldti" "Spheniscus_magellanicus" "Eudyptes_pachyrhynchus" "Eudyptes_sclateri" "Eudyptes_chrysolophus" "Eudyptes_schlegeli" "Eudyptes_chrysocome" "Eudyptes_filholi" "Eudyptes_moseleyi" "Macronectes_giganteus" "Theristicus_caerulescens" "Aquila_chrysaetos" "Athene_cunicularia" "Trogon_surrucura" "Bucorvus_abyssinicus" "Halcyon_senegalensis" "Chloroceryle_aenea" "Actenoides_hombroni" "Merops_nubicus" "Pogoniulus_pusillus" "Colaptes_auratus" "Falco_peregrinus" "Melopsittacus_undulatus" "Acanthisitta_chloris" "Myiozetetes_cayanensis" "Chiroxiphia_lanceolata" "Malurus_cyaneus" "Lichenostomus_cassidix" "Lanius_collurio" "Corvus_cornix" "Eremophila_alpestris" "Parus_major" "Acrocephalus_scirpaceus" "Progne_subis" "Riparia_riparia" "Hirundo_rustica" "Brachypodius_melanocephalos" "Pycnonotus_jocosus" "Phylloscopus_trochilus" "Sylvia_atricapilla" "Certhia_americana" "Sturnus_vulgaris" "Ficedula_albicollis" "Taeniopygia_guttata" "Passer_domesticus" "Motacilla_alba" "Serinus_canaria" "Emberiza_elegans" "Molothrus_ater" "Setophaga_coronata" "Diglossa_brunneiventris")

for i in ${species[@]};
do
      halLiftover all_birds_new.hal Gallus_gallus7 ce_accl_center_15k.bed $i regions_new/ce_accl_center_${i}.bed
done

# clean the bed files
for file in regions_new/*.bed
do
        file="$(basename file .bed)"
        bedtools merge -d 20 -c 4 -o "distinct" $file |
        awk 'BEGIN{ FS="\t"; OFS=FS }{ if(($3 -$2) <= 500)) print $1,$2,$3,$4 }'
done

#Add 250bp to either side of the center
while IFS= read -r line
do
        awk '{ if ($2-250>0) print $1":"$2-250"-"$3+248 }' ./regions_new/ce_accl_center_${line}.bed > ./regions/ce_accl_center_${line}.txt
done < "all_species.txt"

#Get fasta sequences for each species, line1 is the species name and line2 is the genome location
while read -r line1 line2
do
        samtools faidx $line2 -r regions/ce_accl_center_${line1}.txt > fastas/ce_accl_center_${line1}.fa
done < "new_bird_aln.txt"

#Concatenate the sequences
cat fastas/*.fa > all_fastas_15k.fa

#Use the fasta as an evaluation dataset in the python script above

4.2 Parsing the results

#R
#parse atac probs for elements
atac_probs <- fread("hindlimb_HH18_epoch20_lay450_batch64_lr1e7_eval_all_atac_prob.txt", sep = "\t")

species=c("Gavialis_gangeticus", "Struthio_camelus", "Rhea_americana", "Eudromia_elegans", 
          "Dromaius_novaehollandiae", "Anas_platyrhynchos", "Gallus_gallus7", "Phoenicopterus_ruber",
          "Columba_livia", "Tauraco_erythrolophus", "Cuculus_canorus", "Caprimulgus_europaeus", 
          "Apus_apus", "Calypte_anna", "Porphyrio_hochstetteri", "Pluvialis_apricaria", 
          "Sterna_hirundo", "Aptenodytes_patagonicus", "Aptenodytes_forsteri", "Pygoscelis_adeliae", 
          "Pygoscelis_papua", "Pygoscelis_antarcticus", "Megadyptes_antipodes", "Eudyptula_novaehollandiae", 
          "Eudyptula_albosignata", "Eudyptula_minor", "Spheniscus_demersus", "Spheniscus_humboldti", 
          "Spheniscus_magellanicus", "Eudyptes_pachyrhynchus", "Eudyptes_sclateri", "Eudyptes_chrysolophus", 
          "Eudyptes_schlegeli", "Eudyptes_chrysocome", "Eudyptes_filholi", "Eudyptes_moseleyi", 
          "Macronectes_giganteus", "Theristicus_caerulescens", "Aquila_chrysaetos", "Athene_cunicularia", 
          "Trogon_surrucura", "Bucorvus_abyssinicus", "Halcyon_senegalensis", "Chloroceryle_aenea", 
          "Actenoides_hombroni", "Merops_nubicus", "Pogoniulus_pusillus", "Colaptes_auratus", 
          "Falco_peregrinus", "Melopsittacus_undulatus", "Acanthisitta_chloris", "Myiozetetes_cayanensis", 
          "Chiroxiphia_lanceolata", "Malurus_cyaneus", "Lichenostomus_cassidix", "Lanius_collurio", 
          "Corvus_cornix", "Eremophila_alpestris", "Parus_major", "Acrocephalus_scirpaceus", 
          "Progne_subis", "Riparia_riparia", "Hirundo_rustica", "Brachypodius_melanocephalos", 
          "Pycnonotus_jocosus", "Phylloscopus_trochilus", "Sylvia_atricapilla", "Certhia_americana", 
          "Sturnus_vulgaris", "Ficedula_albicollis", "Taeniopygia_guttata", "Passer_domesticus", 
          "Motacilla_alba", "Serinus_canaria", "Emberiza_elegans", "Molothrus_ater", 
          "Setophaga_coronata", "Diglossa_brunneiventris")

galgal_elems <- read.table("ce_accl_center_15k.bed", sep = "\t")

for (aspecies in species){
  elem_file <- read.table(paste(aspecies, ".bed", sep = ""), sep = "\t")
  elem_file$samtools <- paste(elem_file$V1, ":", elem_file$V2 - 250, "-", elem_file$V3 + 248, sep="")
  atac_probs_new <- merge(atac_probs, elem_file, by.x = "V1", by.y = "samtools", all = F)
  atac_probs_new <- atac_probs_new[!duplicated(atac_probs_new$V4)]
  colnames(atac_probs_new)[2] <- aspecies
  galgal_elems <- merge(galgal_elems, atac_probs_new[,c(2,6)], by.x = "V4", by.y = "V4", all.x = T, all.y = F)
}


#code to get four categories
#chicken ATAC yes; chicken yes; accl yes + chicken ATAC no; chicken no; accl no
#chicken ATAC yes; chicken yes; accl no + chicken ATAC no; chicken no; accl yes
#chicken ATAC yes; chicken no; accl no + chicken ATAC no; chicken yes; accl yes
#chicken ATAC yes; chicken no; accl yes + chicken ATAC no; chicken yes; accl no

#atac main data
atac_hl <- read.table("ce_accl_center_15k_hl_atac_annot.bed", header = F)
atac_hl$atac <- atac_hl$V5 > 0 
atac_df <- atac_hl[which(atac_hl$V4 %in% galgal_elems$V4), c("V4", "atac")]
colnames(atac_df) <- c("id", "atac")
atac_df <- merge(atac_df, phyloacc_subset[,c(1:3)], by.x = "id", by.y = "original.id")

rate_data <- fread("M2_elem_Z.txt", header = T)
results <- fread("rate_postZ_M2.txt", header = T, sep = "\t")
results <- results[,c(1:6,seq(10,322,4))]

colnames(results) <- gsub("_3", "", colnames(results))

#using 0.5 as the barrier
for (elem in 1:nrow(atac_df)){
  element <- atac_df$id[elem]
  if (galgal_elems[elem, "Gallus_gallus7"] > 0.5){
    atac_df[elem, "atac_pred"] <- TRUE
  }else{
    atac_df[elem, "atac_pred"] <- FALSE
  }
  accl_taxa <- intersect(colnames(rate_data)[rate_data[which(rate_data$`Locus ID` %in% atac_df[elem, "phyloacc.id"]),] == 2],
                         colnames(results)[results[which(results$`Locus ID` %in% atac_df[elem, "phyloacc.id"]),] >= 0.9])
  accl_taxa <- accl_taxa[accl_taxa != "Gallus_gallus4"]
  accl_atac <- table(galgal_elems[elem, accl_taxa] > 0.5)
  if (length(accl_atac)==2){
    atac_df[elem, "accl_atac_pred"] <- unname(accl_atac[2]/(accl_atac[1] + accl_atac[2]))
  }
  else if (length(accl_atac)==0){
    atac_df[elem, "accl_atac_pred"] <- NA
  }
  else if (names(accl_atac)=="TRUE"){
    atac_df[elem, "accl_atac_pred"] <- 1
  }
  else {
    atac_df[elem, "accl_atac_pred"] <- 0
  }
  all_atac <- table(galgal_elems[elem,] > 0.5)
  atac_df[elem, "all_atac_pred"] <- unname(all_atac[2]/(all_atac[1] + all_atac[2]))
}

atac_df$accl_atac_pred_bool <- atac_df$accl_atac_pred > 0.5

table(atac_df[,c("atac_pred", "accl_atac_pred_bool", "atac")])

#where atac shifts
shifted <- atac_df[which(!atac_df$atac & !atac_df$atac_pred & atac_df$accl_atac_pred_bool),]
shifted <- rbind(shifted, atac_df[which(atac_df$atac & atac_df$atac_pred & !atac_df$accl_atac_pred_bool),])

shifted_elems <- subset(final_data[,c(1:4)], id %in% shifted$id)
write.table(shifted_elems, "atac_shift_elems.txt", sep = "\t", col.names = F, row.names = F, quote = F)

#get closeby genes
library(stringr)
shifted_genes <- fread("atac_shift_genes.txt", sep = "\t")
shifted_genes <- subset(shifted_genes, V14 < 10000)

shifted_genes$id <- str_extract(shifted_genes$V13, "Name=[A-Za-z0-9-_.]*")
shifted_genes$id <- gsub("Name=", "", shifted_genes$id)

References

Crispatzu, Giuliano, Rizwan Rehimi, Tomas Pachano, Tore Bleckwehl, Sara Cruz-Molina, Cally Xiao, Esther Mahabir, Hisham Bazzi, and Alvaro Rada-Iglesias. 2021. “The Chromatin, Topological and Regulatory Properties of Pluripotency-Associated Poised Enhancers Are Conserved in Vivo.” Nature Communications 12 (1): 1–17.

Jhanwar, Shalu, Jonas Malkmus, Jens Stolte, Olga Romashkina, Aimée Zuniga, and Rolf Zeller. 2021. “Conserved and Species-Specific Chromatin Remodeling and Regulatory Dynamics During Mouse and Chicken Limb Bud Development.” Nature Communications 12 (1): 1–17.

Kaplow, Irene M, Alyssa J Lawler, Daniel E Schaffer, Chaitanya Srinivasan, Morgan E Wirthlin, BaDoi N Phan, Xiaomeng Zhang, et al. 2022. “Relating Enhancer Genetic Variation Across Mammals to Complex Phenotypes Using Machine Learning.” bioRxiv.

Lai, Yung-Chih, Ya-Chen Liang, Ting-Xin Jiang, Randall B Widelitz, Ping Wu, and Cheng-Ming Chuong. 2018. “Transcriptome Analyses of Reprogrammed Feather/Scale Chimeric Explants Revealed Co-Expressed Epithelial Gene Networks During Organ Specification.” BMC Genomics 19 (1): 1–12.

Laugsch, Magdalena, Michaela Bartusel, Hafiza Alirzayeva, Agathi Karaolidou, Rizwan Rehimi, Giuliano Crispatzu, Milos Nikolic, et al. 2018. “Disruption of the TFAP2A Regulatory Domain Causes Banchio-Oculo-Facial Syndrome (BOFS) and Illuminates Pathomechanisms for Other Human Neurocristopathies.”

Ling, Irving TC, and Tatjana Sauka-Spengler. 2019. “Early Chromatin Shaping Predetermines Multipotent Vagal Neural Crest into Neural, Neuronal and Mesenchymal Lineages.” Nature Cell Biology 21 (12): 1504–17.

Lonfat, Nicolas, Su Wang, ChangHee Lee, Mauricio Garcia, Jiho Choi, Peter J Park, and Connie Cepko. 2021. “Cis-Regulatory Dissection of Cone Development Reveals a Broad Role for Otx2 and Oc Transcription Factors.” Development 148 (9): dev198549.

Malkmus, Jonas, Laurène Ramos Martins, Shalu Jhanwar, Bonnie Kircher, Victorio Palacio, Rushikesh Sheth, Francisca Leal, et al. 2021. “Spatial Regulation by Multiple Gremlin1 Enhancers Provides Digit Development with Cis-Regulatory Robustness and Evolutionary Plasticity.” Nature Communications 12 (1): 1–17.

Mok, Gi Fay, Leighton Folkes, Shannon A Weldon, Eirini Maniou, Victor Martinez-Heredia, Alice M Godden, Ruth M Williams, et al. 2021. “Characterising Open Chromatin in Chick Embryos Identifies Cis-Regulatory Elements Important for Paraxial Mesoderm Formation and Axis Extension.” Nature Communications 12 (1): 1–15.

Patoori, Sruti, Nathalie Jean-Charles, Ariana Gopal, Sacha Sulaiman, Sneha Gopal, Brian Wang, Benjamin Souferi, and Mark M Emerson. 2020. “Cis-Regulatory Analysis of Onecut1 Expression in Fate-Restricted Retinal Progenitor Cells.” Neural Development 15 (1): 1–20.

Rothstein, Megan, and Marcos Simoes-Costa. 2020. “Heterodimerization of TFAP2 Pioneer Factors Drives Epigenomic Remodeling During Neural Crest Specification.” Genome Research 30 (1): 35–48.

Sackton, Timothy B, Phil Grayson, Alison Cloutier, Zhirui Hu, Jun S Liu, Nicole E Wheeler, Paul P Gardner, et al. 2019. “Convergent Regulatory Evolution and Loss of Flight in Paleognathous Birds.” Science 364 (6435): 74–78.

Williams, Ruth M, Ivan Candido-Ferreira, Emmanouela Repapi, Daria Gavriouchkina, Upeka Senanayake, Irving TC Ling, Jelena Telenius, Stephen Taylor, Jim Hughes, and Tatjana Sauka-Spengler. 2019. “Reconstruction of the Global Neural Crest Gene Regulatory Network in Vivo.” Developmental Cell 51 (2): 255–76.

Young, John J, Phil Grayson, Scott V Edwards, and Clifford J Tabin. 2019. “Attenuated Fgf Signaling Underlies the Forelimb Heterochrony in the Emu Dromaius Novaehollandiae.” Current Biology 29 (21): 3681–91.

ATAC

2024-04-29