ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Fst

From angsd
Revision as of 19:21, 30 July 2015 by Thorfinn (talk | contribs)
Jump to navigation Jump to search

Our program can estimate fst between populations. And has been generalized to give all pairwise fst estimates if you supply the command with multiple populations.

if you supply 3 populations, the program will also output the pbs statistic.

The procedure is

- Use angsd for calculating saf files for each population - Use realSFS to calculate 2d sfs for each pair - Use the above calculated 2dsfs as priors jointly with all safs from step1 to calculate fat files


Two Populations real data

#this is with 2pops
#first calculate per pop saf for each populatoin
../angsd -b list1  -anc hg19ancNoChr.fa -out pop1 -dosaf 1 -gl 1
../angsd -b list2  -anc hg19ancNoChr.fa -out pop2 -dosaf 1 -gl 1
#calculate the 2dsfs prior
../misc/realSFS pop1.saf.idx pop2.saf.idx >pop1.pop2.ml
#prepare the fst for easy window analysis etc
../misc/realSFS fst index pop1.saf.idx pop2.saf.idx -sfs pop1.pop2.ml -fstout here
#get the global estimate
../misc/realSFS fst stats here.fst.idx 
-> FST.Unweight:0.069395 Fst.Weight:0.042349
#below is not tested that much, but seems to work
../misc/realSFS fst stats2 here.fst.idx -win 50000 -step 10000 >slidingwindow

3 Populations real data

In commands below im using 24 threads, because this is what I have. Adjust accordingly

#this is with 2pops
#first calculate per pop saf for each populatoin
./angsd -b list10  -anc hg19ancNoChr.fa -out pop1 -dosaf 1 -gl 1
./angsd -b list11  -anc hg19ancNoChr.fa -out pop2 -dosaf 1 -gl 1
./angsd -b list12  -anc hg19ancNoChr.fa -out pop3 -dosaf 1 -gl 1
#calculate all pairwise 2dsfs's
./misc/realSFS pop1.saf.idx pop2.saf.idx -P 24 >pop1.pop2.ml
./misc/realSFS pop1.saf.idx pop3.saf.idx -P 24 >pop1.pop3.ml
./misc/realSFS pop2.saf.idx pop3.saf.idx -P 24 >pop2.pop3.ml
#prepare the fst for easy analysis etc
./misc/realSFS fst index pop1.saf.idx pop2.saf.idx pop3.saf.idx -sfs pop1.pop2.ml -sfs pop1.pop3.ml -sfs pop2.pop3.ml -fstout here
#get the global estimate
	-> Assuming idxname:here.fst.idx
	-> Assuming .fst.gz file: here.fst.gz
	-> FST.Unweight[nObs:1666245]:0.022063 Fst.Weight:0.034513
0.022063 0.034513
	-> FST.Unweight[nObs:1666245]:0.026867 Fst.Weight:0.031989
0.026867 0.031989
	-> FST.Unweight[nObs:1666245]:0.025324 Fst.Weight:0.021118
0.025324 0.021118
	-> pbs.pop1	0.023145
	-> pbs.pop2	0.005088
	-> pbs.pop3	0.009367
#below is not tested that much, but seems to work
../misc/realSFS fst stats2 here.fst.idx -win 50000 -step 10000 >slidingwindow

In the presence of 3 populations, the program will also calculate the population branch statistics

Sliding Window output

The sliding window seems to work so we have documented it here:

Second column is chromosome, third is center of window followed by
fst.unweight(pop1,pop2) fst.weight(pop1,pop2) fst.unweight(pop1,pop3) fst.weight(pop1,pop3) fst.unweight(pop2,pop3) fst.weight(pop2,pop3)
 
(9133,58895)(14010000,14059999)(14010000,14060000)	1	14035000	0.022099	0.016387	0.026686	0.027731	0.025311	0.047920	-0.002231	0.035045	0.030353
(19114,68881)(14020000,14069999)(14020000,14070000)	1	14045000	0.022096	0.019076	0.026777	0.024238	0.025290	0.052793	-0.005220	0.041969	0.029757
(28951,78655)(14030000,14079999)(14030000,14080000)	1	14055000	0.022043	0.021025	0.026915	0.023368	0.025342	0.056975	-0.006884	0.046840	0.030530
(38928,88632)(14040000,14089999)(14040000,14090000)	1	14065000	0.022083	0.016525	0.026846	0.029560	0.025345	0.053421	-0.004116	0.039898	0.034122
(48917,98170)(14050000,14099999)(14050000,14100000)	1	14075000	0.022132	0.022082	0.026742	0.025564	0.025262	0.037071	0.005226	0.024827	0.020671
(74,49191)(14000000,14049999)(14000000,14050000)	10	14025000	0.022704	0.101955	0.026479	0.095713	0.025102	0.001924	0.103108	-0.048378	-0.002500
(9734,58555)(14010000,14059999)(14010000,14060000)	10	14035000	0.022779	0.102670	0.026425	0.088015	0.025118	0.002721	0.098870	-0.043342	-0.006738

The last 3 columns are the populations branch statistic for population1, popultion2 and population3