ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Fst: Difference between revisions

From angsd
Jump to navigation Jump to search
Line 85: Line 85:


;-type 0 Split out the genome into blocks. And use the first window that have data for the entire window. Then we will have the same windowcenters across datasets.
;-type 0 Split out the genome into blocks. And use the first window that have data for the entire window. Then we will have the same windowcenters across datasets.
==realSFS fst print==

Revision as of 22:01, 19 January 2016

Our program can estimate fst between populations. And has been generalized to give all pairwise fst estimates if you supply the command with multiple populations.

if you supply 3 populations, the program will also output the pbs statistic.
NB we have removed the very unusefull unweighted fst estimator in the output, and have included a header. The output example below will be updated at some point.

The procedure is

- Use angsd for calculating saf files for each population

- Use realSFS to calculate 2d sfs for each pair

- Use the above calculated 2dsfs as priors jointly with all safs from step1 to calculate fst binary files

- Use realSFS to extract the the fst values from the fst


Two Populations real data

#this is with 2pops
#first calculate per pop saf for each populatoin
../angsd -b list1  -anc hg19ancNoChr.fa -out pop1 -dosaf 1 -gl 1
../angsd -b list2  -anc hg19ancNoChr.fa -out pop2 -dosaf 1 -gl 1
#calculate the 2dsfs prior
../misc/realSFS pop1.saf.idx pop2.saf.idx >pop1.pop2.ml
#prepare the fst for easy window analysis etc
../misc/realSFS fst index pop1.saf.idx pop2.saf.idx -sfs pop1.pop2.ml -fstout here
#get the global estimate
../misc/realSFS fst stats here.fst.idx 
-> FST.Unweight:0.069395 Fst.Weight:0.042349
#below is not tested that much, but seems to work
../misc/realSFS fst stats2 here.fst.idx -win 50000 -step 10000 >slidingwindow

3 Populations real data

In commands below im using 24 threads, because this is what I have. Adjust accordingly

#this is with 2pops
#first calculate per pop saf for each populatoin
./angsd -b list10  -anc hg19ancNoChr.fa -out pop1 -dosaf 1 -gl 1
./angsd -b list11  -anc hg19ancNoChr.fa -out pop2 -dosaf 1 -gl 1
./angsd -b list12  -anc hg19ancNoChr.fa -out pop3 -dosaf 1 -gl 1
#calculate all pairwise 2dsfs's
./misc/realSFS pop1.saf.idx pop2.saf.idx -P 24 >pop1.pop2.ml
./misc/realSFS pop1.saf.idx pop3.saf.idx -P 24 >pop1.pop3.ml
./misc/realSFS pop2.saf.idx pop3.saf.idx -P 24 >pop2.pop3.ml
#prepare the fst for easy analysis etc
./misc/realSFS fst index pop1.saf.idx pop2.saf.idx pop3.saf.idx -sfs pop1.pop2.ml -sfs pop1.pop3.ml -sfs pop2.pop3.ml -fstout here
#get the global estimate
	-> Assuming idxname:here.fst.idx
	-> Assuming .fst.gz file: here.fst.gz
	-> FST.Unweight[nObs:1666245]:0.022063 Fst.Weight:0.034513
0.022063 0.034513
	-> FST.Unweight[nObs:1666245]:0.026867 Fst.Weight:0.031989
0.026867 0.031989
	-> FST.Unweight[nObs:1666245]:0.025324 Fst.Weight:0.021118
0.025324 0.021118
	-> pbs.pop1	0.023145
	-> pbs.pop2	0.005088
	-> pbs.pop3	0.009367
#below is not tested that much, but seems to work
../misc/realSFS fst stats2 here.fst.idx -win 50000 -step 10000 >slidingwindow

In the presence of 3 populations, the program will also calculate the population branch statistics

Sliding Window output

The sliding window seems to work so we have documented it here:

Second column is chromosome, third is center of window followed by
fst.unweight(pop1,pop2) fst.weight(pop1,pop2) fst.unweight(pop1,pop3) fst.weight(pop1,pop3) fst.unweight(pop2,pop3) fst.weight(pop2,pop3)
 
(9133,58895)(14010000,14059999)(14010000,14060000)	1	14035000	0.022099	0.016387	0.026686	0.027731	0.025311	0.047920	-0.002231	0.035045	0.030353
(19114,68881)(14020000,14069999)(14020000,14070000)	1	14045000	0.022096	0.019076	0.026777	0.024238	0.025290	0.052793	-0.005220	0.041969	0.029757
(28951,78655)(14030000,14079999)(14030000,14080000)	1	14055000	0.022043	0.021025	0.026915	0.023368	0.025342	0.056975	-0.006884	0.046840	0.030530
(38928,88632)(14040000,14089999)(14040000,14090000)	1	14065000	0.022083	0.016525	0.026846	0.029560	0.025345	0.053421	-0.004116	0.039898	0.034122
(48917,98170)(14050000,14099999)(14050000,14100000)	1	14075000	0.022132	0.022082	0.026742	0.025564	0.025262	0.037071	0.005226	0.024827	0.020671
(74,49191)(14000000,14049999)(14000000,14050000)	10	14025000	0.022704	0.101955	0.026479	0.095713	0.025102	0.001924	0.103108	-0.048378	-0.002500
(9734,58555)(14010000,14059999)(14010000,14060000)	10	14035000	0.022779	0.102670	0.026425	0.088015	0.025118	0.002721	0.098870	-0.043342	-0.006738

The last 3 columns are the populations branch statistic for population1, popultion2 and population3

Relative window positions?

We allow for 3 different ways of defining window positions, these are chosen with the -type argument in realSFS


-type 2 Use pos=1 as the leftmost position of first window. Even though there isn't any data.
-type 1 Use first position with data, as leftmost position for the first window.
-type 0 Split out the genome into blocks. And use the first window that have data for the entire window. Then we will have the same windowcenters across datasets.

realSFS fst print