ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

SFS Estimation: Difference between revisions

From angsd
Jump to navigation Jump to search
No edit summary
Line 2: Line 2:


From version 0.20 we can now estimate the joint likelihood of being in frequency j.
From version 0.20 we can now estimate the joint likelihood of being in frequency j.
From version 0.24 the tajima is being estimated
From version 0.24 the tajima is being estimated in sfstools


-realSFS 1  an sfs file will be generated.
;-realSFS 1: an sfs file will be generated.


-realSFS 2  the tajima will be printed to standard output
;-realSFS 4: snpcalling (not implemented)


-realSFS 4 snpcalling (not implemented)
;-realSFS 8: genotypecalling (not implemented)
 
-realSFS 8 genotypecalling (not implemented)


= Full example for simulating nextgen data and estimating SFS (somewhat deprecated should use sfstools now) =
= Full example for simulating nextgen data and estimating SFS (somewhat deprecated should use sfstools now) =

Revision as of 11:33, 19 June 2012

realSFS

From version 0.20 we can now estimate the joint likelihood of being in frequency j. From version 0.24 the tajima is being estimated in sfstools

-realSFS 1
an sfs file will be generated.
-realSFS 4
snpcalling (not implemented)
-realSFS 8
genotypecalling (not implemented)

Full example for simulating nextgen data and estimating SFS (somewhat deprecated should use sfstools now)

First we simulate a small dataset with very high probability of being variabe.

Number of individuals =15,number of sites=10000, pvar=0.9

./simnextgen.gcc -nind 15 -nsites 10000 -outfiles small -pvar 0.9 
->Using args: -nind 15 -errate 0.007500 -depth 5.000000 -pvar 0.900000 -nsites 10000 -F 0.000000 -model 1
->Dumping files: sequencefile: small.seq.gz	glffile: small.glf.gz	truefreq: small.frq args:small.args geno:small.geno

The .frq file contains the folded spectra, we use getUnfolded on the "true" genotypes, to get the unfolded spectra

./getUnfolded.g++ -infile small.geno -nChr 30
-> opening filehandle for: small.geno
-> opening filehandle for: small.geno.var
-> opening filehandle for: small.geno.unfolded

The second line in small.geno.unfolded now contains the unfolded spectra.

We now run dirty on the glf.gz file.

./dirty.g++ -sim1 misc/small.glf.gz -nInd 15 -outfiles results -realSFS 1
##screen output removed

And we now run the optimSFS program to get the spectra, we supply the filename, the number of chromosomes, and the number of threads we want to use

./optimSFS.gcc -binput ../results.sfs -ncat 30 -nThread 4
##screen output removed

The final output is in results.sfs.ml

We start R and plot our estimated SFS along with the true SFS

mytrue <- as.numeric(read.table("small.geno.unfolded",skip=1))
myest <-scan("../results.sfs.ml")
barplot(rbind(mytrue,myest),beside=T)

File:SfsExample.png