ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.934/0.935 on github), see Change_log for changes, and download it here.


From angsd
Jump to: navigation, search

The method is described in kim2011 and work by simultaneously estimating allele frequencies, genotype likelihoods and the error rates.

The likelihood of the sequencing data D of n individuals for M sites can be described through the allele frequencies f and the type specific error rates e

 \begin{align} p(D|f,e) &= \prod_{i=1}^n \prod_{j=1}^M p(D_j^i|f_j,e)\\
&= \prod_{i=1}^n \prod_{j=1}^M \sum_{g=0}^2 p(g|f_j)p(D_j^i|g,e)

by summing over the unknown genotypes g. The genotype likelihood p(D_j^i|g,e) relies on the type specific error rates (see kim2011 p.14 for details). The type specific error rates are obtain along site the allele frequencies by

 \hat{e},\hat{f} = argmax_{f,e} p(D|f,e)