ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Genotype Distribution: Difference between revisions

From angsd
Jump to navigation Jump to search
(Created page with "comming soon")
 
No edit summary
Line 1: Line 1:
comming soon
Works from version 0.913 and above. The latest developmental version can be found here [https://github.com/ANGSD/angsd github]
 
 
This method allow for estimation of the expected genotype count or fractions for one or two individuals based on genotype likelihoods. Examples of genotypes fraction for a single individual
 
 
all 10 possible genotypes
{| class="wikitable" style="text-align: center
!| pAA || pAC || pAG || pAT || pCC || pCG || pCT || pGG || pGT || pTT
|-
!| 0.293 || 9.3e-05 || 0.000331 || 7.3e-05 || 0.2 || 7.7e-05 || 0.000411 || 0.204 || 7e-05 || 0.302
|}
 
number of derived alleles
{| class="wikitable" style="text-align: center
!| pAA || pAD || pDD
|-
!| 0.9986 || 0.0003168 || 0.001127
|}
 
 
or homozygoes vs. heterogoes
{| class="wikitable" style="text-align: center
!| pHO || pHE
|-
!| 0.9987 || 0.0003168
|}
 
 
For two individuals it could be the full 10x10 possible genotype combination
 
or the number of derived alleles
{| class="wikitable" style="text-align: center
!|  ||  || ind2 ||
|-
!|  ind1  || pAA || pAD || pDD
|-
|  pAA   ||  0.6561 || 0.1458 || 0.0081
|-
|  pAD   || 0.1458 || 0.0324 || 0.0018
|-
|  pDD  || 0.0081 || 0.0018 || 0.0001
|}
 
or the heterozygoes and homozygoes
{| class="wikitable" style="text-align: center
!| HO HO || HO HE || HE HO || HE HE || HO altHO
|-
!| 0.6562 || 0.1476 || 0.1476 || 0.0324 || 0.0162
|}
 
 
 
 
__TOC__
 
 
=Brief Overview=
<pre>
./angsd -HWE_pval
-> angsd version: 0.911-12-gddb6f5f-dirty (htslib: 1.3-1-gc72ae90) build(Apr 10 2016 16:36:30)
-> Analysis helpbox/synopsis information:
-> Command:
../angsd/angsd -HWE_pval -> Sun Apr 10 16:53:24 2016
-------------
abcHWE.cpp:
-HWE_pval 0.000000
</pre>
 
 
==Options==
;-HWE_pval [float]
p-value threshold. The value must be above 0 and a maximum of 1.
 
;-doMajorMinor [int]
Method only works for diallelic sites. There choose a methods for selecting the major and minor allele (see [[Inferring_Major_and_Minor_alleles]])
 
==Use as a filter==
 
Sites with a p-value below the p-value threshold will be removed.
 
==Output==
 
This function will also print the results of the selected sites. If you choose -HWE_pval 1 then all sites (that pass other filters) will be outputted.
<div class="toccolours mw-collapsible mw-collapsed">
Example of output *.hwe.gz
<pre class="mw-collapsible-content">
Chromo  Position        Major  Minor  hweFreq Freq    F      LRT    p-value
1      14000873        G      A      0.282473        0.263594        0.674624        3.140936e+00    7.634997e-02
1      14015890        A      G      0.283119        0.300032        0.999762        8.207572e+00    4.171594e-03
1      14018430        A      C      0.276112        0.299817        0.675018        2.780118e+00    9.544113e-02
1      14033343        A      G      0.295368        0.299442        0.999762        6.473824e+00    1.094747e-02
1      14037881        T      A      0.306003        0.341598        -0.518384      3.178415e+00    7.461710e-02
1      14038946        T      C      0.329113        0.333424        0.999775        6.925424e+00    8.497884e-03
</pre>
</div>
 
 
'''Chromo''' is the chromosome
 
'''Position''' is the position
'''Major''' is the major allele
 
'''Minor''' is the minor allele
 
'''hweFreq''' is the allele frequency assuming HWE (same as -doMaf 1)
 
'''Freq''' is the allele frequency without HWE assumption
 
'''F''' is the scale departure from HWE (inbreeding coefficient - see model)
 
'''LRT''' is the likelihood ratio statistic
 
'''p-value''' is the p-value based on a likelihood ratio test
 
==Model==
 
 
Probability of genotypes without assumption of HWE
 
<math>
\begin{align}
p(G=0|f,F) &= (1-f)^2+f(1-f)F \\
p(G=1|f,F) &= 2f(1-f)-2f(1-f)F  \\
p(G=2|f,F) &= f^2 +f(1-f)F
\end{align}
</math>
 
;n: total number of individuals
;X: all sequencing data for a site
;f: allele frequency
;F: inbreeding coefficient*
;G: true unobserved genotype
 
total likelihood
<math>
p(X|f,F)\sim\prod_i^np(X_i|f,F)=\prod_i^n\sum_{G\in \{0,1,2\}}p(X_i|G)p(G|f,F)
</math>
 
 
 
*NB! we allow for negative values of F in order to be able to detect any divination from HWE.

Revision as of 10:38, 14 July 2016

Works from version 0.913 and above. The latest developmental version can be found here github


This method allow for estimation of the expected genotype count or fractions for one or two individuals based on genotype likelihoods. Examples of genotypes fraction for a single individual


all 10 possible genotypes

pAA pAC pAG pAT pCC pCG pCT pGG pGT pTT
0.293 9.3e-05 0.000331 7.3e-05 0.2 7.7e-05 0.000411 0.204 7e-05 0.302

number of derived alleles

pAA pAD pDD
0.9986 0.0003168 0.001127


or homozygoes vs. heterogoes

pHO pHE
0.9987 0.0003168


For two individuals it could be the full 10x10 possible genotype combination

or the number of derived alleles

ind2
ind1 pAA pAD pDD
pAA 0.6561 0.1458 0.0081
pAD 0.1458 0.0324 0.0018
pDD 0.0081 0.0018 0.0001

or the heterozygoes and homozygoes

HO HO HO HE HE HO HE HE HO altHO
0.6562 0.1476 0.1476 0.0324 0.0162




Brief Overview

./angsd -HWE_pval
	-> angsd version: 0.911-12-gddb6f5f-dirty (htslib: 1.3-1-gc72ae90) build(Apr 10 2016 16:36:30)
	-> Analysis helpbox/synopsis information:
	-> Command: 
../angsd/angsd -HWE_pval 	-> Sun Apr 10 16:53:24 2016
-------------
abcHWE.cpp:
	-HWE_pval	0.000000


Options

-HWE_pval [float]

p-value threshold. The value must be above 0 and a maximum of 1.

-doMajorMinor [int]

Method only works for diallelic sites. There choose a methods for selecting the major and minor allele (see Inferring_Major_and_Minor_alleles)

Use as a filter

Sites with a p-value below the p-value threshold will be removed.

Output

This function will also print the results of the selected sites. If you choose -HWE_pval 1 then all sites (that pass other filters) will be outputted.

Example of output *.hwe.gz

Chromo  Position        Major   Minor   hweFreq Freq    F       LRT     p-value
1       14000873        G       A       0.282473        0.263594        0.674624        3.140936e+00    7.634997e-02
1       14015890        A       G       0.283119        0.300032        0.999762        8.207572e+00    4.171594e-03
1       14018430        A       C       0.276112        0.299817        0.675018        2.780118e+00    9.544113e-02
1       14033343        A       G       0.295368        0.299442        0.999762        6.473824e+00    1.094747e-02
1       14037881        T       A       0.306003        0.341598        -0.518384       3.178415e+00    7.461710e-02
1       14038946        T       C       0.329113        0.333424        0.999775        6.925424e+00    8.497884e-03


Chromo is the chromosome

Position is the position Major is the major allele

Minor is the minor allele

hweFreq is the allele frequency assuming HWE (same as -doMaf 1)

Freq is the allele frequency without HWE assumption

F is the scale departure from HWE (inbreeding coefficient - see model)

LRT is the likelihood ratio statistic

p-value is the p-value based on a likelihood ratio test

Model

Probability of genotypes without assumption of HWE

n
total number of individuals
X
all sequencing data for a site
f
allele frequency
F
inbreeding coefficient*
G
true unobserved genotype

total likelihood


  • NB! we allow for negative values of F in order to be able to detect any divination from HWE.