ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Thetas,Tajima,Neutrality tests: Difference between revisions

From angsd
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
Angsd can estimate the Tajimas D and other neutrality test.
This method will estimate different Thetas (population scaled mutation rate) and can based on these thetas calculate Tajima's D and various other neutrality test statistics. Method is described in [[Korneliussen2013]]
Method works by using the posterior allele frequencies for every site, and uses these for calculating different per-site estimates of theta.
 
=Example=
<pre>
(estimate an SFS)
../angsd0.551/angsd -bam pop1.list -out bingo -realSFS 1
../angsd0.551/misc/emOptim2 bingo.sfs 40 -P 24 >bingo.em.ml
(calculate thetas)
../angsd0.551/angsd -bam pop1.list -out bongo -doThetas 1 -realSFS 1 -pest bingo.em.ml
../angsd0.551/misc/bgid make_bed taj/aout2.thetas.gz
(calculate Tajimas.)
../angsd0.551/misc/bgid do_stat taj/aout2.thetas.gz -nChr 40
</pre>


Due to the linearity the population scaled mutation rate is therefore the sum of the per-site estimates


To estimate the thetats do:


<code>
<code>

Revision as of 17:09, 3 October 2013

This method will estimate different Thetas (population scaled mutation rate) and can based on these thetas calculate Tajima's D and various other neutrality test statistics. Method is described in Korneliussen2013

Example

(estimate an SFS)
../angsd0.551/angsd -bam pop1.list -out bingo -realSFS 1 
../angsd0.551/misc/emOptim2 bingo.sfs 40 -P 24 >bingo.em.ml
(calculate thetas)
../angsd0.551/angsd -bam pop1.list -out bongo -doThetas 1 -realSFS 1 -pest bingo.em.ml 
../angsd0.551/misc/bgid make_bed taj/aout2.thetas.gz 
(calculate Tajimas.)
../angsd0.551/misc/bgid do_stat taj/aout2.thetas.gz -nChr 40


./angsd [other input] -realSFS 1 -doThetas 1 -pest prior.sfs.ml

The prior is an estimate of the .sfs which can be found using the [RealSFS] method in angsd The output is a file called .thetas.gz To perform the Tajimas D and other related neutralitet tests you should feed the .thetas.gz file into a seperate program called calcstat which can be found in the misc dir of the sourcecode


./calcStat24 -tinput file.thetas.gz -step 50000 -win 50000

NB: There is a minor bug in version24 that might cause an entire chromosome to be skipped if there are no datapoints in the first physical window. Use a higher version than 25

NB: Use the bgid program found in the misc folder.

Example Output

.thetas.gz is

chr10   1       -7.041327       -7.362126       -6.399318       -8.788662       -7.840050
chr10   2       -8.337345       -8.921354       -7.358819       -11.052126      -9.502293
chr10   3       -9.283203       -9.945923       -8.207280       -12.281002      -10.546671
chr10   4       -1.105216       -0.617477       -35.602360      -0.730974       -0.672616
chr10   5       -7.705427       -8.374287       -6.622234       -10.726430      -8.976529
chr10   6       -11.683662      -12.369866      -10.578860      -14.766857      -12.975926
chr10   7       -11.688400      -12.374647      -10.583545      -14.771753      -12.980717
chr10   8       -12.989104      -13.675391      -11.884201      -16.072602      -14.281470
chr10   9       -13.682495      -14.368910      -12.577435      -16.766463      -14.975017
chr10   10      -1.105216       -0.655518       -37.649835      -0.982402       -0.805662
chr10   11      -14.433381      -15.119890      -13.328206      -17.517696      -15.726019
chr10   12      -14.410507      -15.097004      -13.305348      -17.494774      -15.703129
chr10   13      -14.963982      -15.650467      -13.858836      -18.048209      -16.256590
chr10   14      -15.657183      -16.343698      -14.552002      -18.741516      -16.949827
chr10   15      -15.762558      -16.449085      -14.657361      -18.846938      -17.055218
chr10   16      -1.105216       -0.766882       -12.373455      -1.539395       -1.080326
chr10   17      -15.876465      -16.563007      -14.771249      -18.960901      -17.169143
chr10   18      -17.256944      -17.943487      -16.151728      -20.341382      -18.549623
1. chromosome
2. position
3. ThetaWatterson
4. ThetaD (nucleotide diversity)
5. Theta? (singleton category)
6. ThetaL
7. ThetaH

.thetas.gz.pestPG

(979699,1029444)(1000000,1050000)       chr10   1025000 -0.111223       -0.414967       -0.395241       -0.036406       -0.012747       183.897443      179.566460
(1029444,1078839)(1050000,1100000)      chr10   1075000 0.007024        -0.229418       -0.191991       -0.077881       0.047612        152.483006      152.709929
(1078839,1128837)(1100000,1150000)      chr10   1125000 -0.723711       -0.855712       -1.006656       -0.322342       -0.033026       197.304744      167.075081
(1128837,1175623)(1150000,1200000)      chr10   1175000 -0.806392       -1.301255       -1.416623       0.113205        -0.312597       147.828056      122.568361
(1175623,1225623)(1200000,1250000)      chr10   1225000 -0.691517       -0.808718       -0.954130       -0.016786       -0.201603       151.001106      128.876508
(1225623,1271896)(1250000,1300000)      chr10   1275000 -0.338848       -0.482030       -0.540336       0.007672        -0.108077       165.844115      153.941040
(1271896,1320928)(1300000,1349963)      chr10   1324981 -0.387835       -0.070543       -0.210244       -0.245692       0.024921        215.743283      198.033441
(1320928,1368926)(1349963,1400000)      chr10   1374981 -0.176277       -0.131296       -0.179671       -0.104470       0.007090        140.876813      135.613769
(1368926,1418197)(1400000,1450000)      chr10   1425000 0.499577        0.217479        0.378439        0.233619        0.016264        328.809655      363.547699
(1418197,1466420)(1450000,1500000)      chr10   1475000 0.305267        0.467985        0.515463        0.272164        -0.065638       220.966737      235.242873
(1466420,1516380)(1500000,1550000)      chr10   1525000 0.391637        0.610786        0.670051        0.146180        0.034386        163.579256      177.149431
(1516380,1564813)(1550000,1600000)      chr10   1575000 -0.176713       -0.113138       -0.164435       0.043673        -0.079513       145.362704      139.919271
(1564813,1614333)(1600000,1650000)      chr10   1625000 0.463855        0.914256        0.955478        -0.376847       0.361592        113.119044      124.249808
(1614333,1662807)(1650000,1700000)      chr10   1675000 0.124714        0.972551        0.874178        -0.489383       0.323134        73.666310       75.620133
(1662807,1710569)(1700000,1750000)      chr10   1725000 -0.095232       0.493424        0.382792        -0.088652       0.022494        55.189016       54.068586
(1710569,1756421)(1750000,1800000)      chr10   1775000 0.527467        0.554005        0.674397        0.052715        0.130473        143.169177      159.172914

Format is:

(indexStart,indexStop)(posStart,posStop) chrname wincenter tajD fuliD fuliF fayH zengsE thetaW thetaPi

Citation

Korneliussen2013