ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Thetas,Tajima,Neutrality tests

From angsd
Jump to navigation Jump to search

This method will estimate different thetas (population scaled mutation rate) and can based on these thetas calculate Tajima's D and various other neutrality test statistics. Method is described in Korneliussen2013.

  • NB Information on this website is for version 0.551 or higher.
  • NB The Korneliussen2013 covers two methods,
  1. using an ML method
  2. using the emperical bayes method. The information on this page relates to the EB method.

For performing the ML method, you should the realSFS method and define the region under interest.

Example

Below is a chain of commands used for caculating statistics. Its a 3 step procedure

  1. Estimate an site frequency spectrum. Output is .em.ml file.
  2. Calculate per-site thetas. Output is a .thetas.gz file.
  3. Calculate neutrality tests statistics. Output is a .thetas.gz.pestPG file.
#(estimate an SFS)
../angsd0.551/angsd -bam pop1.list -out bingo -realSFS 1 
../angsd0.551/misc/emOptim2 bingo.sfs 40 -P 24 >bingo.em.ml
#(calculate thetas)
../angsd0.551/angsd -bam pop1.list -out bongo -doThetas 1 -realSFS 1 -pest bingo.em.ml 
#(calculate Tajimas.)
../angsd0.551/misc/bgid make_bed taj/aout2.thetas.gz 
../angsd0.551/misc/bgid do_stat taj/aout2.thetas.gz -nChr 40

Sliding Window example

We can easily do a sliding window analysis by adding -win/-step arguments to the last command

../angsd0.551/misc/bgid do_stat taj/aout2.thetas.gz -nChr 40 -win 50000 -step 10000

This will calculate the test statistic using a window size of 50kb and a step size of 10kb.

Example Output

.thetas.gz is

chr10   1       -7.041327       -7.362126       -6.399318       -8.788662       -7.840050
chr10   2       -8.337345       -8.921354       -7.358819       -11.052126      -9.502293
chr10   3       -9.283203       -9.945923       -8.207280       -12.281002      -10.546671
chr10   4       -1.105216       -0.617477       -35.602360      -0.730974       -0.672616
chr10   5       -7.705427       -8.374287       -6.622234       -10.726430      -8.976529
chr10   6       -11.683662      -12.369866      -10.578860      -14.766857      -12.975926
chr10   7       -11.688400      -12.374647      -10.583545      -14.771753      -12.980717
chr10   8       -12.989104      -13.675391      -11.884201      -16.072602      -14.281470
chr10   9       -13.682495      -14.368910      -12.577435      -16.766463      -14.975017
chr10   10      -1.105216       -0.655518       -37.649835      -0.982402       -0.805662
chr10   11      -14.433381      -15.119890      -13.328206      -17.517696      -15.726019
chr10   12      -14.410507      -15.097004      -13.305348      -17.494774      -15.703129
chr10   13      -14.963982      -15.650467      -13.858836      -18.048209      -16.256590
chr10   14      -15.657183      -16.343698      -14.552002      -18.741516      -16.949827
chr10   15      -15.762558      -16.449085      -14.657361      -18.846938      -17.055218
chr10   16      -1.105216       -0.766882       -12.373455      -1.539395       -1.080326
chr10   17      -15.876465      -16.563007      -14.771249      -18.960901      -17.169143
chr10   18      -17.256944      -17.943487      -16.151728      -20.341382      -18.549623
1. chromosome
2. position
3. ThetaWatterson
4. ThetaD (nucleotide diversity)
5. Theta? (singleton category)
6. ThetaL
7. ThetaH

.thetas.gz.pestPG

(59999,69999)(60000,70000)(60000,70000) chr1  65000   2349.039592     2008.865974     2791.401569     3817.828656     2913.347320     -0.545594       -0.626967       -0.486984       -0.617873       0.195337        10000
(69999,79999)(70000,80000)(70000,80000) chr1  75000   2349.113388     1993.792014     2764.051812     3979.987797     2986.889940     -0.569871       -0.617112       -0.456779       -0.678388       0.220762        10000
(79999,89999)(80000,90000)(80000,90000) chr1  85000   2349.154140     2035.577279     2649.132059     3902.254435     2968.915852     -0.502912       -0.491556       -0.330221       -0.637555       0.214522        10000
(89999,99999)(90000,100000)(90000,100000)       chr1  95000   2349.462773     2048.143641     2533.193917     3881.554872     2964.849262     -0.483190       -0.388552       -0.202228       -0.626111       0.212980        10000
(99999,109999)(100000,110000)(100000,110000)    chr1  105000  2349.306947     2103.402129     2608.611593     3738.658529     2921.030347     -0.394355       -0.404727       -0.285429       -0.558478       0.197881        10000
(109999,119999)(110000,120000)(110000,120000)   chr1  115000  2348.965451     1867.325681     2725.815492     4491.310734     3179.318214     -0.772512       -0.687843       -0.414876       -0.896283       0.287438        10000
(119999,129999)(120000,130000)(120000,130000)   chr1  125000  2349.437816     2077.636124     2623.517860     3755.631838     2916.633993     -0.435861       -0.437286       -0.301676       -0.573043       0.196304        10000

Format is:

(indexStart,indexStop)(posStart,posStop)(regStat,regStop) chrname wincenter tW tP tF tH tL tajD fulif fuliD fayH zengsE

Most likely you are just interest in the wincenter (column 3) and the column 9 which is the Tajima's D statistic.

The first ()()() er mainly used for debugging the sliding window program. The interpretation is

Citation

Korneliussen2013