ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Genotype calling: Difference between revisions

From angsd
Jump to navigation Jump to search
Line 41: Line 41:
==Example==
==Example==
<pre>
<pre>
./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doMajorMinor 1 -SNP_pval-doGeno 5 -doPost 1 -postCutoff 0.95
./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doMajorMinor 1 -SNP_pval 0.000001 -doGeno 5 -doPost 1 -postCutoff 0.95
</pre>
</pre>



Revision as of 11:44, 22 January 2016

We really don't recommend doing analysis based on called genotypes, but incorporate the uncertainty directly into the analysis you want to perform. But we recognise that many methods are still relying on called genotypes, and have therefore implemented a basic genotype caller into angsd.

The program can do genotype calling based either on the genotype til the highest likelihood or by using the frequency as a prior(recommended see Kim2011).

angsd can also use the full information of the sample allele frequencies for calling genotypes see SFS Estimation.

options

-doGeno [int]

1: print out major minor

2: print the called genotype as -1,0,1,2

4: print the called genotype as AA, AC, AG, ...

8: print all 3 posts (major,major),(major,minor),(minor,minor)

16: print the posterior of the called genotype

32: somewhat different dumps the binary posterior for all samples, encoded as 3*nind double

Use the sum of the above to give the output you want. Forexample -doGeno 5 (1+4) prins the major and minor allele followed by the genotype (AA, AC ...) for each individual

-doPost [int]

1: estimate the posterior genotype probability based on the allele frequency as a prior

2: estimate the posterior genotype probability assuming a uniform prior

-geno_minDepth [int]

set genotypes to missing if the individual depth is less than [int]

-geno_maxDepth [int]

set genotypes to missing if the individual depth is larger than [int]

-geno_minMM [float]

set genotypes to missing if less than [float] of the bases and the major or minor (likely a triallic site)

-postCutoff [float]

Call only a genotype with a posterior above this threshold.

NB if the raw posterior dump is requested the -postCutoff is not used

Example

./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doMajorMinor 1 -SNP_pval 0.000001 -doGeno 5 -doPost 1 -postCutoff 0.95

gives a output like this:

1       14000202        G       A       GG      NN      NN      GA      NN      
1       14000873        G       A       GG      GG      GG      AA      GA      
1       14001018        T       C       NN      NN      NN      CC      NN      
1       14001867        A       G       NN      AA      AA      NN      NN      
1       14002342        C       T       CC      CC      CC      CC      CC      
1       14002422        A       T       AA      NN      NN      NN      NN      
1       14002474        T       C       TC      TT      TT      TT      TT      
1       14003581        C       T       CC      CC      NN      NN      CT      
1       14004623        T       C       TT      TT      TT      NN      TC      
1       14005069        A       G       AA      AA      AA      AA      AA