ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.934/0.935 on github), see Change_log for changes, and download it here.

# SNP calling

# SNP Calling

## Likelihood ratio test

SNPs are called based on their allele frequencies. If a site has a minor allele frequency significantly different from 0 a site is called as polymorphic. The MAF estimate(s) given by -doMaf (see Allele_Frequency_estimation), will be used for a like ratio test by using a chi-square distribution with one degree of freedom for -doMaf 1 and -doMaf 2.

### options

- -SNP_pval [float]

The p-value used for calling snaps. see Allele_Frequency_estimation for additional options

### example

In this example we analyse data from bam files (-bam bam.files), calculate the genotype likelihood using the GATK method (-GL 2), infer the major and minor alleles (-doMajorMinor 1), estimate the allele frequencies assuming known minor (-doMAF 2) and only keep those sites that have a p-value less than 1e-6 of for being variable.

./angsd -bam bam.filelist -GL 2 -out outfile -doMaf 2 -SNP_pval 1e-6 -doMajorMinor 1

### output

the results are given in the file outfile.mafs.gz:

chromo position major minor unknownEM pu-EM nInd 1 14000873 G A 0.282476 0.000000e+00 10 1 14001018 T C 0.259890 7.494005e-14 9 1 14001867 A G 0.272099 6.361578e-14 10 1 14002422 A T 0.377890 0.000000e+00 9 1 14003581 C T 0.194393 5.551115e-16 9 1 14004623 T C 0.259172 2.424727e-13 10 1 14007493 A G 0.297176 5.114086e-07 9 1 14007558 C T 0.381770 0.000000e+00 8 1 14007649 G A 0.220547 1.054967e-11 9 1 14008734 T A 0.242852 0.000000e+00 10 1 14009723 G C 0.255063 2.470836e-07 10 1 14010597 G A 0.315430 0.000000e+00 10 1 14010851 C A 0.276936 0.000000e+00 10 1 14012240 C T 0.297956 0.000000e+00 10

The columns are the chromosome, the position, the major allele, the minor allele, the minor allele estimate, the allele frequency, the p-value and the number of individuals with information.