ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Tutorial: Difference between revisions

From angsd
Jump to navigation Jump to search
No edit summary
Line 314: Line 314:


The filters works across the different analysis classes, so if we supply the dumpCounts we will only get the sites with a maf >0.5%
The filters works across the different analysis classes, so if we supply the dumpCounts we will only get the sites with a maf >0.5%
<div class="toccolours mw-collapsible mw-collapsed" >
./angsd0.530/angsd -bam files.list -doMaf 2 -out tstMaf -doMajorMinor 1 -GL 1 -nInd 20 -minMaf 0.005 -nThreads 10 -doCounts 1 -dumpCounts 3
<pre class="mw-collapsible-content">
paste tstMaf.pos tstMaf.counts |head
chr pos totDepth totA totC totG totT
chr1 13032 21 20 0 0 1
chr1 13038 21 0 1 0 20
chr1 13309 8 0 0 7 1
chr1 13396 34 1 0 0 33
chr1 13482 30 0 1 29 0
chr1 13502 25 1 0 24 0
chr1 13519 26 0 1 0 25
chr1 14933 2 1 0 1 0
chr1 16259 9 0 0 1 8
</pre>
</div>
=Estimating the SFS=
<div class="toccolours mw-collapsible mw-collapsed" >
<div class="toccolours mw-collapsible mw-collapsed" >
./angsd0.530/angsd -bam files.list -doMaf 2 -out tstMaf -doMajorMinor 1 -GL 1 -nInd 20 -minMaf 0.005 -nThreads 10 -doCounts 1 -dumpCounts 3
./angsd0.530/angsd -bam files.list -doMaf 2 -out tstMaf -doMajorMinor 1 -GL 1 -nInd 20 -minMaf 0.005 -nThreads 10 -doCounts 1 -dumpCounts 3

Revision as of 01:36, 3 May 2013

Prepare files

angsd can work on remote bam files therefore first download a list with 23 unrelated europeans from the 1000genomes project.

wget http://popgen.dk/netstuff/files.list


Contents of the file 'files.list'

Understaing angsd options

As a simple reference for the program we have made most of the methods within angsd easy viewable by writing the associated command. All options are given by

-parameter value

It's important that there are no space between the dash and the paramater, it is important that there are a space betwwen the parameter and the value. Futhermore the parameter is casesensitive.

Simply writing angsd will give you the helpscreen.

./angsd

An explanation for every parameter is shown beside the parameter, and for every of these options we can get additional information by typing that parameter solely without any options. An example below for the methods relating to genotype likelihood calculation.

./angsd -GL

This tells you that there are 4 different genotype likelihood models implemented and you can choose accordingly by writing -GL 1 for the SAMtools model. We also see that we can dump the genotype likelihoods in four different ways.

Understanding angsd output

Program catches system signals, if you press ctrl+c, it will therefore stop the filereading, but will let the threads already running finish their jobs. You can therefore press ctrl+c at anytime at expect to get proper output files. After a run has been completed the program will printout a list of the generated files.

An example is below.

./angsd0.530/angsd -bam files.list -doMaf 2 -out tstMaf -doMajorMinor 1 -GL 1 -nInd 20 -minMaf 0.005 -nThreads 10 -doCounts 1 -dumpCounts 3

Getting simple Counts/depth

For some analysis simply getting the sequencing depth for all sites could be of interest, this can of analysis is grouped in the '-doCounts' methods.

./angsd -doCounts

So if we wanted the sum of ACGTS across all samples we could write

./angsd -bam files.list -doCounts 1 -dumpCounts 3 -out tstCounts -nInd 10


Or if we wanted the sequencing depth per sample but only for the good quality data

./angsd -bam files.list -doCounts 1 -dumpCounts 2 -out tstCounts -nInd 10 -minQ 20 -minMapQ 30

Frequencies

We can also estimate the allele frequencies. This we do by using the -doMaf option.

./angsd -doMaf

So if we try to use -doMaf 2 angsd will complain!

./angsd0.530/angsd -bam files.list -doMaf 2 -out tstMaf

So lets decide also estimate the major and minor, what are the options.

./angsd -doMajorMinor

Let us infer the major and minor using the genotype likelihoods.

./angsd0.530/angsd -bam files.list -doMaf 2 -out tstMaf -doMajorMinor 1

So now we need to specify which genotype likelihood model we want to use, let us see what our options are

./angsd0.530/angsd -GL

We pick the same model they use in samtools '-GL 1'.

./angsd0.530/angsd -bam files.list -doMaf 2 -out tstMaf -doMajorMinor 1 -GL 1 -nInd 20

These sites are all invariable so lets filter out the sites with a maf below 0.5%

./angsd0.530/angsd -bam files.list -doMaf 2 -out tstMaf -doMajorMinor 1 -GL 1 -nInd 20 -minMaf 0.005 -nThreads 10

The filters works across the different analysis classes, so if we supply the dumpCounts we will only get the sites with a maf >0.5%

./angsd0.530/angsd -bam files.list -doMaf 2 -out tstMaf -doMajorMinor 1 -GL 1 -nInd 20 -minMaf 0.005 -nThreads 10 -doCounts 1 -dumpCounts 3

Estimating the SFS

./angsd0.530/angsd -bam files.list -doMaf 2 -out tstMaf -doMajorMinor 1 -GL 1 -nInd 20 -minMaf 0.005 -nThreads 10 -doCounts 1 -dumpCounts 3