ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Filters

From angsd
Revision as of 13:15, 24 September 2012 by Line (talk | contribs) (→‎Selected Sites)
Jump to navigation Jump to search

In most analysis you are only interested in a subset of sites and not all sites. Currently we have the following filter options.

Selected Sites

-filter [bimfile.bim]
Return likelihoods for the positions in the .bim file. In addition the options -doMaf 2 -GL 1 must be provided. With -doMajorMinor 3 the major/minor alleles from the bim file is used.

Allele frequencies

-minMaf [float]
only work with sites with a maf above 'float'

polymorphic sites

-minLRT [float]
only work with sits with an LRT>float

Major minor

Number of non missing individuals

-minKeepInd [int]
only work with sites with information from atleast int individiduals



First we do a run with no filters

./angsd  -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1:
...
head TSK.mafs 
chromo	position	major	minor	knownEM	nInd
1	13999919	A	C	0.000008	1
1	13999920	G	A	0.000008	1
1	13999921	G	A	0.000008	1
1	13999922	C	A	0.000008	1
1	13999923	A	C	0.000008	1
1	13999924	G	A	0.000008	1
1	13999925	G	A	0.000008	1
1	13999926	A	C	0.000008	1
1	13999927	G	A	0.000008	1

Now we do a filter with MAF cutoff of 1\%

../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minMaf 0.01
head TSK.mafs 
chromo	position	major	minor	knownEM	nInd
1	13999950	T	G	0.495291	2
1	14000019	G	T	0.047247	9
1	14000056	C	T	0.055851	10
1	14000127	G	T	0.060760	10
1	14000170	C	T	0.052388	9
1	14000176	G	A	0.047928	10
1	14000202	G	A	0.279722	9
1	14000262	C	T	0.058555	9
1	14000322	A	G	0.040471	8

Similar if we only want sites with information for atleast 5 samples

../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minKeepInd 5
head TSK.mafs 
chromo	position	major	minor	knownEM	nInd
1	13999971	T	A	0.000007	6
1	13999972	G	A	0.000007	6
1	13999973	C	A	0.000005	5
1	13999974	G	A	0.000006	6
1	13999975	C	A	0.000002	5
1	13999976	C	A	0.000004	7
1	13999977	A	C	0.000005	8
1	13999978	C	A	0.000005	8
1	13999979	T	A	0.000005	8

If we are interested in all sites with a p-value of 10^(-6) of being variable

../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minLRT 24 -doSNP 1 
head TSK.mafs 
chromo	position	major	minor	knownEM	pK-EM	nInd
1	14000202	G	A	0.279722	42.623150	9
1	14000873	G	A	0.212120	79.118476	10
1	14001018	T	C	0.333736	89.040311	8
1	14001867	A	G	0.200232	47.195423	10
1	14002422	A	T	0.167692	43.196259	9
1	14003581	C	T	0.207404	58.593208	9
1	14004623	T	C	0.219838	102.856433	10
1	14007493	A	G	0.453217	28.398647	9
1	14007558	C	T	0.395670	80.236777	7


Deprecated options

These options should either be included (as is) or be discarded

-minDepth
-maxDepth