ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Allele Counts: Difference between revisions
No edit summary |
|||
Line 39: | Line 39: | ||
;-trim [0] | ;-trim [0] | ||
Default 0. Trim ends of the reads. Useful for aDNA. | Default 0. Trim ends of the reads. Useful for aDNA. | ||
;-setMinDepth | |||
Default -1. If the total depth is below this value, then discard the site for all analysis. | |||
;-setMaxDepth | |||
Default -1. If the total depth is above this value, then discard the site for all analysis. | |||
; -dumpCounts [int] | ; -dumpCounts [int] | ||
see below | see below |
Revision as of 15:14, 29 November 2013
Sometimes we want or need the frequency of the different bases. This is what -doCounts does.
You can refine which bases to be included using the filter parameters -minMapQ/-minQ/-trim. Based on the total depth for each you can discard sites for further analysis if the total depth is below/above some threshold -setMaxDepth/setMinDepth, and you can discard a site if the effective sample size is below some threshold -minInd.
You can dump summary statistics such as qscore distribution -doQsDist, depth distribution -doDepth, or various per site counts -dumpCounts
Brief Overview
./angsd -doCounts -> angsd version: 0.560 build(Nov 28 2013 16:47:03) -> Analysis helpbox/synopsis information: --------------- analysisCount.cpp: -doCounts 0 (Count the number A,C,G,T. All sites, All samples) -minQ 13 (remove bases with qscore<minQ) -setMaxDepth -1 (If total depth is larger then site is removed from analysis. -1 indicates no filtering) -setMinDepth -1 (If total depth is smaller then site is removed from analysis. -1 indicates no filtering) -trim 0 (trim ends of reads) -minInd 0 (Discard site if effective sample size below value. 0 indicates no filtering) Filedumping: -doDepth 0 (dump distribution of seqdepth) .depthSample,.depthGlobal -maxDepth 100 (bin together high depths) -doQsDist 0 (dump distribution of qscores) .qs -dumpCounts 0 1: total seqdepth for site .pos.gz 2: seqdepth persample .pos.gz,.counts.gz 3: A,C,G,T sum all samples .pos.gz,.counts.gz 4: A,C,G,T sum every sample .pos.gz,.counts.gz
Options
Filtering
- -minQ [int]
Default 13, Discard bases with a qscore below this threshold.
- -trim [0]
Default 0. Trim ends of the reads. Useful for aDNA.
- -setMinDepth
Default -1. If the total depth is below this value, then discard the site for all analysis.
- -setMaxDepth
Default -1. If the total depth is above this value, then discard the site for all analysis.
- -dumpCounts [int]
see below
- -minQ [int]
default 13. The minimum allowed base quality score.
- -minInd [int]
default 0. Remove sites were less than 'minInd' individuals have at least one read.
printing counts
- -dumpCounts [int]
1: print overall depth in the .pos file. This depth is the sum of reads covering a sites for all individuals. The first colum is the chromosome, the second it the position the third is the total depth
1 13999959 3 1 13999960 3 1 13999961 3 1 13999962 3 1 13999963 4 1 13999964 5 1 13999965 6 1 13999966 6 1 13999967 6 1 13999968 6
2: prints the depth of each individual. Example of the depth of 10 individuals. Each line corresponce to the same line in the postion file.
0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 1 0 3 0 0 0 0 0 1 0 1 0 3 1 0 0 0 0 1 0 1 0 3 1 0 0 0 0 1 0 1 0 3 1 0 0 0 0 1 0 1 0 3 1 0 0 0 0 1 0 1 0 3
3: Prints the depth for each of the four bases for each indivial for each site. Example with the first four column belonging to the first individuals the counts of the number of A C G and Ts. Only two indivduals are shown. Each line corresponce to the same line in the postion file.
0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 ... 0 1 0 0 0 0 0 0 ... 0 0 1 0 0 0 0 0 ... 1 0 0 0 0 0 0 0 ... 0 0 1 0 0 0 0 0 ...
requred
In order to print the counts the options '-doCounts' have to be used and the input data needs to be sequence data.
Example
Print the individuals depth from bam files
./angsd -out out -doCounts 1 -dumpCounts 2 -bam bam.filelist