ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Haploid calling
Simple haploid output based on sampling or consensus.
Major bug in version 0.911 (not in <0.911)
Use the developmental version github
<classdiagram type="dir:LR">
[BAM files{bg:orange}]->[Sequence data|Random base;Consensus base]
[sequence data]->[*.haplo.gz|single base file{bg:blue}] </classdiagram>
Brief Overview
> ./angsd -doHaploCall -> angsd version: 0.910-45-g2b2b4f0-dirty (htslib: 1.2.1-192-ge7e2b3d) build(Jan 3 2016 14:45:41) -> Analysis helpbox/synopsis information: -> Command: ./angsd -doHaploCall -> Sun Jan 3 15:18:15 2016 -------------- abcHaploCall.cpp: -doHaploCall 0 (Sampling strategies) 0: no haploid calling 1: (Sample single base) 2: (Concensus base) -doCounts 0 Must choose -doCount 1 Optional -minMinor 0 Minimum observed minor alleles -maxMis -1 Maximum missing bases (per site)
This function outputs a base for each individual for each site
Options
- -doHaploCall [int]
1; sample a random base 2; most frequent base. Random base for ties
- -doCounts 1
use -doCounts 1 in order to count the bases at each sites after filters.
- -minMinor [int]
Minimum observed minor alleles; only prints sites with more than minMinor sampled alleles (across individuals).
- -maxMis [int]
maximum allowed missing alleles (accross individuals). -maxMis 0 means only sites without missing alleles are printed
Output
- .haplo.gz
Output: Each line represents site. chromsome name (Column 1), position (Column 2), major allele (Column 3). One column for each individual with the sampled allele.
Example
Create a fasta file bases from a random samples of bases.
./angsd -bam bam.filelist -dohaplocall 1 -doCounts 1 -r 1: -minMinor 1
Output
chr pos major ind0 ind1 ind2 ind3 ind4 ind5 ind6 1 14000170 C T T C N C C C 1 14000202 A A N G A N N G 1 14000457 G G G G G G N A 1 14000459 G G G G G A N N 1 14000774 G T G G G G G T 1 14002083 C G N C C C C C 1 14002351 A A C C A C N A 1 14002950 A T A A A T N T 1 14004832 G G G A G G A G 1 14006543 G T G G G G G G 1 14006631 A C N A N A N A 1 14007068 G T T T G G G N 1 14009284 A A C C C N A N 1 14009775 G G G G G C G C 1 14009787 T T T G T G T T 1 14009791 A G G A G A G A 1 14009794 A A A A N N A A 1 14009800 A G A A G N G A 1 14010748 A G N A G A A A
columns are
- chr chromosome - pos position - major major allele (most common of the sampled alleles) - ind0 first individual - same order as in the input files