ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.918/0.919 on github), see Change_log for changes, and download it here.

Haploid calling

From angsd
Jump to: navigation, search

Simple haploid output based on sampling or consensus. Latest github version of angsd has a small utility program in the misc folde that converts to plink output (tfam/tped).


Major bug in version 0.911 (not in <0.911)

Use the developmental version github




Brief Overview

> ./angsd -doHaploCall
	-> angsd version: 0.910-45-g2b2b4f0-dirty (htslib: 1.2.1-192-ge7e2b3d) build(Jan  3 2016 14:45:41)
	-> Analysis helpbox/synopsis information:
	-> Command: 
./angsd -doHaploCall 	-> Sun Jan  3 15:18:15 2016
--------------
abcHaploCall.cpp:
	-doHaploCall	0
	(Sampling strategies)
	 0:	 no haploid calling 
	 1:	 (Sample single base)
	 2:	 (Concensus base)
	-doCounts	0	Must choose -doCount 1
Optional
	-minMinor	0	Minimum observed minor alleles
	-maxMis	-1	Maximum missing bases (per site)


This function outputs a base for each individual for each site

Options

-doHaploCall [int]

1; sample a random base 2; most frequent base. Random base for ties

-doCounts 1

use -doCounts 1 in order to count the bases at each sites after filters.

-minMinor [int]

Minimum observed minor alleles; only prints sites with more than minMinor sampled alleles (across individuals).

-maxMis [int]

maximum allowed missing alleles (accross individuals). -maxMis 0 means only sites without missing alleles are printed


Output

  • .haplo.gz

Output: Each line represents site. chromsome name (Column 1), position (Column 2), major allele (Column 3). One column for each individual with the sampled allele.

Example

Create a fasta file bases from a random samples of bases.

./angsd -bam bam.filelist -dohaplocall 1 -doCounts 1 -r 1: -minMinor 1

Output

chr	pos	major	ind0	ind1	ind2	ind3	ind4	ind5	ind6
1	14000170	C	T	T	C	N	C	C	C
1	14000202	A	A	N	G	A	N	N	G
1	14000457	G	G	G	G	G	G	N	A
1	14000459	G	G	G	G	G	A	N	N
1	14000774	G	T	G	G	G	G	G	T
1	14002083	C	G	N	C	C	C	C	C
1	14002351	A	A	C	C	A	C	N	A
1	14002950	A	T	A	A	A	T	N	T
1	14004832	G	G	G	A	G	G	A	G
1	14006543	G	T	G	G	G	G	G	G
1	14006631	A	C	N	A	N	A	N	A
1	14007068	G	T	T	T	G	G	G	N
1	14009284	A	A	C	C	C	N	A	N
1	14009775	G	G	G	G	G	C	G	C
1	14009787	T	T	T	G	T	G	T	T
1	14009791	A	G	G	A	G	A	G	A
1	14009794	A	A	A	A	N	N	A	A
1	14009800	A	G	A	A	G	N	G	A
1	14010748	A	G	N	A	G	A	A	A

columns are

chr

chromosome

pos

position

major

major allele (most common of the sampled alleles)

ind0

first individual - same order as in the input files