 <?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.popgen.dk/angsd/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Albrecht</id>
	<title>angsd - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://www.popgen.dk/angsd/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Albrecht"/>
	<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php/Special:Contributions/Albrecht"/>
	<updated>2026-04-07T21:27:32Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.40.1</generator>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Test&amp;diff=3184</id>
		<title>Test</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Test&amp;diff=3184"/>
		<updated>2023-10-24T09:05:26Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: Created page with &amp;quot;# hello a sdf a fa sdfa&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;# hello&lt;br /&gt;
a&lt;br /&gt;
sdf&lt;br /&gt;
a&lt;br /&gt;
fa&lt;br /&gt;
sdfa&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Fasta&amp;diff=3175</id>
		<title>Fasta</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Fasta&amp;diff=3175"/>
		<updated>2023-02-06T16:45:55Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* Ancestral fasta using multiple outgroup */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
This option creates a fasta.gz file from a sequencing data file (BAM file). The function uses genome information in the BAM header to determine the length and chromosome names. For the sites without data an &amp;quot;N&amp;quot; is written. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
 [Single BAM file{bg:orange}]-&amp;gt;[Sequence data|Random base (-doFasta 1);Consensus base (-doFasta 2);Highest EBD (-doFasta 3); write iupac (-doFasta 4)]&lt;br /&gt;
[sequence data]-&amp;gt;doFasta[fasta file{bg:blue}]&lt;br /&gt;
 &amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
 [Multiple BAM files{bg:orange}]-&amp;gt;[Sequence data|Random base (-doFasta 1);Consensus base (-doFasta 2);write iupac (-doFasta 4)]&lt;br /&gt;
[sequence data]-&amp;gt;doFasta[fasta file{bg:blue}]&lt;br /&gt;
 &amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This can be used as input for the ANGSD analysis:&lt;br /&gt;
# [[Error estimation]]&lt;br /&gt;
# [[ABBA-BABA]]&lt;br /&gt;
&lt;br /&gt;
The iupac output code was kindly provided by Kristian Ullrich.&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -dofasta 	-&amp;gt; Tue Sep 26 17:02:07 2017&lt;br /&gt;
--------------&lt;br /&gt;
abcWriteFasta.cpp:&lt;br /&gt;
	-doFasta	0&lt;br /&gt;
	1: use a random (non N) base (needs -doCounts 1)&lt;br /&gt;
	2: use the most common (non N) base (needs -doCounts 1)&lt;br /&gt;
	3: use the base with highest ebd (under development) &lt;br /&gt;
	4: output iupac codes (under development) &lt;br /&gt;
	-basesPerLine	50	(Number of bases perline in output file)&lt;br /&gt;
	-explode	0	 print chromosome where we have no data (0:no,1:yes)&lt;br /&gt;
	-rmTrans	0	 remove transitions as different from -ref bases (0:no,1:yes)&lt;br /&gt;
	-ref	(null)	 reference fasta, only used with -rmTrans 1&lt;br /&gt;
	-seed	0	 use non random seed of value 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This function will dump a fasta file, the full header information from the SAM/BAM file will be used. This means that a fasta will be generated for the entire chromosome even if '-r/-rf -sites' is used.&lt;br /&gt;
&lt;br /&gt;
=Options=&lt;br /&gt;
;-doFasta 1: sample a random base at each position. N's or filtered based are ignored. The &amp;quot;-doCounts 1&amp;quot; options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base. If multiple individuals are used the four bases are counted across individuals. &lt;br /&gt;
&lt;br /&gt;
;-doFasta 2: use the most common base. In the case of ties a random base is chosen among the bases with the same maximum counts. N's or filtered based are ignored. The &amp;quot;-doCounts 1&amp;quot; options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base. If multiple individuals are used the four bases are counted across individuals. &lt;br /&gt;
&lt;br /&gt;
;-doFasta 3: use the base with thie highest effective depth (EBD). This only works for one individual&lt;br /&gt;
&lt;br /&gt;
;-basesPerLine	[INT]&lt;br /&gt;
Number of bases perline in output fasta file (default is 50)&lt;br /&gt;
&lt;br /&gt;
;-explode	[INT]	&lt;br /&gt;
0 (default) only output chromosomes with data. 1: write out all chromosomes &lt;br /&gt;
;-rmTrans [INT]&lt;br /&gt;
0 (default) all sites are used. 1: Remove transition. Here transitions are determined using a fasta file such as a reference genome. &lt;br /&gt;
;-ref [fileName]&lt;br /&gt;
a fasta file used to determine if a site is a transitions (needed when using -rmTrans 1 is used)&lt;br /&gt;
;-seed [INT]&lt;br /&gt;
Use a seed in order to replicate results ( relevant when using random sample -dofasta 1 )&lt;br /&gt;
&lt;br /&gt;
For filters see [[Filters]]&lt;br /&gt;
&lt;br /&gt;
=Output=&lt;br /&gt;
Output is a fasta file, a normal looking fast file. Nothing special about this. For -doFasta 1, sometimes its big letters sometime small letters. This is due to the results being copied directly from the sequencing data. So small/big letters correspond to which strand for the original data. For the consensus fasta all letters are capital letters.&lt;br /&gt;
&lt;br /&gt;
==Example==&lt;br /&gt;
Create a fasta file bases from a random samples of bases.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -i bams/smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=EBD=&lt;br /&gt;
For four bases we have 4 different EBD, each EBD is the product of the mapping quality and scores for the base under consideration.&lt;br /&gt;
The EBD is the effective base depth, as defined by [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3638139/]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
EBD_b = \sum_{i=1}^{N_b} (phred(mapq_i)*phred(qscore_i)),\qquad phred(q) =10^{-q/10} \qquad b \in \{A,C,G,T\}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt; is a certain base, &amp;lt;math&amp;gt;N_b&amp;lt;/math&amp;gt; is the number of reads with base &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Ancestral fasta using multiple outgroups=&lt;br /&gt;
If you have outgroup species map to your reference genome and you want to use them to make a fasta file with ancestral alleles. You can use one or more outgroup individuals e.g. for human you could have a four outgroup bam file from a chimp, a bonobo, a gorrilla and a orangotan. Assuming you want to make a fasta file where the alleles is the same for all outgroup species then you can use a command like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -b fourOutgroup.bamlist -out myFasta -doCounts 1 -snp_pval 0.01 -domaf 1 -domajorminor 1 -gl 2 -rmSNPs 1 -minind 4 -setMinDepthInd 10 -explode 1&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''-b fourOutgroup.bamlist''' contains the bam files for four outgroup individuals.&lt;br /&gt;
&lt;br /&gt;
'''-out myFasta''' the output name&lt;br /&gt;
&lt;br /&gt;
'''-doCounts 1''' counts bases accross individuals to determine the concensus allele&lt;br /&gt;
&lt;br /&gt;
'''-snp_pval 0.01''' p-value threshold for defining a SNP. A lower threshold will need more evidence to call a SNPs. &lt;br /&gt;
&lt;br /&gt;
''' -domaf 1''' estimate allele frequency (use to call SNPs) with that the major and minor alleles inferred from  data&lt;br /&gt;
&lt;br /&gt;
'''-domajorminor 1''' infer the major and minor allele from data&lt;br /&gt;
&lt;br /&gt;
'''-gl 2'''  use genotype likelihoods based on the GATK model&lt;br /&gt;
&lt;br /&gt;
''' -rmSNPs 1''' remove polymorphic sites. instead of keeping sites that are polymorphic then we remove them such that all outgroups have the same allele.  &lt;br /&gt;
&lt;br /&gt;
''' -minind 4''' remove site where you don't have data for all four individuals&lt;br /&gt;
 &lt;br /&gt;
'''-setMinDepthInd 10''' require at least 10 read for each individual&lt;br /&gt;
&lt;br /&gt;
''' -explode 1''' make the fasta file for the whole genome not just the chromosomes/scaffolds were you have data&lt;br /&gt;
&lt;br /&gt;
The sites that are polymorphic or do not have enough data will be labeled as 'N' in the fasta file&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Fasta&amp;diff=3174</id>
		<title>Fasta</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Fasta&amp;diff=3174"/>
		<updated>2023-02-06T16:45:47Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* Ancestral fasta */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
This option creates a fasta.gz file from a sequencing data file (BAM file). The function uses genome information in the BAM header to determine the length and chromosome names. For the sites without data an &amp;quot;N&amp;quot; is written. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
 [Single BAM file{bg:orange}]-&amp;gt;[Sequence data|Random base (-doFasta 1);Consensus base (-doFasta 2);Highest EBD (-doFasta 3); write iupac (-doFasta 4)]&lt;br /&gt;
[sequence data]-&amp;gt;doFasta[fasta file{bg:blue}]&lt;br /&gt;
 &amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
 [Multiple BAM files{bg:orange}]-&amp;gt;[Sequence data|Random base (-doFasta 1);Consensus base (-doFasta 2);write iupac (-doFasta 4)]&lt;br /&gt;
[sequence data]-&amp;gt;doFasta[fasta file{bg:blue}]&lt;br /&gt;
 &amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This can be used as input for the ANGSD analysis:&lt;br /&gt;
# [[Error estimation]]&lt;br /&gt;
# [[ABBA-BABA]]&lt;br /&gt;
&lt;br /&gt;
The iupac output code was kindly provided by Kristian Ullrich.&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -dofasta 	-&amp;gt; Tue Sep 26 17:02:07 2017&lt;br /&gt;
--------------&lt;br /&gt;
abcWriteFasta.cpp:&lt;br /&gt;
	-doFasta	0&lt;br /&gt;
	1: use a random (non N) base (needs -doCounts 1)&lt;br /&gt;
	2: use the most common (non N) base (needs -doCounts 1)&lt;br /&gt;
	3: use the base with highest ebd (under development) &lt;br /&gt;
	4: output iupac codes (under development) &lt;br /&gt;
	-basesPerLine	50	(Number of bases perline in output file)&lt;br /&gt;
	-explode	0	 print chromosome where we have no data (0:no,1:yes)&lt;br /&gt;
	-rmTrans	0	 remove transitions as different from -ref bases (0:no,1:yes)&lt;br /&gt;
	-ref	(null)	 reference fasta, only used with -rmTrans 1&lt;br /&gt;
	-seed	0	 use non random seed of value 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This function will dump a fasta file, the full header information from the SAM/BAM file will be used. This means that a fasta will be generated for the entire chromosome even if '-r/-rf -sites' is used.&lt;br /&gt;
&lt;br /&gt;
=Options=&lt;br /&gt;
;-doFasta 1: sample a random base at each position. N's or filtered based are ignored. The &amp;quot;-doCounts 1&amp;quot; options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base. If multiple individuals are used the four bases are counted across individuals. &lt;br /&gt;
&lt;br /&gt;
;-doFasta 2: use the most common base. In the case of ties a random base is chosen among the bases with the same maximum counts. N's or filtered based are ignored. The &amp;quot;-doCounts 1&amp;quot; options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base. If multiple individuals are used the four bases are counted across individuals. &lt;br /&gt;
&lt;br /&gt;
;-doFasta 3: use the base with thie highest effective depth (EBD). This only works for one individual&lt;br /&gt;
&lt;br /&gt;
;-basesPerLine	[INT]&lt;br /&gt;
Number of bases perline in output fasta file (default is 50)&lt;br /&gt;
&lt;br /&gt;
;-explode	[INT]	&lt;br /&gt;
0 (default) only output chromosomes with data. 1: write out all chromosomes &lt;br /&gt;
;-rmTrans [INT]&lt;br /&gt;
0 (default) all sites are used. 1: Remove transition. Here transitions are determined using a fasta file such as a reference genome. &lt;br /&gt;
;-ref [fileName]&lt;br /&gt;
a fasta file used to determine if a site is a transitions (needed when using -rmTrans 1 is used)&lt;br /&gt;
;-seed [INT]&lt;br /&gt;
Use a seed in order to replicate results ( relevant when using random sample -dofasta 1 )&lt;br /&gt;
&lt;br /&gt;
For filters see [[Filters]]&lt;br /&gt;
&lt;br /&gt;
=Output=&lt;br /&gt;
Output is a fasta file, a normal looking fast file. Nothing special about this. For -doFasta 1, sometimes its big letters sometime small letters. This is due to the results being copied directly from the sequencing data. So small/big letters correspond to which strand for the original data. For the consensus fasta all letters are capital letters.&lt;br /&gt;
&lt;br /&gt;
==Example==&lt;br /&gt;
Create a fasta file bases from a random samples of bases.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -i bams/smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=EBD=&lt;br /&gt;
For four bases we have 4 different EBD, each EBD is the product of the mapping quality and scores for the base under consideration.&lt;br /&gt;
The EBD is the effective base depth, as defined by [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3638139/]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
EBD_b = \sum_{i=1}^{N_b} (phred(mapq_i)*phred(qscore_i)),\qquad phred(q) =10^{-q/10} \qquad b \in \{A,C,G,T\}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt; is a certain base, &amp;lt;math&amp;gt;N_b&amp;lt;/math&amp;gt; is the number of reads with base &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Ancestral fasta using multiple outgroup=&lt;br /&gt;
If you have outgroup species map to your reference genome and you want to use them to make a fasta file with ancestral alleles. You can use one or more outgroup individuals e.g. for human you could have a four outgroup bam file from a chimp, a bonobo, a gorrilla and a orangotan. Assuming you want to make a fasta file where the alleles is the same for all outgroup species then you can use a command like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -b fourOutgroup.bamlist -out myFasta -doCounts 1 -snp_pval 0.01 -domaf 1 -domajorminor 1 -gl 2 -rmSNPs 1 -minind 4 -setMinDepthInd 10 -explode 1&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''-b fourOutgroup.bamlist''' contains the bam files for four outgroup individuals.&lt;br /&gt;
&lt;br /&gt;
'''-out myFasta''' the output name&lt;br /&gt;
&lt;br /&gt;
'''-doCounts 1''' counts bases accross individuals to determine the concensus allele&lt;br /&gt;
&lt;br /&gt;
'''-snp_pval 0.01''' p-value threshold for defining a SNP. A lower threshold will need more evidence to call a SNPs. &lt;br /&gt;
&lt;br /&gt;
''' -domaf 1''' estimate allele frequency (use to call SNPs) with that the major and minor alleles inferred from  data&lt;br /&gt;
&lt;br /&gt;
'''-domajorminor 1''' infer the major and minor allele from data&lt;br /&gt;
&lt;br /&gt;
'''-gl 2'''  use genotype likelihoods based on the GATK model&lt;br /&gt;
&lt;br /&gt;
''' -rmSNPs 1''' remove polymorphic sites. instead of keeping sites that are polymorphic then we remove them such that all outgroups have the same allele.  &lt;br /&gt;
&lt;br /&gt;
''' -minind 4''' remove site where you don't have data for all four individuals&lt;br /&gt;
 &lt;br /&gt;
'''-setMinDepthInd 10''' require at least 10 read for each individual&lt;br /&gt;
&lt;br /&gt;
''' -explode 1''' make the fasta file for the whole genome not just the chromosomes/scaffolds were you have data&lt;br /&gt;
&lt;br /&gt;
The sites that are polymorphic or do not have enough data will be labeled as 'N' in the fasta file&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Fasta&amp;diff=3173</id>
		<title>Fasta</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Fasta&amp;diff=3173"/>
		<updated>2023-02-06T16:44:52Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* Options */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
This option creates a fasta.gz file from a sequencing data file (BAM file). The function uses genome information in the BAM header to determine the length and chromosome names. For the sites without data an &amp;quot;N&amp;quot; is written. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
 [Single BAM file{bg:orange}]-&amp;gt;[Sequence data|Random base (-doFasta 1);Consensus base (-doFasta 2);Highest EBD (-doFasta 3); write iupac (-doFasta 4)]&lt;br /&gt;
[sequence data]-&amp;gt;doFasta[fasta file{bg:blue}]&lt;br /&gt;
 &amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
 [Multiple BAM files{bg:orange}]-&amp;gt;[Sequence data|Random base (-doFasta 1);Consensus base (-doFasta 2);write iupac (-doFasta 4)]&lt;br /&gt;
[sequence data]-&amp;gt;doFasta[fasta file{bg:blue}]&lt;br /&gt;
 &amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This can be used as input for the ANGSD analysis:&lt;br /&gt;
# [[Error estimation]]&lt;br /&gt;
# [[ABBA-BABA]]&lt;br /&gt;
&lt;br /&gt;
The iupac output code was kindly provided by Kristian Ullrich.&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -dofasta 	-&amp;gt; Tue Sep 26 17:02:07 2017&lt;br /&gt;
--------------&lt;br /&gt;
abcWriteFasta.cpp:&lt;br /&gt;
	-doFasta	0&lt;br /&gt;
	1: use a random (non N) base (needs -doCounts 1)&lt;br /&gt;
	2: use the most common (non N) base (needs -doCounts 1)&lt;br /&gt;
	3: use the base with highest ebd (under development) &lt;br /&gt;
	4: output iupac codes (under development) &lt;br /&gt;
	-basesPerLine	50	(Number of bases perline in output file)&lt;br /&gt;
	-explode	0	 print chromosome where we have no data (0:no,1:yes)&lt;br /&gt;
	-rmTrans	0	 remove transitions as different from -ref bases (0:no,1:yes)&lt;br /&gt;
	-ref	(null)	 reference fasta, only used with -rmTrans 1&lt;br /&gt;
	-seed	0	 use non random seed of value 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This function will dump a fasta file, the full header information from the SAM/BAM file will be used. This means that a fasta will be generated for the entire chromosome even if '-r/-rf -sites' is used.&lt;br /&gt;
&lt;br /&gt;
=Options=&lt;br /&gt;
;-doFasta 1: sample a random base at each position. N's or filtered based are ignored. The &amp;quot;-doCounts 1&amp;quot; options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base. If multiple individuals are used the four bases are counted across individuals. &lt;br /&gt;
&lt;br /&gt;
;-doFasta 2: use the most common base. In the case of ties a random base is chosen among the bases with the same maximum counts. N's or filtered based are ignored. The &amp;quot;-doCounts 1&amp;quot; options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base. If multiple individuals are used the four bases are counted across individuals. &lt;br /&gt;
&lt;br /&gt;
;-doFasta 3: use the base with thie highest effective depth (EBD). This only works for one individual&lt;br /&gt;
&lt;br /&gt;
;-basesPerLine	[INT]&lt;br /&gt;
Number of bases perline in output fasta file (default is 50)&lt;br /&gt;
&lt;br /&gt;
;-explode	[INT]	&lt;br /&gt;
0 (default) only output chromosomes with data. 1: write out all chromosomes &lt;br /&gt;
;-rmTrans [INT]&lt;br /&gt;
0 (default) all sites are used. 1: Remove transition. Here transitions are determined using a fasta file such as a reference genome. &lt;br /&gt;
;-ref [fileName]&lt;br /&gt;
a fasta file used to determine if a site is a transitions (needed when using -rmTrans 1 is used)&lt;br /&gt;
;-seed [INT]&lt;br /&gt;
Use a seed in order to replicate results ( relevant when using random sample -dofasta 1 )&lt;br /&gt;
&lt;br /&gt;
For filters see [[Filters]]&lt;br /&gt;
&lt;br /&gt;
=Output=&lt;br /&gt;
Output is a fasta file, a normal looking fast file. Nothing special about this. For -doFasta 1, sometimes its big letters sometime small letters. This is due to the results being copied directly from the sequencing data. So small/big letters correspond to which strand for the original data. For the consensus fasta all letters are capital letters.&lt;br /&gt;
&lt;br /&gt;
==Example==&lt;br /&gt;
Create a fasta file bases from a random samples of bases.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -i bams/smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=EBD=&lt;br /&gt;
For four bases we have 4 different EBD, each EBD is the product of the mapping quality and scores for the base under consideration.&lt;br /&gt;
The EBD is the effective base depth, as defined by [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3638139/]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
EBD_b = \sum_{i=1}^{N_b} (phred(mapq_i)*phred(qscore_i)),\qquad phred(q) =10^{-q/10} \qquad b \in \{A,C,G,T\}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt; is a certain base, &amp;lt;math&amp;gt;N_b&amp;lt;/math&amp;gt; is the number of reads with base &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Ancestral fasta=&lt;br /&gt;
If you have outgroup species map to your reference genome and you want to use them to make a fasta file with ancestral alleles. You can use one or more outgroup individuals e.g. for human you could have a four outgroup bam file from a chimp, a bonobo, a gorrilla and a orangotan. Assuming you want to make a fasta file where the alleles is the same for all outgroup species then you can use a command like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -b fourOutgroup.bamlist -out myFasta -doCounts 1 -snp_pval 0.01 -domaf 1 -domajorminor 1 -gl 2 -rmSNPs 1 -minind 4 -setMinDepthInd 10 -explode 1&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''-b fourOutgroup.bamlist''' contains the bam files for four outgroup individuals.&lt;br /&gt;
&lt;br /&gt;
'''-out myFasta''' the output name&lt;br /&gt;
&lt;br /&gt;
'''-doCounts 1''' counts bases accross individuals to determine the concensus allele&lt;br /&gt;
&lt;br /&gt;
'''-snp_pval 0.01''' p-value threshold for defining a SNP. A lower threshold will need more evidence to call a SNPs. &lt;br /&gt;
&lt;br /&gt;
''' -domaf 1''' estimate allele frequency (use to call SNPs) with that the major and minor alleles inferred from  data&lt;br /&gt;
&lt;br /&gt;
'''-domajorminor 1''' infer the major and minor allele from data&lt;br /&gt;
&lt;br /&gt;
'''-gl 2'''  use genotype likelihoods based on the GATK model&lt;br /&gt;
&lt;br /&gt;
''' -rmSNPs 1''' remove polymorphic sites. instead of keeping sites that are polymorphic then we remove them such that all outgroups have the same allele.  &lt;br /&gt;
&lt;br /&gt;
''' -minind 4''' remove site where you don't have data for all four individuals&lt;br /&gt;
 &lt;br /&gt;
'''-setMinDepthInd 10''' require at least 10 read for each individual&lt;br /&gt;
&lt;br /&gt;
''' -explode 1''' make the fasta file for the whole genome not just the chromosomes/scaffolds were you have data&lt;br /&gt;
&lt;br /&gt;
The sites that are polymorphic or do not have enough data will be labeled as 'N' in the fasta file&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Fasta&amp;diff=3172</id>
		<title>Fasta</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Fasta&amp;diff=3172"/>
		<updated>2023-02-06T16:43:52Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* Ancestral fasta */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
This option creates a fasta.gz file from a sequencing data file (BAM file). The function uses genome information in the BAM header to determine the length and chromosome names. For the sites without data an &amp;quot;N&amp;quot; is written. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
 [Single BAM file{bg:orange}]-&amp;gt;[Sequence data|Random base (-doFasta 1);Consensus base (-doFasta 2);Highest EBD (-doFasta 3); write iupac (-doFasta 4)]&lt;br /&gt;
[sequence data]-&amp;gt;doFasta[fasta file{bg:blue}]&lt;br /&gt;
 &amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
 [Multiple BAM files{bg:orange}]-&amp;gt;[Sequence data|Random base (-doFasta 1);Consensus base (-doFasta 2);write iupac (-doFasta 4)]&lt;br /&gt;
[sequence data]-&amp;gt;doFasta[fasta file{bg:blue}]&lt;br /&gt;
 &amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This can be used as input for the ANGSD analysis:&lt;br /&gt;
# [[Error estimation]]&lt;br /&gt;
# [[ABBA-BABA]]&lt;br /&gt;
&lt;br /&gt;
The iupac output code was kindly provided by Kristian Ullrich.&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -dofasta 	-&amp;gt; Tue Sep 26 17:02:07 2017&lt;br /&gt;
--------------&lt;br /&gt;
abcWriteFasta.cpp:&lt;br /&gt;
	-doFasta	0&lt;br /&gt;
	1: use a random (non N) base (needs -doCounts 1)&lt;br /&gt;
	2: use the most common (non N) base (needs -doCounts 1)&lt;br /&gt;
	3: use the base with highest ebd (under development) &lt;br /&gt;
	4: output iupac codes (under development) &lt;br /&gt;
	-basesPerLine	50	(Number of bases perline in output file)&lt;br /&gt;
	-explode	0	 print chromosome where we have no data (0:no,1:yes)&lt;br /&gt;
	-rmTrans	0	 remove transitions as different from -ref bases (0:no,1:yes)&lt;br /&gt;
	-ref	(null)	 reference fasta, only used with -rmTrans 1&lt;br /&gt;
	-seed	0	 use non random seed of value 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This function will dump a fasta file, the full header information from the SAM/BAM file will be used. This means that a fasta will be generated for the entire chromosome even if '-r/-rf -sites' is used.&lt;br /&gt;
&lt;br /&gt;
=Options=&lt;br /&gt;
;-doFasta 1: sample a random base at each position. N's or filtered based are ignored. The &amp;quot;-doCounts 1&amp;quot; options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base. If multiple individuals are used the four bases are counted across individuals. &lt;br /&gt;
&lt;br /&gt;
;-doFasta 2: use the most common base. In the case of ties a random base is chosen among the bases with the same maximum counts. N's or filtered based are ignored. The &amp;quot;-doCounts 1&amp;quot; options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base. If multiple individuals are used the four bases are counted across individuals. &lt;br /&gt;
&lt;br /&gt;
;-doFasta 3: use the base with thie highest effective depth (EBD). This only works for one individual&lt;br /&gt;
&lt;br /&gt;
;-basesPerLine	[INT]&lt;br /&gt;
Number of bases perline in output fasta file (default is 50)&lt;br /&gt;
&lt;br /&gt;
;-explode	[INT]	&lt;br /&gt;
0 (default) only output chromosomes with data. 1: write out all chromosomes &lt;br /&gt;
;-rmTrans [INT]&lt;br /&gt;
0 (default) all sites are used. 1: Remove transition. Here transitions are determined using a fasta file such as a reference genome. &lt;br /&gt;
;-ref [fileName]&lt;br /&gt;
a fasta file used to determine if a site is a transitions (needed when using -rmTrans 1 is used)&lt;br /&gt;
;-seed [INT]&lt;br /&gt;
Use a seed in order to replicate results&lt;br /&gt;
&lt;br /&gt;
For filters see [[Filters]]&lt;br /&gt;
&lt;br /&gt;
=Output=&lt;br /&gt;
Output is a fasta file, a normal looking fast file. Nothing special about this. For -doFasta 1, sometimes its big letters sometime small letters. This is due to the results being copied directly from the sequencing data. So small/big letters correspond to which strand for the original data. For the consensus fasta all letters are capital letters.&lt;br /&gt;
&lt;br /&gt;
==Example==&lt;br /&gt;
Create a fasta file bases from a random samples of bases.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -i bams/smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=EBD=&lt;br /&gt;
For four bases we have 4 different EBD, each EBD is the product of the mapping quality and scores for the base under consideration.&lt;br /&gt;
The EBD is the effective base depth, as defined by [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3638139/]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
EBD_b = \sum_{i=1}^{N_b} (phred(mapq_i)*phred(qscore_i)),\qquad phred(q) =10^{-q/10} \qquad b \in \{A,C,G,T\}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt; is a certain base, &amp;lt;math&amp;gt;N_b&amp;lt;/math&amp;gt; is the number of reads with base &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Ancestral fasta=&lt;br /&gt;
If you have outgroup species map to your reference genome and you want to use them to make a fasta file with ancestral alleles. You can use one or more outgroup individuals e.g. for human you could have a four outgroup bam file from a chimp, a bonobo, a gorrilla and a orangotan. Assuming you want to make a fasta file where the alleles is the same for all outgroup species then you can use a command like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -b fourOutgroup.bamlist -out myFasta -doCounts 1 -snp_pval 0.01 -domaf 1 -domajorminor 1 -gl 2 -rmSNPs 1 -minind 4 -setMinDepthInd 10 -explode 1&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''-b fourOutgroup.bamlist''' contains the bam files for four outgroup individuals.&lt;br /&gt;
&lt;br /&gt;
'''-out myFasta''' the output name&lt;br /&gt;
&lt;br /&gt;
'''-doCounts 1''' counts bases accross individuals to determine the concensus allele&lt;br /&gt;
&lt;br /&gt;
'''-snp_pval 0.01''' p-value threshold for defining a SNP. A lower threshold will need more evidence to call a SNPs. &lt;br /&gt;
&lt;br /&gt;
''' -domaf 1''' estimate allele frequency (use to call SNPs) with that the major and minor alleles inferred from  data&lt;br /&gt;
&lt;br /&gt;
'''-domajorminor 1''' infer the major and minor allele from data&lt;br /&gt;
&lt;br /&gt;
'''-gl 2'''  use genotype likelihoods based on the GATK model&lt;br /&gt;
&lt;br /&gt;
''' -rmSNPs 1''' remove polymorphic sites. instead of keeping sites that are polymorphic then we remove them such that all outgroups have the same allele.  &lt;br /&gt;
&lt;br /&gt;
''' -minind 4''' remove site where you don't have data for all four individuals&lt;br /&gt;
 &lt;br /&gt;
'''-setMinDepthInd 10''' require at least 10 read for each individual&lt;br /&gt;
&lt;br /&gt;
''' -explode 1''' make the fasta file for the whole genome not just the chromosomes/scaffolds were you have data&lt;br /&gt;
&lt;br /&gt;
The sites that are polymorphic or do not have enough data will be labeled as 'N' in the fasta file&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Fasta&amp;diff=3171</id>
		<title>Fasta</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Fasta&amp;diff=3171"/>
		<updated>2023-02-06T16:41:43Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* Ancestral fasta */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
This option creates a fasta.gz file from a sequencing data file (BAM file). The function uses genome information in the BAM header to determine the length and chromosome names. For the sites without data an &amp;quot;N&amp;quot; is written. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
 [Single BAM file{bg:orange}]-&amp;gt;[Sequence data|Random base (-doFasta 1);Consensus base (-doFasta 2);Highest EBD (-doFasta 3); write iupac (-doFasta 4)]&lt;br /&gt;
[sequence data]-&amp;gt;doFasta[fasta file{bg:blue}]&lt;br /&gt;
 &amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
 [Multiple BAM files{bg:orange}]-&amp;gt;[Sequence data|Random base (-doFasta 1);Consensus base (-doFasta 2);write iupac (-doFasta 4)]&lt;br /&gt;
[sequence data]-&amp;gt;doFasta[fasta file{bg:blue}]&lt;br /&gt;
 &amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This can be used as input for the ANGSD analysis:&lt;br /&gt;
# [[Error estimation]]&lt;br /&gt;
# [[ABBA-BABA]]&lt;br /&gt;
&lt;br /&gt;
The iupac output code was kindly provided by Kristian Ullrich.&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -dofasta 	-&amp;gt; Tue Sep 26 17:02:07 2017&lt;br /&gt;
--------------&lt;br /&gt;
abcWriteFasta.cpp:&lt;br /&gt;
	-doFasta	0&lt;br /&gt;
	1: use a random (non N) base (needs -doCounts 1)&lt;br /&gt;
	2: use the most common (non N) base (needs -doCounts 1)&lt;br /&gt;
	3: use the base with highest ebd (under development) &lt;br /&gt;
	4: output iupac codes (under development) &lt;br /&gt;
	-basesPerLine	50	(Number of bases perline in output file)&lt;br /&gt;
	-explode	0	 print chromosome where we have no data (0:no,1:yes)&lt;br /&gt;
	-rmTrans	0	 remove transitions as different from -ref bases (0:no,1:yes)&lt;br /&gt;
	-ref	(null)	 reference fasta, only used with -rmTrans 1&lt;br /&gt;
	-seed	0	 use non random seed of value 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This function will dump a fasta file, the full header information from the SAM/BAM file will be used. This means that a fasta will be generated for the entire chromosome even if '-r/-rf -sites' is used.&lt;br /&gt;
&lt;br /&gt;
=Options=&lt;br /&gt;
;-doFasta 1: sample a random base at each position. N's or filtered based are ignored. The &amp;quot;-doCounts 1&amp;quot; options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base. If multiple individuals are used the four bases are counted across individuals. &lt;br /&gt;
&lt;br /&gt;
;-doFasta 2: use the most common base. In the case of ties a random base is chosen among the bases with the same maximum counts. N's or filtered based are ignored. The &amp;quot;-doCounts 1&amp;quot; options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base. If multiple individuals are used the four bases are counted across individuals. &lt;br /&gt;
&lt;br /&gt;
;-doFasta 3: use the base with thie highest effective depth (EBD). This only works for one individual&lt;br /&gt;
&lt;br /&gt;
;-basesPerLine	[INT]&lt;br /&gt;
Number of bases perline in output fasta file (default is 50)&lt;br /&gt;
&lt;br /&gt;
;-explode	[INT]	&lt;br /&gt;
0 (default) only output chromosomes with data. 1: write out all chromosomes &lt;br /&gt;
;-rmTrans [INT]&lt;br /&gt;
0 (default) all sites are used. 1: Remove transition. Here transitions are determined using a fasta file such as a reference genome. &lt;br /&gt;
;-ref [fileName]&lt;br /&gt;
a fasta file used to determine if a site is a transitions (needed when using -rmTrans 1 is used)&lt;br /&gt;
;-seed [INT]&lt;br /&gt;
Use a seed in order to replicate results&lt;br /&gt;
&lt;br /&gt;
For filters see [[Filters]]&lt;br /&gt;
&lt;br /&gt;
=Output=&lt;br /&gt;
Output is a fasta file, a normal looking fast file. Nothing special about this. For -doFasta 1, sometimes its big letters sometime small letters. This is due to the results being copied directly from the sequencing data. So small/big letters correspond to which strand for the original data. For the consensus fasta all letters are capital letters.&lt;br /&gt;
&lt;br /&gt;
==Example==&lt;br /&gt;
Create a fasta file bases from a random samples of bases.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -i bams/smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=EBD=&lt;br /&gt;
For four bases we have 4 different EBD, each EBD is the product of the mapping quality and scores for the base under consideration.&lt;br /&gt;
The EBD is the effective base depth, as defined by [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3638139/]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
EBD_b = \sum_{i=1}^{N_b} (phred(mapq_i)*phred(qscore_i)),\qquad phred(q) =10^{-q/10} \qquad b \in \{A,C,G,T\}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt; is a certain base, &amp;lt;math&amp;gt;N_b&amp;lt;/math&amp;gt; is the number of reads with base &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Ancestral fasta=&lt;br /&gt;
If you have outgroup species map to your reference genome and you want to use them to make a fasta file with ancestral alleles. You can use one or more outgroup individuals e.g. for human you could have a four outgroup bam file from a chimp, a bonobo, a gorrilla and a orangotan. Assuming you want to make a fasta file where the alleles is the same for all outgroup species then you can use a command like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -b fourOutgroup.bamlist -out myFasta -doCounts 1 -snp_pval 0.01 -domaf 1 -domajorminor 1 -gl 2 -rmSNPs 1 -minind 4 -setMinDepthInd 10 -explode 1&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''-b fourOutgroup.bamlist''' contains the bam files for four outgroup individuals.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''-out myFasta''' the output name&lt;br /&gt;
&lt;br /&gt;
'''-doCounts 1''' counts bases accross individuals to determine the concensus allele&lt;br /&gt;
&lt;br /&gt;
'''-snp_pval 0.01''' p-value threshold for defining a SNP. A lower threshold will need more evidence to call a SNPs. &lt;br /&gt;
&lt;br /&gt;
''' -domaf 1''' estimate allele frequency (use to call SNPs) with that the major and minor alleles inferred from  data&lt;br /&gt;
&lt;br /&gt;
'''-domajorminor 1''' infer the major and minor allele from data&lt;br /&gt;
&lt;br /&gt;
'''-gl 2'''  use genotype likelihoods based on the GATK model&lt;br /&gt;
&lt;br /&gt;
''' -rmSNPs 1''' remove polymorphic sites. instead of keeping sites that are polymorphic then we remove them such that all outgroups have the same allele.  &lt;br /&gt;
&lt;br /&gt;
''' -minind 4''' remove site where you don't have data for all four individuals&lt;br /&gt;
 &lt;br /&gt;
'''-setMinDepthInd 10''' require at least 10 read for each individual&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Fasta&amp;diff=3170</id>
		<title>Fasta</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Fasta&amp;diff=3170"/>
		<updated>2023-02-06T16:32:53Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
This option creates a fasta.gz file from a sequencing data file (BAM file). The function uses genome information in the BAM header to determine the length and chromosome names. For the sites without data an &amp;quot;N&amp;quot; is written. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
 [Single BAM file{bg:orange}]-&amp;gt;[Sequence data|Random base (-doFasta 1);Consensus base (-doFasta 2);Highest EBD (-doFasta 3); write iupac (-doFasta 4)]&lt;br /&gt;
[sequence data]-&amp;gt;doFasta[fasta file{bg:blue}]&lt;br /&gt;
 &amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
 [Multiple BAM files{bg:orange}]-&amp;gt;[Sequence data|Random base (-doFasta 1);Consensus base (-doFasta 2);write iupac (-doFasta 4)]&lt;br /&gt;
[sequence data]-&amp;gt;doFasta[fasta file{bg:blue}]&lt;br /&gt;
 &amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This can be used as input for the ANGSD analysis:&lt;br /&gt;
# [[Error estimation]]&lt;br /&gt;
# [[ABBA-BABA]]&lt;br /&gt;
&lt;br /&gt;
The iupac output code was kindly provided by Kristian Ullrich.&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -dofasta 	-&amp;gt; Tue Sep 26 17:02:07 2017&lt;br /&gt;
--------------&lt;br /&gt;
abcWriteFasta.cpp:&lt;br /&gt;
	-doFasta	0&lt;br /&gt;
	1: use a random (non N) base (needs -doCounts 1)&lt;br /&gt;
	2: use the most common (non N) base (needs -doCounts 1)&lt;br /&gt;
	3: use the base with highest ebd (under development) &lt;br /&gt;
	4: output iupac codes (under development) &lt;br /&gt;
	-basesPerLine	50	(Number of bases perline in output file)&lt;br /&gt;
	-explode	0	 print chromosome where we have no data (0:no,1:yes)&lt;br /&gt;
	-rmTrans	0	 remove transitions as different from -ref bases (0:no,1:yes)&lt;br /&gt;
	-ref	(null)	 reference fasta, only used with -rmTrans 1&lt;br /&gt;
	-seed	0	 use non random seed of value 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This function will dump a fasta file, the full header information from the SAM/BAM file will be used. This means that a fasta will be generated for the entire chromosome even if '-r/-rf -sites' is used.&lt;br /&gt;
&lt;br /&gt;
=Options=&lt;br /&gt;
;-doFasta 1: sample a random base at each position. N's or filtered based are ignored. The &amp;quot;-doCounts 1&amp;quot; options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base. If multiple individuals are used the four bases are counted across individuals. &lt;br /&gt;
&lt;br /&gt;
;-doFasta 2: use the most common base. In the case of ties a random base is chosen among the bases with the same maximum counts. N's or filtered based are ignored. The &amp;quot;-doCounts 1&amp;quot; options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base. If multiple individuals are used the four bases are counted across individuals. &lt;br /&gt;
&lt;br /&gt;
;-doFasta 3: use the base with thie highest effective depth (EBD). This only works for one individual&lt;br /&gt;
&lt;br /&gt;
;-basesPerLine	[INT]&lt;br /&gt;
Number of bases perline in output fasta file (default is 50)&lt;br /&gt;
&lt;br /&gt;
;-explode	[INT]	&lt;br /&gt;
0 (default) only output chromosomes with data. 1: write out all chromosomes &lt;br /&gt;
;-rmTrans [INT]&lt;br /&gt;
0 (default) all sites are used. 1: Remove transition. Here transitions are determined using a fasta file such as a reference genome. &lt;br /&gt;
;-ref [fileName]&lt;br /&gt;
a fasta file used to determine if a site is a transitions (needed when using -rmTrans 1 is used)&lt;br /&gt;
;-seed [INT]&lt;br /&gt;
Use a seed in order to replicate results&lt;br /&gt;
&lt;br /&gt;
For filters see [[Filters]]&lt;br /&gt;
&lt;br /&gt;
=Output=&lt;br /&gt;
Output is a fasta file, a normal looking fast file. Nothing special about this. For -doFasta 1, sometimes its big letters sometime small letters. This is due to the results being copied directly from the sequencing data. So small/big letters correspond to which strand for the original data. For the consensus fasta all letters are capital letters.&lt;br /&gt;
&lt;br /&gt;
==Example==&lt;br /&gt;
Create a fasta file bases from a random samples of bases.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -i bams/smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=EBD=&lt;br /&gt;
For four bases we have 4 different EBD, each EBD is the product of the mapping quality and scores for the base under consideration.&lt;br /&gt;
The EBD is the effective base depth, as defined by [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3638139/]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
EBD_b = \sum_{i=1}^{N_b} (phred(mapq_i)*phred(qscore_i)),\qquad phred(q) =10^{-q/10} \qquad b \in \{A,C,G,T\}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt; is a certain base, &amp;lt;math&amp;gt;N_b&amp;lt;/math&amp;gt; is the number of reads with base &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Ancestral fasta=&lt;br /&gt;
If you have outgroup species map to your reference genome and you want to use them to make a fasta file with ancestral alleles. You can use one or more outgroup individuals e.g. for human you could have a four outgroup bam file from a chimp, a bonobo, a gorrilla and a orangotan. Assuming you want to make a fasta file where the alleles is the same for all outgroup species then you can use a command like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -b fourOutgroup.bamlist -out myFasta -doCounts 1 -snp_pval 0.01 -domaf 1 -domajorminor 1 -gl 2 -rmSNPs 1 -minind 4 -setMinDepthInd 10 -explode 1&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
here fourOutgroup.bamlist contains the bam files for four outgroup individuals.&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Allele_Frequencies&amp;diff=3169</id>
		<title>Allele Frequencies</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Allele_Frequencies&amp;diff=3169"/>
		<updated>2023-02-06T16:22:40Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* Brief Overview */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div class=&amp;quot;keywords&amp;quot;&amp;gt; -domaf,-domaf,-domaf,-domaf,-domaf, domaf, domaf, domaf, domaf, domaf, domaf, dopost, SNP_pval &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The allele frequency is the relative frequency of an allele for a site. This can be polarized according to the major/minor, reference/non-refernce or ancestral/derived. .Therefore the choice of allele frequency estimator is closely related to choosing which alleles are segregating (see [[Inferring_Major_and_Minor_alleles]]). &lt;br /&gt;
&lt;br /&gt;
We allow for frequency estimation from different input data:&lt;br /&gt;
&lt;br /&gt;
# Genotype Likelihoods&lt;br /&gt;
# Genotype posterior probabilities&lt;br /&gt;
# Counts of bases&lt;br /&gt;
&lt;br /&gt;
The allele frequency estimator from genotype likelihoods are from this  [[suYeon | publication]], and the base counts method is from this [[Li2010 |publication]]. &lt;br /&gt;
&lt;br /&gt;
For the case of the genotype likelihood based methods we allow for deviations from Hardy-Weinberg, namely we allow for users to supply a file containing inbreeding coefficients for each individual.&lt;br /&gt;
&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 ./angsd -doMaf&lt;br /&gt;
abcFreq.cpp:&lt;br /&gt;
-doMaf	0 (Calculate persite frequencies '.mafs.gz')&lt;br /&gt;
	1: Frequency (fixed major and minor)&lt;br /&gt;
	2: Frequency (fixed major unknown minor)&lt;br /&gt;
	4: Frequency from genotype probabilities&lt;br /&gt;
	8: AlleleCounts based method (known major minor)&lt;br /&gt;
	NB. Filedumping is supressed if value is negative&lt;br /&gt;
-doPost	0	(Calculate posterior prob 3xgprob)&lt;br /&gt;
	1: Using frequency as prior&lt;br /&gt;
	2: Using uniform prior&lt;br /&gt;
	3: Using SFS as prior (still in development)&lt;br /&gt;
	4: Using reference panel as prior (still in development), requires a site file with chr pos major minor af ac an&lt;br /&gt;
Filters:&lt;br /&gt;
	-minMaf  	-1.000000	(Remove sites with MAF below)&lt;br /&gt;
	-SNP_pval	0.317311	(Remove sites with a pvalue larger)&lt;br /&gt;
	-rmSNPs 	0	(Remove infered SNPs instead of keeping them (pval &amp;gt; SNP_pval)&lt;br /&gt;
	-rmTriallelic	0.000000	(Remove sites with a pvalue lower)&lt;br /&gt;
	-forceMaf	0	(Write .mafs file when running -doAsso (by default does not output .mafs file with -doAsso))&lt;br /&gt;
	-skipMissing	1	(Set post to 0.33 if missing (do not use freq as prior))&lt;br /&gt;
Extras:&lt;br /&gt;
	-ref	(null)	(Filename for fasta reference)&lt;br /&gt;
	-anc	(null)	(Filename for fasta ancestral)&lt;br /&gt;
	-eps	0.001000 [Only used for -doMaf &amp;amp;8]&lt;br /&gt;
	-beagleProb	0 (Dump beagle style postprobs)&lt;br /&gt;
	-indFname	(null) (file containing individual inbreedcoeficients)&lt;br /&gt;
	-underFlowProtect	0 (file containing individual inbreedcoeficients)&lt;br /&gt;
NB These frequency estimators requires major/minor -doMajorMinor&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Allele Frequency estimation=&lt;br /&gt;
The major and minor allele is first inferred from the data or given by the user (see [[Inferring_Major_and_Minor_alleles]]). This includes information from both major and minor allele, a reference genome (for major) or an ancestral genome. &lt;br /&gt;
&lt;br /&gt;
; -doMaf [int]&lt;br /&gt;
&lt;br /&gt;
1:  Known major, and Known minor. Here both the major and minor allele is assumed to be known (inferred or given by user). The allele frequency is the obtained using based on the genotype likelihoods. The allele frequency estimator from genotype likelihoods are from this [[suYeon | publication]]  but using the EM algorithm and is briefly described [[SYKmaf|here]]. &lt;br /&gt;
&lt;br /&gt;
2:  Known major, Unknown minor. Here the major allele is assumed to be known  (inferred or given by user) however the minor allele is not determined. Instead we sum over the 3 possible minor alleles weighted by their probabilities. The allele frequency estimator from genotype likelihoods are from this [[suYeon | publication]] but using the EM algorithm and is briefly described [[SYKmaf|here]]. &lt;br /&gt;
. &lt;br /&gt;
&lt;br /&gt;
4: frequency based on genotype posterior probabilities. If genotype probabilities are used as input to ANGSD the allele frequency is estimated directly on these by [[postFreq|summing over the probabitlies]]. &lt;br /&gt;
&lt;br /&gt;
8: frequency based on base counts. This method does not rely on genotype likelihood or probabilities but instead infers the allele frequency directly on the base counts. The base counts method is from this [[Li2010 |publication]]. &lt;br /&gt;
&lt;br /&gt;
Multiple estimators can be used simultaniusly be summing up the above numbers. Thus -doMaf 7 (1+2+4) will use the first three estimators. If the allele frequencies are estimated from the genotype likelihoods then you need to infer the major and minor allele (-doMajorMinor)&lt;br /&gt;
&lt;br /&gt;
;NB using -doMaf 4 is only supported if the posteriors are supplied as external files. Since the estimation of genotype posteriors in itself requires a maf estimator.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
&lt;br /&gt;
==From genotype likelihood==&lt;br /&gt;
Example for estimating the allele frequencies both while assuming known major and minor allele but also while taking the uncertaincy of the minor allele inference into account. The [[Inferring_Major_and_Minor_alleles|inference of the major and minor]] allele is done directly from the genotype likelihood&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -out out -doMajorMinor 1 -doMaf 3 -bam bam.filelist -GL 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==From genotype probabilities==&lt;br /&gt;
Example of the use of a genotype probability file for example from the output from beagle. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -out out -doMaf 4 -beagle beagle.file.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Estimator from base counts==&lt;br /&gt;
&lt;br /&gt;
The allele frequencies can be infered directy from the sequencing data [[Li2010|citation]].&lt;br /&gt;
This works by using &amp;quot;counts&amp;quot; of alleles, and should be invoked like&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -out out -doMajorMinor 2 -doMaf 8 -bam bam.filelist -doCounts 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Output data=&lt;br /&gt;
==.mafs.gz==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chromo	position	major	minor	ref	knownEM	unknownEM	nInd&lt;br /&gt;
21      9719788 T       A       0.000001        -0.000012       3&lt;br /&gt;
21      9719789 G       A       0.000000        -0.000001       3&lt;br /&gt;
21      9719790 A       C       0.000000        -0.000004       3&lt;br /&gt;
21      9719791 G       A       0.000000        -0.000001       3&lt;br /&gt;
21      9719792 G       A       0.000000        -0.000002       3&lt;br /&gt;
21      9719793 G       T       0.498277        41.932766       3&lt;br /&gt;
21      9719794 T       A       0.000000        -0.000001       3&lt;br /&gt;
21      9719795 T       A       0.000000        -0.000001       3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
;chromo &lt;br /&gt;
chromosome name&lt;br /&gt;
;position&lt;br /&gt;
position&lt;br /&gt;
;major &lt;br /&gt;
major allele&lt;br /&gt;
;minor &lt;br /&gt;
minor allele&lt;br /&gt;
;knownEM &lt;br /&gt;
frequency using -doMaf 1&lt;br /&gt;
;unknownEM &lt;br /&gt;
frequency using -doMaf 2&lt;br /&gt;
;phat &lt;br /&gt;
frequency using -doMaf 8&lt;br /&gt;
;nInd &lt;br /&gt;
is the number of individuals with data&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Allele_Frequencies&amp;diff=3152</id>
		<title>Allele Frequencies</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Allele_Frequencies&amp;diff=3152"/>
		<updated>2021-09-28T09:00:16Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div class=&amp;quot;keywords&amp;quot;&amp;gt; -domaf,-domaf,-domaf,-domaf,-domaf, domaf, domaf, domaf, domaf, domaf, domaf, dopost, SNP_pval &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The allele frequency is the relative frequency of an allele for a site. This can be polarized according to the major/minor, reference/non-refernce or ancestral/derived. .Therefore the choice of allele frequency estimator is closely related to choosing which alleles are segregating (see [[Inferring_Major_and_Minor_alleles]]). &lt;br /&gt;
&lt;br /&gt;
We allow for frequency estimation from different input data:&lt;br /&gt;
&lt;br /&gt;
# Genotype Likelihoods&lt;br /&gt;
# Genotype posterior probabilities&lt;br /&gt;
# Counts of bases&lt;br /&gt;
&lt;br /&gt;
The allele frequency estimator from genotype likelihoods are from this  [[suYeon | publication]], and the base counts method is from this [[Li2010 |publication]]. &lt;br /&gt;
&lt;br /&gt;
For the case of the genotype likelihood based methods we allow for deviations from Hardy-Weinberg, namely we allow for users to supply a file containing inbreeding coefficients for each individual.&lt;br /&gt;
&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 ./angsd -doMaf&lt;br /&gt;
        -&amp;gt; angsd version: 0.910-76-gad32889 (htslib: 1.3-32-gecdc348) build(Mar  2 2016 12:38:33)&lt;br /&gt;
        -&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
        -&amp;gt; Command: &lt;br /&gt;
./angsd -doMaf  -&amp;gt; Wed Mar  2 12:45:40 2016&lt;br /&gt;
------------------------&lt;br /&gt;
abcFreq.cpp:&lt;br /&gt;
-doMaf  0 (Calculate persite frequencies '.mafs.gz')&lt;br /&gt;
        1: Frequency (fixed major and minor)&lt;br /&gt;
        2: Frequency (fixed major unknown minor)&lt;br /&gt;
        4: Frequency from genotype probabilities&lt;br /&gt;
        8: AlleleCounts based method (known major minor)&lt;br /&gt;
        NB. Filedumping is supressed if value is negative&lt;br /&gt;
-doPost 0       (Calculate posterior prob 3xgprob)&lt;br /&gt;
        1: Using frequency as prior&lt;br /&gt;
        2: Using uniform prior&lt;br /&gt;
        3: Using SFS as prior (still in development)&lt;br /&gt;
Filters:&lt;br /&gt;
        -minMaf         -1.000000       (Remove sites with MAF below)&lt;br /&gt;
        -SNP_pval       1.000000        (Remove sites with a pvalue larger)&lt;br /&gt;
        -rmTriallelic   0.000000        (Remove sites with a pvalue lower)&lt;br /&gt;
Extras:&lt;br /&gt;
        -ref    (null)  (Filename for fasta reference)&lt;br /&gt;
        -anc    (null)  (Filename for fasta ancestral)&lt;br /&gt;
        -eps    0.001000 [Only used for -doMaf &amp;amp;8]&lt;br /&gt;
        -beagleProb     0 (Dump beagle style postprobs)&lt;br /&gt;
        -indFname       (null) (file containing individual inbreedcoeficients)&lt;br /&gt;
NB These frequency estimators requires major/minor -doMajorMinor&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Allele Frequency estimation=&lt;br /&gt;
The major and minor allele is first inferred from the data or given by the user (see [[Inferring_Major_and_Minor_alleles]]). This includes information from both major and minor allele, a reference genome (for major) or an ancestral genome. &lt;br /&gt;
&lt;br /&gt;
; -doMaf [int]&lt;br /&gt;
&lt;br /&gt;
1:  Known major, and Known minor. Here both the major and minor allele is assumed to be known (inferred or given by user). The allele frequency is the obtained using based on the genotype likelihoods. The allele frequency estimator from genotype likelihoods are from this [[suYeon | publication]]  but using the EM algorithm and is briefly described [[SYKmaf|here]]. &lt;br /&gt;
&lt;br /&gt;
2:  Known major, Unknown minor. Here the major allele is assumed to be known  (inferred or given by user) however the minor allele is not determined. Instead we sum over the 3 possible minor alleles weighted by their probabilities. The allele frequency estimator from genotype likelihoods are from this [[suYeon | publication]] but using the EM algorithm and is briefly described [[SYKmaf|here]]. &lt;br /&gt;
. &lt;br /&gt;
&lt;br /&gt;
4: frequency based on genotype posterior probabilities. If genotype probabilities are used as input to ANGSD the allele frequency is estimated directly on these by [[postFreq|summing over the probabitlies]]. &lt;br /&gt;
&lt;br /&gt;
8: frequency based on base counts. This method does not rely on genotype likelihood or probabilities but instead infers the allele frequency directly on the base counts. The base counts method is from this [[Li2010 |publication]]. &lt;br /&gt;
&lt;br /&gt;
Multiple estimators can be used simultaniusly be summing up the above numbers. Thus -doMaf 7 (1+2+4) will use the first three estimators. If the allele frequencies are estimated from the genotype likelihoods then you need to infer the major and minor allele (-doMajorMinor)&lt;br /&gt;
&lt;br /&gt;
;NB using -doMaf 4 is only supported if the posteriors are supplied as external files. Since the estimation of genotype posteriors in itself requires a maf estimator.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
&lt;br /&gt;
==From genotype likelihood==&lt;br /&gt;
Example for estimating the allele frequencies both while assuming known major and minor allele but also while taking the uncertaincy of the minor allele inference into account. The [[Inferring_Major_and_Minor_alleles|inference of the major and minor]] allele is done directly from the genotype likelihood&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -out out -doMajorMinor 1 -doMaf 3 -bam bam.filelist -GL 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==From genotype probabilities==&lt;br /&gt;
Example of the use of a genotype probability file for example from the output from beagle. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -out out -doMaf 4 -beagle beagle.file.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Estimator from base counts==&lt;br /&gt;
&lt;br /&gt;
The allele frequencies can be infered directy from the sequencing data [[Li2010|citation]].&lt;br /&gt;
This works by using &amp;quot;counts&amp;quot; of alleles, and should be invoked like&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -out out -doMajorMinor 2 -doMaf 8 -bam bam.filelist -doCounts 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Output data=&lt;br /&gt;
==.mafs.gz==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chromo	position	major	minor	ref	knownEM	unknownEM	nInd&lt;br /&gt;
21      9719788 T       A       0.000001        -0.000012       3&lt;br /&gt;
21      9719789 G       A       0.000000        -0.000001       3&lt;br /&gt;
21      9719790 A       C       0.000000        -0.000004       3&lt;br /&gt;
21      9719791 G       A       0.000000        -0.000001       3&lt;br /&gt;
21      9719792 G       A       0.000000        -0.000002       3&lt;br /&gt;
21      9719793 G       T       0.498277        41.932766       3&lt;br /&gt;
21      9719794 T       A       0.000000        -0.000001       3&lt;br /&gt;
21      9719795 T       A       0.000000        -0.000001       3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
;chromo &lt;br /&gt;
chromosome name&lt;br /&gt;
;position&lt;br /&gt;
position&lt;br /&gt;
;major &lt;br /&gt;
major allele&lt;br /&gt;
;minor &lt;br /&gt;
minor allele&lt;br /&gt;
;knownEM &lt;br /&gt;
frequency using -doMaf 1&lt;br /&gt;
;unknownEM &lt;br /&gt;
frequency using -doMaf 2&lt;br /&gt;
;phat &lt;br /&gt;
frequency using -doMaf 8&lt;br /&gt;
;nInd &lt;br /&gt;
is the number of individuals with data&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=SNP_calling&amp;diff=3151</id>
		<title>SNP calling</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=SNP_calling&amp;diff=3151"/>
		<updated>2021-09-28T08:59:02Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=SNP Calling=&lt;br /&gt;
&lt;br /&gt;
==Likelihood ratio test==&lt;br /&gt;
SNPs are called based on their allele frequencies. If a site has a minor allele frequency significantly different from 0 a site is called as polymorphic. The MAF estimate(s) given by -doMaf (see [[Allele_Frequency_estimation]]), will be used for a like ratio test by using a chi-square distribution with one degree of freedom for -doMaf 1 and -doMaf 2.&lt;br /&gt;
&lt;br /&gt;
===options===&lt;br /&gt;
; -SNP_pval [float]&lt;br /&gt;
The p-value used for calling snaps.&lt;br /&gt;
see [[Allele_Frequency_estimation]] for additional options&lt;br /&gt;
&lt;br /&gt;
===example===&lt;br /&gt;
In this example we analyse data from bam files (-bam bam.files), calculate the genotype likelihood using the GATK method (-GL 2), infer the major and minor alleles (-doMajorMinor 1), estimate the allele frequencies assuming known minor (-doMAF 2) and only keep those sites that have a p-value less than 1e-6 of for being variable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam bam.filelist -GL 2 -out outfile -doMaf 2 -SNP_pval 1e-6 -doMajorMinor 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===output===&lt;br /&gt;
the results are given in the file outfile.mafs.gz:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chromo  position        major   minor   unknownEM       pu-EM   nInd&lt;br /&gt;
1       14000873        G       A       0.282476        0.000000e+00    10&lt;br /&gt;
1       14001018        T       C       0.259890        7.494005e-14    9&lt;br /&gt;
1       14001867        A       G       0.272099        6.361578e-14    10&lt;br /&gt;
1       14002422        A       T       0.377890        0.000000e+00    9&lt;br /&gt;
1       14003581        C       T       0.194393        5.551115e-16    9&lt;br /&gt;
1       14004623        T       C       0.259172        2.424727e-13    10&lt;br /&gt;
1       14007493        A       G       0.297176        5.114086e-07    9&lt;br /&gt;
1       14007558        C       T       0.381770        0.000000e+00    8&lt;br /&gt;
1       14007649        G       A       0.220547        1.054967e-11    9&lt;br /&gt;
1       14008734        T       A       0.242852        0.000000e+00    10&lt;br /&gt;
1       14009723        G       C       0.255063        2.470836e-07    10&lt;br /&gt;
1       14010597        G       A       0.315430        0.000000e+00    10&lt;br /&gt;
1       14010851        C       A       0.276936        0.000000e+00    10&lt;br /&gt;
1       14012240        C       T       0.297956        0.000000e+00    10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The columns are the chromosome, the position, the major allele, the minor allele, the minor allele estimate, the allele frequency, the p-value and the number of individuals with information.&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Genotype_likelihoods_from_alignments_new&amp;diff=3150</id>
		<title>Genotype likelihoods from alignments new</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Genotype_likelihoods_from_alignments_new&amp;diff=3150"/>
		<updated>2021-09-28T08:57:20Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Genotype likelihoods are the likelihood of the data given the genotype. In angsd we have implemented four different genotype likelihood models.&lt;br /&gt;
#SAMtools&lt;br /&gt;
#GATK (Simplefied)&lt;br /&gt;
#SOAPsnp&lt;br /&gt;
#Su Yeon Kim&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
calcGL.cpp:&lt;br /&gt;
	-calcGL=0: &lt;br /&gt;
	1: SAMtools&lt;br /&gt;
	2: GATK&lt;br /&gt;
	3: SOAPsnp&lt;br /&gt;
	4: SYK&lt;br /&gt;
	-minQ		13		(remove bases with qscore&amp;lt;minQ)&lt;br /&gt;
	-trim		0		(zero means no trimming)&lt;br /&gt;
	-tmpdir		angsd_tmpdir/	(used by SOAPsnp)&lt;br /&gt;
	-errors		(null)		(used by SYK)&lt;br /&gt;
	-minInd		-1		(-1 indicates no filtering)&lt;br /&gt;
&lt;br /&gt;
Filedumping:&lt;br /&gt;
	-writeGL	0&lt;br /&gt;
	1: binary glf (10 log likes)	.glf&lt;br /&gt;
	2: beagle likelihood file	.beagle.gz&lt;br /&gt;
	3: binary 3 times likelihood	.glf&lt;br /&gt;
	4: text version (10 log likes)	.glf&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Genotype likelihoods from alignments=&lt;br /&gt;
&amp;lt;classdiagram&amp;gt;&lt;br /&gt;
// [input|bam files;SOAP files{bg:orange}]-&amp;gt;[sequence data]&lt;br /&gt;
 [sequence data]-&amp;gt;[genotype likelihoods|SAMtools;GATK;SOAPsnp;Kim et.al]&lt;br /&gt;
 &amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
; -GL [int]&lt;br /&gt;
If your input is sequencing file you can estimate genotype likelhoods from the mapped reads. Four different methods are available. &lt;br /&gt;
==Samtools==&lt;br /&gt;
-GL 1&lt;br /&gt;
&lt;br /&gt;
This methods has a random component. In same tools there is a stocastic component so to get the exact same results as samtools use nThreads=1. However, the method is still the same with multiple threads but some sites will have small differences compared to the samtools output bacause of the stocastic component.&lt;br /&gt;
&lt;br /&gt;
===options===&lt;br /&gt;
; -minQ [int]&lt;br /&gt;
default 13. The minimum allowed base quality score.&lt;br /&gt;
&lt;br /&gt;
===example===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam bam.filelist -GL 1 -out outfile&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==GATK==&lt;br /&gt;
-GL 2&lt;br /&gt;
===options===&lt;br /&gt;
; -minQ [int]&lt;br /&gt;
default 13. The minimum allowed base quality score.&lt;br /&gt;
&lt;br /&gt;
===example===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam bam.filelist -GL 2 -out outfile&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==soapSNP==&lt;br /&gt;
-GL 3&lt;br /&gt;
When estimating GL with soapSNP we need to generate a calibration matrix. This is done automaticly if these doesn't exist. These are located in angsd_tmpdir/basenameNUM.count,angsd_tmpdir/basenameNUM.qual&lt;br /&gt;
&lt;br /&gt;
===options===&lt;br /&gt;
; -minQ [int]&lt;br /&gt;
default 13. The minimum allowed base quality score. &lt;br /&gt;
; -tmpdir [int]&lt;br /&gt;
default angsd_tmpdir; The directory of the recalibration matrix.&lt;br /&gt;
&lt;br /&gt;
===example===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam bam.filelist -GL 3 -out outfile -ref hg19.fa &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This first loop doesn't estimate anything else than the calibration matrix.&lt;br /&gt;
So now we can do the analysis we want&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam bam.filelist -GL 3 -out outfile -ref hg19.fa -doGlf 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
NB internally the max readlength is not allowed to exceed 256.&lt;br /&gt;
&lt;br /&gt;
==Kim et al.==&lt;br /&gt;
-GL 4&lt;br /&gt;
[[Kim10|Citation]] [[Kim11|Citation]]&lt;br /&gt;
&lt;br /&gt;
===options===&lt;br /&gt;
; -error [filename]&lt;br /&gt;
A file with the estimated type specific error rates (see [[Error_estimation]]).&lt;br /&gt;
&lt;br /&gt;
===example===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam bam.filelist -GL 4 -out outfile -error error.file &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=output genotype likelihoods=&lt;br /&gt;
; -doGlf [int]&lt;br /&gt;
Output the log genotype likelihoods to a file&lt;br /&gt;
&lt;br /&gt;
;0. don't dump anything (default)&lt;br /&gt;
&lt;br /&gt;
;1.  binary all 10 llh&lt;br /&gt;
&lt;br /&gt;
;2. beagle text&lt;br /&gt;
&lt;br /&gt;
;3. beagle binary&lt;br /&gt;
&lt;br /&gt;
;4. textoutput of all 10 llhs.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==binary==&lt;br /&gt;
Glf file in binary doubles. All 10 genotype likelihoods are printed to a file. For each printed site there are 10*N doubles where N is the number of individuals. The order of the 10 genotypes are alphabetical AA AC AG AT CC CG CT GG GT TT. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Beagle format==&lt;br /&gt;
Beagle haplotype imputation and be performed directly on genotype likelhoods. To generate beagle input file use&lt;br /&gt;
&lt;br /&gt;
; -doGlf 2&lt;br /&gt;
&lt;br /&gt;
In order to make this file the major and minor allele has the be inferred (-doMajorMinor). It is also a good idea to only use the polymorphic sites.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Example===&lt;br /&gt;
In this example our input files are bam files. We use the samtools genotype likelihood methods. We use 10 threads. We infer the major and minor allele from the likelihoods and estimate the allele frequencies. We test for polymorphic sites and only outbut the ones with are likelhood ratio test statistic of minimum 24 (ca. p-value&amp;lt;1e-6). &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genolike -nThreads 10 -doGlf 2 -doMajorMinor 1 -SNP_pval 1e-6 -doMaf 2 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===output===&lt;br /&gt;
The above command generates the file genolike.beagle.gz that can be use as input for the beagle software&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker  allele1 allele2 Ind0    Ind0    Ind0    Ind1    Ind1    Ind1    Ind2    Ind2    Ind2    Ind3    Ind3    Ind3 &lt;br /&gt;
1_14000023      1       0       0.941177        0.058822        0.000001        0.799685        0.199918        0.000397        0.666316        0.333155        0.000529 &lt;br /&gt;
1_14000072      2       3       0.709983        0.177493        0.112525        0.941178        0.058822        0.000000        0.665554        0.332774        0.001672&lt;br /&gt;
1_14000113      0       2       0.855993        0.106996        0.037010        0.333333        0.333333        0.333333        0.799971        0.199989        0.000040 &lt;br /&gt;
1_14000202      2       0       0.835380        0.104420        0.060201        0.799685        0.199918        0.000397        0.333333        0.333333        0.333333&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that the above values sum to one per sites for each individuals. This is just a normalization of the genotype likelihoods in order to avoid underflow problems in the beagle software it does not mean that they are genotype probabilities.&lt;br /&gt;
&lt;br /&gt;
==simple text format==&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Input&amp;diff=3149</id>
		<title>Input</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Input&amp;diff=3149"/>
		<updated>2021-09-28T08:57:07Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;ANGSD currently supports various input formats&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
[sequence data|BAM;CRAM;mpileup{bg:orange}]-[genotype;likelihoods|VCF;GLF;beagle{bg:orange}]&lt;br /&gt;
[genotype;likelihoods|VCF;GLF;beagle{bg:orange}]-[genotype;probability|beagle{bg:orange}]&lt;br /&gt;
&amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Below is a short description of those we believe is of most use. Note that CRAM files are used interchangeably as BAM files. So use -bam for supplying both a CRAM list or BAM list or both.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Sequence data (BAM/CRAM/mpileup)=&lt;br /&gt;
==BAM/CRAM==&lt;br /&gt;
&lt;br /&gt;
ANGSD accepts BAM/CRAM files for mapped sequences and both are handled using the same -bam option. For information on the file specification and file creation see the samtools website [http://samtools.sourceforge.net/]. These are required do be sorted according to reference. To see the options for BAM/CRAM use the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
./angsd -bam&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
	-&amp;gt; angsd version: 0.910-14-g5e2711f (htslib: 1.2.1-252-ga2656aa) build(Dec  4 2015 10:40:24)&lt;br /&gt;
	-&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
	-&amp;gt; Command: &lt;br /&gt;
./angsd -bam &lt;br /&gt;
	-&amp;gt; angsd version: 0.910-14-g5e2711f (htslib: 1.2.1-252-ga2656aa) build(Dec  4 2015 10:40:28)&lt;br /&gt;
	-&amp;gt; Fri Dec  4 10:43:27 2015&lt;br /&gt;
---------------&lt;br /&gt;
parseArgs_bambi.cpp: bam reader:&lt;br /&gt;
	-r		(null)	Supply a single region in commandline (see examples below)&lt;br /&gt;
	-rf		(null)	Supply multiple regions in a file (see examples below)&lt;br /&gt;
	-remove_bads	1	Discard 'bad' reads, (flag &amp;gt;=256) &lt;br /&gt;
	-uniqueOnly	0	Discards reads that doesn't map uniquely&lt;br /&gt;
	-show		0	Mimic 'samtools mpileup' also supply -ref fasta for printing reference column&lt;br /&gt;
	-minMapQ	0	Discard reads with mapping quality below&lt;br /&gt;
	-minQ		13	Discard bases with base quality below&lt;br /&gt;
	-trim		0	Number of based to discard at both ends of the reads&lt;br /&gt;
	-only_proper_pairs	1	Only use reads where the mate could be mapped&lt;br /&gt;
	-C		0	adjust mapQ for excessive mismatches (as SAMtools), supply -ref&lt;br /&gt;
	-baq		0	adjust qscores around indels (as SAMtools), supply -ref&lt;br /&gt;
	-if		2	include flags for each read&lt;br /&gt;
	-df		4	discard flags for each read&lt;br /&gt;
	-checkBamHeaders	1	Exit if difference in BAM headers&lt;br /&gt;
	-doCheck	1	Keep going even if datafile is not suffixed with .bam/.cram&lt;br /&gt;
	-downSample	0.000000	Downsample to the fraction of original data&lt;br /&gt;
	-minChunkSize	250	Minimum size of chunk sent to analyses&lt;br /&gt;
&lt;br /&gt;
Examples for region specification:&lt;br /&gt;
		chr:		Use entire chromosome: chr&lt;br /&gt;
		chr:start-	Use region from start to end of chr&lt;br /&gt;
		chr:-stop	Use region from beginning of chromosome: chr to stop&lt;br /&gt;
		chr:start-stop	Use region from start to stop from chromosome: chr&lt;br /&gt;
		chr:site	Use single site on chromosome: chr&lt;br /&gt;
Will include read if:&lt;br /&gt;
	includeflag:[2] (beta)each segment properly aligned according to the aligner, &lt;br /&gt;
Will discard read if:&lt;br /&gt;
	discardflag:[4] (beta)segment unmapped, &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Example===&lt;br /&gt;
Example of estimating allele frequencies from bam files&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -out out -doMaf 2 -bam bam.filelist -doMajorMinor 1 -GL 1 -P 5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Arguments===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;-bam [filelist]&lt;br /&gt;
;-b [filelist]&lt;br /&gt;
&lt;br /&gt;
The filelist is a file containing the full path for each bam file with one filename per row. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
filelist with 6 individuals&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
/home/software/angsd/test/smallBam/smallNA12763.bam&lt;br /&gt;
/home/software/angsd/test/smallBam/smallNA11830.bam&lt;br /&gt;
/home/software/angsd/test/smallBam/smallNA12004.bam&lt;br /&gt;
/home/software/angsd/test/smallBam/smallNA06985.bam&lt;br /&gt;
/home/software/angsd/test/smallBam/smallNA11993.bam&lt;br /&gt;
/home/software/angsd/test/smallBam/smallNA12761.bam&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
;-r [region]&lt;br /&gt;
Specify a region with in a chromosome using the syntax [chr]:[start-stop]. examples&lt;br /&gt;
 chr1:1-10000             \\ first 10000 based for chr1&lt;br /&gt;
 chr2:50000-              \\chr2 but exclude the first 50000 bases&lt;br /&gt;
 chr11:1-                 \\all of chr11&lt;br /&gt;
 chr11:                   \\all of chr11&lt;br /&gt;
 chr7:123456              \\position 123456 of chr7&lt;br /&gt;
;-rf [region file] &lt;br /&gt;
Specify multiple regions in a file using the same syntax as -r&lt;br /&gt;
;-remove_bads [int]=1&lt;br /&gt;
Same as  the samtools flags -x which removes read with a flag above 255 (not primary, failure and duplicate reads).  0 no , 1 remove (default).&lt;br /&gt;
;-uniqueOnly [int]=0&lt;br /&gt;
Remove reads that have multiple best hits. 0 no (default), 1 remove.&lt;br /&gt;
;-minMapQ [int]=0&lt;br /&gt;
Minimum mapQ quality.&lt;br /&gt;
;-trim [int]=0&lt;br /&gt;
Number of bases to remove from both ends of the read. &lt;br /&gt;
;-only_proper_pairs [int]=1&lt;br /&gt;
Include only proper pairs (pairs of read with both mates mapped correctly).  1: include only proper (default), 0: use all reads. Only relevant for paired end data.&lt;br /&gt;
;-C [int] =0&lt;br /&gt;
Adjust mapQ for excessive mismatches (as SAMtools), supply -ref.&lt;br /&gt;
;-baq [int]=0&lt;br /&gt;
Perform BAQ computation, remember to cite the[http://bioinformatics.oxfordjournals.org/content/early/2011/02/13/bioinformatics.btr076 | BAQ paper] for this.&lt;br /&gt;
0: No BAQ calcualtion&lt;br /&gt;
&lt;br /&gt;
1:normal BAQ (same as default in SAMtools).&lt;br /&gt;
2:extended BAQ (same as default in SAMtools).&lt;br /&gt;
&lt;br /&gt;
;-redo-baq=0&lt;br /&gt;
if zero then it will use the existing record&lt;br /&gt;
&lt;br /&gt;
You will need to supply your reference (-ref) for BAQ options.&lt;br /&gt;
;-checkBamHeaders [int]=1&lt;br /&gt;
Exits if the headers are not compatible for all files. 0 no , 1 remove (default). Not performing this check is not advisable&lt;br /&gt;
;-downSample [float]=0&lt;br /&gt;
Randomly remove reads to downsample your data. 0.25 will on average keep 25% of the reads&lt;br /&gt;
;-setMinChunkSize [int]=250&lt;br /&gt;
Minimum number of sites to read in before starting to analyze - larger number will use more RAM&lt;br /&gt;
&lt;br /&gt;
==Pileup files==&lt;br /&gt;
Pileup files are the output files that are generated by SAMtools mpileup.&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
../angsd/angsd -pileup&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
	-&amp;gt; angsd version: 0.910-20-g553b991 (htslib: 1.2.1-192-ge7e2b3d) build(Dec  4 2015 12:17:14)&lt;br /&gt;
	-&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
	-&amp;gt; Command: &lt;br /&gt;
../angsd/angsd -pileup 	-&amp;gt; Fri Dec  4 12:17:53 2015&lt;br /&gt;
----------------&lt;br /&gt;
multiReader.cpp:&lt;br /&gt;
	-nLines	50	(Number of lines to read)&lt;br /&gt;
	-bpl	33554432 (bytesPerLine)&lt;br /&gt;
	-beagle	(null)	(Beagle Filename (can be .gz))&lt;br /&gt;
	-vcf-GL	(null)	(vcf Filename (can be .gz))&lt;br /&gt;
	-vcf-GP	(null)	(vcf Filename (can be .gz))&lt;br /&gt;
	-glf	(null)	(glf Filename (can be .gz))&lt;br /&gt;
	-pileup	(null)	(pileup Filename (can be .gz))&lt;br /&gt;
	-intName 1	(Assume First column is chr_position)&lt;br /&gt;
	-isSim	0	(Simulated data assumes ancestral is A)&lt;br /&gt;
	-nInd	0		(Number of individuals)&lt;br /&gt;
	-minQ	13	(minimum base quality; only used in pileupreader)&lt;br /&gt;
----------------&lt;br /&gt;
multiReader.cpp:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Example===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -pileup sam.mpileup -nInd 10 -fai hg19.fa.gz.fai&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Arguments===&lt;br /&gt;
&lt;br /&gt;
;-pileup [filename]&lt;br /&gt;
name of the pileup file. &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
A pileup file&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
1	13999999	N	3	ggg	I&amp;lt;B	2	Gg	FF	2	Gg	F7	6	ggGgGg	DBA@=2&lt;br /&gt;
1	14000000	N	3	ggg	8EG	2	Gg	BF	1	G	B	7	ggGgGgg	C&amp;gt;B=?:&amp;lt;&lt;br /&gt;
1	14000001	N	2	gg	&amp;lt;@	2	Gg	AC	2	Gg	:&amp;lt;	7	ggGgGgg	DBB?832&lt;br /&gt;
1	14000002	N	0			2	Cc	C1	1	C	B	7	ccCcCcc	=;A7485&lt;br /&gt;
1	14000003	N	2	gg	&amp;lt;/	2	Gg	&amp;lt;I	2	Gg	&amp;lt;/	7	ggGgGgg	C&amp;lt;;A84.&lt;br /&gt;
1	14000004	N	3	aaa	6C=	2	Aa	A9	2	Aa	BB	7	aaAaAaa	CBA7951&lt;br /&gt;
1	14000005	N	2	cc	4;	2	Cc	CC	2	Cc	@@	7	ccCcCcc	CBAB930&lt;br /&gt;
1	14000006	N	3	aaa	A9&amp;gt;	2	Aa	E&amp;lt;	2	Aa	;C	7	aa$AaAaa	D&amp;gt;BC6;:&lt;br /&gt;
1	14000007	N	3	ggg	43&amp;gt;	2	Gg	BI	2	Gg	D@	6	gGgGgg	BB?A.7&lt;br /&gt;
1	14000008	N	3	aaa	776	3	Aa^/A	:&amp;lt;?	2	Aa	BC	6	aAaAaa	D&amp;gt;C;:5&lt;br /&gt;
1	14000009	N	2	gg	96	3	GgG	BFD	2	Gg	A&amp;lt;	6	gGgGgg	CCA882&lt;br /&gt;
1	14000010	N	2	cc	54	3	CcC	&amp;gt;;A	2	Cc	A:	4	cCcC	=A69&lt;br /&gt;
1	14000011	N	2	gg	:0	3	GgG	9I&amp;lt;	2	Gg	&amp;lt;A	6	gGgGgg	C6A864&lt;br /&gt;
1	14000012	N	3	aaa	&amp;gt;F?	3	AaA	?&amp;lt;?	2	Aa	BC	5	aAaAa	D&amp;gt;B99&lt;br /&gt;
1	14000013	N	3	ggg	2==	3	GgG	AHD	2	Gg	EA	6	gGgGgg	C;A@63&lt;br /&gt;
1	14000014	N	3	aaa	8.6	3	AaA	?8A	2	Aa	2C	6	aAaAaa	C3A88&amp;lt;&lt;br /&gt;
1	14000015	N	2	cc	CD	3	CcC	CEB	2	Cc	?=	6	cCcCcc	D4&amp;lt;:=&amp;lt;&lt;br /&gt;
1	14000016	N	1	t	5	3	TtT	BGC	2	Tt	C@	6	tT$tTtt	C38A9&amp;gt;&lt;br /&gt;
1	14000017	N	3	ccc	17J	3	CcC	BB3	2	Cc	B7	5	ccCcc	D::B?&lt;br /&gt;
1	14000018	N	3	ccc	.:.	3	CcC	B:B	2	Cc	2;	5	ccCcc	&amp;lt;9956&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;-nInd [int]&lt;br /&gt;
Number of individuals must be specified. &lt;br /&gt;
;-fai [filename]&lt;br /&gt;
The index to the reference genome.&lt;br /&gt;
;-bpl [int]=33554432&lt;br /&gt;
maximum bytes per line. Increase if the pileup has many individuals.&lt;br /&gt;
;-nLines [int]=50&lt;br /&gt;
Number of lines to read at a time. Increasing this number will affect the RAM use.&lt;br /&gt;
;-minQ [int]=0&lt;br /&gt;
Minimum base quality score.&lt;br /&gt;
&lt;br /&gt;
===Tutorial===&lt;br /&gt;
&lt;br /&gt;
Various softwares can generate pileup format but the most used one is samtools &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
samtools mpileup -b bam.filelist &amp;gt; sam.mpileup&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
if you can then use it as input to angsd&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -pileup sam.mpileup -nInd 10 -fai hg19.fa.gz.fai -domaf 1 -domajorminor 1 -gl 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=BCF/VCF files=&lt;br /&gt;
BCF/VCF file as input is now included but with some limitations. Only chr,pos and PL tags are being used, and we discard indels.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
./angsd -vcf-gl&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
	-&amp;gt; angsd version: 0.910-20-g553b991 (htslib: 1.2.1-192-ge7e2b3d) build(Dec  4 2015 12:17:14)&lt;br /&gt;
	-&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
	-&amp;gt; Command: &lt;br /&gt;
./angsd -vcf-gl 	-&amp;gt; Fri Dec  4 14:35:51 2015&lt;br /&gt;
----------------&lt;br /&gt;
multiReader.cpp:&lt;br /&gt;
	-nLines	50	(Number of lines to read)&lt;br /&gt;
	-bpl	33554432 (bytesPerLine)&lt;br /&gt;
	-beagle	(null)	(Beagle Filename (can be .gz))&lt;br /&gt;
	-vcf-GL	(null)	(vcf Filename (can be .gz))&lt;br /&gt;
	-vcf-GP	(null)	(vcf Filename (can be .gz))&lt;br /&gt;
	-glf	(null)	(glf Filename (can be .gz))&lt;br /&gt;
	-pileup	(null)	(pileup Filename (can be .gz))&lt;br /&gt;
	-intName 1	(Assume First column is chr_position)&lt;br /&gt;
	-isSim	0	(Simulated data assumes ancestral is A)&lt;br /&gt;
	-nInd	0		(Number of individuals)&lt;br /&gt;
	-minQ	13	(minimum base quality; only used in pileupreader)&lt;br /&gt;
----------------&lt;br /&gt;
multiReader.cpp:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Example===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
angsd -vcf-gl ../smallBam/small2.bcf -domajorminor 1 -domaf 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Arguments===&lt;br /&gt;
&lt;br /&gt;
;-vcf-gl [filename]&lt;br /&gt;
name of the vcf file. &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
A vcf file&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
##fileformat=VCFv4.2(angsd version)&lt;br /&gt;
##FORMAT=&amp;lt;ID=GT,Number=1,Type=Integer,Description=&amp;quot;Genotype&amp;quot;&amp;gt;&lt;br /&gt;
##FORMAT=&amp;lt;ID=GP,Number=G,Type=Float,Description=&amp;quot;Genotype Probabilities&amp;quot;&amp;gt;&lt;br /&gt;
##FORMAT=&amp;lt;ID=PL,Number=G,Type=Float,Description=&amp;quot;Phred-scaled Genotype Likelihoods&amp;quot;&amp;gt;&lt;br /&gt;
##FORMAT=&amp;lt;ID=GL,Number=G,Type=Float,Description=&amp;quot;scaled Genotype Likelihoods (loglikeratios to the most likely (in log10))&amp;quot;&amp;gt;&lt;br /&gt;
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	ind0	ind1&lt;br /&gt;
1	14000873	.	G	A	.	PASS	.	GP:GL	0.000000,0.137003,0.862997:-15.128970,-1.505169,0.000000	0.716266,0.281975,0.001759:0.000000,-0.301034,-1.800000&lt;br /&gt;
1	14001018	.	T	C	.	PASS	.	GP:GL	0.000000,0.081718,0.918282:-13.701492,-1.806203,0.000000	0.850652,0.149348,0.000000:0.000000,-0.602068,-5.699627&lt;br /&gt;
1	14001867	.	A	G	.	PASS	.	GP:GL	0.000489,0.727550,0.271961:-3.600000,-0.301034,0.000000	0.914538,0.085462,0.000000:0.000000,-0.903101,-8.859124&lt;br /&gt;
1	14002422	.	A	T	.	PASS	.	GP:GL	0.000000,0.291570,0.708430:-9.777061,-0.903101,0.000000	0.767047,0.232952,0.000001:0.000000,-0.602068,-5.499530&lt;br /&gt;
1	14002474	.	T	C	.	PASS	.	GP:GL	0.995488,0.004512,0.000000:0.000000,-1.505169,-15.068561	0.965008,0.034992,0.000000:0.000000,-0.602068,-5.899399&lt;br /&gt;
1	14003581	.	C	T	.	PASS	.	GP:GL	0.000000,0.674489,0.325510:-7.200000,-0.602068,0.000000	0.992516,0.007484,0.000000:0.000000,-1.806203,-13.447742&lt;br /&gt;
1	14004623	.	T	C	.	PASS	.	GP:GL	0.000000,0.588345,0.411654:-6.999968,-0.602068,0.000000	0.989186,0.010814,0.000000:0.000000,-1.806203,-12.574310&lt;br /&gt;
1	14007493	.	A	G	.	PASS	.	GP:GL	0.000013,0.541811,0.458176:-5.286503,-0.602068,0.000000	0.398233,0.422941,0.178826:-0.400000,-0.301034,0.000000&lt;br /&gt;
1	14007558	.	C	T	.	PASS	.	GP:GL	0.000000,0.091908,0.908092:-15.524007,-1.505169,0.000000	0.284993,0.442044,0.272964:-0.400000,-0.301034,0.000000&lt;br /&gt;
1	14007649	.	G	A	.	PASS	.	GP:GL	0.000000,0.638610,0.361390:-7.340205,-0.602068,0.000000	0.779442,0.220538,0.000020:0.000000,-0.301034,-3.500000&lt;br /&gt;
1	14008734	.	T	A	.	PASS	.	GP:GL	0.000000,0.280425,0.719575:-13.909454,-1.204135,0.000000	0.757059,0.242817,0.000123:0.000000,-0.301034,-2.800000&lt;br /&gt;
1	14009723	.	G	C	.	PASS	.	GP:GL	0.000345,0.744684,0.254971:-3.800000,-0.301034,0.000000	0.744903,0.255042,0.000055:0.000000,-0.301034,-3.200000&lt;br /&gt;
1	14010597	.	G	A	.	PASS	.	GP:GL	0.000000,0.063511,0.936489:-17.446187,-1.806203,0.000000	0.684326,0.315309,0.000365:0.000000,-0.301034,-2.600000&lt;br /&gt;
1	14010654	.	T	C	.	PASS	.	GP:GL	0.600538,0.348812,0.050650:0.000000,0.000000,0.000000	0.600538,0.348812,0.050650:0.000000,0.000000,0.000000&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
;-nLines [int]=50&lt;br /&gt;
Number of lines to read at a time. Increasing this number will affect the RAM use.&lt;br /&gt;
&lt;br /&gt;
===Tutorial===&lt;br /&gt;
&lt;br /&gt;
Create a VCF file using your favorate software or using angsd&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -b bam.filelist -dovcf 1 -gl 1 -dopost 1 -domajorminor 1 -domaf 1 -snp_pval 1e-6&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
you can then use it as input to angsd if you have the GL info&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -vcf-gl angsdput.vcf.gz -nind 10 -fai hg19.fa.gz.fai -domaf 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Genotype Likelihood Files=&lt;br /&gt;
&lt;br /&gt;
==-glf==&lt;br /&gt;
A simple format for genotype likelihoods: This is the format used by ''supersim'' subprogram and the ''-doglf 1'' option in angsd.&lt;br /&gt;
This format is binary, 10doubles per individual. -nInd therefore needs to be supplied&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
../angsd/angsd -glf&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
	-&amp;gt; angsd version: 0.910-20-g553b991 (htslib: 1.2.1-192-ge7e2b3d) build(Dec  4 2015 12:17:14)&lt;br /&gt;
	-&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
	-&amp;gt; Command: &lt;br /&gt;
../angsd/angsd -pileup 	-&amp;gt; Fri Dec  4 12:17:53 2015&lt;br /&gt;
----------------&lt;br /&gt;
multiReader.cpp:&lt;br /&gt;
	-nLines	50	(Number of lines to read)&lt;br /&gt;
	-bpl	33554432 (bytesPerLine)&lt;br /&gt;
	-beagle	(null)	(Beagle Filename (can be .gz))&lt;br /&gt;
	-vcf-GL	(null)	(vcf Filename (can be .gz))&lt;br /&gt;
	-vcf-GP	(null)	(vcf Filename (can be .gz))&lt;br /&gt;
	-glf	(null)	(glf Filename (can be .gz))&lt;br /&gt;
	-pileup	(null)	(pileup Filename (can be .gz))&lt;br /&gt;
	-intName 1	(Assume First column is chr_position)&lt;br /&gt;
	-isSim	0	(Simulated data assumes ancestral is A)&lt;br /&gt;
	-nInd	0		(Number of individuals)&lt;br /&gt;
	-minQ	13	(minimum base quality; only used in pileupreader)&lt;br /&gt;
----------------&lt;br /&gt;
multiReader.cpp:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Example===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -glf data.glf.gz -nInd 10 -fai hg19.fa.gz.fai&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Arguments===&lt;br /&gt;
&lt;br /&gt;
;-glf [filename]:&lt;br /&gt;
name of the glf file (gunzipped). &lt;br /&gt;
Every genotype likelihood is saved as binary double log scaled. In the following order. AA,AC,AG,AT,... for each individual&lt;br /&gt;
;-nInd [int]&lt;br /&gt;
Number of individuals must be specified. &lt;br /&gt;
;-fai [filename]&lt;br /&gt;
The index to the reference genome.&lt;br /&gt;
;-bpl [int]=33554432&lt;br /&gt;
maximum bytes per line. Increase if the pileup has many individuals.&lt;br /&gt;
;-nLines [int]=50&lt;br /&gt;
Number of lines to read at a time. Increasing this number will affect the RAM use.&lt;br /&gt;
;-minQ [int]=0&lt;br /&gt;
Minimum base quality score.&lt;br /&gt;
&lt;br /&gt;
===Tutorial===&lt;br /&gt;
&lt;br /&gt;
Simulate genotype likelihoods &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
supersim -outfiles data -nind 10 -nsites 100000 -errate 0.01 -depth 4&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
then use it as input to angsd&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -glf data.glf.gz -nInd 10 -fai hg19.fa.gz.fai -domaf 1 -domajorminor 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
make GLF file from the chromosome 1&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genolike -doGlf 1 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist -r 1:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
recalculate the allele frequencies&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -glf genolike.glf.gz -nInd 10 -fai hg19.fa.gz.fai -domaf 2 -domajorminor 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==-glf10_text==&lt;br /&gt;
-glf10_text was added in commit: https://github.com/ANGSD/angsd/commit/46fc3edc181e80c4ad5e6bd644a64d23a5012e0e nov2 2017.&lt;br /&gt;
This allows for reading files in the output format as -doglf 4.&lt;br /&gt;
This is a simple text file with column 1 and column 2 being chromosome/scaffold and position. Then for each individual there are 10 logscaled genotype likelihoods in the order: AA,AC,AG,AT,CC,CG,CT,GG,GT,TT.&lt;br /&gt;
Example runs are:&lt;br /&gt;
First generate an example of this format:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -gl 1 -doglf 4 -bam list -out first -domajorminor 1 -domaf 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates the file first.glf.gz. Which we can then use as input.&lt;br /&gt;
Example here:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 ./angsd -glf10_text first.glf.gz -nind 33 -domaf 1 -domajorminor 1 -fai fai.fai &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Notice that -nInd and -fai needs to be supplied.&lt;br /&gt;
&lt;br /&gt;
=Genotype Probability Files=&lt;br /&gt;
==Beagle format==&lt;br /&gt;
Genotype probabilities in gz beagle format can be used as input. The format used is the haplotype imputation format outputted from beagle [http://faculty.washington.edu/browning/beagle/beagle.html]. A newer version of beagle uses VCF files.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
./angsd -beagle &lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
	-&amp;gt; angsd version: 0.910-20-g553b991 (htslib: 1.2.1-192-ge7e2b3d) build(Dec  4 2015 12:17:14)&lt;br /&gt;
	-&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
	-&amp;gt; Command: &lt;br /&gt;
./angsd -beagle 	-&amp;gt; Fri Dec  4 14:03:22 2015&lt;br /&gt;
----------------&lt;br /&gt;
multiReader.cpp:&lt;br /&gt;
	-nLines	50	(Number of lines to read)&lt;br /&gt;
	-bpl	33554432 (bytesPerLine)&lt;br /&gt;
	-beagle	(null)	(Beagle Filename (can be .gz))&lt;br /&gt;
	-vcf-GL	(null)	(vcf Filename (can be .gz))&lt;br /&gt;
	-vcf-GP	(null)	(vcf Filename (can be .gz))&lt;br /&gt;
	-glf	(null)	(glf Filename (can be .gz))&lt;br /&gt;
	-pileup	(null)	(pileup Filename (can be .gz))&lt;br /&gt;
	-intName 1	(Assume First column is chr_position)&lt;br /&gt;
	-isSim	0	(Simulated data assumes ancestral is A)&lt;br /&gt;
	-nInd	0		(Number of individuals)&lt;br /&gt;
	-minQ	13	(minimum base quality; only used in pileupreader)&lt;br /&gt;
----------------&lt;br /&gt;
multiReader.cpp:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
===Example===&lt;br /&gt;
&lt;br /&gt;
Example of estimating allele frequencies from beagle files&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -out out -doMaf 4 -beagle file.beagle.gprobs.gz -fai ref.fai&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Arguments===&lt;br /&gt;
&lt;br /&gt;
; -beagle [fileName]&lt;br /&gt;
beagle file name. The file must be gzipped.&lt;br /&gt;
The file format is a single line per site. The first 3 coloums are&lt;br /&gt;
* markerName&lt;br /&gt;
* alleleA&lt;br /&gt;
* alleleB&lt;br /&gt;
&lt;br /&gt;
For each individual 3 columns are added. These three columns should sum to one. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
 file with two individuals&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
marker alleleA alleleB NA06984 NA06984 NA06984 NA06986 NA06986 NA06986&lt;br /&gt;
chr9_95759065 G A 0.6563 0.3078 0.0358 0.5357 0.4016 0.0627&lt;br /&gt;
chr9_95759152 C A 1 0 0 0 1 0&lt;br /&gt;
chr9_95762332 G A 0.925 0.0734 0.0015 0.894 0.1031 0.0029&lt;br /&gt;
chr9_95762333 A T 0.8903 0.1067 0.003 0.811 0.1797 0.0093&lt;br /&gt;
chr9_95762343 G T 0.9149 0.0835 0.0017 0.8396 0.1541 0.0064&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
; -intName [int]=1&lt;br /&gt;
default 1. If the SNP name are written as chr_position this information will be parsed. If the SNP name is in another format then use -intName 0.&lt;br /&gt;
;-fai [filename]&lt;br /&gt;
The index to the reference genome&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
can also be obtained from the bam header&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
samtools view -H  file.bam | grep SN |cut -f2,3 | sed 's/SN\://g' |  sed 's/LN\://g' &amp;gt; ref.fai&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
;-bpl [int]=33554432&lt;br /&gt;
maximum bytes per line. Increase if the pileup has many individuals&lt;br /&gt;
;-nLines [int]=50&lt;br /&gt;
Number of lines to read at a time. Increasing this number will affect the RAM use&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Genotype_Likelihoods&amp;diff=3148</id>
		<title>Genotype Likelihoods</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Genotype_Likelihoods&amp;diff=3148"/>
		<updated>2021-09-28T08:56:35Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Many methods in ANGSD are based on genotype likelihoods, and ANGSD has 4 different genotype likelihood models implemented.&lt;br /&gt;
&lt;br /&gt;
Genotype likelihoods and the four models are described in the [[#Theory | Bottom]].&lt;br /&gt;
&lt;br /&gt;
The SOAPsnp requires that a reference is supplied. Preferably the recalibration should only be performed on non-variable sites, so we recommend that the reference fasta should be modified such that all snp sites have an 'N'.&lt;br /&gt;
&lt;br /&gt;
We also allow for output of the calculated genotype likelihoods in various formats that might be handy for some users.&lt;br /&gt;
&lt;br /&gt;
;NB the GATK model described and implemented in this program are the one described in the first GATK paper. This might be drastically different from the one used in the newer version of GATK.&lt;br /&gt;
__TOC__&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL &lt;br /&gt;
	-&amp;gt; angsd version: 0.567	 build(Dec  7 2013 14:56:25)&lt;br /&gt;
	-&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
---------------------&lt;br /&gt;
analysisEstLikes.cpp:&lt;br /&gt;
	-GL=0: &lt;br /&gt;
	1: SAMtools&lt;br /&gt;
	2: GATK&lt;br /&gt;
	3: SOAPsnp&lt;br /&gt;
	4: SYK&lt;br /&gt;
	-trim		0		(zero means no trimming)&lt;br /&gt;
	-tmpdir		angsd_tmpdir/	(used by SOAPsnp)&lt;br /&gt;
	-errors		(null)		(used by SYK)&lt;br /&gt;
	-minInd		0		(0 indicates no filtering)&lt;br /&gt;
&lt;br /&gt;
Filedumping:&lt;br /&gt;
	-doGlf	0&lt;br /&gt;
	1: binary glf (10 log likes)	.glf.gz&lt;br /&gt;
	2: beagle likelihood file	.beagle.gz&lt;br /&gt;
	3: binary 3 times likelihood	.glf.gz&lt;br /&gt;
	4: text version (10 log likes)	.glf.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
=Options=&lt;br /&gt;
; -GL [int]&lt;br /&gt;
If your input is sequencing file you can estimate genotype likelhoods from the mapped reads. Four different methods are available. &lt;br /&gt;
# SAMtools model&lt;br /&gt;
# GATK model&lt;br /&gt;
# SOAPsnp model&lt;br /&gt;
# SYK model&lt;br /&gt;
&lt;br /&gt;
; NB&lt;br /&gt;
When estimating GL with soapSNP we need to generate a calibration matrix. This is done automaticly if these doesn't exist. These are located in angsd_tmpdir/basenameNUM.count,angsd_tmpdir/basenameNUM.qual, and the read length is not allowed to exceed 256 base pairs.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;-trim [int]&lt;br /&gt;
This will discards [int] bases at both ends of the reads when calculating the genotype likelihoods. &lt;br /&gt;
&lt;br /&gt;
;-tmpdir [directoryPath]&lt;br /&gt;
default is `angsd_tmpdir`. SOAPsnp generates a mismatch matrix for each BAM file and based on this mismatch matrix it calculates the type specific errors for each position in the read. So for each BAM file it generates two files, to avoid cluttering up the working directory you can specify a folder that should be used. SOAPsnp assumes that all reads have the same length, if this is not the case this model might not be suited (also true for other recalibration tools).&lt;br /&gt;
&lt;br /&gt;
;-errors [fileName]&lt;br /&gt;
SYK model requires a file containing the type specific errors, as estimated from [[Error estimation | -doError 1]].&lt;br /&gt;
&lt;br /&gt;
;-minInd [int]&lt;br /&gt;
Discard the sites where we don't have data from '''-minInd''' individuals. If you have 100 individuals, and you only want to base your downstream analysis on the sites where you have data for at least half your samples then set '''-minInd 50'''.&lt;br /&gt;
&lt;br /&gt;
==Filtering==&lt;br /&gt;
See  [[Input#BAM_files]] for Bam specific filters.&lt;br /&gt;
&lt;br /&gt;
=Examples=&lt;br /&gt;
SAMtools and GATK likelihood are chosen simply with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 #SAMtools&lt;br /&gt;
./angsd -GL 2 #GATK&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
SOAPsnp and SYK requires some extra arguments as shown below.&lt;br /&gt;
==SOAPsnp==&lt;br /&gt;
First run through the bam files ones to generate the calibration matrix&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam bam.filelist -GL 3 -out outfile -ref hg19.fa -minQ 0&lt;br /&gt;
#NB important to set -minQ to zero, ANGSD defaults to minQ 13&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This first loop doesn't estimate anything else than the calibration matrix.&lt;br /&gt;
&lt;br /&gt;
After this run the we can estimate the genotype likelihoods and any other further analysis we desire. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam bam.filelist -GL 3 -out outfile -doGlf 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==SYK (Kim et al.)==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam bam.filelist -GL 4 -out outfile -errors error.file -doCounts 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This model is based on counts of bases and therefore needs [[Alleles_counts]] &amp;quot;-doCounts 1&amp;quot;. The error file is one line of 16 values as outputted from -doError&lt;br /&gt;
&lt;br /&gt;
=Output genotype likelihoods=&lt;br /&gt;
; -doGlf [int]&lt;br /&gt;
Output the log genotype likelihoods to a file&lt;br /&gt;
&lt;br /&gt;
;0. don't output the genotype likelihoods (default)&lt;br /&gt;
&lt;br /&gt;
;1.  binary all 10 log genotype likelihood &lt;br /&gt;
&lt;br /&gt;
;2. beagle genotype likelihood format (use directly for imputation)&lt;br /&gt;
&lt;br /&gt;
;3. beagle binary&lt;br /&gt;
&lt;br /&gt;
;4. textoutput of all 10 log genotype likelihoods.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Binary==&lt;br /&gt;
Glf file in binary doubles. All 10 genotype likelihoods are printed to a file. For each printed site there are 10*N doubles where N is the number of individuals. The order of the 10 genotypes are alphabetical AA AC AG AT CC CG CT GG GT TT. These are log scaled likelihood ratios to the most likely.&lt;br /&gt;
&lt;br /&gt;
Pseudocode for parsing these files in '''c/c++'''.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
FILE *fp = fopen(genotypelikelihood.bin,&amp;quot;r&amp;quot;)&lt;br /&gt;
ind nInd = 5;&lt;br /&gt;
double gls[5*10];&lt;br /&gt;
fread(gls,sizeof(double),5*10,fp);&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Beagle format==&lt;br /&gt;
Beagle haplotype imputation and be performed directly on genotype likelhoods. To generate beagle input file use&lt;br /&gt;
&lt;br /&gt;
; -doGlf 2&lt;br /&gt;
&lt;br /&gt;
In order to make this file the major and minor allele has the be inferred [[Inferring Major and Minor alleles | -doMajorMinor]]. It is also a good idea to only use the polymorphic sites.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Example===&lt;br /&gt;
In this example our input files are bam files. We use the samtools genotype likelihood methods. We use 10 threads. We infer the major and minor allele from the likelihoods and estimate the allele frequencies. We test for polymorphic sites and only outbut the ones with are likelhood ratio test statistic of minimum 24 (ca. p-value&amp;lt;1e-6). &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genolike -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Output===&lt;br /&gt;
The above command generates the file genolike.beagle.gz that can be use as input for the beagle software&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker  allele1 allele2 Ind0    Ind0    Ind0    Ind1    Ind1    Ind1    Ind2    Ind2    Ind2    Ind3    Ind3    Ind3 &lt;br /&gt;
1_14000023      1       0       0.941177        0.058822        0.000001        0.799685        0.199918        0.000397        0.666316        0.333155        0.000529 &lt;br /&gt;
1_14000072      2       3       0.709983        0.177493        0.112525        0.941178        0.058822        0.000000        0.665554        0.332774        0.001672&lt;br /&gt;
1_14000113      0       2       0.855993        0.106996        0.037010        0.333333        0.333333        0.333333        0.799971        0.199989        0.000040 &lt;br /&gt;
1_14000202      2       0       0.835380        0.104420        0.060201        0.799685        0.199918        0.000397        0.333333        0.333333        0.333333&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that the above values sum to one per sites for each individuals. This is just a normalization of the genotype likelihoods in order to avoid underflow problems in the beagle software it does not mean that they are genotype probabilities.&lt;br /&gt;
&lt;br /&gt;
; column 1 (marker)&lt;br /&gt;
the chromosome and position&lt;br /&gt;
; column 2 (allele 1)&lt;br /&gt;
the major allele codes as 0=A, 1=C, 2=G, 3=T&lt;br /&gt;
; column 3 (allele 2)&lt;br /&gt;
the minor allele codes as 0=A, 1=C, 2=G, 3=T&lt;br /&gt;
; column 4 (Ind0)&lt;br /&gt;
Genotype likelihood for the major/major genotype for the first individual&lt;br /&gt;
; column 5 (Ind0)&lt;br /&gt;
Genotype likelihood for the major/minor genotype for the first individual&lt;br /&gt;
; column 6 (Ind0)&lt;br /&gt;
Genotype likelihood for the minor/minor genotype for the first individual&lt;br /&gt;
; column 7 (Ind1)&lt;br /&gt;
Genotype likelihood for the major/major genotype for the second individual&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
==Simple Text Format==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -bam bam.filelist -doGlf 4 -nInd 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
We use SAMtools genotype likelihoods from the first sample ('''-nInd 1''') in the file list called '''bam.filelist'''.&lt;br /&gt;
&lt;br /&gt;
Generates '''angsdput.glf.gz''', which looks like:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 13999965 -2.072327 -0.693156 -2.072327 -2.072327 0.000000 -0.693156 -0.693156 -2.072327 -2.072327 -2.072327&lt;br /&gt;
1 13999966 -2.072327 -2.072327 -0.693156 -2.072327 -2.072327 -0.693156 -2.072327 0.000000 -0.693156 -2.072327&lt;br /&gt;
1 13999967 0.000000 -0.693156 -0.693156 -0.693156 -2.072327 -2.072327 -2.072327 -2.072327 -2.072327 -2.072327&lt;br /&gt;
1 13999968 -2.072327 -2.072327 -0.693156 -2.072327 -2.072327 -0.693156 -2.072327 0.000000 -0.693156 -2.072327&lt;br /&gt;
1 13999969 0.000000 -0.693156 -0.693156 -0.693156 -2.072327 -2.072327 -2.072327 -2.072327 -2.072327 -2.072327&lt;br /&gt;
1 13999970 -2.072327 -2.072327 -2.072327 -0.693156 -2.072327 -2.072327 -0.693156 -2.072327 -0.693156 0.000000&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
First 2 columns are the genomic positions, and the final 10 values are the genotype likelihoods in the usual ordering.&lt;br /&gt;
=Which genotype likelihood model should I choose ?=&lt;br /&gt;
It depends on the data. As shown on this example [[Glcomparison]], there was a huge difference between '''-GL 1''' and '''-GL 2''' for older 1000genomes BAM files, but little difference for newer bam files.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Theory=&lt;br /&gt;
Genotype likelihoods are in this context the likelihood the data given a genotype. This is to be understood as we take all the information from our data for a specific position for a single individual, and we use this information to calculate the likelihood for our different genotypes. Since we assume diploid individuals it follows that we have 10 different genotypes.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;text-align: center; color: green;&amp;quot;&lt;br /&gt;
|0&lt;br /&gt;
|1&lt;br /&gt;
|2&lt;br /&gt;
|3&lt;br /&gt;
|4&lt;br /&gt;
|5&lt;br /&gt;
|6&lt;br /&gt;
|7&lt;br /&gt;
|8&lt;br /&gt;
|9&lt;br /&gt;
|-&lt;br /&gt;
|AA&lt;br /&gt;
|AC&lt;br /&gt;
|AG&lt;br /&gt;
|AT&lt;br /&gt;
|CC&lt;br /&gt;
|CG&lt;br /&gt;
|CT&lt;br /&gt;
|GG&lt;br /&gt;
|GT&lt;br /&gt;
|TT&lt;br /&gt;
|}&lt;br /&gt;
And we write the genotype likelihood as&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
L(G=\{A_1 ,A_2\}|D ) \propto Pr (D|G={A_1 ,A_2 } ),\qquad A_1 ,A_2 \in \{A,C,G,T\}.&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==GATK genotype likelihoods==&lt;br /&gt;
In angsd we use the direct method of the first version of GATK (dragon). This is simply&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Pr(D|G=\{A_1,A_2\})=\prod_{i=1}^M Pr \left ( b_i|G=\{A_1,A_2\} \right) = \prod_{i=1}^M  (\frac{1}{2}Pr( b_i|A_1)  + \frac{1}{2}Pr( b_i|A_2)  ) &lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Pr(b|A) =\left\{&lt;br /&gt;
  \begin{array}{lr}&lt;br /&gt;
    \frac{e}{3} &amp;amp; : b \neq A\\&lt;br /&gt;
   1-e &amp;amp; : b = A&lt;br /&gt;
  \end{array}&lt;br /&gt;
\right.&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where M is the sequencing depth &amp;lt;math&amp;gt;b_i&amp;lt;/math&amp;gt; is the observed base in read ''i, e'' is the probability of error calculated from the phredscaled qscore e.g. &amp;lt;math&amp;gt; e=10^{-q/10} &amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==SAMtools genotype likelihoods==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
This subsection with SAMtools gl are preliminary&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Define:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
fk_i = 0.83^i*0.97+0.03 &lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
lhet_{n,k} = \log  \frac{\binom{n}{k}}{2^n} &lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\beta_{n,k} = \frac{\beta_{n,k-1}}{\beta_{n,k-1}+\binom{n}{k}\cdot k \cdot log(prob(e))+(n-k)*log(1-prob(e))}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==SOAPsnp genotype likelihoods==&lt;br /&gt;
==SYK genotype likelihoods==&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=PCA_MDS&amp;diff=3116</id>
		<title>PCA MDS</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=PCA_MDS&amp;diff=3116"/>
		<updated>2020-04-20T15:26:24Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* Options */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= single read sampling approach for PCA or MDS =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This function is new and works from version '''0.912''' and in the latest developmental version from [https://github.com/ANGSD/angsd github]&lt;br /&gt;
&lt;br /&gt;
For the PCA / MDS methods you should called SNP sites (use [[PCA]] if you do not want to call SNPs). SNPs can be called based on genotype likelihoods (see [[SNP_calling]]) or you can give the variable sites you want analysis using the [[Sites|-sites]] options. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -doIBS&lt;br /&gt;
	-&amp;gt; angsd version: 0.911-26-gf1cb0e0-dirty (htslib: 1.3-1-gc72ae90) build(Apr 27 2016 11:15:33)&lt;br /&gt;
	-&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
	-&amp;gt; Command: &lt;br /&gt;
../angsd/angsd -doIBS 	-&amp;gt; Wed Apr 27 12:38:35 2016&lt;br /&gt;
--------------&lt;br /&gt;
abcIBS.cpp:&lt;br /&gt;
	-doIBS	0&lt;br /&gt;
	(Sampling strategies)&lt;br /&gt;
	 0:	 no IBS &lt;br /&gt;
	 1:	 (Sample single base)&lt;br /&gt;
	 2:	 (Concensus base)&lt;br /&gt;
	-doCounts	0	Must choose -doCount 1&lt;br /&gt;
Optional&lt;br /&gt;
	-minMinor	0	Minimum observed minor alleles&lt;br /&gt;
	-minFreq	0.000	Minimum minor allele frequency&lt;br /&gt;
	-output01	0	output 0 and 1s instead of based&lt;br /&gt;
	-maxMis		-1	Maximum missing bases (per site)&lt;br /&gt;
	-doMajorMinor	0	use input files or data to select major and minor alleles&lt;br /&gt;
	-makeMatrix	0	print out the ibs matrix &lt;br /&gt;
	-doCov		0	print out the cov matrix &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Options==&lt;br /&gt;
;-doIBS [int] &lt;br /&gt;
Print a single base from each individual at each position. 1: random sampled read. 2: Consensus base. If you do not want to print out every site then use -1 or -2. &lt;br /&gt;
&lt;br /&gt;
;-doCounts [int]&lt;br /&gt;
Method requeres counting the different bases at each position. Therefore, -doCounts 1 must be used&lt;br /&gt;
&lt;br /&gt;
;-doMajorMinor [int]&lt;br /&gt;
The covariance matrix can only be calculated for diallelic sites. Therefore, choose a methods for selecting the major and minor allele (see [[Inferring_Major_and_Minor_alleles]]). This can also be use if you only want to make this assumption for the IBS matrix or only want to print out bases that are either the major or minor. &lt;br /&gt;
&lt;br /&gt;
;-minMinor [int]&lt;br /&gt;
Minimum observed minor alleles. The default in 0. If you do not use -doMajorMinor then the number of minor alleles are the sum of the 3 most uncommon alleles. &lt;br /&gt;
&lt;br /&gt;
;-minFreq [float]	&lt;br /&gt;
Minimum minor allele frequency based on the sampled bases. The default in 0. If you do not use -doMajorMinor then the frequency is the sum of the frequencies of the 3 most uncommon alleles. &lt;br /&gt;
&lt;br /&gt;
;-output01 [int]	&lt;br /&gt;
output the samples reads as 0 (for major) and 1s (for non major) instead of actual base&lt;br /&gt;
&lt;br /&gt;
;-maxMis [int]&lt;br /&gt;
Maximum missing bases (per site) i.e. is the maximum number of allowed non-major/minor sampled bases&lt;br /&gt;
&lt;br /&gt;
;-makeMatrix [int]&lt;br /&gt;
1 prints out the pairwise IBS matrix. This is the avg. distance between pairs of individuals. Distance is zero if the base in the same and 1 otherwise. You can use this for MDS (see below)&lt;br /&gt;
&lt;br /&gt;
;-doCov [int]		&lt;br /&gt;
1 print out the covariance matrix which can be used for PCA (see below). You should use the -minFreq option to avoid sites with low allele frequency.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== run example ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam all.files -minMapQ 30 -minQ 20 -GL 2  -doMajorMinor 1 -doMaf 1 -SNP_pval 2e-6 -doIBS 1 -doCounts 1 -doCov 1 -makeMatrix 1 -minMaf 0.05 -P 5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This will produce the output (see below) which includes pairwise differences (.ibsMat) and the covariance matrix (.covMat). These can be used for MDS and PCA respectively (see R example below). Note that only the PCA method require SNP calling and allele frequency estimation.&lt;br /&gt;
&lt;br /&gt;
==Output==&lt;br /&gt;
&lt;br /&gt;
=== sampled bases *ibs.gz ===&lt;br /&gt;
This function will print the sampled based *ibs.gz. &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor&amp;gt;0 and -output01 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   minor   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7&lt;br /&gt;
1       14000873        A       G       0       1       1       1       1       1       1&lt;br /&gt;
1       14001018        C       T       0       1       1       1       1       1       1&lt;br /&gt;
1       14001867        G       A       0       1       1       1       1       0       1&lt;br /&gt;
1       14002342        T       C       1       1       1       1       1       -1      1&lt;br /&gt;
1       14002422        T       A       0       1       1       1       1       0       -1&lt;br /&gt;
1       14003581        T       C       0       1       1       1       1       1       1&lt;br /&gt;
1       14004623        C       T       0       1       1       1       1       0       1&lt;br /&gt;
1       14006543        T       G       0       -1      1       1       1       0       1&lt;br /&gt;
1       14007493        G       A       0       0       1       -1      1       0       1&lt;br /&gt;
1       14007558        T       C       0       0       1       1       -1      -1      1&lt;br /&gt;
1       14007649        A       G       0       1       1       1       1       0       1&lt;br /&gt;
1       14008269        A       G       1       1       0       -1      1       -1      1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor&amp;gt;0 and -output01 0&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   minor   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7&lt;br /&gt;
1       13116   G       T       N       G       T       T       N       G       N       T&lt;br /&gt;
1       13118   G       A       N       G       A       A       N       G       N       A&lt;br /&gt;
1       14930   A       G       G       G       G       A       N       N       A       N&lt;br /&gt;
1       15211   T       G       N       G       T       G       N       N       N       G&lt;br /&gt;
1       54490   A       G       N       G       N       G       N       N       N       N&lt;br /&gt;
1       54716   T       C       T       C       C       C       T       N       N       N&lt;br /&gt;
1       58814   A       G       N       G       N       G       G       G       N       N&lt;br /&gt;
1       62777   T       A       N       N       A       N       A       A       A       N&lt;br /&gt;
1       63268   C       T       N       T       N       T       C       N       T       N&lt;br /&gt;
1       63671   A       G       N       G       N       N       G       G       G       N&lt;br /&gt;
1       69428   G       T       N       G       T       N       N       T       T       N&lt;br /&gt;
1       69761   T       A       A       A       T       A       N       A       N       N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor 0 and -output01 0&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7    ind8&lt;br /&gt;
1       13116   T       N       G       T       T       N       G       N       T       T&lt;br /&gt;
1       13118   A       N       G       A       A       N       G       N       A       A&lt;br /&gt;
1       14930   A       G       G       G       A       N       N       A       N       G&lt;br /&gt;
1       15211   G       N       G       T       G       N       N       N       G       G&lt;br /&gt;
1       54490   G       N       G       N       G       N       N       N       N       A&lt;br /&gt;
1       54716   C       T       C       C       C       T       N       N       N       C&lt;br /&gt;
1       58814   G       N       G       N       G       G       G       N       N       G&lt;br /&gt;
1       62777   A       N       N       A       N       A       A       A       N       A&lt;br /&gt;
1       63268   T       N       T       N       T       C       N       T       N       N&lt;br /&gt;
1       63336   C       C       C       C       C       C       N       C       N       N&lt;br /&gt;
1       63671   G       N       G       N       N       G       G       G       N       N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''chr''' is the chromosome&lt;br /&gt;
&lt;br /&gt;
'''pos''' is the position&lt;br /&gt;
&lt;br /&gt;
'''major''' is the major allele&lt;br /&gt;
&lt;br /&gt;
'''minor''' is the minor allele. Needs -doMajorMinor&lt;br /&gt;
&lt;br /&gt;
'''indX''' is the sampled base for individual number X. if -output01 1 then it is 1 for major, 0 for non major and -1 for missing&lt;br /&gt;
&lt;br /&gt;
=== sample based IBS matrix *.ibsMat ===&lt;br /&gt;
This function will print the pairwise IBS distance &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibsMat with -makeMatrix 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
0.000000        0.510638        0.606383        0.595745        0.545455        0.428571&lt;br /&gt;
0.510638        0.000000        0.154639        0.154639        0.108911        0.408602&lt;br /&gt;
0.606383        0.154639        0.000000        0.121212        0.137255        0.489362&lt;br /&gt;
0.595745        0.154639        0.121212        0.000000        0.106796        0.484211&lt;br /&gt;
0.545455        0.108911        0.137255        0.106796        0.000000        0.404040&lt;br /&gt;
0.428571        0.408602        0.489362        0.484211        0.404040        0.000000&lt;br /&gt;
0.577320        0.121212        0.181818        0.171717        0.097087        0.473684&lt;br /&gt;
0.536082        0.090000        0.138614        0.118812        0.047619        0.428571&lt;br /&gt;
0.262500        0.571429        0.702381        0.694118        0.632184        0.353659&lt;br /&gt;
0.458333        0.383838        0.484848        0.494949        0.398058        0.368421&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nind x Nind matrix with pairwise IBS distance&lt;br /&gt;
&lt;br /&gt;
=== sample based covariance matrix *.covMat ===&lt;br /&gt;
This function will print the covariance matrix based on a single sampled read&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.covMat with -doCov 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
1.098251        -0.026225       -0.005617       -0.014726       -0.022438       -0.021786&lt;br /&gt;
-0.026225       1.115986        -0.017167       0.000735        -0.017163       -0.016899&lt;br /&gt;
-0.005617       -0.017167       1.074779        -0.015685       -0.019819       -0.015473&lt;br /&gt;
-0.014726       0.000735        -0.015685       1.072853        -0.013641       -0.007789&lt;br /&gt;
-0.022438       -0.017163       -0.019819       -0.013641       1.094612        -0.016045&lt;br /&gt;
-0.021786       -0.016899       -0.015473       -0.007789       -0.016045       1.059264&lt;br /&gt;
-0.005831       -0.009854       -0.001269       -0.002362       -0.018479       -0.011942&lt;br /&gt;
-0.015399       -0.020010       -0.001296       -0.022947       -0.006515       -0.003938&lt;br /&gt;
-0.001730       -0.040534       -0.002295       -0.017442       -0.024194       -0.007469&lt;br /&gt;
-0.016094       -0.015303       -0.018302       -0.022502       -0.030503       -0.001208&lt;br /&gt;
-0.122045       -0.106068       -0.103089       -0.104443       -0.110237       -0.103610&lt;br /&gt;
-0.106553       -0.100202       -0.104754       -0.109399       -0.107645       -0.111665&lt;br /&gt;
-0.108945       -0.102440       -0.105292       -0.101372       -0.107110       -0.106639&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nind x Nind covariance matrix&lt;br /&gt;
&lt;br /&gt;
==Model==&lt;br /&gt;
&lt;br /&gt;
=== IBS ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
pairwise distance between individuals&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
d_{ij} = \frac{\sum_m^M 1-I_{b_j}(b_i)}{M}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
where M in the number of sites with a read for both individuals. &amp;lt;math&amp;gt; 1-I_{b_j}(b_i) &amp;lt;/math&amp;gt; is the indicator function which is equal to one with the two individuals i and j have the same base and zero otherwise&lt;br /&gt;
&lt;br /&gt;
=== Covariance ===&lt;br /&gt;
&lt;br /&gt;
Allele frequency based on single reads. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f_{m} = \frac{N_{minor}}{N_{major} + N_{minor}}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
cov(ij) = \frac{1}{M}\sum_m^M \frac{ (h^i_m-f_m)(h^j_m-f_m) }{f_m(1-f_m)}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where M in the number of sites with a read for both individuals. &amp;lt;math&amp;gt; h^i_m&amp;lt;/math&amp;gt; is 1 if individuals i for site m has the major allele and zero otherwise&lt;br /&gt;
&lt;br /&gt;
=MDS/PCA using R=&lt;br /&gt;
&lt;br /&gt;
[[File:PCA_MDS.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
## MDS&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.ibsMat&amp;quot;&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
mds &amp;lt;- cmdscale(as.dist(m))&lt;br /&gt;
plot(mds,lwd=2,ylab=&amp;quot;Dist&amp;quot;,xlab=&amp;quot;Dist&amp;quot;,main=&amp;quot;multidimensional scaling&amp;quot;,col=rep(1:3,each=10))&lt;br /&gt;
&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.covMat&amp;quot;&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
e &amp;lt;- eigen(m)&lt;br /&gt;
plot(e$vectors[,1:2],lwd=2,ylab=&amp;quot;PC 2&amp;quot;,xlab=&amp;quot;PC 2&amp;quot;,main=&amp;quot;Principal components&amp;quot;,col=rep(1:3,each=10),pch=16)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=other fun stuff=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
## heatmap / clustering / trees&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.ibsMat&amp;quot; # or covMat&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
#heat map&lt;br /&gt;
heatmap(m)&lt;br /&gt;
#neighbour joining&lt;br /&gt;
plot(ape::nj(m))&lt;br /&gt;
plot(hclust(dist(m), &amp;quot;ave&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Relatedness&amp;diff=3107</id>
		<title>Relatedness</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Relatedness&amp;diff=3107"/>
		<updated>2019-08-29T07:21:44Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
== NGSrelate - estimation of IBD probabilities ==&lt;br /&gt;
&lt;br /&gt;
In order to estimate kinship coefficient then population allele frequencies are needed. These can be estimated from data if you can multiple individuals. For some individuals, for example most human populations, there are publicly available data.&lt;br /&gt;
If you can obtain population allele frequencies or have a many samples from your population then we recommend that you use NGSrelate has works with ANGSD output. From the estimated IBD probabilities you can then infer the relationship. Below is a table of the expected IBD sharing probabilities assuming no inbreeding&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;text-align: center&lt;br /&gt;
!|  Relationship || &amp;lt;math&amp;gt;K_0&amp;lt;/math&amp;gt;|| &amp;lt;math&amp;gt;K_1&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;K_2&amp;lt;/math&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|   mono-zygotic twin   ||    &amp;lt;math&amp;gt;0 &amp;lt;/math&amp;gt;    ||     &amp;lt;math&amp;gt;0&amp;lt;/math&amp;gt;  ||   &amp;lt;math&amp;gt;1 &amp;lt;/math&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|   Parent-Offspring    ||    &amp;lt;math&amp;gt;0 &amp;lt;/math&amp;gt;    ||     &amp;lt;math&amp;gt;1&amp;lt;/math&amp;gt;  ||   &amp;lt;math&amp;gt;0 &amp;lt;/math&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|   Full siblings  || &amp;lt;math&amp;gt;0.25  &amp;lt;/math&amp;gt;    ||   &amp;lt;math&amp;gt;  0.5  &amp;lt;/math&amp;gt;  ||   &amp;lt;math&amp;gt; 0.25 &amp;lt;/math&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|   Half siblings  || &amp;lt;math&amp;gt; 0.5  &amp;lt;/math&amp;gt;   ||     &amp;lt;math&amp;gt;0.5 &amp;lt;/math&amp;gt; ||   &amp;lt;math&amp;gt; 0 &amp;lt;/math&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|   First cousins   || &amp;lt;math&amp;gt;0.75  &amp;lt;/math&amp;gt;    ||     &amp;lt;math&amp;gt;0.25 &amp;lt;/math&amp;gt;  ||   &amp;lt;math&amp;gt; 0  &amp;lt;/math&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|   Unrelated   || &amp;lt;math&amp;gt;1 &amp;lt;/math&amp;gt;    ||     &amp;lt;math&amp;gt;0 &amp;lt;/math&amp;gt;  ||   &amp;lt;math&amp;gt; 0  &amp;lt;/math&amp;gt; &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
NGSrelate has its very own website http://www.popgen.dk/software/index.php/NgsRelate&lt;br /&gt;
&lt;br /&gt;
== IBS/genotype distribution ==&lt;br /&gt;
&lt;br /&gt;
If you do not have population allele frequencies the you cannot estimate kinship coefficients. However, you can still make some claims about the relationship of your samples based on IBS patterns. Below is an example of IBS patterns between two individuals where we ignore the allele types. G is the genotype that counts for example the number of derived or non-reference alleles. Basically it is the 2D SFS where the is just 1 individual in each of the two populations&lt;br /&gt;
The full decription of the method can be seen here: [http://www.popgen.dk/software/index.php/IBSrelate IBSrelate]&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;text-align: center&lt;br /&gt;
!|  ||  || ind2 ||&lt;br /&gt;
|-&lt;br /&gt;
!|  ind1  || &amp;lt;math&amp;gt;G=0 &amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;G=1 &amp;lt;/math&amp;gt;  ||&amp;lt;math&amp;gt;G=2 &amp;lt;/math&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|  &amp;lt;math&amp;gt;G=0 &amp;lt;/math&amp;gt;     ||    &amp;lt;math&amp;gt;A &amp;lt;/math&amp;gt;    ||     &amp;lt;math&amp;gt;D&amp;lt;/math&amp;gt;  ||   &amp;lt;math&amp;gt;G &amp;lt;/math&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|   &amp;lt;math&amp;gt;G=1 &amp;lt;/math&amp;gt;   || &amp;lt;math&amp;gt; B  &amp;lt;/math&amp;gt;    ||   &amp;lt;math&amp;gt;  E  &amp;lt;/math&amp;gt;  ||   &amp;lt;math&amp;gt; H &amp;lt;/math&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|   &amp;lt;math&amp;gt;G=2 &amp;lt;/math&amp;gt;   || &amp;lt;math&amp;gt; C  &amp;lt;/math&amp;gt;   ||     &amp;lt;math&amp;gt;F &amp;lt;/math&amp;gt; ||   &amp;lt;math&amp;gt; I &amp;lt;/math&amp;gt; &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Here are some usefull ratio of IBS that can be used to say something about relatedness. Here we assume no inbreeding. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;text-align: center&lt;br /&gt;
!|  Relationship || Expected ratio || Expected ratio (R1) || Expected ratio (R1)&lt;br /&gt;
|-&lt;br /&gt;
|   mono-zygotic twin   ||    &amp;lt;math&amp;gt;B,C,D,F,G,H=0 &amp;lt;/math&amp;gt;    ||    &amp;lt;math&amp;gt; \frac{E}{B+C+D+F+G+H}= \infty  &amp;lt;/math&amp;gt;  ||  -&lt;br /&gt;
|-&lt;br /&gt;
|   Parent-Offspring    ||    &amp;lt;math&amp;gt;C,G=0 &amp;lt;/math&amp;gt;   ||  &amp;lt;math&amp;gt;  \frac{E}{B+C+D+F+G+H}=0.5 &amp;lt;/math&amp;gt;  ||   -&lt;br /&gt;
|-&lt;br /&gt;
|   Full siblings  || &amp;lt;math&amp;gt; \frac{E}{C+G}&amp;gt;10 &amp;lt;/math&amp;gt;    ||  &amp;lt;math&amp;gt;  \frac{E}{B+C+D+F+G+H}&amp;gt;0.5 &amp;lt;/math&amp;gt;  ||  &amp;lt;math&amp;gt; \frac{E}{B+C+D+F+G+H}&amp;lt;10/13 &amp;lt;/math&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|   Half siblings  || &amp;lt;math&amp;gt; \frac{E}{C+G}&amp;gt;4 &amp;lt;/math&amp;gt;   ||  &amp;lt;math&amp;gt;  \frac{E}{B+C+D+F+G+H}&amp;lt;4/9&amp;lt;/math&amp;gt;  ||  &amp;lt;math&amp;gt; \frac{E}{B+C+D+F+G+H}&amp;gt; 1/6&amp;lt;/math&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|   First cousins   ||  &amp;lt;math&amp;gt; \frac{E}{C+G}&amp;gt;8/3 &amp;lt;/math&amp;gt;    ||   &amp;lt;math&amp;gt;  \frac{E}{B+C+D+F+G+H}&amp;lt;8/19&amp;lt;/math&amp;gt;  ||  &amp;lt;math&amp;gt; \frac{E}{B+C+D+F+G+H}&amp;gt; 1/14 &amp;lt;/math&amp;gt; &lt;br /&gt;
|-&lt;br /&gt;
|   Unrelated   || &amp;lt;math&amp;gt; \frac{E}{C+G}=2 &amp;lt;/math&amp;gt;    ||   &amp;lt;math&amp;gt;  \frac{E}{B+C+D+F+G+H}&amp;lt;4/10&amp;lt;/math&amp;gt;  ||  &amp;lt;math&amp;gt; \frac{E}{B+C+D+F+G+H}&amp;gt; 0 &amp;lt;/math&amp;gt; &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:ratioRel.png|800px]]&lt;br /&gt;
&lt;br /&gt;
=== How to get the IBS pattern ===&lt;br /&gt;
&lt;br /&gt;
You can get the estimate by using the [[2d_SFS_Estimation| 2D SFS method]] or you can use the [[Genotype_Distribution| genotype distribution method]] both in ANGSD. &lt;br /&gt;
&lt;br /&gt;
The two methods are very similar but with a very small difference. The SFS method uses ancestral information or a reference in order to infer the 2 alleles for each position. The genotype distribution does not infer either the major or the minor allele but uses all 10 possible genotype likelihoods.&lt;br /&gt;
&lt;br /&gt;
=== Rcode to get expectations===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# R code go get expected IBS pattern&lt;br /&gt;
## k is the 3 IBD sharing probabities&lt;br /&gt;
## f is the allele frequency &lt;br /&gt;
getEst&amp;lt;-function(k=c(1,0,0),f=0.5){&lt;br /&gt;
    p&amp;lt;-f&lt;br /&gt;
    q&amp;lt;-1-f&lt;br /&gt;
    m0&amp;lt;-rbind(&lt;br /&gt;
        c(p^4,2*p^3*q,p^2*q^2),&lt;br /&gt;
        c(2*p^3*q,4*p^2*q^2,2*p*q^3),&lt;br /&gt;
        c(p^2*q^2,2*q^3*p,q^4)&lt;br /&gt;
        )&lt;br /&gt;
   m1&amp;lt;-rbind(&lt;br /&gt;
        c(p^3,p^2*q,0),&lt;br /&gt;
        c(p^2*q,p^2*q+q^2*p,p*q^2),&lt;br /&gt;
        c(0,q^2*p,q^3)&lt;br /&gt;
        )&lt;br /&gt;
    m2&amp;lt;-rbind(&lt;br /&gt;
        c(p^2,0,0),&lt;br /&gt;
        c(0,2*p*q,0),&lt;br /&gt;
        c(0,0,q^2)&lt;br /&gt;
        )&lt;br /&gt;
&lt;br /&gt;
return(k[1]*m0+k[2]*m1+k[3]*m2)&lt;br /&gt;
&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
getEst(k=c(1,0,0),f=0.5)&lt;br /&gt;
       [,1]  [,2]   [,3]&lt;br /&gt;
[1,] 0.0625 0.125 0.0625&lt;br /&gt;
[2,] 0.1250 0.250 0.1250&lt;br /&gt;
[3,] 0.0625 0.125 0.0625&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=LD&amp;diff=3096</id>
		<title>LD</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=LD&amp;diff=3096"/>
		<updated>2019-07-17T08:57:30Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
For estimation of LD based on genotype likelihoods you can use [https://github.com/fgvieira/ngsLD ngsLD]&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=LD&amp;diff=3095</id>
		<title>LD</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=LD&amp;diff=3095"/>
		<updated>2019-07-17T08:57:11Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: Created page with &amp;quot; For estimation of LD based on genotype likelihoods you can use [(https://github.com/fgvieira/ngsLD ngsLD]&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
For estimation of LD based on genotype likelihoods you can use [(https://github.com/fgvieira/ngsLD ngsLD]&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=MediaWiki:Sidebar&amp;diff=3094</id>
		<title>MediaWiki:Sidebar</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=MediaWiki:Sidebar&amp;diff=3094"/>
		<updated>2019-07-17T08:54:25Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* Pages&lt;br /&gt;
** Main_Page#Overview|ANGSD overview&lt;br /&gt;
** Download_and_installation|Installation&lt;br /&gt;
** Quick_Start|Quick Start/Testdata&lt;br /&gt;
** Input|Input data&lt;br /&gt;
** filters | Filters&lt;br /&gt;
** snpFilters | snpFilters&lt;br /&gt;
&lt;br /&gt;
* Population genetics&lt;br /&gt;
** SFS Estimation|SFS Estimation&lt;br /&gt;
**tajima|Thetas,Tajima,Neutrality test&lt;br /&gt;
** 2d SFS Estimation |(Multi) SFS Estimation&lt;br /&gt;
** Direct Ancestry | Direct Ancestry &lt;br /&gt;
*  Population structure&lt;br /&gt;
** NGSadmix | Admixture&lt;br /&gt;
** Fst |Fst&lt;br /&gt;
** Abbababa |ABBABABA (D-stat)&lt;br /&gt;
** Abbababa2 |ABBABABA (multipop)&lt;br /&gt;
** Pbs | Population branch statistics (pbs)&lt;br /&gt;
** PCA | PCA &lt;br /&gt;
** PCA_MDS | PCA (sampling approach)&lt;br /&gt;
** LD | Linkage disequilibrium &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* Medical genetics&lt;br /&gt;
** Association|Association&lt;br /&gt;
&lt;br /&gt;
* IBD/IBS&lt;br /&gt;
** Relatedness | Relatedness&lt;br /&gt;
** HWE_and_Inbreeding_estimates|HWE and inbreeding with ngsF&lt;br /&gt;
** HWE_test | HWE test&lt;br /&gt;
** Genotype_Distribution | Genotype distribution&lt;br /&gt;
** Heterozygosity | Heterozygosity&lt;br /&gt;
&lt;br /&gt;
* Summaries&lt;br /&gt;
** Contamination|Contamination&lt;br /&gt;
** Error estimation|Error estimation&lt;br /&gt;
** alleles_counts|Allele counts&lt;br /&gt;
** depth|Depth&lt;br /&gt;
** base_quality|Base quality&lt;br /&gt;
** fasta | Create Fasta file&lt;br /&gt;
** Mismatch | Mismatch&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* SNPs and genotypes&lt;br /&gt;
** Genotype_likelihoods|Genotypes likelihoods&lt;br /&gt;
** Inferring_Major_and_Minor_alleles|Major and Minor&lt;br /&gt;
** Allele_Frequency_estimation|Allele frequencies&lt;br /&gt;
** Genotype_calling|Genotype calling&lt;br /&gt;
** Haploid_calling|Haploid calling&lt;br /&gt;
** SNP_calling|SNP Calling&lt;br /&gt;
&amp;lt;!-- ** SNP_Calling|SNP Calling --&amp;gt;&lt;br /&gt;
&amp;lt;!-- ** Covariance_matrix_for_PCA|PCA --&amp;gt;&lt;br /&gt;
&amp;lt;!-- ** Heterozygosity|Heterozogosity --&amp;gt;&lt;br /&gt;
&amp;lt;!-- ** HWE_and_Inbreeding_estimates|HWE and inbreeding --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Output&lt;br /&gt;
** beagle_input|Beagle inputation&lt;br /&gt;
** Genotype_likelihoods#Output_genotype_likelihoods|Genotype likelihood files&lt;br /&gt;
** Plink |Plink&lt;br /&gt;
&lt;br /&gt;
*Misc/util programs&lt;br /&gt;
** realSFS | realSFS&lt;br /&gt;
** msToGlf | msToGlf&lt;br /&gt;
** thetaStat | thetaStat&lt;br /&gt;
** supersim | supersim&lt;br /&gt;
&lt;br /&gt;
* Program structure&lt;br /&gt;
** angsd structure |Introduction&lt;br /&gt;
** angsd_class | overview of class&lt;br /&gt;
** custom_start | getting started &lt;br /&gt;
** data_access | accessing core data&lt;br /&gt;
** custom_data | custom data containers&lt;br /&gt;
** print | printing results &lt;br /&gt;
&lt;br /&gt;
* About ANGSD&lt;br /&gt;
** change_log|Version log&lt;br /&gt;
** citing_angsd|Citing angsd&lt;br /&gt;
** authors|Authors&lt;br /&gt;
** Bugs | Bugs&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* navigation&lt;br /&gt;
** mainpage|mainpage-description&lt;br /&gt;
** portal-url|portal&lt;br /&gt;
** currentevents-url|currentevents&lt;br /&gt;
** recentchanges-url|recentchanges&lt;br /&gt;
** randompage-url|randompage&lt;br /&gt;
** helppage|help&lt;br /&gt;
* SEARCH&lt;br /&gt;
* TOOLBOX&lt;br /&gt;
* LANGUAGES&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=PCA_MDS&amp;diff=3054</id>
		<title>PCA MDS</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=PCA_MDS&amp;diff=3054"/>
		<updated>2018-10-04T07:37:24Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: Undo revision 3053 by Albrecht (talk)&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= single read sampling approach for PCA or MDS =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This function is new and works from version '''0.912''' and in the latest developmental version from [https://github.com/ANGSD/angsd github]&lt;br /&gt;
&lt;br /&gt;
For the PCA / MDS methods you should called SNP sites (use [[PCA]] if you do not want to call SNPs). SNPs can be called based on genotype likelihoods (see [[SNP_calling]]) or you can give the variable sites you want analysis using the [[Sites|-sites]] options. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -doIBS&lt;br /&gt;
	-&amp;gt; angsd version: 0.911-26-gf1cb0e0-dirty (htslib: 1.3-1-gc72ae90) build(Apr 27 2016 11:15:33)&lt;br /&gt;
	-&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
	-&amp;gt; Command: &lt;br /&gt;
../angsd/angsd -doIBS 	-&amp;gt; Wed Apr 27 12:38:35 2016&lt;br /&gt;
--------------&lt;br /&gt;
abcIBS.cpp:&lt;br /&gt;
	-doIBS	0&lt;br /&gt;
	(Sampling strategies)&lt;br /&gt;
	 0:	 no IBS &lt;br /&gt;
	 1:	 (Sample single base)&lt;br /&gt;
	 2:	 (Concensus base)&lt;br /&gt;
	-doCounts	0	Must choose -doCount 1&lt;br /&gt;
Optional&lt;br /&gt;
	-minMinor	0	Minimum observed minor alleles&lt;br /&gt;
	-minFreq	0.000	Minimum minor allele frequency&lt;br /&gt;
	-output01	0	output 0 and 1s instead of based&lt;br /&gt;
	-maxMis		-1	Maximum missing bases (per site)&lt;br /&gt;
	-doMajorMinor	0	use input files or data to select major and minor alleles&lt;br /&gt;
	-makeMatrix	0	print out the ibs matrix &lt;br /&gt;
	-doCov		0	print out the cov matrix &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Options==&lt;br /&gt;
;-doIBS [int] &lt;br /&gt;
Print a single base from each individual at each position. 1: random sampled read. 2: Consensus base&lt;br /&gt;
&lt;br /&gt;
;-doCounts [int]&lt;br /&gt;
Method requeres counting the different bases at each position. Therefore, -doCounts 1 must be used&lt;br /&gt;
&lt;br /&gt;
;-doMajorMinor [int]&lt;br /&gt;
The covariance matrix can only be calculated for diallelic sites. Therefore, choose a methods for selecting the major and minor allele (see [[Inferring_Major_and_Minor_alleles]]). This can also be use if you only want to make this assumption for the IBS matrix or only want to print out bases that are either the major or minor. &lt;br /&gt;
&lt;br /&gt;
;-minMinor [int]&lt;br /&gt;
Minimum observed minor alleles. The default in 0. If you do not use -doMajorMinor then the number of minor alleles are the sum of the 3 most uncommon alleles. &lt;br /&gt;
&lt;br /&gt;
;-minFreq [float]	&lt;br /&gt;
Minimum minor allele frequency based on the sampled bases. The default in 0. If you do not use -doMajorMinor then the frequency is the sum of the frequencies of the 3 most uncommon alleles. &lt;br /&gt;
&lt;br /&gt;
;-output01 [int]	&lt;br /&gt;
output the samples reads as 0 (for major) and 1s (for non major) instead of actual base&lt;br /&gt;
&lt;br /&gt;
;-maxMis [int]&lt;br /&gt;
Maximum missing bases (per site) i.e. is the maximum number of allowed non-major/minor sampled bases&lt;br /&gt;
&lt;br /&gt;
;-makeMatrix [int]&lt;br /&gt;
1 prints out the pairwise IBS matrix. This is the avg. distance between pairs of individuals. Distance is zero if the base in the same and 1 otherwise. You can use this for MDS (see below)&lt;br /&gt;
&lt;br /&gt;
;-doCov [int]		&lt;br /&gt;
1 print out the covariance matrix which can be used for PCA (see below). You should use the -minFreq option to avoid sites with low allele frequency.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== run example ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam all.files -minMapQ 30 -minQ 20 -GL 2  -doMajorMinor 1 -doMaf 1 -SNP_pval 2e-6 -doIBS 1 -doCounts 1 -doCov 1 -makeMatrix 1 -minMaf 0.05 -P 5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This will produce the output (see below) which includes pairwise differences (.ibsMat) and the covariance matrix (.covMat). These can be used for MDS and PCA respectively (see R example below). Note that only the PCA method require SNP calling and allele frequency estimation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Output==&lt;br /&gt;
&lt;br /&gt;
=== sampled bases *ibs.gz ===&lt;br /&gt;
This function will print the sampled based *ibs.gz. &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor&amp;gt;0 and -output01 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   minor   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7&lt;br /&gt;
1       14000873        A       G       0       1       1       1       1       1       1&lt;br /&gt;
1       14001018        C       T       0       1       1       1       1       1       1&lt;br /&gt;
1       14001867        G       A       0       1       1       1       1       0       1&lt;br /&gt;
1       14002342        T       C       1       1       1       1       1       -1      1&lt;br /&gt;
1       14002422        T       A       0       1       1       1       1       0       -1&lt;br /&gt;
1       14003581        T       C       0       1       1       1       1       1       1&lt;br /&gt;
1       14004623        C       T       0       1       1       1       1       0       1&lt;br /&gt;
1       14006543        T       G       0       -1      1       1       1       0       1&lt;br /&gt;
1       14007493        G       A       0       0       1       -1      1       0       1&lt;br /&gt;
1       14007558        T       C       0       0       1       1       -1      -1      1&lt;br /&gt;
1       14007649        A       G       0       1       1       1       1       0       1&lt;br /&gt;
1       14008269        A       G       1       1       0       -1      1       -1      1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor&amp;gt;0 and -output01 0&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   minor   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7&lt;br /&gt;
1       13116   G       T       N       G       T       T       N       G       N       T&lt;br /&gt;
1       13118   G       A       N       G       A       A       N       G       N       A&lt;br /&gt;
1       14930   A       G       G       G       G       A       N       N       A       N&lt;br /&gt;
1       15211   T       G       N       G       T       G       N       N       N       G&lt;br /&gt;
1       54490   A       G       N       G       N       G       N       N       N       N&lt;br /&gt;
1       54716   T       C       T       C       C       C       T       N       N       N&lt;br /&gt;
1       58814   A       G       N       G       N       G       G       G       N       N&lt;br /&gt;
1       62777   T       A       N       N       A       N       A       A       A       N&lt;br /&gt;
1       63268   C       T       N       T       N       T       C       N       T       N&lt;br /&gt;
1       63671   A       G       N       G       N       N       G       G       G       N&lt;br /&gt;
1       69428   G       T       N       G       T       N       N       T       T       N&lt;br /&gt;
1       69761   T       A       A       A       T       A       N       A       N       N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor 0 and -output01 0&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7    ind8&lt;br /&gt;
1       13116   T       N       G       T       T       N       G       N       T       T&lt;br /&gt;
1       13118   A       N       G       A       A       N       G       N       A       A&lt;br /&gt;
1       14930   A       G       G       G       A       N       N       A       N       G&lt;br /&gt;
1       15211   G       N       G       T       G       N       N       N       G       G&lt;br /&gt;
1       54490   G       N       G       N       G       N       N       N       N       A&lt;br /&gt;
1       54716   C       T       C       C       C       T       N       N       N       C&lt;br /&gt;
1       58814   G       N       G       N       G       G       G       N       N       G&lt;br /&gt;
1       62777   A       N       N       A       N       A       A       A       N       A&lt;br /&gt;
1       63268   T       N       T       N       T       C       N       T       N       N&lt;br /&gt;
1       63336   C       C       C       C       C       C       N       C       N       N&lt;br /&gt;
1       63671   G       N       G       N       N       G       G       G       N       N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''chr''' is the chromosome&lt;br /&gt;
&lt;br /&gt;
'''pos''' is the position&lt;br /&gt;
&lt;br /&gt;
'''major''' is the major allele&lt;br /&gt;
&lt;br /&gt;
'''minor''' is the minor allele. Needs -doMajorMinor&lt;br /&gt;
&lt;br /&gt;
'''indX''' is the sampled base for individual number X. if -output01 1 then it is 1 for major, 0 for non major and -1 for missing&lt;br /&gt;
&lt;br /&gt;
=== sample based IBS matrix *.ibsMat ===&lt;br /&gt;
This function will print the pairwise IBS distance &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibsMat with -makeMatrix 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
0.000000        0.510638        0.606383        0.595745        0.545455        0.428571&lt;br /&gt;
0.510638        0.000000        0.154639        0.154639        0.108911        0.408602&lt;br /&gt;
0.606383        0.154639        0.000000        0.121212        0.137255        0.489362&lt;br /&gt;
0.595745        0.154639        0.121212        0.000000        0.106796        0.484211&lt;br /&gt;
0.545455        0.108911        0.137255        0.106796        0.000000        0.404040&lt;br /&gt;
0.428571        0.408602        0.489362        0.484211        0.404040        0.000000&lt;br /&gt;
0.577320        0.121212        0.181818        0.171717        0.097087        0.473684&lt;br /&gt;
0.536082        0.090000        0.138614        0.118812        0.047619        0.428571&lt;br /&gt;
0.262500        0.571429        0.702381        0.694118        0.632184        0.353659&lt;br /&gt;
0.458333        0.383838        0.484848        0.494949        0.398058        0.368421&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nind x Nind matrix with pairwise IBS distance&lt;br /&gt;
&lt;br /&gt;
=== sample based covariance matrix *.covMat ===&lt;br /&gt;
This function will print the covariance matrix based on a single sampled read&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.covMat with -doCov 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
1.098251        -0.026225       -0.005617       -0.014726       -0.022438       -0.021786&lt;br /&gt;
-0.026225       1.115986        -0.017167       0.000735        -0.017163       -0.016899&lt;br /&gt;
-0.005617       -0.017167       1.074779        -0.015685       -0.019819       -0.015473&lt;br /&gt;
-0.014726       0.000735        -0.015685       1.072853        -0.013641       -0.007789&lt;br /&gt;
-0.022438       -0.017163       -0.019819       -0.013641       1.094612        -0.016045&lt;br /&gt;
-0.021786       -0.016899       -0.015473       -0.007789       -0.016045       1.059264&lt;br /&gt;
-0.005831       -0.009854       -0.001269       -0.002362       -0.018479       -0.011942&lt;br /&gt;
-0.015399       -0.020010       -0.001296       -0.022947       -0.006515       -0.003938&lt;br /&gt;
-0.001730       -0.040534       -0.002295       -0.017442       -0.024194       -0.007469&lt;br /&gt;
-0.016094       -0.015303       -0.018302       -0.022502       -0.030503       -0.001208&lt;br /&gt;
-0.122045       -0.106068       -0.103089       -0.104443       -0.110237       -0.103610&lt;br /&gt;
-0.106553       -0.100202       -0.104754       -0.109399       -0.107645       -0.111665&lt;br /&gt;
-0.108945       -0.102440       -0.105292       -0.101372       -0.107110       -0.106639&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nind x Nind covariance matrix&lt;br /&gt;
&lt;br /&gt;
==Model==&lt;br /&gt;
&lt;br /&gt;
=== IBS ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
pairwise distance between individuals&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
d_{ij} = \frac{\sum_m^M 1-I_{b_j}(b_i)}{M}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
where M in the number of sites with a read for both individuals. &amp;lt;math&amp;gt; 1-I_{b_j}(b_i) &amp;lt;/math&amp;gt; is the indicator function which is equal to one with the two individuals i and j have the same base and zero otherwise&lt;br /&gt;
&lt;br /&gt;
=== Covariance ===&lt;br /&gt;
&lt;br /&gt;
Allele frequency based on single reads. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f_{m} = \frac{N_{minor}}{N_{major} + N_{minor}}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
cov(ij) = \frac{1}{M}\sum_m^M \frac{ (h^i_m-f_m)(h^j_m-f_m) }{f_m(1-f_m)}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where M in the number of sites with a read for both individuals. &amp;lt;math&amp;gt; h^i_m&amp;lt;/math&amp;gt; is 1 if individuals i for site m has the major allele and zero otherwise&lt;br /&gt;
&lt;br /&gt;
=MDS/PCA using R=&lt;br /&gt;
&lt;br /&gt;
[[File:PCA_MDS.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
## MDS&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.ibsMat&amp;quot;&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
mds &amp;lt;- cmdscale(as.dist(m))&lt;br /&gt;
plot(mds,lwd=2,ylab=&amp;quot;Dist&amp;quot;,xlab=&amp;quot;Dist&amp;quot;,main=&amp;quot;multidimensional scaling&amp;quot;,col=rep(1:3,each=10))&lt;br /&gt;
&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.covMat&amp;quot;&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
e &amp;lt;- eigen(m)&lt;br /&gt;
plot(e$vectors[,1:2],lwd=2,ylab=&amp;quot;PC 2&amp;quot;,xlab=&amp;quot;PC 2&amp;quot;,main=&amp;quot;Principal components&amp;quot;,col=rep(1:3,each=10),pch=16)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=other fun stuff=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
## heatmap / clustering / trees&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.ibsMat&amp;quot; # or covMat&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
#heat map&lt;br /&gt;
heatmap(m)&lt;br /&gt;
#neighbour joining&lt;br /&gt;
plot(ape::nj(m))&lt;br /&gt;
plot(hclust(dist(m), &amp;quot;ave&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=PCA_MDS&amp;diff=3053</id>
		<title>PCA MDS</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=PCA_MDS&amp;diff=3053"/>
		<updated>2018-10-04T07:36:21Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* Model */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= single read sampling approach for PCA or MDS =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This function is new and works from version '''0.912''' and in the latest developmental version from [https://github.com/ANGSD/angsd github]&lt;br /&gt;
&lt;br /&gt;
For the PCA / MDS methods you should called SNP sites (use [[PCA]] if you do not want to call SNPs). SNPs can be called based on genotype likelihoods (see [[SNP_calling]]) or you can give the variable sites you want analysis using the [[Sites|-sites]] options. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -doIBS&lt;br /&gt;
	-&amp;gt; angsd version: 0.911-26-gf1cb0e0-dirty (htslib: 1.3-1-gc72ae90) build(Apr 27 2016 11:15:33)&lt;br /&gt;
	-&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
	-&amp;gt; Command: &lt;br /&gt;
../angsd/angsd -doIBS 	-&amp;gt; Wed Apr 27 12:38:35 2016&lt;br /&gt;
--------------&lt;br /&gt;
abcIBS.cpp:&lt;br /&gt;
	-doIBS	0&lt;br /&gt;
	(Sampling strategies)&lt;br /&gt;
	 0:	 no IBS &lt;br /&gt;
	 1:	 (Sample single base)&lt;br /&gt;
	 2:	 (Concensus base)&lt;br /&gt;
	-doCounts	0	Must choose -doCount 1&lt;br /&gt;
Optional&lt;br /&gt;
	-minMinor	0	Minimum observed minor alleles&lt;br /&gt;
	-minFreq	0.000	Minimum minor allele frequency&lt;br /&gt;
	-output01	0	output 0 and 1s instead of based&lt;br /&gt;
	-maxMis		-1	Maximum missing bases (per site)&lt;br /&gt;
	-doMajorMinor	0	use input files or data to select major and minor alleles&lt;br /&gt;
	-makeMatrix	0	print out the ibs matrix &lt;br /&gt;
	-doCov		0	print out the cov matrix &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Options==&lt;br /&gt;
;-doIBS [int] &lt;br /&gt;
Print a single base from each individual at each position. 1: random sampled read. 2: Consensus base&lt;br /&gt;
&lt;br /&gt;
;-doCounts [int]&lt;br /&gt;
Method requeres counting the different bases at each position. Therefore, -doCounts 1 must be used&lt;br /&gt;
&lt;br /&gt;
;-doMajorMinor [int]&lt;br /&gt;
The covariance matrix can only be calculated for diallelic sites. Therefore, choose a methods for selecting the major and minor allele (see [[Inferring_Major_and_Minor_alleles]]). This can also be use if you only want to make this assumption for the IBS matrix or only want to print out bases that are either the major or minor. &lt;br /&gt;
&lt;br /&gt;
;-minMinor [int]&lt;br /&gt;
Minimum observed minor alleles. The default in 0. If you do not use -doMajorMinor then the number of minor alleles are the sum of the 3 most uncommon alleles. &lt;br /&gt;
&lt;br /&gt;
;-minFreq [float]	&lt;br /&gt;
Minimum minor allele frequency based on the sampled bases. The default in 0. If you do not use -doMajorMinor then the frequency is the sum of the frequencies of the 3 most uncommon alleles. &lt;br /&gt;
&lt;br /&gt;
;-output01 [int]	&lt;br /&gt;
output the samples reads as 0 (for major) and 1s (for non major) instead of actual base&lt;br /&gt;
&lt;br /&gt;
;-maxMis [int]&lt;br /&gt;
Maximum missing bases (per site) i.e. is the maximum number of allowed non-major/minor sampled bases&lt;br /&gt;
&lt;br /&gt;
;-makeMatrix [int]&lt;br /&gt;
1 prints out the pairwise IBS matrix. This is the avg. distance between pairs of individuals. Distance is zero if the base in the same and 1 otherwise. You can use this for MDS (see below)&lt;br /&gt;
&lt;br /&gt;
;-doCov [int]		&lt;br /&gt;
1 print out the covariance matrix which can be used for PCA (see below). You should use the -minFreq option to avoid sites with low allele frequency.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== run example ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam all.files -minMapQ 30 -minQ 20 -GL 2  -doMajorMinor 1 -doMaf 1 -SNP_pval 2e-6 -doIBS 1 -doCounts 1 -doCov 1 -makeMatrix 1 -minMaf 0.05 -P 5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This will produce the output (see below) which includes pairwise differences (.ibsMat) and the covariance matrix (.covMat). These can be used for MDS and PCA respectively (see R example below). Note that only the PCA method require SNP calling and allele frequency estimation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Output==&lt;br /&gt;
&lt;br /&gt;
=== sampled bases *ibs.gz ===&lt;br /&gt;
This function will print the sampled based *ibs.gz. &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor&amp;gt;0 and -output01 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   minor   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7&lt;br /&gt;
1       14000873        A       G       0       1       1       1       1       1       1&lt;br /&gt;
1       14001018        C       T       0       1       1       1       1       1       1&lt;br /&gt;
1       14001867        G       A       0       1       1       1       1       0       1&lt;br /&gt;
1       14002342        T       C       1       1       1       1       1       -1      1&lt;br /&gt;
1       14002422        T       A       0       1       1       1       1       0       -1&lt;br /&gt;
1       14003581        T       C       0       1       1       1       1       1       1&lt;br /&gt;
1       14004623        C       T       0       1       1       1       1       0       1&lt;br /&gt;
1       14006543        T       G       0       -1      1       1       1       0       1&lt;br /&gt;
1       14007493        G       A       0       0       1       -1      1       0       1&lt;br /&gt;
1       14007558        T       C       0       0       1       1       -1      -1      1&lt;br /&gt;
1       14007649        A       G       0       1       1       1       1       0       1&lt;br /&gt;
1       14008269        A       G       1       1       0       -1      1       -1      1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor&amp;gt;0 and -output01 0&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   minor   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7&lt;br /&gt;
1       13116   G       T       N       G       T       T       N       G       N       T&lt;br /&gt;
1       13118   G       A       N       G       A       A       N       G       N       A&lt;br /&gt;
1       14930   A       G       G       G       G       A       N       N       A       N&lt;br /&gt;
1       15211   T       G       N       G       T       G       N       N       N       G&lt;br /&gt;
1       54490   A       G       N       G       N       G       N       N       N       N&lt;br /&gt;
1       54716   T       C       T       C       C       C       T       N       N       N&lt;br /&gt;
1       58814   A       G       N       G       N       G       G       G       N       N&lt;br /&gt;
1       62777   T       A       N       N       A       N       A       A       A       N&lt;br /&gt;
1       63268   C       T       N       T       N       T       C       N       T       N&lt;br /&gt;
1       63671   A       G       N       G       N       N       G       G       G       N&lt;br /&gt;
1       69428   G       T       N       G       T       N       N       T       T       N&lt;br /&gt;
1       69761   T       A       A       A       T       A       N       A       N       N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor 0 and -output01 0&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7    ind8&lt;br /&gt;
1       13116   T       N       G       T       T       N       G       N       T       T&lt;br /&gt;
1       13118   A       N       G       A       A       N       G       N       A       A&lt;br /&gt;
1       14930   A       G       G       G       A       N       N       A       N       G&lt;br /&gt;
1       15211   G       N       G       T       G       N       N       N       G       G&lt;br /&gt;
1       54490   G       N       G       N       G       N       N       N       N       A&lt;br /&gt;
1       54716   C       T       C       C       C       T       N       N       N       C&lt;br /&gt;
1       58814   G       N       G       N       G       G       G       N       N       G&lt;br /&gt;
1       62777   A       N       N       A       N       A       A       A       N       A&lt;br /&gt;
1       63268   T       N       T       N       T       C       N       T       N       N&lt;br /&gt;
1       63336   C       C       C       C       C       C       N       C       N       N&lt;br /&gt;
1       63671   G       N       G       N       N       G       G       G       N       N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''chr''' is the chromosome&lt;br /&gt;
&lt;br /&gt;
'''pos''' is the position&lt;br /&gt;
&lt;br /&gt;
'''major''' is the major allele&lt;br /&gt;
&lt;br /&gt;
'''minor''' is the minor allele. Needs -doMajorMinor&lt;br /&gt;
&lt;br /&gt;
'''indX''' is the sampled base for individual number X. if -output01 1 then it is 1 for major, 0 for non major and -1 for missing&lt;br /&gt;
&lt;br /&gt;
=== sample based IBS matrix *.ibsMat ===&lt;br /&gt;
This function will print the pairwise IBS distance &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibsMat with -makeMatrix 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
0.000000        0.510638        0.606383        0.595745        0.545455        0.428571&lt;br /&gt;
0.510638        0.000000        0.154639        0.154639        0.108911        0.408602&lt;br /&gt;
0.606383        0.154639        0.000000        0.121212        0.137255        0.489362&lt;br /&gt;
0.595745        0.154639        0.121212        0.000000        0.106796        0.484211&lt;br /&gt;
0.545455        0.108911        0.137255        0.106796        0.000000        0.404040&lt;br /&gt;
0.428571        0.408602        0.489362        0.484211        0.404040        0.000000&lt;br /&gt;
0.577320        0.121212        0.181818        0.171717        0.097087        0.473684&lt;br /&gt;
0.536082        0.090000        0.138614        0.118812        0.047619        0.428571&lt;br /&gt;
0.262500        0.571429        0.702381        0.694118        0.632184        0.353659&lt;br /&gt;
0.458333        0.383838        0.484848        0.494949        0.398058        0.368421&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nind x Nind matrix with pairwise IBS distance&lt;br /&gt;
&lt;br /&gt;
=== sample based covariance matrix *.covMat ===&lt;br /&gt;
This function will print the covariance matrix based on a single sampled read&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.covMat with -doCov 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
1.098251        -0.026225       -0.005617       -0.014726       -0.022438       -0.021786&lt;br /&gt;
-0.026225       1.115986        -0.017167       0.000735        -0.017163       -0.016899&lt;br /&gt;
-0.005617       -0.017167       1.074779        -0.015685       -0.019819       -0.015473&lt;br /&gt;
-0.014726       0.000735        -0.015685       1.072853        -0.013641       -0.007789&lt;br /&gt;
-0.022438       -0.017163       -0.019819       -0.013641       1.094612        -0.016045&lt;br /&gt;
-0.021786       -0.016899       -0.015473       -0.007789       -0.016045       1.059264&lt;br /&gt;
-0.005831       -0.009854       -0.001269       -0.002362       -0.018479       -0.011942&lt;br /&gt;
-0.015399       -0.020010       -0.001296       -0.022947       -0.006515       -0.003938&lt;br /&gt;
-0.001730       -0.040534       -0.002295       -0.017442       -0.024194       -0.007469&lt;br /&gt;
-0.016094       -0.015303       -0.018302       -0.022502       -0.030503       -0.001208&lt;br /&gt;
-0.122045       -0.106068       -0.103089       -0.104443       -0.110237       -0.103610&lt;br /&gt;
-0.106553       -0.100202       -0.104754       -0.109399       -0.107645       -0.111665&lt;br /&gt;
-0.108945       -0.102440       -0.105292       -0.101372       -0.107110       -0.106639&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nind x Nind covariance matrix&lt;br /&gt;
&lt;br /&gt;
==Model==&lt;br /&gt;
&lt;br /&gt;
=== IBS ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
pairwise distance between individuals&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
d_{ij} = \frac{\sum_m^M 1-I_{b_j}(b_i)}{M}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
where M in the number of sites with a read for both individuals. &amp;lt;math&amp;gt; I_{b_j}(b_i) &amp;lt;/math&amp;gt; is the indicator function which is equal to one with the two individuals i and j have the same base and zero otherwise&lt;br /&gt;
&lt;br /&gt;
=== Covariance ===&lt;br /&gt;
&lt;br /&gt;
Allele frequency based on single reads. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f_{m} = \frac{N_{minor}}{N_{major} + N_{minor}}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
cov(ij) = \frac{1}{M}\sum_m^M \frac{ (h^i_m-f_m)(h^j_m-f_m) }{f_m(1-f_m)}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where M in the number of sites with a read for both individuals. &amp;lt;math&amp;gt; h^i_m&amp;lt;/math&amp;gt; is 1 if individuals i for site m has the major allele and zero otherwise&lt;br /&gt;
&lt;br /&gt;
=MDS/PCA using R=&lt;br /&gt;
&lt;br /&gt;
[[File:PCA_MDS.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
## MDS&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.ibsMat&amp;quot;&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
mds &amp;lt;- cmdscale(as.dist(m))&lt;br /&gt;
plot(mds,lwd=2,ylab=&amp;quot;Dist&amp;quot;,xlab=&amp;quot;Dist&amp;quot;,main=&amp;quot;multidimensional scaling&amp;quot;,col=rep(1:3,each=10))&lt;br /&gt;
&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.covMat&amp;quot;&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
e &amp;lt;- eigen(m)&lt;br /&gt;
plot(e$vectors[,1:2],lwd=2,ylab=&amp;quot;PC 2&amp;quot;,xlab=&amp;quot;PC 2&amp;quot;,main=&amp;quot;Principal components&amp;quot;,col=rep(1:3,each=10),pch=16)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=other fun stuff=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
## heatmap / clustering / trees&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.ibsMat&amp;quot; # or covMat&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
#heat map&lt;br /&gt;
heatmap(m)&lt;br /&gt;
#neighbour joining&lt;br /&gt;
plot(ape::nj(m))&lt;br /&gt;
plot(hclust(dist(m), &amp;quot;ave&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=PCA_MDS&amp;diff=3052</id>
		<title>PCA MDS</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=PCA_MDS&amp;diff=3052"/>
		<updated>2018-10-04T07:35:19Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* sampled bases *ibs.gz */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= single read sampling approach for PCA or MDS =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This function is new and works from version '''0.912''' and in the latest developmental version from [https://github.com/ANGSD/angsd github]&lt;br /&gt;
&lt;br /&gt;
For the PCA / MDS methods you should called SNP sites (use [[PCA]] if you do not want to call SNPs). SNPs can be called based on genotype likelihoods (see [[SNP_calling]]) or you can give the variable sites you want analysis using the [[Sites|-sites]] options. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -doIBS&lt;br /&gt;
	-&amp;gt; angsd version: 0.911-26-gf1cb0e0-dirty (htslib: 1.3-1-gc72ae90) build(Apr 27 2016 11:15:33)&lt;br /&gt;
	-&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
	-&amp;gt; Command: &lt;br /&gt;
../angsd/angsd -doIBS 	-&amp;gt; Wed Apr 27 12:38:35 2016&lt;br /&gt;
--------------&lt;br /&gt;
abcIBS.cpp:&lt;br /&gt;
	-doIBS	0&lt;br /&gt;
	(Sampling strategies)&lt;br /&gt;
	 0:	 no IBS &lt;br /&gt;
	 1:	 (Sample single base)&lt;br /&gt;
	 2:	 (Concensus base)&lt;br /&gt;
	-doCounts	0	Must choose -doCount 1&lt;br /&gt;
Optional&lt;br /&gt;
	-minMinor	0	Minimum observed minor alleles&lt;br /&gt;
	-minFreq	0.000	Minimum minor allele frequency&lt;br /&gt;
	-output01	0	output 0 and 1s instead of based&lt;br /&gt;
	-maxMis		-1	Maximum missing bases (per site)&lt;br /&gt;
	-doMajorMinor	0	use input files or data to select major and minor alleles&lt;br /&gt;
	-makeMatrix	0	print out the ibs matrix &lt;br /&gt;
	-doCov		0	print out the cov matrix &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Options==&lt;br /&gt;
;-doIBS [int] &lt;br /&gt;
Print a single base from each individual at each position. 1: random sampled read. 2: Consensus base&lt;br /&gt;
&lt;br /&gt;
;-doCounts [int]&lt;br /&gt;
Method requeres counting the different bases at each position. Therefore, -doCounts 1 must be used&lt;br /&gt;
&lt;br /&gt;
;-doMajorMinor [int]&lt;br /&gt;
The covariance matrix can only be calculated for diallelic sites. Therefore, choose a methods for selecting the major and minor allele (see [[Inferring_Major_and_Minor_alleles]]). This can also be use if you only want to make this assumption for the IBS matrix or only want to print out bases that are either the major or minor. &lt;br /&gt;
&lt;br /&gt;
;-minMinor [int]&lt;br /&gt;
Minimum observed minor alleles. The default in 0. If you do not use -doMajorMinor then the number of minor alleles are the sum of the 3 most uncommon alleles. &lt;br /&gt;
&lt;br /&gt;
;-minFreq [float]	&lt;br /&gt;
Minimum minor allele frequency based on the sampled bases. The default in 0. If you do not use -doMajorMinor then the frequency is the sum of the frequencies of the 3 most uncommon alleles. &lt;br /&gt;
&lt;br /&gt;
;-output01 [int]	&lt;br /&gt;
output the samples reads as 0 (for major) and 1s (for non major) instead of actual base&lt;br /&gt;
&lt;br /&gt;
;-maxMis [int]&lt;br /&gt;
Maximum missing bases (per site) i.e. is the maximum number of allowed non-major/minor sampled bases&lt;br /&gt;
&lt;br /&gt;
;-makeMatrix [int]&lt;br /&gt;
1 prints out the pairwise IBS matrix. This is the avg. distance between pairs of individuals. Distance is zero if the base in the same and 1 otherwise. You can use this for MDS (see below)&lt;br /&gt;
&lt;br /&gt;
;-doCov [int]		&lt;br /&gt;
1 print out the covariance matrix which can be used for PCA (see below). You should use the -minFreq option to avoid sites with low allele frequency.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== run example ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam all.files -minMapQ 30 -minQ 20 -GL 2  -doMajorMinor 1 -doMaf 1 -SNP_pval 2e-6 -doIBS 1 -doCounts 1 -doCov 1 -makeMatrix 1 -minMaf 0.05 -P 5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This will produce the output (see below) which includes pairwise differences (.ibsMat) and the covariance matrix (.covMat). These can be used for MDS and PCA respectively (see R example below). Note that only the PCA method require SNP calling and allele frequency estimation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Output==&lt;br /&gt;
&lt;br /&gt;
=== sampled bases *ibs.gz ===&lt;br /&gt;
This function will print the sampled based *ibs.gz. &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor&amp;gt;0 and -output01 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   minor   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7&lt;br /&gt;
1       14000873        A       G       0       1       1       1       1       1       1&lt;br /&gt;
1       14001018        C       T       0       1       1       1       1       1       1&lt;br /&gt;
1       14001867        G       A       0       1       1       1       1       0       1&lt;br /&gt;
1       14002342        T       C       1       1       1       1       1       -1      1&lt;br /&gt;
1       14002422        T       A       0       1       1       1       1       0       -1&lt;br /&gt;
1       14003581        T       C       0       1       1       1       1       1       1&lt;br /&gt;
1       14004623        C       T       0       1       1       1       1       0       1&lt;br /&gt;
1       14006543        T       G       0       -1      1       1       1       0       1&lt;br /&gt;
1       14007493        G       A       0       0       1       -1      1       0       1&lt;br /&gt;
1       14007558        T       C       0       0       1       1       -1      -1      1&lt;br /&gt;
1       14007649        A       G       0       1       1       1       1       0       1&lt;br /&gt;
1       14008269        A       G       1       1       0       -1      1       -1      1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor&amp;gt;0 and -output01 0&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   minor   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7&lt;br /&gt;
1       13116   G       T       N       G       T       T       N       G       N       T&lt;br /&gt;
1       13118   G       A       N       G       A       A       N       G       N       A&lt;br /&gt;
1       14930   A       G       G       G       G       A       N       N       A       N&lt;br /&gt;
1       15211   T       G       N       G       T       G       N       N       N       G&lt;br /&gt;
1       54490   A       G       N       G       N       G       N       N       N       N&lt;br /&gt;
1       54716   T       C       T       C       C       C       T       N       N       N&lt;br /&gt;
1       58814   A       G       N       G       N       G       G       G       N       N&lt;br /&gt;
1       62777   T       A       N       N       A       N       A       A       A       N&lt;br /&gt;
1       63268   C       T       N       T       N       T       C       N       T       N&lt;br /&gt;
1       63671   A       G       N       G       N       N       G       G       G       N&lt;br /&gt;
1       69428   G       T       N       G       T       N       N       T       T       N&lt;br /&gt;
1       69761   T       A       A       A       T       A       N       A       N       N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor 0 and -output01 0&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7    ind8&lt;br /&gt;
1       13116   T       N       G       T       T       N       G       N       T       T&lt;br /&gt;
1       13118   A       N       G       A       A       N       G       N       A       A&lt;br /&gt;
1       14930   A       G       G       G       A       N       N       A       N       G&lt;br /&gt;
1       15211   G       N       G       T       G       N       N       N       G       G&lt;br /&gt;
1       54490   G       N       G       N       G       N       N       N       N       A&lt;br /&gt;
1       54716   C       T       C       C       C       T       N       N       N       C&lt;br /&gt;
1       58814   G       N       G       N       G       G       G       N       N       G&lt;br /&gt;
1       62777   A       N       N       A       N       A       A       A       N       A&lt;br /&gt;
1       63268   T       N       T       N       T       C       N       T       N       N&lt;br /&gt;
1       63336   C       C       C       C       C       C       N       C       N       N&lt;br /&gt;
1       63671   G       N       G       N       N       G       G       G       N       N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''chr''' is the chromosome&lt;br /&gt;
&lt;br /&gt;
'''pos''' is the position&lt;br /&gt;
&lt;br /&gt;
'''major''' is the major allele&lt;br /&gt;
&lt;br /&gt;
'''minor''' is the minor allele. Needs -doMajorMinor&lt;br /&gt;
&lt;br /&gt;
'''indX''' is the sampled base for individual number X. if -output01 1 then it is 1 for major, 0 for non major and -1 for missing&lt;br /&gt;
&lt;br /&gt;
=== sample based IBS matrix *.ibsMat ===&lt;br /&gt;
This function will print the pairwise IBS distance &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibsMat with -makeMatrix 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
0.000000        0.510638        0.606383        0.595745        0.545455        0.428571&lt;br /&gt;
0.510638        0.000000        0.154639        0.154639        0.108911        0.408602&lt;br /&gt;
0.606383        0.154639        0.000000        0.121212        0.137255        0.489362&lt;br /&gt;
0.595745        0.154639        0.121212        0.000000        0.106796        0.484211&lt;br /&gt;
0.545455        0.108911        0.137255        0.106796        0.000000        0.404040&lt;br /&gt;
0.428571        0.408602        0.489362        0.484211        0.404040        0.000000&lt;br /&gt;
0.577320        0.121212        0.181818        0.171717        0.097087        0.473684&lt;br /&gt;
0.536082        0.090000        0.138614        0.118812        0.047619        0.428571&lt;br /&gt;
0.262500        0.571429        0.702381        0.694118        0.632184        0.353659&lt;br /&gt;
0.458333        0.383838        0.484848        0.494949        0.398058        0.368421&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nind x Nind matrix with pairwise IBS distance&lt;br /&gt;
&lt;br /&gt;
=== sample based covariance matrix *.covMat ===&lt;br /&gt;
This function will print the covariance matrix based on a single sampled read&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.covMat with -doCov 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
1.098251        -0.026225       -0.005617       -0.014726       -0.022438       -0.021786&lt;br /&gt;
-0.026225       1.115986        -0.017167       0.000735        -0.017163       -0.016899&lt;br /&gt;
-0.005617       -0.017167       1.074779        -0.015685       -0.019819       -0.015473&lt;br /&gt;
-0.014726       0.000735        -0.015685       1.072853        -0.013641       -0.007789&lt;br /&gt;
-0.022438       -0.017163       -0.019819       -0.013641       1.094612        -0.016045&lt;br /&gt;
-0.021786       -0.016899       -0.015473       -0.007789       -0.016045       1.059264&lt;br /&gt;
-0.005831       -0.009854       -0.001269       -0.002362       -0.018479       -0.011942&lt;br /&gt;
-0.015399       -0.020010       -0.001296       -0.022947       -0.006515       -0.003938&lt;br /&gt;
-0.001730       -0.040534       -0.002295       -0.017442       -0.024194       -0.007469&lt;br /&gt;
-0.016094       -0.015303       -0.018302       -0.022502       -0.030503       -0.001208&lt;br /&gt;
-0.122045       -0.106068       -0.103089       -0.104443       -0.110237       -0.103610&lt;br /&gt;
-0.106553       -0.100202       -0.104754       -0.109399       -0.107645       -0.111665&lt;br /&gt;
-0.108945       -0.102440       -0.105292       -0.101372       -0.107110       -0.106639&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nind x Nind covariance matrix&lt;br /&gt;
&lt;br /&gt;
==Model==&lt;br /&gt;
&lt;br /&gt;
=== IBS ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
pairwise distance between individuals&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
d_{ij} = \frac{\sum_m^M 1-I_{b_j}(b_i)}{M}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
where M in the number of sites with a read for both individuals. &amp;lt;math&amp;gt; 1-I_{b_j}(b_i) &amp;lt;/math&amp;gt; is the indicator function which is equal to one with the two individuals i and j have the same base and zero otherwise&lt;br /&gt;
&lt;br /&gt;
=== Covariance ===&lt;br /&gt;
&lt;br /&gt;
Allele frequency based on single reads. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f_{m} = \frac{N_{minor}}{N_{major} + N_{minor}}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
cov(ij) = \frac{1}{M}\sum_m^M \frac{ (h^i_m-f_m)(h^j_m-f_m) }{f_m(1-f_m)}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where M in the number of sites with a read for both individuals. &amp;lt;math&amp;gt; h^i_m&amp;lt;/math&amp;gt; is 1 if individuals i for site m has the major allele and zero otherwise&lt;br /&gt;
&lt;br /&gt;
=MDS/PCA using R=&lt;br /&gt;
&lt;br /&gt;
[[File:PCA_MDS.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
## MDS&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.ibsMat&amp;quot;&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
mds &amp;lt;- cmdscale(as.dist(m))&lt;br /&gt;
plot(mds,lwd=2,ylab=&amp;quot;Dist&amp;quot;,xlab=&amp;quot;Dist&amp;quot;,main=&amp;quot;multidimensional scaling&amp;quot;,col=rep(1:3,each=10))&lt;br /&gt;
&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.covMat&amp;quot;&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
e &amp;lt;- eigen(m)&lt;br /&gt;
plot(e$vectors[,1:2],lwd=2,ylab=&amp;quot;PC 2&amp;quot;,xlab=&amp;quot;PC 2&amp;quot;,main=&amp;quot;Principal components&amp;quot;,col=rep(1:3,each=10),pch=16)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=other fun stuff=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
## heatmap / clustering / trees&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.ibsMat&amp;quot; # or covMat&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
#heat map&lt;br /&gt;
heatmap(m)&lt;br /&gt;
#neighbour joining&lt;br /&gt;
plot(ape::nj(m))&lt;br /&gt;
plot(hclust(dist(m), &amp;quot;ave&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=PCA_MDS&amp;diff=3051</id>
		<title>PCA MDS</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=PCA_MDS&amp;diff=3051"/>
		<updated>2018-10-04T07:31:29Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* sampled bases *ibs.gz */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= single read sampling approach for PCA or MDS =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This function is new and works from version '''0.912''' and in the latest developmental version from [https://github.com/ANGSD/angsd github]&lt;br /&gt;
&lt;br /&gt;
For the PCA / MDS methods you should called SNP sites (use [[PCA]] if you do not want to call SNPs). SNPs can be called based on genotype likelihoods (see [[SNP_calling]]) or you can give the variable sites you want analysis using the [[Sites|-sites]] options. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -doIBS&lt;br /&gt;
	-&amp;gt; angsd version: 0.911-26-gf1cb0e0-dirty (htslib: 1.3-1-gc72ae90) build(Apr 27 2016 11:15:33)&lt;br /&gt;
	-&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
	-&amp;gt; Command: &lt;br /&gt;
../angsd/angsd -doIBS 	-&amp;gt; Wed Apr 27 12:38:35 2016&lt;br /&gt;
--------------&lt;br /&gt;
abcIBS.cpp:&lt;br /&gt;
	-doIBS	0&lt;br /&gt;
	(Sampling strategies)&lt;br /&gt;
	 0:	 no IBS &lt;br /&gt;
	 1:	 (Sample single base)&lt;br /&gt;
	 2:	 (Concensus base)&lt;br /&gt;
	-doCounts	0	Must choose -doCount 1&lt;br /&gt;
Optional&lt;br /&gt;
	-minMinor	0	Minimum observed minor alleles&lt;br /&gt;
	-minFreq	0.000	Minimum minor allele frequency&lt;br /&gt;
	-output01	0	output 0 and 1s instead of based&lt;br /&gt;
	-maxMis		-1	Maximum missing bases (per site)&lt;br /&gt;
	-doMajorMinor	0	use input files or data to select major and minor alleles&lt;br /&gt;
	-makeMatrix	0	print out the ibs matrix &lt;br /&gt;
	-doCov		0	print out the cov matrix &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Options==&lt;br /&gt;
;-doIBS [int] &lt;br /&gt;
Print a single base from each individual at each position. 1: random sampled read. 2: Consensus base&lt;br /&gt;
&lt;br /&gt;
;-doCounts [int]&lt;br /&gt;
Method requeres counting the different bases at each position. Therefore, -doCounts 1 must be used&lt;br /&gt;
&lt;br /&gt;
;-doMajorMinor [int]&lt;br /&gt;
The covariance matrix can only be calculated for diallelic sites. Therefore, choose a methods for selecting the major and minor allele (see [[Inferring_Major_and_Minor_alleles]]). This can also be use if you only want to make this assumption for the IBS matrix or only want to print out bases that are either the major or minor. &lt;br /&gt;
&lt;br /&gt;
;-minMinor [int]&lt;br /&gt;
Minimum observed minor alleles. The default in 0. If you do not use -doMajorMinor then the number of minor alleles are the sum of the 3 most uncommon alleles. &lt;br /&gt;
&lt;br /&gt;
;-minFreq [float]	&lt;br /&gt;
Minimum minor allele frequency based on the sampled bases. The default in 0. If you do not use -doMajorMinor then the frequency is the sum of the frequencies of the 3 most uncommon alleles. &lt;br /&gt;
&lt;br /&gt;
;-output01 [int]	&lt;br /&gt;
output the samples reads as 0 (for major) and 1s (for non major) instead of actual base&lt;br /&gt;
&lt;br /&gt;
;-maxMis [int]&lt;br /&gt;
Maximum missing bases (per site) i.e. is the maximum number of allowed non-major/minor sampled bases&lt;br /&gt;
&lt;br /&gt;
;-makeMatrix [int]&lt;br /&gt;
1 prints out the pairwise IBS matrix. This is the avg. distance between pairs of individuals. Distance is zero if the base in the same and 1 otherwise. You can use this for MDS (see below)&lt;br /&gt;
&lt;br /&gt;
;-doCov [int]		&lt;br /&gt;
1 print out the covariance matrix which can be used for PCA (see below). You should use the -minFreq option to avoid sites with low allele frequency.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== run example ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam all.files -minMapQ 30 -minQ 20 -GL 2  -doMajorMinor 1 -doMaf 1 -SNP_pval 2e-6 -doIBS 1 -doCounts 1 -doCov 1 -makeMatrix 1 -minMaf 0.05 -P 5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This will produce the output (see below) which includes pairwise differences (.ibsMat) and the covariance matrix (.covMat). These can be used for MDS and PCA respectively (see R example below). Note that only the PCA method require SNP calling and allele frequency estimation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Output==&lt;br /&gt;
&lt;br /&gt;
=== sampled bases *ibs.gz ===&lt;br /&gt;
This function will print the sampled based *ibs.gz. &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor&amp;gt;0 and -output01 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   minor   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7&lt;br /&gt;
1       14000873        A       G       0       1       1       1       1       1       1&lt;br /&gt;
1       14001018        C       T       0       1       1       1       1       1       1&lt;br /&gt;
1       14001867        G       A       0       1       1       1       1       0       1&lt;br /&gt;
1       14002342        T       C       1       1       1       1       1       -1      1&lt;br /&gt;
1       14002422        T       A       0       1       1       1       1       0       -1&lt;br /&gt;
1       14003581        T       C       0       1       1       1       1       1       1&lt;br /&gt;
1       14004623        C       T       0       1       1       1       1       0       1&lt;br /&gt;
1       14006543        T       G       0       -1      1       1       1       0       1&lt;br /&gt;
1       14007493        G       A       0       0       1       -1      1       0       1&lt;br /&gt;
1       14007558        T       C       0       0       1       1       -1      -1      1&lt;br /&gt;
1       14007649        A       G       0       1       1       1       1       0       1&lt;br /&gt;
1       14008269        A       G       1       1       0       -1      1       -1      1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor&amp;gt;0 and -output01 0&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   minor   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7&lt;br /&gt;
1       13116   G       T       N       G       T       T       N       G       N       T&lt;br /&gt;
1       13118   G       A       N       G       A       A       N       G       N       A&lt;br /&gt;
1       14930   A       G       G       G       G       A       N       N       A       N&lt;br /&gt;
1       15211   T       G       N       G       T       G       N       N       N       G&lt;br /&gt;
1       54490   A       G       N       G       N       G       N       N       N       N&lt;br /&gt;
1       54716   T       C       T       C       C       C       T       N       N       N&lt;br /&gt;
1       58814   A       G       N       G       N       G       G       G       N       N&lt;br /&gt;
1       62777   T       A       N       N       A       N       A       A       A       N&lt;br /&gt;
1       63268   C       T       N       T       N       T       C       N       T       N&lt;br /&gt;
1       63671   A       G       N       G       N       N       G       G       G       N&lt;br /&gt;
1       69428   G       T       N       G       T       N       N       T       T       N&lt;br /&gt;
1       69761   T       A       A       A       T       A       N       A       N       N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor 0 and -output01 0&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7    ind8&lt;br /&gt;
1       13116   T       N       G       T       T       N       G       N       T       T&lt;br /&gt;
1       13118   A       N       G       A       A       N       G       N       A       A&lt;br /&gt;
1       14930   A       G       G       G       A       N       N       A       N       G&lt;br /&gt;
1       15211   G       N       G       T       G       N       N       N       G       G&lt;br /&gt;
1       54490   G       N       G       N       G       N       N       N       N       A&lt;br /&gt;
1       54716   C       T       C       C       C       T       N       N       N       C&lt;br /&gt;
1       58814   G       N       G       N       G       G       G       N       N       G&lt;br /&gt;
1       62777   A       N       N       A       N       A       A       A       N       A&lt;br /&gt;
1       63268   T       N       T       N       T       C       N       T       N       N&lt;br /&gt;
1       63336   C       C       C       C       C       C       N       C       N       N&lt;br /&gt;
1       63671   G       N       G       N       N       G       G       G       N       N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''chr''' is the chromosome&lt;br /&gt;
&lt;br /&gt;
'''pos''' is the position&lt;br /&gt;
'''major''' is the major allele&lt;br /&gt;
&lt;br /&gt;
'''minor''' is the minor allele. Needs -doMajorMinor&lt;br /&gt;
&lt;br /&gt;
'''indX''' is the sampled base for individual number X. if -output01 1 then it is 1 for major, 0 for non major and -1 for missing&lt;br /&gt;
&lt;br /&gt;
=== sample based IBS matrix *.ibsMat ===&lt;br /&gt;
This function will print the pairwise IBS distance &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibsMat with -makeMatrix 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
0.000000        0.510638        0.606383        0.595745        0.545455        0.428571&lt;br /&gt;
0.510638        0.000000        0.154639        0.154639        0.108911        0.408602&lt;br /&gt;
0.606383        0.154639        0.000000        0.121212        0.137255        0.489362&lt;br /&gt;
0.595745        0.154639        0.121212        0.000000        0.106796        0.484211&lt;br /&gt;
0.545455        0.108911        0.137255        0.106796        0.000000        0.404040&lt;br /&gt;
0.428571        0.408602        0.489362        0.484211        0.404040        0.000000&lt;br /&gt;
0.577320        0.121212        0.181818        0.171717        0.097087        0.473684&lt;br /&gt;
0.536082        0.090000        0.138614        0.118812        0.047619        0.428571&lt;br /&gt;
0.262500        0.571429        0.702381        0.694118        0.632184        0.353659&lt;br /&gt;
0.458333        0.383838        0.484848        0.494949        0.398058        0.368421&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nind x Nind matrix with pairwise IBS distance&lt;br /&gt;
&lt;br /&gt;
=== sample based covariance matrix *.covMat ===&lt;br /&gt;
This function will print the covariance matrix based on a single sampled read&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.covMat with -doCov 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
1.098251        -0.026225       -0.005617       -0.014726       -0.022438       -0.021786&lt;br /&gt;
-0.026225       1.115986        -0.017167       0.000735        -0.017163       -0.016899&lt;br /&gt;
-0.005617       -0.017167       1.074779        -0.015685       -0.019819       -0.015473&lt;br /&gt;
-0.014726       0.000735        -0.015685       1.072853        -0.013641       -0.007789&lt;br /&gt;
-0.022438       -0.017163       -0.019819       -0.013641       1.094612        -0.016045&lt;br /&gt;
-0.021786       -0.016899       -0.015473       -0.007789       -0.016045       1.059264&lt;br /&gt;
-0.005831       -0.009854       -0.001269       -0.002362       -0.018479       -0.011942&lt;br /&gt;
-0.015399       -0.020010       -0.001296       -0.022947       -0.006515       -0.003938&lt;br /&gt;
-0.001730       -0.040534       -0.002295       -0.017442       -0.024194       -0.007469&lt;br /&gt;
-0.016094       -0.015303       -0.018302       -0.022502       -0.030503       -0.001208&lt;br /&gt;
-0.122045       -0.106068       -0.103089       -0.104443       -0.110237       -0.103610&lt;br /&gt;
-0.106553       -0.100202       -0.104754       -0.109399       -0.107645       -0.111665&lt;br /&gt;
-0.108945       -0.102440       -0.105292       -0.101372       -0.107110       -0.106639&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nind x Nind covariance matrix&lt;br /&gt;
&lt;br /&gt;
==Model==&lt;br /&gt;
&lt;br /&gt;
=== IBS ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
pairwise distance between individuals&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
d_{ij} = \frac{\sum_m^M 1-I_{b_j}(b_i)}{M}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
where M in the number of sites with a read for both individuals. &amp;lt;math&amp;gt; 1-I_{b_j}(b_i) &amp;lt;/math&amp;gt; is the indicator function which is equal to one with the two individuals i and j have the same base and zero otherwise&lt;br /&gt;
&lt;br /&gt;
=== Covariance ===&lt;br /&gt;
&lt;br /&gt;
Allele frequency based on single reads. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f_{m} = \frac{N_{minor}}{N_{major} + N_{minor}}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
cov(ij) = \frac{1}{M}\sum_m^M \frac{ (h^i_m-f_m)(h^j_m-f_m) }{f_m(1-f_m)}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where M in the number of sites with a read for both individuals. &amp;lt;math&amp;gt; h^i_m&amp;lt;/math&amp;gt; is 1 if individuals i for site m has the major allele and zero otherwise&lt;br /&gt;
&lt;br /&gt;
=MDS/PCA using R=&lt;br /&gt;
&lt;br /&gt;
[[File:PCA_MDS.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
## MDS&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.ibsMat&amp;quot;&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
mds &amp;lt;- cmdscale(as.dist(m))&lt;br /&gt;
plot(mds,lwd=2,ylab=&amp;quot;Dist&amp;quot;,xlab=&amp;quot;Dist&amp;quot;,main=&amp;quot;multidimensional scaling&amp;quot;,col=rep(1:3,each=10))&lt;br /&gt;
&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.covMat&amp;quot;&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
e &amp;lt;- eigen(m)&lt;br /&gt;
plot(e$vectors[,1:2],lwd=2,ylab=&amp;quot;PC 2&amp;quot;,xlab=&amp;quot;PC 2&amp;quot;,main=&amp;quot;Principal components&amp;quot;,col=rep(1:3,each=10),pch=16)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=other fun stuff=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
## heatmap / clustering / trees&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.ibsMat&amp;quot; # or covMat&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
#heat map&lt;br /&gt;
heatmap(m)&lt;br /&gt;
#neighbour joining&lt;br /&gt;
plot(ape::nj(m))&lt;br /&gt;
plot(hclust(dist(m), &amp;quot;ave&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Genotype_Likelihoods&amp;diff=3043</id>
		<title>Genotype Likelihoods</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Genotype_Likelihoods&amp;diff=3043"/>
		<updated>2018-05-18T12:14:11Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* SYK (Kim et al.) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Many methods in ANGSD are based on genotype likelihoods, and ANGSD has 4 different genotype likelihood models implemented.&lt;br /&gt;
&lt;br /&gt;
Genotype likelihoods and the four models are described in the [[#Theory | Bottom]].&lt;br /&gt;
&lt;br /&gt;
The SOAPsnp requires that a reference is supplied. Preferably the recalibration should only be performed on non-variable sites, so we recommend that the reference fasta should be modified such that all snp sites have an 'N'.&lt;br /&gt;
&lt;br /&gt;
We also allow for output of the calculated genotype likelihoods in various formats that might be handy for some users.&lt;br /&gt;
&lt;br /&gt;
;NB the GATK model described and implemented in this program are the one described in the first GATK paper. This might be drastically different from the one used in the newer version of GATK.&lt;br /&gt;
__TOC__&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL &lt;br /&gt;
	-&amp;gt; angsd version: 0.567	 build(Dec  7 2013 14:56:25)&lt;br /&gt;
	-&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
---------------------&lt;br /&gt;
analysisEstLikes.cpp:&lt;br /&gt;
	-GL=0: &lt;br /&gt;
	1: SAMtools&lt;br /&gt;
	2: GATK&lt;br /&gt;
	3: SOAPsnp&lt;br /&gt;
	4: SYK&lt;br /&gt;
	-trim		0		(zero means no trimming)&lt;br /&gt;
	-tmpdir		angsd_tmpdir/	(used by SOAPsnp)&lt;br /&gt;
	-errors		(null)		(used by SYK)&lt;br /&gt;
	-minInd		0		(0 indicates no filtering)&lt;br /&gt;
&lt;br /&gt;
Filedumping:&lt;br /&gt;
	-doGlf	0&lt;br /&gt;
	1: binary glf (10 log likes)	.glf.gz&lt;br /&gt;
	2: beagle likelihood file	.beagle.gz&lt;br /&gt;
	3: binary 3 times likelihood	.glf.gz&lt;br /&gt;
	4: text version (10 log likes)	.glf.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
=Options=&lt;br /&gt;
; -GL [int]&lt;br /&gt;
If your input is sequencing file you can estimate genotype likelhoods from the mapped reads. Four different methods are available. &lt;br /&gt;
# SAMtools model&lt;br /&gt;
# GATK model&lt;br /&gt;
# SOAPsnp model&lt;br /&gt;
# SYK model&lt;br /&gt;
&lt;br /&gt;
; NB&lt;br /&gt;
When estimating GL with soapSNP we need to generate a calibration matrix. This is done automaticly if these doesn't exist. These are located in angsd_tmpdir/basenameNUM.count,angsd_tmpdir/basenameNUM.qual, and the read length is not allowed to exceed 256 base pairs.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;-trim [int]&lt;br /&gt;
This will discards [int] bases at both ends of the reads when calculating the genotype likelihoods. &lt;br /&gt;
&lt;br /&gt;
;-tmpdir [directoryPath]&lt;br /&gt;
default is `angsd_tmpdir`. SOAPsnp generates a mismatch matrix for each BAM file and based on this mismatch matrix it calculates the type specific errors for each position in the read. So for each BAM file it generates two files, to avoid cluttering up the working directory you can specify a folder that should be used. SOAPsnp assumes that all reads have the same length, if this is not the case this model might not be suited (also true for other recalibration tools).&lt;br /&gt;
&lt;br /&gt;
;-errors [fileName]&lt;br /&gt;
SYK model requires a file containing the type specific errors, as estimated from [[Error estimation | -doError 1]].&lt;br /&gt;
&lt;br /&gt;
;-minInd [int]&lt;br /&gt;
Discard the sites where we don't have data from '''-minInd''' individuals. If you have 100 individuals, and you only want to base your downstream analysis on the sites where you have data for at least half your samples then set '''-minInd 50'''.&lt;br /&gt;
&lt;br /&gt;
==Filtering==&lt;br /&gt;
See  [[Input#BAM_files]] for Bam specific filters.&lt;br /&gt;
&lt;br /&gt;
=Examples=&lt;br /&gt;
SAMtools and GATK likelihood are chosen simply with&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 #SAMtools&lt;br /&gt;
./angsd -GL 2 #GATK&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
SOAPsnp and SYK requires some extra arguments as shown below.&lt;br /&gt;
==SOAPsnp==&lt;br /&gt;
First run through the bam files ones to generate the calibration matrix&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam bam.filelist -GL 3 -out outfile -ref hg19.fa -minQ 0&lt;br /&gt;
#NB important to set -minQ to zero, ANGSD defaults to minQ 13&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This first loop doesn't estimate anything else than the calibration matrix.&lt;br /&gt;
&lt;br /&gt;
After this run the we can estimate the genotype likelihoods and any other further analysis we desire. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam bam.filelist -GL 3 -out outfile -doGlf 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==SYK (Kim et al.)==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam bam.filelist -GL 4 -out outfile -errors error.file -doCounts 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This model is based on counts of bases and therefore needs [[Alleles_counts]] &amp;quot;-doCounts 1&amp;quot;. The error file is one line of 16 values as outputted from -doError&lt;br /&gt;
&lt;br /&gt;
=Output genotype likelihoods=&lt;br /&gt;
; -doGlf [int]&lt;br /&gt;
Output the log genotype likelihoods to a file&lt;br /&gt;
&lt;br /&gt;
;0. don't output the genotype likelihoods (default)&lt;br /&gt;
&lt;br /&gt;
;1.  binary all 10 log genotype likelihood &lt;br /&gt;
&lt;br /&gt;
;2. beagle genotype likelihood format (use directly for imputation)&lt;br /&gt;
&lt;br /&gt;
;3. beagle binary&lt;br /&gt;
&lt;br /&gt;
;4. textoutput of all 10 log genotype likelihoods.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Binary==&lt;br /&gt;
Glf file in binary doubles. All 10 genotype likelihoods are printed to a file. For each printed site there are 10*N doubles where N is the number of individuals. The order of the 10 genotypes are alphabetical AA AC AG AT CC CG CT GG GT TT. These are log scaled likelihood ratios to the most likely.&lt;br /&gt;
&lt;br /&gt;
Pseudocode for parsing these files in '''c/c++'''.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
FILE *fp = fopen(genotypelikelihood.bin,&amp;quot;r&amp;quot;)&lt;br /&gt;
ind nInd = 5;&lt;br /&gt;
double gls[5*10];&lt;br /&gt;
fread(gls,sizeof(double),5*10,fp);&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Beagle format==&lt;br /&gt;
Beagle haplotype imputation and be performed directly on genotype likelhoods. To generate beagle input file use&lt;br /&gt;
&lt;br /&gt;
; -doGlf 2&lt;br /&gt;
&lt;br /&gt;
In order to make this file the major and minor allele has the be inferred [[Inferring Major and Minor alleles | -doMajorMinor]]. It is also a good idea to only use the polymorphic sites.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Example===&lt;br /&gt;
In this example our input files are bam files. We use the samtools genotype likelihood methods. We use 10 threads. We infer the major and minor allele from the likelihoods and estimate the allele frequencies. We test for polymorphic sites and only outbut the ones with are likelhood ratio test statistic of minimum 24 (ca. p-value&amp;lt;1e-6). &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genolike -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 2e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Output===&lt;br /&gt;
The above command generates the file genolike.beagle.gz that can be use as input for the beagle software&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker  allele1 allele2 Ind0    Ind0    Ind0    Ind1    Ind1    Ind1    Ind2    Ind2    Ind2    Ind3    Ind3    Ind3 &lt;br /&gt;
1_14000023      1       0       0.941177        0.058822        0.000001        0.799685        0.199918        0.000397        0.666316        0.333155        0.000529 &lt;br /&gt;
1_14000072      2       3       0.709983        0.177493        0.112525        0.941178        0.058822        0.000000        0.665554        0.332774        0.001672&lt;br /&gt;
1_14000113      0       2       0.855993        0.106996        0.037010        0.333333        0.333333        0.333333        0.799971        0.199989        0.000040 &lt;br /&gt;
1_14000202      2       0       0.835380        0.104420        0.060201        0.799685        0.199918        0.000397        0.333333        0.333333        0.333333&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that the above values sum to one per sites for each individuals. This is just a normalization of the genotype likelihoods in order to avoid underflow problems in the beagle software it does not mean that they are genotype probabilities.&lt;br /&gt;
&lt;br /&gt;
; column 1 (marker)&lt;br /&gt;
the chromosome and position&lt;br /&gt;
; column 2 (allele 1)&lt;br /&gt;
the major allele codes as 0=A, 1=C, 2=G, 3=T&lt;br /&gt;
; column 3 (allele 2)&lt;br /&gt;
the minor allele codes as 0=A, 1=C, 2=G, 3=T&lt;br /&gt;
; column 4 (Ind0)&lt;br /&gt;
Genotype likelihood for the major/major genotype for the first individual&lt;br /&gt;
; column 5 (Ind0)&lt;br /&gt;
Genotype likelihood for the major/minor genotype for the first individual&lt;br /&gt;
; column 6 (Ind0)&lt;br /&gt;
Genotype likelihood for the minor/minor genotype for the first individual&lt;br /&gt;
; column 7 (Ind1)&lt;br /&gt;
Genotype likelihood for the major/major genotype for the second individual&lt;br /&gt;
...&lt;br /&gt;
&lt;br /&gt;
==Simple Text Format==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -bam bam.filelist -doGlf 4 -nInd 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
We use SAMtools genotype likelihoods from the first sample ('''-nInd 1''') in the file list called '''bam.filelist'''.&lt;br /&gt;
&lt;br /&gt;
Generates '''angsdput.glf.gz''', which looks like:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 13999965 -2.072327 -0.693156 -2.072327 -2.072327 0.000000 -0.693156 -0.693156 -2.072327 -2.072327 -2.072327&lt;br /&gt;
1 13999966 -2.072327 -2.072327 -0.693156 -2.072327 -2.072327 -0.693156 -2.072327 0.000000 -0.693156 -2.072327&lt;br /&gt;
1 13999967 0.000000 -0.693156 -0.693156 -0.693156 -2.072327 -2.072327 -2.072327 -2.072327 -2.072327 -2.072327&lt;br /&gt;
1 13999968 -2.072327 -2.072327 -0.693156 -2.072327 -2.072327 -0.693156 -2.072327 0.000000 -0.693156 -2.072327&lt;br /&gt;
1 13999969 0.000000 -0.693156 -0.693156 -0.693156 -2.072327 -2.072327 -2.072327 -2.072327 -2.072327 -2.072327&lt;br /&gt;
1 13999970 -2.072327 -2.072327 -2.072327 -0.693156 -2.072327 -2.072327 -0.693156 -2.072327 -0.693156 0.000000&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
First 2 columns are the genomic positions, and the final 10 values are the genotype likelihoods in the usual ordering.&lt;br /&gt;
=Which genotype likelihood model should I choose ?=&lt;br /&gt;
It depends on the data. As shown on this example [[Glcomparison]], there was a huge difference between '''-GL 1''' and '''-GL 2''' for older 1000genomes BAM files, but little difference for newer bam files.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Theory=&lt;br /&gt;
Genotype likelihoods are in this context the likelihood the data given a genotype. This is to be understood as we take all the information from our data for a specific position for a single individual, and we use this information to calculate the likelihood for our different genotypes. Since we assume diploid individuals it follows that we have 10 different genotypes.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;text-align: center; color: green;&amp;quot;&lt;br /&gt;
|0&lt;br /&gt;
|1&lt;br /&gt;
|2&lt;br /&gt;
|3&lt;br /&gt;
|4&lt;br /&gt;
|5&lt;br /&gt;
|6&lt;br /&gt;
|7&lt;br /&gt;
|8&lt;br /&gt;
|9&lt;br /&gt;
|-&lt;br /&gt;
|AA&lt;br /&gt;
|AC&lt;br /&gt;
|AG&lt;br /&gt;
|AT&lt;br /&gt;
|CC&lt;br /&gt;
|CG&lt;br /&gt;
|CT&lt;br /&gt;
|GG&lt;br /&gt;
|GT&lt;br /&gt;
|TT&lt;br /&gt;
|}&lt;br /&gt;
And we write the genotype likelihood as&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
L(G=\{A_1 ,A_2\}|D ) \propto Pr (D|G={A_1 ,A_2 } ),\qquad A_1 ,A_2 \in \{A,C,G,T\}.&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==GATK genotype likelihoods==&lt;br /&gt;
In angsd we use the direct method of the first version of GATK (dragon). This is simply&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Pr(D|G=\{A_1,A_2\})=\prod_{i=1}^M Pr \left ( b_i|G=\{A_1,A_2\} \right) = \prod_{i=1}^M  (\frac{1}{2}Pr( b_i|A_1)  + \frac{1}{2}Pr( b_i|A_2)  ) &lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
Pr(b|A) =\left\{&lt;br /&gt;
  \begin{array}{lr}&lt;br /&gt;
    \frac{e}{3} &amp;amp; : b \neq A\\&lt;br /&gt;
   1-e &amp;amp; : b = A&lt;br /&gt;
  \end{array}&lt;br /&gt;
\right.&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where M is the sequencing depth &amp;lt;math&amp;gt;b_i&amp;lt;/math&amp;gt; is the observed base in read ''i, e'' is the probability of error calculated from the phredscaled qscore e.g. &amp;lt;math&amp;gt; e=10^{-q/10} &amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==SAMtools genotype likelihoods==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
This subsection with SAMtools gl are preliminary&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Define:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
fk_i = 0.83^i*0.97+0.03 &lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
lhet_{n,k} = \log  \frac{\binom{n}{k}}{2^n} &lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
\beta_{n,k} = \frac{\beta_{n,k-1}}{\beta_{n,k-1}+\binom{n}{k}\cdot k \cdot log(prob(e))+(n-k)*log(1-prob(e))}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==SOAPsnp genotype likelihoods==&lt;br /&gt;
==SYK genotype likelihoods==&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Dodepth&amp;diff=3042</id>
		<title>Dodepth</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Dodepth&amp;diff=3042"/>
		<updated>2018-04-30T11:37:17Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: Redirected page to Allele Counts&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[Allele_Counts]]&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Docounts&amp;diff=3041</id>
		<title>Docounts</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Docounts&amp;diff=3041"/>
		<updated>2018-04-30T11:36:26Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: Redirected page to Allele Counts&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[Allele_Counts]]&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Docount&amp;diff=3039</id>
		<title>Docount</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Docount&amp;diff=3039"/>
		<updated>2018-04-02T10:02:47Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: Redirected page to Allele Counts&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
#REDIRECT [[Allele_Counts]]&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Dogeno&amp;diff=3038</id>
		<title>Dogeno</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Dogeno&amp;diff=3038"/>
		<updated>2018-04-02T10:01:44Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: Redirected page to Genotype calling&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[Genotype_calling]]&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Domajorminor&amp;diff=3037</id>
		<title>Domajorminor</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Domajorminor&amp;diff=3037"/>
		<updated>2018-04-02T10:00:55Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: Redirected page to Major Minor&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[Major_Minor]]&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Rmtriallelic&amp;diff=3036</id>
		<title>Rmtriallelic</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Rmtriallelic&amp;diff=3036"/>
		<updated>2018-04-02T09:59:21Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: Redirected page to Allele Frequencies&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[Allele_Frequencies]]&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Snp_pval&amp;diff=3035</id>
		<title>Snp pval</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Snp_pval&amp;diff=3035"/>
		<updated>2018-04-02T09:58:37Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: Redirected page to Allele Frequencies&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[Allele_Frequencies]]&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Dopost&amp;diff=3034</id>
		<title>Dopost</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Dopost&amp;diff=3034"/>
		<updated>2018-04-02T09:57:41Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: Redirected page to Allele Frequencies&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[Allele_Frequencies]]&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Domaf&amp;diff=3032</id>
		<title>Domaf</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Domaf&amp;diff=3032"/>
		<updated>2018-04-02T09:56:11Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: Redirected page to Allele Frequencies&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[Allele_Frequencies]]&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Allele_Frequencies&amp;diff=3031</id>
		<title>Allele Frequencies</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Allele_Frequencies&amp;diff=3031"/>
		<updated>2018-04-02T09:53:41Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div class=&amp;quot;keywords&amp;quot;&amp;gt; -domaf,-domaf,-domaf,-domaf,-domaf, domaf, domaf, domaf, domaf, domaf, domaf, dopost, SNP_pval &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The allele frequency is the relative frequency of an allele for a site. This can be polarized according to the major/minor, reference/non-refernce or ancestral/derived. .Therefore the choice of allele frequency estimator is closely related to choosing which alleles are segregating (see [[Inferring_Major_and_Minor_alleles]]). &lt;br /&gt;
&lt;br /&gt;
We allow for frequency estimation from different input data:&lt;br /&gt;
&lt;br /&gt;
# Genotype Likelihoods&lt;br /&gt;
# Genotype posterior probabilities&lt;br /&gt;
# Counts of bases&lt;br /&gt;
&lt;br /&gt;
The allele frequency estimator from genotype likelihoods are from this  [[suYeon | publication]], and the base counts method is from this [[Li2010 |publication]]. &lt;br /&gt;
&lt;br /&gt;
For the case of the genotype likelihood based methods we allow for deviations from Hardy-Weinberg, namely we allow for users to supply a file containing inbreeding coefficients for each individual.&lt;br /&gt;
&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 ./angsd -doMaf&lt;br /&gt;
        -&amp;gt; angsd version: 0.910-76-gad32889 (htslib: 1.3-32-gecdc348) build(Mar  2 2016 12:38:33)&lt;br /&gt;
        -&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
        -&amp;gt; Command: &lt;br /&gt;
./angsd -doMaf  -&amp;gt; Wed Mar  2 12:45:40 2016&lt;br /&gt;
------------------------&lt;br /&gt;
abcFreq.cpp:&lt;br /&gt;
-doMaf  0 (Calculate persite frequencies '.mafs.gz')&lt;br /&gt;
        1: Frequency (fixed major and minor)&lt;br /&gt;
        2: Frequency (fixed major unknown minor)&lt;br /&gt;
        4: Frequency from genotype probabilities&lt;br /&gt;
        8: AlleleCounts based method (known major minor)&lt;br /&gt;
        NB. Filedumping is supressed if value is negative&lt;br /&gt;
-doPost 0       (Calculate posterior prob 3xgprob)&lt;br /&gt;
        1: Using frequency as prior&lt;br /&gt;
        2: Using uniform prior&lt;br /&gt;
        3: Using SFS as prior (still in development)&lt;br /&gt;
Filters:&lt;br /&gt;
        -minMaf         -1.000000       (Remove sites with MAF below)&lt;br /&gt;
        -SNP_pval       1.000000        (Remove sites with a pvalue larger)&lt;br /&gt;
        -rmTriallelic   0.000000        (Remove sites with a pvalue lower)&lt;br /&gt;
Extras:&lt;br /&gt;
        -ref    (null)  (Filename for fasta reference)&lt;br /&gt;
        -anc    (null)  (Filename for fasta ancestral)&lt;br /&gt;
        -eps    0.001000 [Only used for -doMaf &amp;amp;8]&lt;br /&gt;
        -beagleProb     0 (Dump beagle style postprobs)&lt;br /&gt;
        -indFname       (null) (file containing individual inbreedcoeficients)&lt;br /&gt;
NB These frequency estimators requires major/minor -doMajorMinor&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Allele Frequency estimation=&lt;br /&gt;
The major and minor allele is first inferred from the data or given by the user (see [[Inferring_Major_and_Minor_alleles]]). This includes information from both major and minor allele, a reference genome (for major) or an ancestral genome. &lt;br /&gt;
&lt;br /&gt;
; -doMaf [int]&lt;br /&gt;
&lt;br /&gt;
1:  Known major, and Known minor. Here both the major and minor allele is assumed to be known (inferred or given by user). The allele frequency is the obtained using based on the genotype likelihoods. The allele frequency estimator from genotype likelihoods are from this [[suYeon | publication]]  but using the EM algorithm and is briefly described [[SYKmaf|here]]. &lt;br /&gt;
&lt;br /&gt;
2:  Known major, Unknown minor. Here the major allele is assumed to be known  (inferred or given by user) however the minor allele is not determined. Instead we sum over the 3 possible minor alleles weighted by their probabilities. The allele frequency estimator from genotype likelihoods are from this [[suYeon | publication]] but using the EM algorithm and is briefly described [[SYKmaf|here]]. &lt;br /&gt;
. &lt;br /&gt;
&lt;br /&gt;
4: frequency based on genotype posterior probabilities. If genotype probabilities are used as input to ANGSD the allele frequency is estimated directly on these by [[postFreq|summing over the probabitlies]]. &lt;br /&gt;
&lt;br /&gt;
8: frequency based on base counts. This method does not rely on genotype likelihood or probabilities but instead infers the allele frequency directly on the base counts. The base counts method is from this [[Li2010 |publication]]. &lt;br /&gt;
&lt;br /&gt;
Multiple estimators can be used simultaniusly be summing up the above numbers. Thus -doMaf 7 (1+2+4) will use the first three estimators. If the allele frequencies are estimated from the genotype likelihoods then you need to infer the major and minor allele (-doMajorMinor)&lt;br /&gt;
&lt;br /&gt;
;NB using -doMaf 4 is only supported if the posteriors are supplied as external files. Since the estimation of genotype posteriors in itself requires a maf estimator.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
&lt;br /&gt;
==From genotype likelihood==&lt;br /&gt;
Example for estimating the allele frequencies both while assuming known major and minor allele but also while taking the uncertaincy of the minor allele inference into account. The [[Inferring_Major_and_Minor_alleles|inference of the major and minor]] allele is done directly from the genotype likelihood&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -out out -doMajorMinor 1 -doMaf 3 -bam bam.filelist -GL 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==From genotype probabilities==&lt;br /&gt;
Example of the use of a genotype probability file for example from the output from beagle. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -out out -doMaf 4 -beagle beagle.file.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Estimator from base counts==&lt;br /&gt;
&lt;br /&gt;
The allele frequencies can be infered directy from the sequencing data [[Li2010|citation]].&lt;br /&gt;
This works by using &amp;quot;counts&amp;quot; of alleles, and should be invoked like&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -out out -doMajorMinor 2 -doMaf 8 -bam bam.filelist -doCounts 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Output data=&lt;br /&gt;
==.mafs.gz==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chromo	position	major	minor	ref	knownEM	unknownEM	nInd&lt;br /&gt;
21      9719788 T       A       0.000001        -0.000012       3&lt;br /&gt;
21      9719789 G       A       0.000000        -0.000001       3&lt;br /&gt;
21      9719790 A       C       0.000000        -0.000004       3&lt;br /&gt;
21      9719791 G       A       0.000000        -0.000001       3&lt;br /&gt;
21      9719792 G       A       0.000000        -0.000002       3&lt;br /&gt;
21      9719793 G       T       0.498277        41.932766       3&lt;br /&gt;
21      9719794 T       A       0.000000        -0.000001       3&lt;br /&gt;
21      9719795 T       A       0.000000        -0.000001       3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
;chromo &lt;br /&gt;
chromosome name&lt;br /&gt;
;position&lt;br /&gt;
position&lt;br /&gt;
;major &lt;br /&gt;
major allele&lt;br /&gt;
;minor &lt;br /&gt;
minor allele&lt;br /&gt;
;knownEM &lt;br /&gt;
frequency using -doMaf 1&lt;br /&gt;
;unknownEM &lt;br /&gt;
frequency using -doMaf 2&lt;br /&gt;
;phat &lt;br /&gt;
frequency using -doMaf 8&lt;br /&gt;
;nInd &lt;br /&gt;
is the number of individuals with data&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=MediaWiki:Common.css&amp;diff=3030</id>
		<title>MediaWiki:Common.css</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=MediaWiki:Common.css&amp;diff=3030"/>
		<updated>2018-04-02T09:52:01Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;/* CSS placed here will be applied to all skins */&lt;br /&gt;
.keywords {&lt;br /&gt;
   display: none;&lt;br /&gt;
}&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=MediaWiki:Common.css&amp;diff=3029</id>
		<title>MediaWiki:Common.css</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=MediaWiki:Common.css&amp;diff=3029"/>
		<updated>2018-04-02T09:50:14Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;/* CSS placed here will be applied to all skins */&lt;br /&gt;
div.keywords {&lt;br /&gt;
   display: none;&lt;br /&gt;
}&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Allele_Frequencies&amp;diff=3028</id>
		<title>Allele Frequencies</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Allele_Frequencies&amp;diff=3028"/>
		<updated>2018-04-02T09:49:05Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div class=&amp;quot;keywords&amp;quot;&amp;gt; domaf, dopost, SNP_pval &amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The allele frequency is the relative frequency of an allele for a site. This can be polarized according to the major/minor, reference/non-refernce or ancestral/derived. .Therefore the choice of allele frequency estimator is closely related to choosing which alleles are segregating (see [[Inferring_Major_and_Minor_alleles]]). &lt;br /&gt;
&lt;br /&gt;
We allow for frequency estimation from different input data:&lt;br /&gt;
&lt;br /&gt;
# Genotype Likelihoods&lt;br /&gt;
# Genotype posterior probabilities&lt;br /&gt;
# Counts of bases&lt;br /&gt;
&lt;br /&gt;
The allele frequency estimator from genotype likelihoods are from this  [[suYeon | publication]], and the base counts method is from this [[Li2010 |publication]]. &lt;br /&gt;
&lt;br /&gt;
For the case of the genotype likelihood based methods we allow for deviations from Hardy-Weinberg, namely we allow for users to supply a file containing inbreeding coefficients for each individual.&lt;br /&gt;
&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 ./angsd -doMaf&lt;br /&gt;
        -&amp;gt; angsd version: 0.910-76-gad32889 (htslib: 1.3-32-gecdc348) build(Mar  2 2016 12:38:33)&lt;br /&gt;
        -&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
        -&amp;gt; Command: &lt;br /&gt;
./angsd -doMaf  -&amp;gt; Wed Mar  2 12:45:40 2016&lt;br /&gt;
------------------------&lt;br /&gt;
abcFreq.cpp:&lt;br /&gt;
-doMaf  0 (Calculate persite frequencies '.mafs.gz')&lt;br /&gt;
        1: Frequency (fixed major and minor)&lt;br /&gt;
        2: Frequency (fixed major unknown minor)&lt;br /&gt;
        4: Frequency from genotype probabilities&lt;br /&gt;
        8: AlleleCounts based method (known major minor)&lt;br /&gt;
        NB. Filedumping is supressed if value is negative&lt;br /&gt;
-doPost 0       (Calculate posterior prob 3xgprob)&lt;br /&gt;
        1: Using frequency as prior&lt;br /&gt;
        2: Using uniform prior&lt;br /&gt;
        3: Using SFS as prior (still in development)&lt;br /&gt;
Filters:&lt;br /&gt;
        -minMaf         -1.000000       (Remove sites with MAF below)&lt;br /&gt;
        -SNP_pval       1.000000        (Remove sites with a pvalue larger)&lt;br /&gt;
        -rmTriallelic   0.000000        (Remove sites with a pvalue lower)&lt;br /&gt;
Extras:&lt;br /&gt;
        -ref    (null)  (Filename for fasta reference)&lt;br /&gt;
        -anc    (null)  (Filename for fasta ancestral)&lt;br /&gt;
        -eps    0.001000 [Only used for -doMaf &amp;amp;8]&lt;br /&gt;
        -beagleProb     0 (Dump beagle style postprobs)&lt;br /&gt;
        -indFname       (null) (file containing individual inbreedcoeficients)&lt;br /&gt;
NB These frequency estimators requires major/minor -doMajorMinor&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Allele Frequency estimation=&lt;br /&gt;
The major and minor allele is first inferred from the data or given by the user (see [[Inferring_Major_and_Minor_alleles]]). This includes information from both major and minor allele, a reference genome (for major) or an ancestral genome. &lt;br /&gt;
&lt;br /&gt;
; -doMaf [int]&lt;br /&gt;
&lt;br /&gt;
1:  Known major, and Known minor. Here both the major and minor allele is assumed to be known (inferred or given by user). The allele frequency is the obtained using based on the genotype likelihoods. The allele frequency estimator from genotype likelihoods are from this [[suYeon | publication]]  but using the EM algorithm and is briefly described [[SYKmaf|here]]. &lt;br /&gt;
&lt;br /&gt;
2:  Known major, Unknown minor. Here the major allele is assumed to be known  (inferred or given by user) however the minor allele is not determined. Instead we sum over the 3 possible minor alleles weighted by their probabilities. The allele frequency estimator from genotype likelihoods are from this [[suYeon | publication]] but using the EM algorithm and is briefly described [[SYKmaf|here]]. &lt;br /&gt;
. &lt;br /&gt;
&lt;br /&gt;
4: frequency based on genotype posterior probabilities. If genotype probabilities are used as input to ANGSD the allele frequency is estimated directly on these by [[postFreq|summing over the probabitlies]]. &lt;br /&gt;
&lt;br /&gt;
8: frequency based on base counts. This method does not rely on genotype likelihood or probabilities but instead infers the allele frequency directly on the base counts. The base counts method is from this [[Li2010 |publication]]. &lt;br /&gt;
&lt;br /&gt;
Multiple estimators can be used simultaniusly be summing up the above numbers. Thus -doMaf 7 (1+2+4) will use the first three estimators. If the allele frequencies are estimated from the genotype likelihoods then you need to infer the major and minor allele (-doMajorMinor)&lt;br /&gt;
&lt;br /&gt;
;NB using -doMaf 4 is only supported if the posteriors are supplied as external files. Since the estimation of genotype posteriors in itself requires a maf estimator.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
&lt;br /&gt;
==From genotype likelihood==&lt;br /&gt;
Example for estimating the allele frequencies both while assuming known major and minor allele but also while taking the uncertaincy of the minor allele inference into account. The [[Inferring_Major_and_Minor_alleles|inference of the major and minor]] allele is done directly from the genotype likelihood&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -out out -doMajorMinor 1 -doMaf 3 -bam bam.filelist -GL 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==From genotype probabilities==&lt;br /&gt;
Example of the use of a genotype probability file for example from the output from beagle. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -out out -doMaf 4 -beagle beagle.file.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Estimator from base counts==&lt;br /&gt;
&lt;br /&gt;
The allele frequencies can be infered directy from the sequencing data [[Li2010|citation]].&lt;br /&gt;
This works by using &amp;quot;counts&amp;quot; of alleles, and should be invoked like&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -out out -doMajorMinor 2 -doMaf 8 -bam bam.filelist -doCounts 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Output data=&lt;br /&gt;
==.mafs.gz==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chromo	position	major	minor	ref	knownEM	unknownEM	nInd&lt;br /&gt;
21      9719788 T       A       0.000001        -0.000012       3&lt;br /&gt;
21      9719789 G       A       0.000000        -0.000001       3&lt;br /&gt;
21      9719790 A       C       0.000000        -0.000004       3&lt;br /&gt;
21      9719791 G       A       0.000000        -0.000001       3&lt;br /&gt;
21      9719792 G       A       0.000000        -0.000002       3&lt;br /&gt;
21      9719793 G       T       0.498277        41.932766       3&lt;br /&gt;
21      9719794 T       A       0.000000        -0.000001       3&lt;br /&gt;
21      9719795 T       A       0.000000        -0.000001       3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
;chromo &lt;br /&gt;
chromosome name&lt;br /&gt;
;position&lt;br /&gt;
position&lt;br /&gt;
;major &lt;br /&gt;
major allele&lt;br /&gt;
;minor &lt;br /&gt;
minor allele&lt;br /&gt;
;knownEM &lt;br /&gt;
frequency using -doMaf 1&lt;br /&gt;
;unknownEM &lt;br /&gt;
frequency using -doMaf 2&lt;br /&gt;
;phat &lt;br /&gt;
frequency using -doMaf 8&lt;br /&gt;
;nInd &lt;br /&gt;
is the number of individuals with data&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=MediaWiki:Common.css&amp;diff=3027</id>
		<title>MediaWiki:Common.css</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=MediaWiki:Common.css&amp;diff=3027"/>
		<updated>2018-04-02T09:48:46Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: Created page with &amp;quot;/* CSS placed here will be applied to all skins */ .keywords {    display: none; }&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;/* CSS placed here will be applied to all skins */&lt;br /&gt;
.keywords {&lt;br /&gt;
   display: none;&lt;br /&gt;
}&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=PCA_MDS&amp;diff=3026</id>
		<title>PCA MDS</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=PCA_MDS&amp;diff=3026"/>
		<updated>2018-03-12T15:39:59Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= single read sampling approach for PCA or MDS =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This function is new and works from version '''0.912''' and in the latest developmental version from [https://github.com/ANGSD/angsd github]&lt;br /&gt;
&lt;br /&gt;
For the PCA / MDS methods you should called SNP sites (use [[PCA]] if you do not want to call SNPs). SNPs can be called based on genotype likelihoods (see [[SNP_calling]]) or you can give the variable sites you want analysis using the [[Sites|-sites]] options. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -doIBS&lt;br /&gt;
	-&amp;gt; angsd version: 0.911-26-gf1cb0e0-dirty (htslib: 1.3-1-gc72ae90) build(Apr 27 2016 11:15:33)&lt;br /&gt;
	-&amp;gt; Analysis helpbox/synopsis information:&lt;br /&gt;
	-&amp;gt; Command: &lt;br /&gt;
../angsd/angsd -doIBS 	-&amp;gt; Wed Apr 27 12:38:35 2016&lt;br /&gt;
--------------&lt;br /&gt;
abcIBS.cpp:&lt;br /&gt;
	-doIBS	0&lt;br /&gt;
	(Sampling strategies)&lt;br /&gt;
	 0:	 no IBS &lt;br /&gt;
	 1:	 (Sample single base)&lt;br /&gt;
	 2:	 (Concensus base)&lt;br /&gt;
	-doCounts	0	Must choose -doCount 1&lt;br /&gt;
Optional&lt;br /&gt;
	-minMinor	0	Minimum observed minor alleles&lt;br /&gt;
	-minFreq	0.000	Minimum minor allele frequency&lt;br /&gt;
	-output01	0	output 0 and 1s instead of based&lt;br /&gt;
	-maxMis		-1	Maximum missing bases (per site)&lt;br /&gt;
	-doMajorMinor	0	use input files or data to select major and minor alleles&lt;br /&gt;
	-makeMatrix	0	print out the ibs matrix &lt;br /&gt;
	-doCov		0	print out the cov matrix &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Options==&lt;br /&gt;
;-doIBS [int] &lt;br /&gt;
Print a single base from each individual at each position. 1: random sampled read. 2: Consensus base&lt;br /&gt;
&lt;br /&gt;
;-doCounts [int]&lt;br /&gt;
Method requeres counting the different bases at each position. Therefore, -doCounts 1 must be used&lt;br /&gt;
&lt;br /&gt;
;-doMajorMinor [int]&lt;br /&gt;
The covariance matrix can only be calculated for diallelic sites. Therefore, choose a methods for selecting the major and minor allele (see [[Inferring_Major_and_Minor_alleles]]). This can also be use if you only want to make this assumption for the IBS matrix or only want to print out bases that are either the major or minor. &lt;br /&gt;
&lt;br /&gt;
;-minMinor [int]&lt;br /&gt;
Minimum observed minor alleles. The default in 0. If you do not use -doMajorMinor then the number of minor alleles are the sum of the 3 most uncommon alleles. &lt;br /&gt;
&lt;br /&gt;
;-minFreq [float]	&lt;br /&gt;
Minimum minor allele frequency based on the sampled bases. The default in 0. If you do not use -doMajorMinor then the frequency is the sum of the frequencies of the 3 most uncommon alleles. &lt;br /&gt;
&lt;br /&gt;
;-output01 [int]	&lt;br /&gt;
output the samples reads as 0 (for major) and 1s (for non major) instead of actual base&lt;br /&gt;
&lt;br /&gt;
;-maxMis [int]&lt;br /&gt;
Maximum missing bases (per site) i.e. is the maximum number of allowed non-major/minor sampled bases&lt;br /&gt;
&lt;br /&gt;
;-makeMatrix [int]&lt;br /&gt;
1 prints out the pairwise IBS matrix. This is the avg. distance between pairs of individuals. Distance is zero if the base in the same and 1 otherwise. You can use this for MDS (see below)&lt;br /&gt;
&lt;br /&gt;
;-doCov [int]		&lt;br /&gt;
1 print out the covariance matrix which can be used for PCA (see below). You should use the -minFreq option to avoid sites with low allele frequency.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== run example ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam all.files -minMapQ 30 -minQ 20 -GL 2  -doMajorMinor 1 -doMaf 1 -SNP_pval 2e-6 -doIBS 1 -doCounts 1 -doCov 1 -makeMatrix 1 -minMaf 0.05 -P 5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This will produce the output (see below) which includes pairwise differences (.ibsMat) and the covariance matrix (.covMat). These can be used for MDS and PCA respectively (see R example below). Note that only the PCA method require SNP calling and allele frequency estimation. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Output==&lt;br /&gt;
&lt;br /&gt;
=== sampled bases *ibs.gz ===&lt;br /&gt;
This function will print the sampled based *ibs.gz. &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor&amp;gt;0 and -output01 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   minor   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7&lt;br /&gt;
1       14000873        A       G       0       1       1       1       1       1       1&lt;br /&gt;
1       14001018        C       T       0       1       1       1       1       1       1&lt;br /&gt;
1       14001867        G       A       0       1       1       1       1       0       1&lt;br /&gt;
1       14002342        T       C       1       1       1       1       1       -1      1&lt;br /&gt;
1       14002422        T       A       0       1       1       1       1       0       -1&lt;br /&gt;
1       14003581        T       C       0       1       1       1       1       1       1&lt;br /&gt;
1       14004623        C       T       0       1       1       1       1       0       1&lt;br /&gt;
1       14006543        T       G       0       -1      1       1       1       0       1&lt;br /&gt;
1       14007493        G       A       0       0       1       -1      1       0       1&lt;br /&gt;
1       14007558        T       C       0       0       1       1       -1      -1      1&lt;br /&gt;
1       14007649        A       G       0       1       1       1       1       0       1&lt;br /&gt;
1       14008269        A       G       1       1       0       -1      1       -1      1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor&amp;gt;0 and -output01 0&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   minor   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7&lt;br /&gt;
1       13116   G       T       N       G       T       T       N       G       N       T&lt;br /&gt;
1       13118   G       A       N       G       A       A       N       G       N       A&lt;br /&gt;
1       14930   A       G       G       G       G       A       N       N       A       N&lt;br /&gt;
1       15211   T       G       N       G       T       G       N       N       N       G&lt;br /&gt;
1       54490   A       G       N       G       N       G       N       N       N       N&lt;br /&gt;
1       54716   T       C       T       C       C       C       T       N       N       N&lt;br /&gt;
1       58814   A       G       N       G       N       G       G       G       N       N&lt;br /&gt;
1       62777   T       A       N       N       A       N       A       A       A       N&lt;br /&gt;
1       63268   C       T       N       T       N       T       C       N       T       N&lt;br /&gt;
1       63671   A       G       N       G       N       N       G       G       G       N&lt;br /&gt;
1       69428   G       T       N       G       T       N       N       T       T       N&lt;br /&gt;
1       69761   T       A       A       A       T       A       N       A       N       N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibs.gz with -doMajorMinor 0 and -output01 0&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
chr     pos     major   ind0    ind1    ind2    ind3    ind4    ind5    ind6    ind7    ind8&lt;br /&gt;
1       13116   T       N       G       T       T       N       G       N       T       T&lt;br /&gt;
1       13118   A       N       G       A       A       N       G       N       A       A&lt;br /&gt;
1       14930   A       G       G       G       A       N       N       A       N       G&lt;br /&gt;
1       15211   G       N       G       T       G       N       N       N       G       G&lt;br /&gt;
1       54490   G       N       G       N       G       N       N       N       N       A&lt;br /&gt;
1       54716   C       T       C       C       C       T       N       N       N       C&lt;br /&gt;
1       58814   G       N       G       N       G       G       G       N       N       G&lt;br /&gt;
1       62777   A       N       N       A       N       A       A       A       N       A&lt;br /&gt;
1       63268   T       N       T       N       T       C       N       T       N       N&lt;br /&gt;
1       63336   C       C       C       C       C       C       N       C       N       N&lt;br /&gt;
1       63671   G       N       G       N       N       G       G       G       N       N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''chr''' is the chromosome&lt;br /&gt;
&lt;br /&gt;
'''pos''' is the position&lt;br /&gt;
'''major''' is the major allele&lt;br /&gt;
&lt;br /&gt;
'''minor''' is the minor allele. Needs -doMajorMinor&lt;br /&gt;
&lt;br /&gt;
'''indX''' is samples base for individual number X. if -output01 1 then it is 1 for major, 0 for non major and -1 for missing&lt;br /&gt;
&lt;br /&gt;
=== sample based IBS matrix *.ibsMat ===&lt;br /&gt;
This function will print the pairwise IBS distance &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.ibsMat with -makeMatrix 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
0.000000        0.510638        0.606383        0.595745        0.545455        0.428571&lt;br /&gt;
0.510638        0.000000        0.154639        0.154639        0.108911        0.408602&lt;br /&gt;
0.606383        0.154639        0.000000        0.121212        0.137255        0.489362&lt;br /&gt;
0.595745        0.154639        0.121212        0.000000        0.106796        0.484211&lt;br /&gt;
0.545455        0.108911        0.137255        0.106796        0.000000        0.404040&lt;br /&gt;
0.428571        0.408602        0.489362        0.484211        0.404040        0.000000&lt;br /&gt;
0.577320        0.121212        0.181818        0.171717        0.097087        0.473684&lt;br /&gt;
0.536082        0.090000        0.138614        0.118812        0.047619        0.428571&lt;br /&gt;
0.262500        0.571429        0.702381        0.694118        0.632184        0.353659&lt;br /&gt;
0.458333        0.383838        0.484848        0.494949        0.398058        0.368421&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nind x Nind matrix with pairwise IBS distance&lt;br /&gt;
&lt;br /&gt;
=== sample based covariance matrix *.covMat ===&lt;br /&gt;
This function will print the covariance matrix based on a single sampled read&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of output *.covMat with -doCov 1&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
1.098251        -0.026225       -0.005617       -0.014726       -0.022438       -0.021786&lt;br /&gt;
-0.026225       1.115986        -0.017167       0.000735        -0.017163       -0.016899&lt;br /&gt;
-0.005617       -0.017167       1.074779        -0.015685       -0.019819       -0.015473&lt;br /&gt;
-0.014726       0.000735        -0.015685       1.072853        -0.013641       -0.007789&lt;br /&gt;
-0.022438       -0.017163       -0.019819       -0.013641       1.094612        -0.016045&lt;br /&gt;
-0.021786       -0.016899       -0.015473       -0.007789       -0.016045       1.059264&lt;br /&gt;
-0.005831       -0.009854       -0.001269       -0.002362       -0.018479       -0.011942&lt;br /&gt;
-0.015399       -0.020010       -0.001296       -0.022947       -0.006515       -0.003938&lt;br /&gt;
-0.001730       -0.040534       -0.002295       -0.017442       -0.024194       -0.007469&lt;br /&gt;
-0.016094       -0.015303       -0.018302       -0.022502       -0.030503       -0.001208&lt;br /&gt;
-0.122045       -0.106068       -0.103089       -0.104443       -0.110237       -0.103610&lt;br /&gt;
-0.106553       -0.100202       -0.104754       -0.109399       -0.107645       -0.111665&lt;br /&gt;
-0.108945       -0.102440       -0.105292       -0.101372       -0.107110       -0.106639&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Nind x Nind covariance matrix&lt;br /&gt;
&lt;br /&gt;
==Model==&lt;br /&gt;
&lt;br /&gt;
=== IBS ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
pairwise distance between individuals&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
d_{ij} = \frac{\sum_m^M 1-I_{b_j}(b_i)}{M}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
where M in the number of sites with a read for both individuals. &amp;lt;math&amp;gt; 1-I_{b_j}(b_i) &amp;lt;/math&amp;gt; is the indicator function which is equal to one with the two individuals i and j have the same base and zero otherwise&lt;br /&gt;
&lt;br /&gt;
=== Covariance ===&lt;br /&gt;
&lt;br /&gt;
Allele frequency based on single reads. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
f_{m} = \frac{N_{minor}}{N_{major} + N_{minor}}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
cov(ij) = \frac{1}{M}\sum_m^M \frac{ (h^i_m-f_m)(h^j_m-f_m) }{f_m(1-f_m)}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where M in the number of sites with a read for both individuals. &amp;lt;math&amp;gt; h^i_m&amp;lt;/math&amp;gt; is 1 if individuals i for site m has the major allele and zero otherwise&lt;br /&gt;
&lt;br /&gt;
=MDS/PCA using R=&lt;br /&gt;
&lt;br /&gt;
[[File:PCA_MDS.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
## MDS&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.ibsMat&amp;quot;&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
mds &amp;lt;- cmdscale(as.dist(m))&lt;br /&gt;
plot(mds,lwd=2,ylab=&amp;quot;Dist&amp;quot;,xlab=&amp;quot;Dist&amp;quot;,main=&amp;quot;multidimensional scaling&amp;quot;,col=rep(1:3,each=10))&lt;br /&gt;
&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.covMat&amp;quot;&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
e &amp;lt;- eigen(m)&lt;br /&gt;
plot(e$vectors[,1:2],lwd=2,ylab=&amp;quot;PC 2&amp;quot;,xlab=&amp;quot;PC 2&amp;quot;,main=&amp;quot;Principal components&amp;quot;,col=rep(1:3,each=10),pch=16)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=other fun stuff=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
## heatmap / clustering / trees&lt;br /&gt;
name &amp;lt;- &amp;quot;angsdput.ibsMat&amp;quot; # or covMat&lt;br /&gt;
m &amp;lt;- as.matrix(read.table(name))&lt;br /&gt;
#heat map&lt;br /&gt;
heatmap(m)&lt;br /&gt;
#neighbour joining&lt;br /&gt;
plot(ape::nj(m))&lt;br /&gt;
plot(hclust(dist(m), &amp;quot;ave&amp;quot;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=HWE_and_Inbreeding_estimates&amp;diff=3025</id>
		<title>HWE and Inbreeding estimates</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=HWE_and_Inbreeding_estimates&amp;diff=3025"/>
		<updated>2018-02-07T14:59:33Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* Site estimate of inbreeding coefficient (HWE test) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Site estimate of inbreeding coefficient (HWE test)=&lt;br /&gt;
&lt;br /&gt;
;-HWE_pval_F&lt;br /&gt;
&lt;br /&gt;
By choosing '''HWE_pval_F 1''', no sites will be filtered but the p-value along with the frequency, and F for the sites will be dumped in '''.hweF.gz''' file.&lt;br /&gt;
&lt;br /&gt;
=Estimation of individual inbreeding coefficients=&lt;br /&gt;
Filipe G. Vieira  has been extending the ANGSD feature set by a very fancy method for estimating individual inbreeding coefficients. Please see his website here: https://github.com/fgvieira/ngsF for the latest version which is utilising ANGSD output&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=MediaWiki:Sidebar&amp;diff=3024</id>
		<title>MediaWiki:Sidebar</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=MediaWiki:Sidebar&amp;diff=3024"/>
		<updated>2018-02-07T14:58:32Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* Pages&lt;br /&gt;
** Main_Page#Overview|ANGSD overview&lt;br /&gt;
** Download_and_installation|Installation&lt;br /&gt;
** Quick_Start|Quick Start/Testdata&lt;br /&gt;
** Input|Input data&lt;br /&gt;
** filters | Filters&lt;br /&gt;
** snpFilters | snpFilters&lt;br /&gt;
* Population genetics&lt;br /&gt;
** SFS Estimation|SFS Estimation&lt;br /&gt;
**tajima|Thetas,Tajima,Neutrality test&lt;br /&gt;
** 2d SFS Estimation |(Multi) SFS Estimation&lt;br /&gt;
** Direct Ancestry | Direct Ancestry &lt;br /&gt;
*  Population structure&lt;br /&gt;
** NGSadmix | Admixture&lt;br /&gt;
** Fst |Fst&lt;br /&gt;
** Abbababa |ABBABABA (D-stat)&lt;br /&gt;
** Abbababa2 |ABBABABA (multipop)&lt;br /&gt;
** Pbs | Population branch statistics (pbs)&lt;br /&gt;
** PCA | PCA &lt;br /&gt;
** PCA_MDS | PCA (sampling approach)&lt;br /&gt;
* Medical genetics&lt;br /&gt;
** Association|Association&lt;br /&gt;
&lt;br /&gt;
* IBD/IBS&lt;br /&gt;
** Relatedness | Relatedness&lt;br /&gt;
** HWE_and_Inbreeding_estimates|HWE and inbreeding with ngsF&lt;br /&gt;
** HWE_test | HWE test&lt;br /&gt;
** Genotype_Distribution | Genotype distribution&lt;br /&gt;
** Heterozygosity | Heterozygosity&lt;br /&gt;
&lt;br /&gt;
* Summaries&lt;br /&gt;
** Contamination|Contamination&lt;br /&gt;
** Error estimation|Error estimation&lt;br /&gt;
** alleles_counts|Allele counts&lt;br /&gt;
** depth|Depth&lt;br /&gt;
** base_quality|Base quality&lt;br /&gt;
** fasta | Create Fasta file&lt;br /&gt;
** Mismatch | Mismatch&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* SNPs and genotypes&lt;br /&gt;
** Genotype_likelihoods|Genotypes likelihoods&lt;br /&gt;
** Inferring_Major_and_Minor_alleles|Major and Minor&lt;br /&gt;
** Allele_Frequency_estimation|Allele frequencies&lt;br /&gt;
** Genotype_calling|Genotype calling&lt;br /&gt;
** Haploid_calling|Haploid calling&lt;br /&gt;
** SNP_calling|SNP Calling&lt;br /&gt;
&amp;lt;!-- ** SNP_Calling|SNP Calling --&amp;gt;&lt;br /&gt;
&amp;lt;!-- ** Covariance_matrix_for_PCA|PCA --&amp;gt;&lt;br /&gt;
&amp;lt;!-- ** Heterozygosity|Heterozogosity --&amp;gt;&lt;br /&gt;
&amp;lt;!-- ** HWE_and_Inbreeding_estimates|HWE and inbreeding --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Output&lt;br /&gt;
** beagle_input|Beagle inputation&lt;br /&gt;
** Genotype_likelihoods#Output_genotype_likelihoods|Genotype likelihood files&lt;br /&gt;
** Plink |Plink&lt;br /&gt;
&lt;br /&gt;
*Misc/util programs&lt;br /&gt;
** realSFS | realSFS&lt;br /&gt;
** msToGlf | msToGlf&lt;br /&gt;
** thetaStat | thetaStat&lt;br /&gt;
** supersim | supersim&lt;br /&gt;
&lt;br /&gt;
* Program structure&lt;br /&gt;
** angsd structure |Introduction&lt;br /&gt;
** angsd_class | overview of class&lt;br /&gt;
** custom_start | getting started &lt;br /&gt;
** data_access | accessing core data&lt;br /&gt;
** custom_data | custom data containers&lt;br /&gt;
** print | printing results &lt;br /&gt;
&lt;br /&gt;
* About ANGSD&lt;br /&gt;
** change_log|Version log&lt;br /&gt;
** citing_angsd|Citing angsd&lt;br /&gt;
** authors|Authors&lt;br /&gt;
** Bugs | Bugs&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
* navigation&lt;br /&gt;
** mainpage|mainpage-description&lt;br /&gt;
** portal-url|portal&lt;br /&gt;
** currentevents-url|currentevents&lt;br /&gt;
** recentchanges-url|recentchanges&lt;br /&gt;
** randompage-url|randompage&lt;br /&gt;
** helppage|help&lt;br /&gt;
* SEARCH&lt;br /&gt;
* TOOLBOX&lt;br /&gt;
* LANGUAGES&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=PCA&amp;diff=3023</id>
		<title>PCA</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=PCA&amp;diff=3023"/>
		<updated>2018-02-07T14:56:42Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* single read sampling */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
= Genotype likelihood approach =&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
==PCAngsd==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For PCA analysis we would recommend using [http://www.popgen.dk/software/index.php/PCAngsd PCAngsd] which is based on genotype likelihoods from variable sites. This works well for low/medium depth sequencing even with sequencing depth varies between samples. &lt;br /&gt;
&lt;br /&gt;
You can generate the input files in ANGSD with the command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 2 -out genolike -nThreads 10 -doGlf 2 -doMajorMinor 1 -SNP_pval 1e-6 -doMaf 1  -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
which will output genotype likelihoods for variable sites in the beagle format. This file can then be used in [http://www.popgen.dk/software/index.php/PCAngsd PCAngsd]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==NGS tools==&lt;br /&gt;
ngsTools methods for doing PCA/Covariance based on genotype likelihoods files: &lt;br /&gt;
&lt;br /&gt;
Fumagalli, M, Vieira, FG, Korneliussen, TS, Linderoth, T, Huerta-Sánchez, E, Albrechtsen, A, Nielsen, R (2013). Quantifying population genetic differentiation from next-generation sequencing data. Genetics, 195, 3:979-92.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This works even without the need to call SNPs or genotypes based on genotype likelihoods. &lt;br /&gt;
&lt;br /&gt;
NB! If you have very different depths for the different samples, e.i. some very low and others medium and high, then you might want to use the PCAngsd or use single base sampling approach [[PCA_MDS]]&lt;br /&gt;
&lt;br /&gt;
The main documentation for this is found here:&lt;br /&gt;
https://github.com/mfumagalli/ngsTools and here https://github.com/mfumagalli/ngsTools#ngscovar&lt;br /&gt;
&lt;br /&gt;
= single read sampling =&lt;br /&gt;
Both PCA and MDS can be performed based on sampling of a single read at each site. This can work even with very low depth data e.g. &amp;lt;1X. This method can be found here:[[PCA_MDS]]. However, it requires low error rate and polymorphic sites need to be inferred (or provided by user based on for example reference data such as the 1000G for humans)&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=PCA&amp;diff=3022</id>
		<title>PCA</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=PCA&amp;diff=3022"/>
		<updated>2018-02-07T14:55:46Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* NGS tools */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
= Genotype likelihood approach =&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
==PCAngsd==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For PCA analysis we would recommend using [http://www.popgen.dk/software/index.php/PCAngsd PCAngsd] which is based on genotype likelihoods from variable sites. This works well for low/medium depth sequencing even with sequencing depth varies between samples. &lt;br /&gt;
&lt;br /&gt;
You can generate the input files in ANGSD with the command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 2 -out genolike -nThreads 10 -doGlf 2 -doMajorMinor 1 -SNP_pval 1e-6 -doMaf 1  -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
which will output genotype likelihoods for variable sites in the beagle format. This file can then be used in [http://www.popgen.dk/software/index.php/PCAngsd PCAngsd]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==NGS tools==&lt;br /&gt;
ngsTools methods for doing PCA/Covariance based on genotype likelihoods files: &lt;br /&gt;
&lt;br /&gt;
Fumagalli, M, Vieira, FG, Korneliussen, TS, Linderoth, T, Huerta-Sánchez, E, Albrechtsen, A, Nielsen, R (2013). Quantifying population genetic differentiation from next-generation sequencing data. Genetics, 195, 3:979-92.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This works even without the need to call SNPs or genotypes based on genotype likelihoods. &lt;br /&gt;
&lt;br /&gt;
NB! If you have very different depths for the different samples, e.i. some very low and others medium and high, then you might want to use the PCAngsd or use single base sampling approach [[PCA_MDS]]&lt;br /&gt;
&lt;br /&gt;
The main documentation for this is found here:&lt;br /&gt;
https://github.com/mfumagalli/ngsTools and here https://github.com/mfumagalli/ngsTools#ngscovar&lt;br /&gt;
&lt;br /&gt;
= single read sampling =&lt;br /&gt;
Both PCA and MDS can be performed based on sampling of a single read at each site. This can work even with very low depth data e.g. &amp;lt;1X. This method can be found here:[[PCA_MDS]]. However, it requires low error rate and polymorphic sites need to be inferred (or provided by user)&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=PCA&amp;diff=3021</id>
		<title>PCA</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=PCA&amp;diff=3021"/>
		<updated>2018-02-07T14:54:45Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
= Genotype likelihood approach =&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
==PCAngsd==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For PCA analysis we would recommend using [http://www.popgen.dk/software/index.php/PCAngsd PCAngsd] which is based on genotype likelihoods from variable sites. This works well for low/medium depth sequencing even with sequencing depth varies between samples. &lt;br /&gt;
&lt;br /&gt;
You can generate the input files in ANGSD with the command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 2 -out genolike -nThreads 10 -doGlf 2 -doMajorMinor 1 -SNP_pval 1e-6 -doMaf 1  -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
which will output genotype likelihoods for variable sites in the beagle format. This file can then be used in [http://www.popgen.dk/software/index.php/PCAngsd PCAngsd]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==NGS tools==&lt;br /&gt;
ngsTools methods for doing PCA/Covariance based on genotype likelihoods files: &lt;br /&gt;
&lt;br /&gt;
Fumagalli, M, Vieira, FG, Korneliussen, TS, Linderoth, T, Huerta-Sánchez, E, Albrechtsen, A, Nielsen, R (2013). Quantifying population genetic differentiation from next-generation sequencing data. Genetics, 195, 3:979-92.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This works without the need to call SNPs or genotypes based on genotype likelihoods. &lt;br /&gt;
&lt;br /&gt;
NB! If you have very different depths for the different samples, e.i. some very low and others medium and high, then you might want to use the single base sampling approach [[PCA_MDS]]&lt;br /&gt;
&lt;br /&gt;
The main documentation for this is found here:&lt;br /&gt;
https://github.com/mfumagalli/ngsTools and here https://github.com/mfumagalli/ngsTools#ngscovar&lt;br /&gt;
&lt;br /&gt;
= single read sampling =&lt;br /&gt;
Both PCA and MDS can be performed based on sampling of a single read at each site. This can work even with very low depth data e.g. &amp;lt;1X. This method can be found here:[[PCA_MDS]]. However, it requires low error rate and polymorphic sites need to be inferred (or provided by user)&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=File:Pcangsd_pca.png&amp;diff=3020</id>
		<title>File:Pcangsd pca.png</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=File:Pcangsd_pca.png&amp;diff=3020"/>
		<updated>2018-02-07T14:54:36Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=NGSadmix&amp;diff=3019</id>
		<title>NGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=NGSadmix&amp;diff=3019"/>
		<updated>2018-02-07T14:44:52Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;To estimate admixture proportions from sequencing data then you can use NGSadmix [[File:admixtureNGS.png|thumb]]&lt;br /&gt;
NGSadmix has it's very own webpage, because we like it so much.&lt;br /&gt;
[http://www.popgen.dk/software/index.php/NgsAdmix NGSadmix webpage]&lt;br /&gt;
&lt;br /&gt;
===Quick run===&lt;br /&gt;
You can generate input files for NGSadmix easily in ANGSD see [[Beagle_input]].&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genolike -nThreads 10 -doGlf 2 -doMajorMinor 1 -SNP_pval 1e-6 -doMaf 1  -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and then run NGSadmix (found in the mics folder in the angsd folder)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
NGSadmix -likes input.gz -K 3 -P 4 -o myoutfiles -minMaf 0.05 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=NGSadmix&amp;diff=3018</id>
		<title>NGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=NGSadmix&amp;diff=3018"/>
		<updated>2018-02-07T14:15:56Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;To estimate admixture proportions from sequencing data then you can use NGSadmix&lt;br /&gt;
NGSadmix has it's very own webpage, because we like it so much.&lt;br /&gt;
[[File:admixtureNGS.png|thumb]]&lt;br /&gt;
http://www.popgen.dk/software/index.php/NgsAdmix&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can generate input files for NGSadmix easily in ANGSD see [[Beagle_input]].&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=File:AdmixtureNGS.png&amp;diff=3017</id>
		<title>File:AdmixtureNGS.png</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=File:AdmixtureNGS.png&amp;diff=3017"/>
		<updated>2018-02-07T14:15:34Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Genotype_calling&amp;diff=3004</id>
		<title>Genotype calling</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Genotype_calling&amp;diff=3004"/>
		<updated>2018-01-19T12:59:11Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* Options */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;We really don't recommend doing analysis based on called genotypes, but incorporate the uncertainty directly into the analysis you want to perform. But we recognise that many methods are still relying on called genotypes, and have therefore implemented a basic genotype caller into angsd.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Genotype calling in ANGSD is based on calculating the posterior probability of the genotypes. The '''-doGeno''' is therefore a simple wrapper around the '''-doPost''' along with some extra filtering options. See [[Allele Frequencies]] for more information.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -dogeno         -&amp;gt; Wed Mar  2 12:39:19 2016&lt;br /&gt;
-----------------&lt;br /&gt;
abcCallGenotypes.cpp:&lt;br /&gt;
&lt;br /&gt;
-doGeno 0&lt;br /&gt;
        1: write major and minor&lt;br /&gt;
        2: write the called genotype encoded as -1,0,1,2, -1=not called&lt;br /&gt;
        4: write the called genotype directly: eg AA,AC etc &lt;br /&gt;
        8: write the posterior probability of all possible genotypes&lt;br /&gt;
        16: write the posterior probability of called genotype&lt;br /&gt;
        32: write the posterior probabilities of the 3 gentypes as binary&lt;br /&gt;
        -&amp;gt; A combination of the above can be choosen by summing the values, EG write 0,1,2 types with majorminor as -doGeno 3&lt;br /&gt;
        -postCutoff=0.333333 (Only genotype to missing if below this threshold)&lt;br /&gt;
        -geno_minDepth=-1       (-1 indicates no cutof)&lt;br /&gt;
        -geno_maxDepth=-1       (-1 indicates no cutof)&lt;br /&gt;
        -geno_minMM=-1.000000   (minimum fraction af major-minor bases)&lt;br /&gt;
        -minInd=0       (only keep sites if you call genotypes from this number of individuals)&lt;br /&gt;
&lt;br /&gt;
        NB When writing the posterior the -postCutoff is not used&lt;br /&gt;
        NB geno_minDepth requires -doCounts&lt;br /&gt;
        NB geno_maxDepth requires -doCounts&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
angsd can also use the full information of the sample allele frequencies for calling genotypes see [[SFS Estimation]].&lt;br /&gt;
==Options==&lt;br /&gt;
;-doGeno [int]&lt;br /&gt;
1: print out major minor&lt;br /&gt;
&lt;br /&gt;
2: print the called genotype as -1,0,1,2&lt;br /&gt;
&lt;br /&gt;
4: print the called genotype as AA, AC, AG, ...&lt;br /&gt;
&lt;br /&gt;
8: print all 3 posts (major,major),(major,minor),(minor,minor)&lt;br /&gt;
&lt;br /&gt;
16: print the posterior of the called genotype&lt;br /&gt;
&lt;br /&gt;
32: somewhat different dumps the binary posterior for all samples, encoded as 3*nind double&lt;br /&gt;
&lt;br /&gt;
Use the sum of the above to give the output you want. Forexample -doGeno 5 (1+4) prins the major and minor allele followed by the genotype (AA, AC ...) for each individual&lt;br /&gt;
&lt;br /&gt;
; -doPost [int]&lt;br /&gt;
1: estimate the posterior genotype probability based on the allele frequency as a prior&lt;br /&gt;
&lt;br /&gt;
2: estimate the posterior genotype probability assuming a uniform prior&lt;br /&gt;
&lt;br /&gt;
; -geno_minDepth [int]&lt;br /&gt;
set genotypes to missing if the individual depth is less than [int] &lt;br /&gt;
&lt;br /&gt;
; -geno_maxDepth [int]&lt;br /&gt;
set genotypes to missing if the individual depth is larger than [int] &lt;br /&gt;
&lt;br /&gt;
; -geno_minMM [float]&lt;br /&gt;
set genotypes to missing if less than [float] of the bases are the major or minor (likely a triallic site). e.g. 0.1 means that less than 10% of reads in this individual is either the major or the minor&lt;br /&gt;
&lt;br /&gt;
; -postCutoff [float]&lt;br /&gt;
Call only a genotype with a posterior above this threshold.&lt;br /&gt;
&lt;br /&gt;
NB if the raw posterior dump is requested the -postCutoff is not used&lt;br /&gt;
&lt;br /&gt;
==Examples==&lt;br /&gt;
===Allele frequency as prior===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doMajorMinor 1 -SNP_pval 0.000001 -doGeno 5 -doPost 1 -postCutoff 0.95&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gives a output like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1       14000202        G       A       GG      NN      NN      GA      NN      &lt;br /&gt;
1       14000873        G       A       GG      GG      GG      AA      GA      &lt;br /&gt;
1       14001018        T       C       NN      NN      NN      CC      NN      &lt;br /&gt;
1       14001867        A       G       NN      AA      AA      NN      NN      &lt;br /&gt;
1       14002342        C       T       CC      CC      CC      CC      CC      &lt;br /&gt;
1       14002422        A       T       AA      NN      NN      NN      NN      &lt;br /&gt;
1       14002474        T       C       TC      TT      TT      TT      TT      &lt;br /&gt;
1       14003581        C       T       CC      CC      NN      NN      CT      &lt;br /&gt;
1       14004623        T       C       TT      TT      TT      NN      TC      &lt;br /&gt;
1       14005069        A       G       AA      AA      AA      AA      AA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
===Sample allele frequency with SFS as prior===&lt;br /&gt;
1. First get an estimate of the site frequency spectrum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -dosaf 1 -anc ../hg19ancNoChr.fa.gz -gl 1 -b list&lt;br /&gt;
./realSFS angsdput.saf.idx &amp;gt;angsdput.saf.idx.ml&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
2. Now calculate diallelic genotype posterior probablity with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -dopost 3 -b list -gl 1 -domajorminor 1 -domaf 1 -pest angsdput.saf.idx.ml -dogeno 2 -r 1 -out angsdput2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Genotype_calling&amp;diff=3003</id>
		<title>Genotype calling</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Genotype_calling&amp;diff=3003"/>
		<updated>2018-01-19T12:58:38Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* Options */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;We really don't recommend doing analysis based on called genotypes, but incorporate the uncertainty directly into the analysis you want to perform. But we recognise that many methods are still relying on called genotypes, and have therefore implemented a basic genotype caller into angsd.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Genotype calling in ANGSD is based on calculating the posterior probability of the genotypes. The '''-doGeno''' is therefore a simple wrapper around the '''-doPost''' along with some extra filtering options. See [[Allele Frequencies]] for more information.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -dogeno         -&amp;gt; Wed Mar  2 12:39:19 2016&lt;br /&gt;
-----------------&lt;br /&gt;
abcCallGenotypes.cpp:&lt;br /&gt;
&lt;br /&gt;
-doGeno 0&lt;br /&gt;
        1: write major and minor&lt;br /&gt;
        2: write the called genotype encoded as -1,0,1,2, -1=not called&lt;br /&gt;
        4: write the called genotype directly: eg AA,AC etc &lt;br /&gt;
        8: write the posterior probability of all possible genotypes&lt;br /&gt;
        16: write the posterior probability of called genotype&lt;br /&gt;
        32: write the posterior probabilities of the 3 gentypes as binary&lt;br /&gt;
        -&amp;gt; A combination of the above can be choosen by summing the values, EG write 0,1,2 types with majorminor as -doGeno 3&lt;br /&gt;
        -postCutoff=0.333333 (Only genotype to missing if below this threshold)&lt;br /&gt;
        -geno_minDepth=-1       (-1 indicates no cutof)&lt;br /&gt;
        -geno_maxDepth=-1       (-1 indicates no cutof)&lt;br /&gt;
        -geno_minMM=-1.000000   (minimum fraction af major-minor bases)&lt;br /&gt;
        -minInd=0       (only keep sites if you call genotypes from this number of individuals)&lt;br /&gt;
&lt;br /&gt;
        NB When writing the posterior the -postCutoff is not used&lt;br /&gt;
        NB geno_minDepth requires -doCounts&lt;br /&gt;
        NB geno_maxDepth requires -doCounts&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
angsd can also use the full information of the sample allele frequencies for calling genotypes see [[SFS Estimation]].&lt;br /&gt;
==Options==&lt;br /&gt;
;-doGeno [int]&lt;br /&gt;
1: print out major minor&lt;br /&gt;
&lt;br /&gt;
2: print the called genotype as -1,0,1,2&lt;br /&gt;
&lt;br /&gt;
4: print the called genotype as AA, AC, AG, ...&lt;br /&gt;
&lt;br /&gt;
8: print all 3 posts (major,major),(major,minor),(minor,minor)&lt;br /&gt;
&lt;br /&gt;
16: print the posterior of the called genotype&lt;br /&gt;
&lt;br /&gt;
32: somewhat different dumps the binary posterior for all samples, encoded as 3*nind double&lt;br /&gt;
&lt;br /&gt;
Use the sum of the above to give the output you want. Forexample -doGeno 5 (1+4) prins the major and minor allele followed by the genotype (AA, AC ...) for each individual&lt;br /&gt;
&lt;br /&gt;
; -doPost [int]&lt;br /&gt;
1: estimate the posterior genotype probability based on the allele frequency as a prior&lt;br /&gt;
&lt;br /&gt;
2: estimate the posterior genotype probability assuming a uniform prior&lt;br /&gt;
&lt;br /&gt;
; -geno_minDepth [int]&lt;br /&gt;
set genotypes to missing if the individual depth is less than [int] &lt;br /&gt;
&lt;br /&gt;
; -geno_maxDepth [int]&lt;br /&gt;
set genotypes to missing if the individual depth is larger than [int] &lt;br /&gt;
&lt;br /&gt;
; -geno_minMM [float]&lt;br /&gt;
set genotypes to missing if less than [float] of the bases are the major or minor (likely a triallic site). e.g. 0.1 means that less than 10% of reads are either the major or the minor in this indivual&lt;br /&gt;
&lt;br /&gt;
; -postCutoff [float]&lt;br /&gt;
Call only a genotype with a posterior above this threshold.&lt;br /&gt;
&lt;br /&gt;
NB if the raw posterior dump is requested the -postCutoff is not used&lt;br /&gt;
&lt;br /&gt;
==Examples==&lt;br /&gt;
===Allele frequency as prior===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doMajorMinor 1 -SNP_pval 0.000001 -doGeno 5 -doPost 1 -postCutoff 0.95&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gives a output like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1       14000202        G       A       GG      NN      NN      GA      NN      &lt;br /&gt;
1       14000873        G       A       GG      GG      GG      AA      GA      &lt;br /&gt;
1       14001018        T       C       NN      NN      NN      CC      NN      &lt;br /&gt;
1       14001867        A       G       NN      AA      AA      NN      NN      &lt;br /&gt;
1       14002342        C       T       CC      CC      CC      CC      CC      &lt;br /&gt;
1       14002422        A       T       AA      NN      NN      NN      NN      &lt;br /&gt;
1       14002474        T       C       TC      TT      TT      TT      TT      &lt;br /&gt;
1       14003581        C       T       CC      CC      NN      NN      CT      &lt;br /&gt;
1       14004623        T       C       TT      TT      TT      NN      TC      &lt;br /&gt;
1       14005069        A       G       AA      AA      AA      AA      AA&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
===Sample allele frequency with SFS as prior===&lt;br /&gt;
1. First get an estimate of the site frequency spectrum&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -dosaf 1 -anc ../hg19ancNoChr.fa.gz -gl 1 -b list&lt;br /&gt;
./realSFS angsdput.saf.idx &amp;gt;angsdput.saf.idx.ml&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
2. Now calculate diallelic genotype posterior probablity with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -dopost 3 -b list -gl 1 -domajorminor 1 -domaf 1 -pest angsdput.saf.idx.ml -dogeno 2 -r 1 -out angsdput2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/angsd/index.php?title=Association&amp;diff=2977</id>
		<title>Association</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/angsd/index.php?title=Association&amp;diff=2977"/>
		<updated>2017-08-10T08:32:11Z</updated>

		<summary type="html">&lt;p&gt;Albrecht: /* Output */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Association can be performed using two approaches.&lt;br /&gt;
# Based on testing differences in allele frequencies between cases and controls, using genotype likelihoods&lt;br /&gt;
# Based on a generalized linear framework which also allows for quantitative traits and binary and for including additional covariates, using genotype posteriors. &lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
We recommend that users don't perform association analysis on all sites, but limit the analysis to informative sites, and in the case of alignement data (BAM), we advise that users filter away the low mapping quality reads and the low qscore bases.&lt;br /&gt;
&lt;br /&gt;
The filtering of the alignment data is described in [[Input]], and filtering based on frequencies/polymorphic sites are described [[Filters#Allele_frequencies| here]].&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
This can be done easily at the command line by adding the below commands&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
-minQ 20 -minMapQ 30 -SNP_pval 1e-6 #Use polymorphic sites with a p-value of 10^-6&lt;br /&gt;
-minQ 20 -minMapQ 30 -minMaf 0.05 #Use sites with a MAF &amp;gt;0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
=Brief Overview=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -doAsso&lt;br /&gt;
analysisAsso.cpp:&lt;br /&gt;
	-doAsso	0&lt;br /&gt;
	1: Frequency Test (Known Major and Minor)&lt;br /&gt;
	2: Score Test&lt;br /&gt;
	3: Frequency Test (Unknown Minor)	&lt;br /&gt;
  Frequency Test Options:&lt;br /&gt;
	-yBin		(null)	(File containing disease status)	&lt;br /&gt;
&lt;br /&gt;
  Score Test Options:&lt;br /&gt;
	-yBin		(null)	(File containing disease status)&lt;br /&gt;
	-yQuant		(null)	(File containing phenotypes)&lt;br /&gt;
	-minHigh	10	(Require atleast minHigh number of high credible genotypes)&lt;br /&gt;
	-minCount	10	(Require this number of minor alleles, estimated from MAF)&lt;br /&gt;
	-cov		(null)	(File containing additional covariates)&lt;br /&gt;
	-model	1&lt;br /&gt;
	1: Additive/Log-Additive (Default)&lt;br /&gt;
	2: Dominant&lt;br /&gt;
	3: Recessive&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
=Case control association using allele frequencies=&lt;br /&gt;
To test for differences in the allele frequencies,  genotype likelihood needs to be provided or [[Genotype_likelihoods_from_alignments | estimated]]. The test is an implimentation of the likelihoods ratio test for differences between cases and controls described in details in [[Kim2011]].&lt;br /&gt;
&lt;br /&gt;
;-doAsso [int] &lt;br /&gt;
'''1''': The test is performed assuming the minor allele is known. &amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;-yBin [Filename]&lt;br /&gt;
A file containing the case control status. 0 being the controls, 1 being the cases and -999 being missing phenotypes. The file should contain a single phenotype entry per line.&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of cases control phenotype file&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
1&lt;br /&gt;
0&lt;br /&gt;
0&lt;br /&gt;
0&lt;br /&gt;
1&lt;br /&gt;
1&lt;br /&gt;
1&lt;br /&gt;
1&lt;br /&gt;
0&lt;br /&gt;
-999&lt;br /&gt;
1&lt;br /&gt;
0&lt;br /&gt;
0&lt;br /&gt;
0&lt;br /&gt;
0&lt;br /&gt;
1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Example==&lt;br /&gt;
&lt;br /&gt;
create a large number of individuals by recycling the example files (500 individuals) and simulate some phentypes (case/control) using R&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
for i in `seq 1 50`;do cat bam.filelist&amp;gt;&amp;gt;large.filelist;done&lt;br /&gt;
Rscript -e &amp;quot;write.table(cbind(rbinom(500,1,0.5)),'pheno.ybin',row=F,col=F)&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -yBin pheno.ybin -doAsso 1 -GL 1 -out out -doMajorMinor 1 -doMaf 1 -SNP_pval 1e-6 -bam large.filelist -r 1: -P 5&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note that because you are reading 500 bam files it takes a little while&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
gunzip -c out.lrt0.gz | head&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
Chromosome	Position	Major	Minor	Frequency	LRT&lt;br /&gt;
1	14000003	G	A	0.057070	0.016684&lt;br /&gt;
1	14000013	G	A	0.067886	0.029014&lt;br /&gt;
1	14000019	G	T	0.052904	0.569061&lt;br /&gt;
1	14000023	C	A	0.073336	0.184060&lt;br /&gt;
1	14000053	T	C	0.038903	0.604695&lt;br /&gt;
1	14000170	C	T	0.050756	0.481033&lt;br /&gt;
1	14000176	G	A	0.053157	0.424910&lt;br /&gt;
1	14000200	C	A	0.085332	0.485030&lt;br /&gt;
1	14000202	G	A	0.257132	0.025047&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The LRT is the likelihood ration statistics which is chi square distributed with one degree of freedom. &lt;br /&gt;
&lt;br /&gt;
==Dependency Chain==&lt;br /&gt;
The method is based on estimating frequencies from genotype likelihoods. If alignment data has been supplied you need to specify the following.&lt;br /&gt;
&lt;br /&gt;
# [[Genotype_likelihoods_from_alignments | Genotype likelihood model (-GL)]].&lt;br /&gt;
#[[Inferring_Major_and_Minor_alleles  |Determine Major/Minor (-doMajorMinor)]].&lt;br /&gt;
#[[Allele_Frequency_estimation| Maf estimator (-doMaf)]].&lt;br /&gt;
&lt;br /&gt;
If you have supplied genotype likelihood files as input for angsd you can skip 1.&lt;br /&gt;
&lt;br /&gt;
=Score statistic=&lt;br /&gt;
To perform the test in a generalized linear framework posterior genotype probabilities must be provided or [[Genotype_calling|estimated]]. The approach is published here [[skotte2012]].&lt;br /&gt;
;-doAsso 2&lt;br /&gt;
&lt;br /&gt;
;-yBin [Filename]&lt;br /&gt;
A file containing the case control status. 0 being the controls, 1 being the cases and -999 being missing phenotypes. &lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of cases control phenotype file&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
1&lt;br /&gt;
0&lt;br /&gt;
0&lt;br /&gt;
0&lt;br /&gt;
1&lt;br /&gt;
1&lt;br /&gt;
1&lt;br /&gt;
1&lt;br /&gt;
0&lt;br /&gt;
-999&lt;br /&gt;
1&lt;br /&gt;
0&lt;br /&gt;
0&lt;br /&gt;
0&lt;br /&gt;
0&lt;br /&gt;
1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
;-yQuant [Filename]&lt;br /&gt;
File containing the phenotype values.-999 being missing phenotypes. The file should contain a single phenotype entry per line.&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of quantitative phenotype file&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
-999&lt;br /&gt;
2.06164722761138&lt;br /&gt;
-0.091935218675602&lt;br /&gt;
-0.287527686061831&lt;br /&gt;
-999&lt;br /&gt;
-999&lt;br /&gt;
-1.20996664036026&lt;br /&gt;
0.0188541092307412&lt;br /&gt;
-2.1122713873334&lt;br /&gt;
-999&lt;br /&gt;
-1.32920529536579&lt;br /&gt;
-1.10582299663753&lt;br /&gt;
-0.391773417823766&lt;br /&gt;
-0.501400984567535&lt;br /&gt;
-999&lt;br /&gt;
1.06014677976046&lt;br /&gt;
-1.10582299663753&lt;br /&gt;
-999&lt;br /&gt;
0.223156127557052&lt;br /&gt;
-0.189660869820135&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
;-cov [Filename]&lt;br /&gt;
Files containing additional covariates in the analysis. Each lines should contain the additional covariates for a single individuals. Thus the number of lines should match the number of individuals and the number of coloums should match the number of additional covariates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;div class=&amp;quot;toccolours mw-collapsible mw-collapsed&amp;quot;&amp;gt;&lt;br /&gt;
Example of covariate file&lt;br /&gt;
&amp;lt;pre class=&amp;quot;mw-collapsible-content&amp;quot;&amp;gt;&lt;br /&gt;
1 0 0 1 &lt;br /&gt;
1 0.1 0 0 &lt;br /&gt;
2 0 1 0 &lt;br /&gt;
2 0 1 0 &lt;br /&gt;
2 0.1 0 1 &lt;br /&gt;
1 0 0 1 &lt;br /&gt;
1 0.3 0 0 &lt;br /&gt;
2 0 0 0 &lt;br /&gt;
1 0 0 0 &lt;br /&gt;
2 0.2 0 1 &lt;br /&gt;
1 0 1 0 &lt;br /&gt;
1 0 0 0 &lt;br /&gt;
1 0.1 0 0 &lt;br /&gt;
1 0 0 0 &lt;br /&gt;
2 0 0 1 &lt;br /&gt;
2 0 0 0 &lt;br /&gt;
2 0 0 0 &lt;br /&gt;
1 0 0 1 &lt;br /&gt;
1 0.5 0 0 &lt;br /&gt;
2 0 0 0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
;-minHigh [int]&lt;br /&gt;
default = 10 &amp;lt;br&amp;gt;&lt;br /&gt;
This approach needs a certain amount of variability in the genotype probabilities. minHigh filters out sites that does not have at least [int] number of of homozygous major, heterozygous and homozygous minor genotypes. At least two of the three genotypes categories needs at least [int] individuals with a genotype probability above 0.9. This filter avoids the scenario where all individuals have genotypes with the same probability e.g. all are heterozygous with a high probability or all have 0.33333333 probability for all three genotypes. &lt;br /&gt;
;-minCount [int] &lt;br /&gt;
default = 10 &amp;lt;br&amp;gt;&lt;br /&gt;
The minimum expected minor alleles in the sample. This is the frequency multiplied by two times the number of individuals. Performing association on extremely low minor allele frequencies does not make sence.&lt;br /&gt;
;-model [int]&lt;br /&gt;
# Additive/Log-additive for Linear/Logistic Regression (Default).&lt;br /&gt;
# Dominant.&lt;br /&gt;
# Recessive.&lt;br /&gt;
&lt;br /&gt;
==Example==&lt;br /&gt;
create a large number of individuals by recycling the example files (500 individuals) and simulate some phentypes (case/control) using R&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
rm large.filelist&lt;br /&gt;
for i in `seq 1 50`;do cat bam.filelist&amp;gt;&amp;gt;large.filelist;done&lt;br /&gt;
Rscript -e &amp;quot;write.table(cbind(rbinom(500,1,0.5)),'pheno.ybin',row=F,col=F)&amp;quot;&lt;br /&gt;
Rscript -e &amp;quot;write.table(cbind(rnorm(500)),'pheno.yquant',row=F,col=F)&amp;quot;&lt;br /&gt;
Rscript -e &amp;quot;set.seed(1);write.table(cbind(rbinom(500,1,0.5),rnorm(500)),'cov.file',row=F,col=F)&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For cases control data for polymorphic sites (p-value &amp;lt; 1e-6)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -yBin pheno.ybin -doAsso 2 -GL 1 -doPost 1 -out out -doMajorMinor 1 -SNP_pval 1e-6 -doMaf 1 -bam large.filelist -P 5 -r 1:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For quantitative traits (normal distributed errors)  for polymorphic sites (p-value &amp;lt; 1e-6) and additional covariates&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -yQuant pheno.yquant -doAsso 2 -cov cov.file -GL 1 -doPost 1 -out out -doMajorMinor 1 -SNP_pval 1e-6 -doMaf 1  -bam large.filelist -P 5  -r 1:&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Example with imputation (using BEAGLE)==&lt;br /&gt;
&lt;br /&gt;
First the polymorphic sites to be analysed needs to be selected (-doMaf 1 -SNP_pval -doMajorMinor) and the genotype likelihoods estimated (-GL 1) for use in [http://faculty.washington.edu/browning/beagle/beagle.html the Beagle software] (-doGlf 2).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out input -doMajorMinor 1 -SNP_pval 1e-6 -doMaf 1  -bam large.filelist -P 5  -r 1: -doGlf 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Perform the imputation &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
java -Xmx15000m -jar beagle.jar like=input.beagle.gz out=beagleOut&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
the reference fai can be obtained by indexing the reference genome or by using a bam files header &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
samtools view -H  bams/smallNA11830.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam | grep SN |cut -f2,3 | sed 's/SN\://g' |  sed 's/LN\://g' &amp;gt; ref.fai&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The association can then be performed on the genotype probabilities using the score statistics&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -doMaf 4 -beagle beagleOut.impute.beagle.gz.gprobs.gz -fai ref.fai  -yBin pheno.ybin -doAsso 2 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Dependency Chain==&lt;br /&gt;
The method is based on genotype probabilities. If alignment data has been supplied you need to specify the following.&lt;br /&gt;
&lt;br /&gt;
# [[Genotype_likelihoods_from_alignments | Genotype likelihood model (-GL)]].&lt;br /&gt;
#[[Inferring_Major_and_Minor_alleles  |Determine Major/Minor (-doMajorMinor)]].&lt;br /&gt;
#[[Allele_Frequency_estimation| Maf estimator (-doMaf)]].&lt;br /&gt;
#[[Genotype_calling| Calculate posterior genotype probability (-doPost)]]. If you use the score statistics -doAsso 2 then calculate the posterior using the allele frequency as prior (-doPost 1). &lt;br /&gt;
&lt;br /&gt;
If you have supplied genotype likelihoods for angsd, then you should skip 1.&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you have supplied genotype probabilities (as beagle output format), there are no dependencies.&lt;br /&gt;
&lt;br /&gt;
=Output=&lt;br /&gt;
==Output format==&lt;br /&gt;
The output from the association analysis is a list of files called '''prefix.lrt'''. These are tab separated plain text files, with nine columns. &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Chromosome&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Position&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Major&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Minor&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Frequency&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| N*&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| LRT&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| highHe*&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| highHo*&lt;br /&gt;
|}&lt;br /&gt;
'''*''' Indicates that these columns are only used for the score test.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; &lt;br /&gt;
|- &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Field&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Description&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;row&amp;quot;| Chromosome&lt;br /&gt;
|  Chromosome.&lt;br /&gt;
|- &lt;br /&gt;
! scope=&amp;quot;row&amp;quot;| Position&lt;br /&gt;
| Physical Position.&lt;br /&gt;
|- &lt;br /&gt;
! scope=&amp;quot;row&amp;quot;| Major&lt;br /&gt;
| The Major allele as determined by [[MajorMinor |-doMajorMinor]]. If posterior genotype files has been supplied as input, this column is not defined.&lt;br /&gt;
|- &lt;br /&gt;
! scope=&amp;quot;row&amp;quot;| Minor&lt;br /&gt;
| The Minor allele as determined by [[MajorMinor |-doMajorMinor]]. If posterior genotype files has been supplied as input, this column is not defined.&lt;br /&gt;
|- &lt;br /&gt;
! scope=&amp;quot;row&amp;quot;| Frequency&lt;br /&gt;
| The Minor allele frequency as determined by [[Maf|-doMaf]].&lt;br /&gt;
|- &lt;br /&gt;
! scope=&amp;quot;row&amp;quot;| N*&lt;br /&gt;
| Number of individuals. That is the number of samples that have both sequencing data and phenotypic data.&lt;br /&gt;
|- &lt;br /&gt;
! scope=&amp;quot;row&amp;quot;| LRT&lt;br /&gt;
| The likelihood ratio statistic. This statistic is chi square distributed with one degree of freedom. Sites that fails one of the filters are given the value -999.000000.&lt;br /&gt;
|- &lt;br /&gt;
! scope=&amp;quot;row&amp;quot;| high_WT/HE/HO*&lt;br /&gt;
| Number of sites with a WE/HE/HO genotype posterior probability above 0.9. WT=major/major,HE=major/minor,HO=minor/minor.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Chromosome	Position	Major	Minor	Frequency	N	LRT	high_WT/HE/HO&lt;br /&gt;
1	14000023	C	A	0.052976	330	2.863582	250/10/0&lt;br /&gt;
1	14000072	G	T	0.020555	330	1.864555	320/10/0&lt;br /&gt;
1	14000113	A	G	0.019543	330	0.074985	320/10/0&lt;br /&gt;
1	14000202	G	A	0.270106	330	0.181530	50/90/0&lt;br /&gt;
1	14000375	T	C	0.020471	330	1.845881	320/10/0&lt;br /&gt;
1	14000851	T	C	0.016849	330	0.694058	320/10/0&lt;br /&gt;
1	14000873	G	A	0.305990	330	0.684507	140/60/10&lt;br /&gt;
1	14001008	T	C	0.018434	330	0.031631	320/10/0&lt;br /&gt;
1	14001018	T	C	0.296051	330	0.761196	110/40/10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;!--=Citations=&lt;br /&gt;
For '''-doAsso 1' and '''-doAsso 3'&lt;br /&gt;
{{:Skotte2012}}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==problems with inflation of p-values==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can evaluate the behavior of the tests by making a QQ plot of the LRT. There are several reasons why it might show signs of inflation&lt;br /&gt;
; -doPost (when using doAsso 2 without the use of posterior input -beagle&lt;br /&gt;
if you estimate the posterior genotype probability using a uniform prior (-doPost 2) then small differences in depth between sample will inflate the test statistics (see [[Skotte2012]]. Use the allele frequency as a prior (doPost 1) &lt;br /&gt;
; -minCount/-minHigh&lt;br /&gt;
If you set this too low then it will results in inflation of the test statistics.&lt;br /&gt;
; -yQuant (when using -doAsso 2 with a quantitative trait)&lt;br /&gt;
If your trait is not continues or the distribution of the trait is skewed or has outliers then you will get inflation of p-values. Same rules apply as for a standard regression. Consider transforming you trait into a normal distribution&lt;br /&gt;
; Population structure&lt;br /&gt;
If you have population structure then you will have to adjust for it in the regression model (doAssso 2). Consider using NGSadmix or PCAngsd and use the results as covariates. Note that the model will still have some issues because it uses the allele frequency as a prior. For the adventurous you can use PCAngsd or NGSadmix to estimate the individual allele frequencies and calculate your own genotype probabilities that take structure into account. These can then be used in angsd using the -beagle input format.&lt;br /&gt;
; low N&lt;br /&gt;
Usually a GWAS is performed on thousands of samples and we have only tested the use of the score statistics on hundreds of samples. If you have a low number of samples then try to figure out what minor allele frequency you would need in order to have some power. Also be careful with reducing -minCount/-minHigh.&lt;/div&gt;</summary>
		<author><name>Albrecht</name></author>
	</entry>
</feed>