<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.popgen.dk/software/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Jonas2</id>
	<title>software - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://www.popgen.dk/software/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Jonas2"/>
	<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php/Special:Contributions/Jonas2"/>
	<updated>2026-04-30T14:16:24Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.40.1</generator>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=1457</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=1457"/>
		<updated>2021-01-20T14:38:24Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
PCAngsd is a program that estimates the covariance matrix and individual allele frequencies for low-depth next-generation sequencing (NGS) data in structured/heterogeneous populations using principal component analysis (PCA) to perform multiple population genetic analyses using genotype likelihoods. Since version 0.98, PCAngsd was re-written to be based on Cython for computational bottlenecks and parallelization.&lt;br /&gt;
&lt;br /&gt;
The main method was published in 2018 and can be found here: [https://www.genetics.org/content/210/2/719]&lt;br /&gt;
&lt;br /&gt;
The HWE test was published in 2019 and can be found here: [https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-0998.13019]&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Overview=&lt;br /&gt;
Framework for analyzing low-depth next-generation sequencing (NGS) data in heterogeneous/structured populations using principal component analysis (PCA). Population structure is inferred by estimating individual allele frequencies in an iterative approach using a truncated SVD model. The covariance matrix is estimated using the estimated individual allele frequencies as prior information for the unobserved genotypes in low-depth NGS data.&lt;br /&gt;
&lt;br /&gt;
The estimated individual allele frequencies can further be used to account for population structure in other probabilistic methods. PCAngsd can perform the following analyses:&lt;br /&gt;
*Covariance matrix&lt;br /&gt;
*Admixture estimations&lt;br /&gt;
*Inbreeding coefficients (both per-individual and per-site)&lt;br /&gt;
*HWE test&lt;br /&gt;
*Genome-wide selection scan&lt;br /&gt;
*Genotype calling&lt;br /&gt;
*Estimate NJ tree of samples&lt;br /&gt;
&lt;br /&gt;
Older versions of PCAngsd can be found here [https://github.com/Rosemeis/pcangsd/releases/].&lt;br /&gt;
&lt;br /&gt;
=Download and Installation=&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended. Installation has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
Get PCAngsd and build&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
python setup.py build_ext --inplace&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Install dependencies:&lt;br /&gt;
&lt;br /&gt;
The required set of Python packages are easily installed using the pip command and the 'requirements.txt file' included in the 'pcangsd' folder.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r requirements.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Quick start=&lt;br /&gt;
&lt;br /&gt;
PCAngsd is used by running the main caller file pcangsd.py. To see all available options use the following command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Genotype likelihoods using 64 threads&lt;br /&gt;
python pcangsd.py -beagle input.beagle.gz -out output -threads 64&lt;br /&gt;
&lt;br /&gt;
# PLINK files (using file-prefix, *.bed, *.bim, *.fam)&lt;br /&gt;
python pcangsd.py -beagle input.plink -out output -threads 64&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd accepts either genotype likelihoods in Beagle format or PLINK genotype files. Beagle files can be generated from BAM files using [http://popgen.dk/angsd ANGSD]. For inference of population structure in genotype data with non-random missigness, we recommend our [http://www.popgen.dk/software/index.php/EMU EMU] software that performs accelerated EM-PCA, however with fewer functionalities than PCAngsd (#soon).&lt;br /&gt;
&lt;br /&gt;
PCAngsd will mostly output files in binary Numpy format (.npy) with a few exceptions. In order to read files in python:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
import numpy as np&lt;br /&gt;
C = np.genfromtxt(&amp;quot;output.cov&amp;quot;) # Reads in estimated covariance matrix (text)&lt;br /&gt;
D = np.load(&amp;quot;output.selection.npy&amp;quot;) # Reads PC based selection statistics&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R can also read Numpy matrices using the &amp;quot;RcppCNPy&amp;quot; R library:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
library(RcppCNPy)&lt;br /&gt;
C &amp;lt;- as.matrix(read.table(&amp;quot;output.cov&amp;quot;)) # Reads in estimated covariance matrix&lt;br /&gt;
D &amp;lt;- npyLoad(&amp;quot;output.selection.npy&amp;quot;) # Reads PC based selection statistics&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
An example of generating genotype likelihoods in [http://popgen.dk/angsd ANGSD] and output them in the required Beagle text format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 2 -out input -nThreads 4 -doGlf 2 -doMajorMinor 1 -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Tutorial=&lt;br /&gt;
&lt;br /&gt;
Please refer to the tutorial's page [http://www.popgen.dk/software/index.php/PCAngsdTutorial]&lt;br /&gt;
&lt;br /&gt;
=Options=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==General usage==&lt;br /&gt;
; -beagle [Beagle file]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format (.beagle.gz).&lt;br /&gt;
; -filter [Text file]&lt;br /&gt;
Input file of 1's or 0's whether to keep individuals or not.&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their ONLY prefix (.bed, .bim, .fam).&lt;br /&gt;
; -plink_error [float]&lt;br /&gt;
Incorporate errors into genotypes by specifying rate as argument.&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -maf_iter [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies (Default: 200).&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation (Default: 1e-4).&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies (Default: 100).&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies (Default: 1e-5).&lt;br /&gt;
; -hwe [.lrt.npy file]&lt;br /&gt;
Input file of LRT binary file from previous PCAngsd run to filter based on HWE.&lt;br /&gt;
; -hwe_tole [float]&lt;br /&gt;
Threshold for HWE filtering of sites. &lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies (Default: Automatically tested using MAP test).&lt;br /&gt;
; -pi [.pi.npy file]&lt;br /&gt;
Load previous estimation of individual allele frequencies to skip covariance estimation.&lt;br /&gt;
; -maf_save&lt;br /&gt;
Choose to save estimated population allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
; -pi_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary). Numpy format (.npy). Can be used with the '-pi' command.&lt;br /&gt;
; -dosage_save&lt;br /&gt;
Choose to save estimated genotype dosages (Binary). Numpy format (.npy).&lt;br /&gt;
; -post_save&lt;br /&gt;
Choose to save the posterior genotype probabilities. Beagle format (.beagle).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the kept sites after filtering which is useful for downstream analysis. Outputs a file of 1's and 0's for keeping a site or not, respectively.&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use (Default: 1).&lt;br /&gt;
; -out [output prefix]&lt;br /&gt;
Fileprefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
Perform PC-based genome-wide selection scans using posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome-wide selection scan along all significant PCs. Outputs the selection statistics and must be converted to p-values by user. Each column reflect the selection statistics along a tested PC and they are χ²-distributed with 1 degree of freedom.&lt;br /&gt;
&lt;br /&gt;
; -pcadapt&lt;br /&gt;
Using an extended model of [https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-0998.12592 pcadapt]. Performs a genome-wide selection scan across all significant PCs. Outputs the z-scores and must be converted to test statistics with the provided script 'pcangsd/scripts/pcadapt.R', and the test statistics are χ²-distributed with K degree of freedom.&lt;br /&gt;
&lt;br /&gt;
; -snp_weights&lt;br /&gt;
Output the SNP weights of the significant K eigenvectors.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
Estimate per-site inbreeding coefficients accounting for population structure and perform likehood ratio test for detecting sites deviating from HWE [https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-0998.13019].&lt;br /&gt;
&lt;br /&gt;
; -inbreedSamples&lt;br /&gt;
Estimate per-individual inbreeding coefficients accounting for population structure which is based on an extension of [http://genome.cshlp.org/content/23/11/1852.full ngsF] for structured populations. &lt;br /&gt;
&lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for inbreeding EM algorithm. (Default: 200)&lt;br /&gt;
&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for inbreeding EM algorithm in estimating inbreeding coefficients. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities by incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '-inbreedSamples' must also be called for using this option.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and ancestral allele frequencies can be estimated assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations. Estimates admixture proportions and ancestral allele frequencies.&lt;br /&gt;
; -admix_K [int]&lt;br /&gt;
Not recommended. Override the number of ancestry components (K) to use, instead of using K=e-1.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 200)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 1e-5) &lt;br /&gt;
; -admix_alpha [float&lt;br /&gt;
Specify alpha (sparseness regularization parameter). (Default: 0)&lt;br /&gt;
; -admix_auto [float]&lt;br /&gt;
Enable automatic search for optimal alpha using likelihood measure, by giving soft upper search bound of alpha.&lt;br /&gt;
; -admix_seed [int]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations.&lt;br /&gt;
&lt;br /&gt;
==Tree==&lt;br /&gt;
; -tree&lt;br /&gt;
Construct neighbour-joining tree of samples from estimated covariance matrix estimated based on indivdual allele frequencies.&lt;br /&gt;
; -tree_samples&lt;br /&gt;
Provide a list of sample names of all individuals to construct a beautiful tree.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
Our methods for inferring population structure have been published in GENETICS:&lt;br /&gt;
&lt;br /&gt;
[http://www.genetics.org/content/early/2018/08/21/genetics.118.301336 Inferring Population Structure and Admixture Proportions in Low Depth NGS Data]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Our method for testing for HWE in structured populations has been published in Molecular Ecology Resources:&lt;br /&gt;
&lt;br /&gt;
[https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-0998.13019 Testing for Hardy‐Weinberg Equilibrium in Structured Populations using Genotype or Low‐Depth NGS Data]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=1456</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=1456"/>
		<updated>2021-01-20T14:13:28Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
PCAngsd is a program that estimates the covariance matrix and individual allele frequencies for low-depth next-generation sequencing (NGS) data in structured/heterogeneous populations using principal component analysis (PCA) to perform multiple population genetic analyses using genotype likelihoods. Since version 0.98, PCAngsd was re-written to be based on Cython for computational bottlenecks and parallelization.&lt;br /&gt;
&lt;br /&gt;
The main method was published in 2018 and can be found here: [https://www.genetics.org/content/210/2/719]&lt;br /&gt;
&lt;br /&gt;
The HWE test was published in 2019 and can be found here: [https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-0998.13019]&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Overview=&lt;br /&gt;
Framework for analyzing low-depth next-generation sequencing (NGS) data in heterogeneous/structured populations using principal component analysis (PCA). Population structure is inferred by estimating individual allele frequencies in an iterative approach using a truncated SVD model. The covariance matrix is estimated using the estimated individual allele frequencies as prior information for the unobserved genotypes in low-depth NGS data.&lt;br /&gt;
&lt;br /&gt;
The estimated individual allele frequencies can further be used to account for population structure in other probabilistic methods. PCAngsd can perform the following analyses:&lt;br /&gt;
*Covariance matrix&lt;br /&gt;
*Admixture estimations&lt;br /&gt;
*Inbreeding coefficients (both per-individual and per-site)&lt;br /&gt;
*HWE test&lt;br /&gt;
*Genome-wide selection scan&lt;br /&gt;
*Genotype calling&lt;br /&gt;
*Estimate NJ tree of samples&lt;br /&gt;
&lt;br /&gt;
Older versions of PCAngsd can be found here [https://github.com/Rosemeis/pcangsd/releases/].&lt;br /&gt;
&lt;br /&gt;
=Download and Installation=&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended. Installation has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
Get PCAngsd and build&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
python setup.py build_ext --inplace&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Install dependencies:&lt;br /&gt;
&lt;br /&gt;
The required set of Python packages are easily installed using the pip command and the 'requirements.txt file' included in the 'pcangsd' folder.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r requirements.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Quick start=&lt;br /&gt;
&lt;br /&gt;
PCAngsd is used by running the main caller file pcangsd.py. To see all available options use the following command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Genotype likelihoods using 64 threads&lt;br /&gt;
python pcangsd.py -beagle input.beagle.gz -out output -threads 64&lt;br /&gt;
&lt;br /&gt;
# PLINK files (using file-prefix, *.bed, *.bim, *.fam)&lt;br /&gt;
python pcangsd.py -beagle input.plink -out output -threads 64&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd accepts either genotype likelihoods in Beagle format or PLINK genotype files. Beagle files can be generated from BAM files using [http://popgen.dk/angsd ANGSD]. For inference of population structure in genotype data with non-random missigness, we recommend our [http://www.popgen.dk/software/index.php/EMU EMU] software that performs accelerated EM-PCA, however with fewer functionalities than PCAngsd (#soon).&lt;br /&gt;
&lt;br /&gt;
PCAngsd will mostly output files in binary Numpy format (.npy) with a few exceptions. In order to read files in python:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
import numpy as np&lt;br /&gt;
C = np.genfromtxt(&amp;quot;output.cov&amp;quot;) # Reads in estimated covariance matrix (text)&lt;br /&gt;
D = np.load(&amp;quot;output.selection.npy&amp;quot;) # Reads PC based selection statistics&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
R can also read Numpy matrices using the &amp;quot;RcppCNPy&amp;quot; R library:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
library(RcppCNPy)&lt;br /&gt;
C &amp;lt;- as.matrix(read.table(&amp;quot;output.cov&amp;quot;)) # Reads in estimated covariance matrix&lt;br /&gt;
D &amp;lt;- npyLoad(&amp;quot;output.selection.npy&amp;quot;) # Reads PC based selection statistics&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
An example of generating genotype likelihoods in [http://popgen.dk/angsd ANGSD] and output them in the required Beagle text format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 2 -out input -nThreads 4 -doGlf 2 -doMajorMinor 1 -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Tutorial=&lt;br /&gt;
&lt;br /&gt;
Please refer to the tutorial's page [http://www.popgen.dk/software/index.php/PCAngsdTutorial]&lt;br /&gt;
&lt;br /&gt;
=Options=&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==General usage==&lt;br /&gt;
; -beagle [Beagle file]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format (.beagle.gz).&lt;br /&gt;
; -filter [Text file]&lt;br /&gt;
Input file of 1's or 0's whether to keep individuals or not.&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their ONLY prefix (.bed, .bim, .fam).&lt;br /&gt;
; -plink_error [float]&lt;br /&gt;
Incorporate errors into genotypes by specifying rate as argument.&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -maf_iter [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies (Default: 200).&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation (Default: 1e-4).&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies (Default: 100).&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies (Default: 1e-5).&lt;br /&gt;
; -hwe [.lrt.npy file]&lt;br /&gt;
Input file of LRT binary file from previous PCAngsd run to filter based on HWE.&lt;br /&gt;
; -hwe_tole [float]&lt;br /&gt;
Threshold for HWE filtering of sites. &lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies (Default: Automatically tested using MAP test).&lt;br /&gt;
; -pi [.pi.npy file]&lt;br /&gt;
Load previous estimation of individual allele frequencies to skip covariance estimation.&lt;br /&gt;
; -maf_save&lt;br /&gt;
Choose to save estimated population allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
; -pi_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary). Numpy format (.npy). Can be used with the '-pi' command.&lt;br /&gt;
; -dosage_save&lt;br /&gt;
Choose to save estimated genotype dosages (Binary). Numpy format (.npy).&lt;br /&gt;
; -post_save&lt;br /&gt;
Choose to save the posterior genotype probabilities. Beagle format (.beagle).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the kept sites after filtering which is useful for downstream analysis. Outputs a file of 1's and 0's for keeping a site or not, respectively.&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use (Default: 1).&lt;br /&gt;
; -out [output prefix]&lt;br /&gt;
Fileprefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
Perform PC-based genome-wide selection scans using posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome-wide selection scan along all significant PCs. Outputs the selection statistics and must be converted to p-values by user. Each column reflect the selection statistics along a tested PC and they are χ²-distributed with 1 degree of freedom.&lt;br /&gt;
&lt;br /&gt;
; -pcadapt&lt;br /&gt;
Using an extended model of [https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-0998.12592 pcadapt]. Performs a genome-wide selection scan across all significant PCs. Outputs the z-scores and must be converted to test statistics with the provided script 'pcangsd/scripts/pcadapt.R', and the test statistics are χ²-distributed with K degree of freedom.&lt;br /&gt;
&lt;br /&gt;
; -snp_weights&lt;br /&gt;
Output the SNP weights of the significant K eigenvectors.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
Estimate per-site inbreeding coefficients accounting for population structure and perform likehood ratio test for detecting sites deviating from HWE [https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-0998.13019].&lt;br /&gt;
&lt;br /&gt;
; -inbreedSamples&lt;br /&gt;
Estimate per-individual inbreeding coefficients accounting for population structure which is based on an extension of [http://genome.cshlp.org/content/23/11/1852.full ngsF] for structured populations. &lt;br /&gt;
&lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for inbreeding EM algorithm. (Default: 200)&lt;br /&gt;
&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for inbreeding EM algorithm in estimating inbreeding coefficients. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities by incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '-inbreedSamples' must also be called for using this option.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and ancestral allele frequencies can be estimated assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations. Estimates admixture proportions and ancestral allele frequencies.&lt;br /&gt;
; -admix_K [int]&lt;br /&gt;
Not recommended. Override the number of ancestry components (K) to use, instead of using K=e-1.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 200)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 1e-5) &lt;br /&gt;
; -admix_alpha [float&lt;br /&gt;
Specify alpha (sparseness regularization parameter). (Default: 0)&lt;br /&gt;
; -admix_auto [float]&lt;br /&gt;
Enable automatic search for optimal alpha using likelihood measure, by giving soft upper search bound of alpha.&lt;br /&gt;
; -admix_seed [int]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
Our methods for inferring population structure have been published in GENETICS:&lt;br /&gt;
&lt;br /&gt;
[http://www.genetics.org/content/early/2018/08/21/genetics.118.301336 Inferring Population Structure and Admixture Proportions in Low Depth NGS Data]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Our method for testing for HWE in structured populations has been published in Molecular Ecology Resources:&lt;br /&gt;
&lt;br /&gt;
[https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-0998.13019 Testing for Hardy‐Weinberg Equilibrium in Structured Populations using Genotype or Low‐Depth NGS Data]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=HaploNet&amp;diff=1455</id>
		<title>HaploNet</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=HaploNet&amp;diff=1455"/>
		<updated>2021-01-20T09:24:13Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: Created page with &amp;quot;HaploNet is a program that performs dimensionality reduction and clustering of haplotypes using neural networks. We utilize a variational autoencoder framework using a Gaussia...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;HaploNet is a program that performs dimensionality reduction and clustering of haplotypes using neural networks. We utilize a variational autoencoder framework using a Gaussian mixture prior to model haplotypes in windows along the window. The learnt encodings and clusterings can be joint to infer population structure using PCA and to estimate ancestry proportions using haplotype information.  &lt;br /&gt;
&lt;br /&gt;
The preprint for the method can be found here: [https://www.biorxiv.org/content/10.1101/2020.12.28.424587v1]&lt;br /&gt;
&lt;br /&gt;
Get HaploNet and build&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/HaploNet.git&lt;br /&gt;
cd HaploNet&lt;br /&gt;
python setup.py build_ext --inplace&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
HaploNet is based on the PyTorch library (v.1.7) and therefore has GPU support which we also recommend for faster training of the windows. PyTorch can be installed using either 'conda' or 'pip' [https://pytorch.org/get-started/locally/]. OpenMP is assumed to be installed.&lt;br /&gt;
HaploNet has the following requirements:&lt;br /&gt;
*Python (&amp;gt;3.6)&lt;br /&gt;
*PyTorch&lt;br /&gt;
*Cython&lt;br /&gt;
*NumPy&lt;br /&gt;
*scikit-allel&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=EMU&amp;diff=1440</id>
		<title>EMU</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=EMU&amp;diff=1440"/>
		<updated>2020-04-29T10:56:42Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about EMU (EM-PCA for Ultra-low Coverage Sequencing Data). EMU infers population structure in the presence of missingness and works for both haploid, psuedo-haploid and diploid genotype datasets. Due to EMUs iterative nature, it is able to infer population structure even for datasets of ultra-low coverage sequencing data with very high missingness rates in addition to being able to handle non-random missingness patterns where other existing methods fail. We use a procedure of low-rank approximations based on randomized PCA to iteratively update population structure in a very efficient manner. &lt;br /&gt;
&lt;br /&gt;
EMU is written in Python and Cython and is freely available on Github. We have also implemented a very memory-efficient variant of EMU (EMU-mem) for large-scale datasets that uses the 2-bit data structures of PLINK binary file formats.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/emu&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/emu.git&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See github for more information regarding installation.&lt;br /&gt;
Server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in EMU&lt;br /&gt;
python emu.py -h&lt;br /&gt;
&lt;br /&gt;
# Infer population structure using 2 eigenvectors and 64 threads from binary PLINK files (.bed, .bim, .fam)&lt;br /&gt;
python emu.py -plink plink_prefix -e 2 -t 64 -accel -o plink_emu&lt;br /&gt;
&lt;br /&gt;
# Or directly from NumPy array input&lt;br /&gt;
python emu.py -npy matrix.npy -e 2 -t 64 -accel -o npy_emu&lt;br /&gt;
&lt;br /&gt;
# Use EMU-mem variant&lt;br /&gt;
python emu_mem.py -plink plink_prefix -e 2 -t 64 -accel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
EMU use either binary PLINK files as input (RECOMMENDED!) or saved NumPy genotype matrices in 8-bit format (numpy.int8). EMU-mem will only accept PLINK files as input due to the 2-bit data structures. If NumPy format should be preferred, you can use the script provided on Github for conversion (convertMat.py).&lt;br /&gt;
&lt;br /&gt;
=Using EMU=&lt;br /&gt;
We highly recommend to use EM acceleration at all times (default), but can be turned off using &amp;quot;-no_accel&amp;quot;. You can save factor matrices (-indf_save) from a run to use as starting point in a new run (-w, -s, -u). Due to convenience we have also implemented the PC-based selection scan of Galinsky et al. 2016 (-selection). MAF filtering is possible but it is recommended (and ASSUMED!) to do beforehand. &lt;br /&gt;
&lt;br /&gt;
==Options==&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix (.bed, .bim, .fam).&lt;br /&gt;
; -npy [Numpy.int8 matrix format]&lt;br /&gt;
Path to NumPy matrix (.npy).&lt;br /&gt;
; -e [int]&lt;br /&gt;
Number of eigenvectors to use in optimization.&lt;br /&gt;
; -k [int]&lt;br /&gt;
Number of eigenvectors to output if user wants different than -e.&lt;br /&gt;
; -m [int]&lt;br /&gt;
Maximum number of iterations (Default: 100).&lt;br /&gt;
; -m_tole [float]&lt;br /&gt;
Tolerance for covergence of iterative procedure (Default: 5e-7).&lt;br /&gt;
; -t [int]&lt;br /&gt;
Number of threads to use (Default: 1).&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Minimum minor allele frequency threshold (Default: 0.00).&lt;br /&gt;
; -selection&lt;br /&gt;
Perform genome-wide PC-based selection scan (Galinsky et al. 2016).&lt;br /&gt;
; -maf_save&lt;br /&gt;
Save the estimated minor allele frequencies.&lt;br /&gt;
; -bool_save&lt;br /&gt;
Save boolean vector of filtered sites based on MAF.&lt;br /&gt;
; -indf_save&lt;br /&gt;
Save estimated factor matrices (W, S, U).&lt;br /&gt;
; -index [file]&lt;br /&gt;
Provide index of individuals for guiding initialization (np.int8 format).&lt;br /&gt;
; -svd [string]&lt;br /&gt;
Select which low-rank SVD method to use, halko/arpack (Default: 'halko').&lt;br /&gt;
; -svd_power [int]&lt;br /&gt;
Number of power iterations to use in low-rank SVD (Default: 3).&lt;br /&gt;
; -w [file]&lt;br /&gt;
Provide starting point, left singular matrix (.w.npy).&lt;br /&gt;
; -s [file]&lt;br /&gt;
Provide starting point, singular values (.s.npy).&lt;br /&gt;
; -u [file]&lt;br /&gt;
Provide starting point, right singular matrix (.u.npy).&lt;br /&gt;
; -no_accel&lt;br /&gt;
Turn off EM acceleration.&lt;br /&gt;
; -o [string]&lt;br /&gt;
Prefix for all output files (Default: 'emu').&lt;br /&gt;
; -cost&lt;br /&gt;
Output Frobenius each iteration (DEBUG).&lt;br /&gt;
; -cost_step&lt;br /&gt;
Use acceleration based on Frobenius (DEBUG).&lt;br /&gt;
&lt;br /&gt;
==Options in EMU-mem==&lt;br /&gt;
-maf, -bool_save, -svd, -cost, -cost_step functions are not available for EMU-mem. MAF filtering has to be performed beforehand, which is easily done in PLINK (--maf 0.05).&lt;br /&gt;
&lt;br /&gt;
=Run example=&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Download data&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
run&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
python emu.py -plink data/humanOrigins_7worldPops -e 4 -t 4 -o plink_emu&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plot&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
vec &amp;lt;- read.table(&amp;quot;plink_emu.eigenvecs&amp;quot;) # Reads in eigenvectors&lt;br /&gt;
fam &amp;lt;- read.table(&amp;quot;data/humanOrigins_7worldPops.fam&amp;quot;,head=F)&lt;br /&gt;
plot(vec[,1:2],col=fam[,1],xlab=&amp;quot;PC1&amp;quot;,ylab=&amp;quot;PC2&amp;quot;)&lt;br /&gt;
legend(&amp;quot;center&amp;quot;,fill=1:7,levels(fam[,1]))&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
TBA&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=EMU&amp;diff=1430</id>
		<title>EMU</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=EMU&amp;diff=1430"/>
		<updated>2020-04-17T11:25:40Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: Created page with &amp;quot;This page contains information about EMU (EM-PCA for Ultra-low Coverage Sequencing Data). EMU infers population structure in the presence of missingness and works for both hap...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about EMU (EM-PCA for Ultra-low Coverage Sequencing Data). EMU infers population structure in the presence of missingness and works for both haploid, psuedo-haploid and diploid genotype datasets. Due to EMUs iterative nature, it is able to infer population structure even for datasets of ultra-low coverage sequencing data with very high missingness rates in addition to being able to handle non-random missingness patterns where other existing methods fail. We use a procedure of low-rank approximations based on randomized PCA to iteratively update population structure in a very efficient manner. &lt;br /&gt;
&lt;br /&gt;
EMU is written in Python and Cython and is freely available on Github. We have also implemented a very memory-efficient variant of EMU (EMU-mem) for large-scale datasets that uses the 2-bit data structures of PLINK binary file formats.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/emu&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/emu.git&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See github for more information regarding installation.&lt;br /&gt;
Server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in EMU&lt;br /&gt;
python emu.py -h&lt;br /&gt;
&lt;br /&gt;
# Infer population structure using 2 eigenvectors and 64 threads from binary PLINK files (.bed, .bim, .fam)&lt;br /&gt;
python emu.py -plink plink_prefix -e 2 -t 64 -accel -o plink_emu&lt;br /&gt;
&lt;br /&gt;
# Or directly from NumPy array input&lt;br /&gt;
python emu.py -npy matrix.npy -e 2 -t 64 -accel -o npy_emu&lt;br /&gt;
&lt;br /&gt;
# Use EMU-mem variant&lt;br /&gt;
python emu_mem.py -plink plink_prefix -e 2 -t 64 -accel&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
EMU use either binary PLINK files as input (RECOMMENDED!) or saved NumPy genotype matrices in 8-bit format (numpy.int8). EMU-mem will only accept PLINK files as input due to the 2-bit data structures. If NumPy format should be preferred, you can use the script provided on Github for conversion (convertMat.py).&lt;br /&gt;
&lt;br /&gt;
=Using EMU=&lt;br /&gt;
We highly recommend to use EM acceleration at all times (-accel). You can save factor matrices (-indf_save) from a run to use as starting point in a new run (-w, -s, -u). Due to convenience we have also implemented the PC-based selection scan of Galinsky et al. 2016 (-selection). MAF filtering is possible but it is recommended (and ASSUMED!) to do beforehand. &lt;br /&gt;
&lt;br /&gt;
==Options==&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix (.bed, .bim, .fam).&lt;br /&gt;
; -npy [Numpy.int8 matrix format]&lt;br /&gt;
Path to NumPy matrix (.npy).&lt;br /&gt;
; -e [int]&lt;br /&gt;
Number of eigenvectors to use in optimization.&lt;br /&gt;
; -k [int]&lt;br /&gt;
Number of eigenvectors to output if user wants different than -e.&lt;br /&gt;
; -m [int]&lt;br /&gt;
Maximum number of iterations (Default: 100).&lt;br /&gt;
; -m_tole [float]&lt;br /&gt;
Tolerance for covergence of iterative procedure (Default: 5e-7).&lt;br /&gt;
; -t [int]&lt;br /&gt;
Number of threads to use (Default: 1).&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Minimum minor allele frequency threshold (Default: 0.00).&lt;br /&gt;
; -selection&lt;br /&gt;
Perform genome-wide PC-based selection scan (Galinsky et al. 2016).&lt;br /&gt;
; -maf_save&lt;br /&gt;
Save the estimated minor allele frequencies.&lt;br /&gt;
; -bool_save&lt;br /&gt;
Save boolean vector of filtered sites based on MAF.&lt;br /&gt;
; -indf_save&lt;br /&gt;
Save estimated factor matrices (W, S, U).&lt;br /&gt;
; -index [file]&lt;br /&gt;
Provide index of individuals for guiding initialization (np.int8 format).&lt;br /&gt;
; -svd [string]&lt;br /&gt;
Select which low-rank SVD method to use, halko/arpack (Default: 'halko').&lt;br /&gt;
; -svd_power [int]&lt;br /&gt;
Number of power iterations to use in low-rank SVD (Default: 3).&lt;br /&gt;
; -w [file]&lt;br /&gt;
Provide starting point, left singular matrix (.w.npy).&lt;br /&gt;
; -s [file]&lt;br /&gt;
Provide starting point, singular values (.s.npy).&lt;br /&gt;
; -u [file]&lt;br /&gt;
Provide starting point, right singular matrix (.u.npy).&lt;br /&gt;
; -accel&lt;br /&gt;
Use EM acceleration (Highly recommended!).&lt;br /&gt;
; -o [string]&lt;br /&gt;
Prefix for all output files (Default: 'emu').&lt;br /&gt;
; -cost&lt;br /&gt;
Output Frobenius each iteration (DEBUG).&lt;br /&gt;
; -cost_step&lt;br /&gt;
Use acceleration based on Frobenius (DEBUG).&lt;br /&gt;
&lt;br /&gt;
==Options in EMU-mem==&lt;br /&gt;
-maf, -bool_save, -svd, -cost, -cost_step functions are not available for EMU-mem. MAF filtering has to be performed beforehand, which is easily done in PLINK (--maf 0.05).&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
TBA&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsdv2&amp;diff=1368</id>
		<title>PCAngsdv2</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsdv2&amp;diff=1368"/>
		<updated>2019-08-16T11:34:33Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: /* Citation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd is a program that estimates the covariance matrix for low depth next-generation sequencing (NGS) data in structured/heterogeneous populations using principal component analysis (PCA) to perform multiple population genetic analyses using an iterative procedure based on genotype likelihoods. &lt;br /&gt;
&lt;br /&gt;
Since version 0.98, PCAngsd was re-written to be based on Cython for computational bottlenecks and parallelization and is now compatible with any newer Python version.&lt;br /&gt;
&lt;br /&gt;
The method was published in 2018 and can be found here: [https://www.genetics.org/content/210/2/719]&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
Based on population structure inference, PCAngsd is able to detect the number of significant principal components which is then used to estimate individual allele frequencies using genotype dosages in a SVD model. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components.&lt;br /&gt;
The estimated individual allele frequencies and principal components can be used as prior knowledge in other probabilistic methods based on a same Bayesian principle. PCAngsd can perform the following analyses:&lt;br /&gt;
*Covariance matrix&lt;br /&gt;
*Genotype calling&lt;br /&gt;
*Admixture&lt;br /&gt;
*Inbreeding coefficients (both per-individual and per-site)&lt;br /&gt;
*HWE test&lt;br /&gt;
*Genome selection scan&lt;br /&gt;
*Kinship matrix&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The older version, based on the Numba library (only working with Python 2.7) is still available in version 0.973 and can be found here [https://github.com/Rosemeis/pcangsd/releases/tag/0.973].&lt;br /&gt;
&lt;br /&gt;
=Download and Installation=&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended. &lt;br /&gt;
Installation has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
It is assumed that OpenMP is installed [https://www.openmp.org/].&lt;br /&gt;
&lt;br /&gt;
1. Login to your server using ssh on your terminal window.&lt;br /&gt;
&lt;br /&gt;
2. Create the directory where you will install your software and enter it, such as:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir ~/Software&lt;br /&gt;
cd ~/Software&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
3. Download the source code:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
4. Configure, Compile and Install:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
python setup.py build_ext --inplace&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
5. Install dependencies:&lt;br /&gt;
&lt;br /&gt;
The required set of Python packages are easily installed using the pip command and the requirements.txt file included in the pcangsd folder.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;pip install --user -r requirements.txt&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Quick start=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
# Download the input beagle file with genotype likelihoods&lt;br /&gt;
wget popgen.dk/software/download/NGSadmix/data/input.gz &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle input.gz -o test1 -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle input.gz -admix -o test2 -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle input.gz -inbreed 2 -o test3 -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle input.gz -selection -o test4 -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Detailed Examples and Tutorial==&lt;br /&gt;
&lt;br /&gt;
Please refer to the tutorial's page [http://www.popgen.dk/software/index.php/PCAngsdTutorial]&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. New functionality for using PLINK files has been added (version 0.9). Genotypes are automatically converted into a genotype likelihood matrix where the user can incorporate an error model.&lt;br /&gt;
&lt;br /&gt;
[http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out data -nThreads 10 -doGlf 2 -doMajorMinor 1 -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Output=&lt;br /&gt;
&lt;br /&gt;
Since version 0.98, PCAngsd's output is only in binary Numpy format (.npy) except for the covariance matrix. &lt;br /&gt;
&lt;br /&gt;
In order to read files in python:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
import numpy as np&lt;br /&gt;
C = np.genfromtxt(&amp;quot;output.cov&amp;quot;) # Reads in estimated covariance matrix&lt;br /&gt;
S = np.load(&amp;quot;output.selection.npy&amp;quot;) # Reads results from selection scan&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
R can also read Numpy matrices using the &amp;quot;RcppCNPy&amp;quot; library:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
library(RcppCNPy)&lt;br /&gt;
C &amp;lt;- as.matrix(read.table(&amp;quot;output.cov&amp;quot;)) # Reads in estimated covariance matrix&lt;br /&gt;
S &amp;lt;- npyLoad(&amp;quot;output.selection.npy&amp;quot;) # Reads results from selection scan&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.´&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format (.beagle.gz).&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix (.bed, .bim, .fam).&lt;br /&gt;
; -plink_error [float]&lt;br /&gt;
Incorporate error model for PLINK genotypes.&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies (Default: 100).&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies (Default: 1e-5).&lt;br /&gt;
; -maf_iter [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies (Default: 100).&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation (Default: 1e-4).&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies (Default: Automatically tested using MAP test).&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -indf_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
; -dosage_save&lt;br /&gt;
Choose to save estimated genotype dosages (Binary). Numpy format (.npy).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
; -post_save&lt;br /&gt;
Choose to save the posterior genotype probabilities. Beagle format (.beagle).&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use (Default: 1).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations. Individual ancestry proportions are saved (Binary). Numpy format (.npy).&lt;br /&gt;
; -admix_alpha [float-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run (Default: 0).&lt;br /&gt;
; -admix_auto [float]&lt;br /&gt;
Enable automatic search for optimal alpha using likelihood measure, by giving soft upper search bound of alpha.&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run.&lt;br /&gt;
; -admix_K [int]&lt;br /&gt;
Not recommended. Manually specify the number of ancestral populations to use in admixture estimations (overrides number chosen from '''-e'''). Structure explained by individual allele frequencies may therefore not reflect the manually chosen K. It is recommended to adjust '''-e''' instead of '''-admix_K'''.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 200)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 1e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of batches to use in NMF method. (Default: 10)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 1 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
Simple estimator computed by an EM algorithm. Allows for F-values between -1 and 1. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
A maximum likelihood estimator also computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
Estimator using an estimated kinship matrix. Allows for F-values between -1 and 1. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate].  &lt;br /&gt;
&lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
Use likelihood ratio tests (.lrt.sites.gz) generated from '''-inbreedSites''' to filter out variable sites using a given threshold for HWE test p-value:&lt;br /&gt;
&lt;br /&gt;
; -hwe [LRT filename]&lt;br /&gt;
&lt;br /&gt;
; -hwe_tole [float]&lt;br /&gt;
Tolerance value for HWE test. (Default: 1e-6)&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs. Outputs the selection statistics and must be converted to p-values by user. Each column reflect the selection statistics along a tested PC and they are χ²-distributed with 1 degree of freedom.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
Remove related individuals based on kinhsip matrix of previous run:&lt;br /&gt;
; -relate [Kinship filename]&lt;br /&gt;
; -relate_tole [float]&lt;br /&gt;
Threshold for kinship coefficients for removing individuals (Default: 0.0625).&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
Our methods for inferring population structure have been published in GENETICS:&lt;br /&gt;
&lt;br /&gt;
[http://www.genetics.org/content/early/2018/08/21/genetics.118.301336 Inferring Population Structure and Admixture Proportions in Low Depth NGS Data]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Our method for testing for HWE in structured populations has been published in Molecular Ecology Resources:&lt;br /&gt;
&lt;br /&gt;
[https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-0998.13019 Testing for Hardy‐Weinberg Equilibrium in Structured Populations using Genotype or Low‐Depth NGS Data]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=1240</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=1240"/>
		<updated>2019-07-26T08:07:18Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about PCAngsd, which estimates the covariance matrix for low-depth NGS data in an iterative procedure based on genotype likelihoods and is able to perform multiple population genetic analyses in structured/heterogeneous populations. Based on iterative population structure inference, PCAngsd estimates individual allele frequencies. These individual allele frequencies can be used in various analyses to account for population structure in such a way that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-site and per-individual) and perform genomic selection scan using principal components. The entire program is written in Python and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
Since version ''0.98'', PCAngsd has been re-written in Cython for computational bottlenecks and parallelization.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See github for more information regarding installation.&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended. Installation has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -inbreed 1 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -selection -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out data -nThreads 10 -doGlf 2 -doMajorMinor 1 -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute and output the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format (.beagle.gz).&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix (.bed, .bim, .fam).&lt;br /&gt;
; -plink_error [float]&lt;br /&gt;
Incorporate error model for PLINK genotypes.&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies (Default: 100).&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies (Default: 1e-5).&lt;br /&gt;
; -maf_iter [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies (Default: 100).&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation (Default: 1e-4).&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies (Default: Automatically tested using MAP test).&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -indf_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
; -dosage_save&lt;br /&gt;
Choose to save estimated genotype dosages (Binary). Numpy format (.npy).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
; -post_save&lt;br /&gt;
Choose to save the posterior genotype probabilities. Beagle format (.beagle).&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use (Default: 1).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold (Binary). Numpy format (.npy).&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information (Binary). Numpy format (.npy).&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations. Individual ancestry proportions are saved (Binary). Numpy format (.npy).&lt;br /&gt;
; -admix_alpha [float-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run (Default: 0).&lt;br /&gt;
; -admix_auto [float]&lt;br /&gt;
Enable automatic search for optimal alpha using likelihood measure, by giving soft upper search bound of alpha.&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run.&lt;br /&gt;
; -admix_K [int]&lt;br /&gt;
Not recommended. Manually specify the number of ancestral populations to use in admixture estimations (overrides number chosen from '''-e'''). Structure explained by individual allele frequencies may therefore not reflect the manually chosen K. It is recommended to adjust '''-e''' instead of '''-admix_K'''.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 200)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 1e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of batches to use in NMF method. (Default: 10)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 1 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
Simple estimator computed by an EM algorithm. Allows for F-values between -1 and 1. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
A maximum likelihood estimator also computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
Estimator using an estimated kinship matrix. Allows for F-values between -1 and 1. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate].  &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
Use likelihood ratio tests (.lrt.sites.gz) generated from '''-inbreedSites''' to filter out variable sites using a given threshold for HWE test p-value:&lt;br /&gt;
&lt;br /&gt;
; -hwe [LRT filename]&lt;br /&gt;
; -hwe_tole [float]&lt;br /&gt;
Tolerance value for HWE test. (Default: 1e-6)&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs. Outputs the selection statistics and must be converted to p-values by user. Each column reflect the selection statistics along a tested PC and they are χ²-distributed with 1 degree of freedom.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
Remove related individuals based on kinhsip matrix of previous run:&lt;br /&gt;
; -relate [Kinship filename]&lt;br /&gt;
; -relate_tole [float]&lt;br /&gt;
Threshold for kinship coefficients for removing individuals (Default: 0.0625).&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
Our methods for inferring population structure have been published in GENETICS:&lt;br /&gt;
[http://www.genetics.org/content/early/2018/08/21/genetics.118.301336 Inferring Population Structure and Admixture Proportions in Low Depth NGS Data]&lt;br /&gt;
&lt;br /&gt;
Our method for estimating per-site inbreeding sites and testing for HWE has been published in Molecular Ecology Resources:&lt;br /&gt;
[https://onlinelibrary.wiley.com/doi/abs/10.1111/1755-0998.13019 Testing for Hardy‐Weinberg Equilibrium in Structured Populations using Genotype or Low‐Depth NGS Data]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=859</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=859"/>
		<updated>2018-08-23T14:51:41Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods and is able to perform multiple population genetic analyses in heterogeneous populations. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.95&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy'''[http://www.numpy.org], '''scipy'''[https://www.scipy.org], '''pandas'''[https://pandas.pydata.org], '''numba'''[https://numba.pydata.org] and '''pysnptools'''[https://github.com/MicrosoftGenomics/PySnpTools].&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -inbreed 2 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out data -nThreads 10 -doGlf 2 -doMajorMinor 1 -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format (.beagle.gz).&lt;br /&gt;
; -indf [Individual allele frequencies filename]&lt;br /&gt;
Input file of individual allele frequencies (binary). Numpy format (.indf.npy).&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix. (.bed, .bim, .fam)&lt;br /&gt;
; -epsilon [float]&lt;br /&gt;
Include error assumption in PLINK genotypes. (Default: 0.00)&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 1e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -indf_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
; -expg_save&lt;br /&gt;
Choose to save estimated genotype dosages (Binary). Numpy format (.npy).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha.&lt;br /&gt;
; -admix_K [int]&lt;br /&gt;
Not recommended. Manually specify the number of ancestral populations to use in admixture estimations (overrides number chosen from '''-e'''). Structure explained by individual allele frequencies may therefore not reflect the manually chosen K. It is recommended to adjust '''-e''' instead of '''-admix_K'''.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 5e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of batches to use in NMF method. (Default: 5)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Allows for F-values between -1 and 1. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
Estimator using an estimated kinship matrix. Allows for F-values between -1 and 1. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate].  &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
Use likelihood ratio tests (.lrt.sites.gz) generated from '''-inbreedSites''' to filter out variable sites using a given threshold for HWE test p-value:&lt;br /&gt;
&lt;br /&gt;
; -HWE_filter [LRT filename]&lt;br /&gt;
; -HWE_tole [float]&lt;br /&gt;
Tolerance value for HWE test. (Default: 1e-6)&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs. Outputs the selection statistics and must be converted to p-values by user. Each column reflect the selection statistics along a tested PC and they are χ²-distributed with 1 degree of freedom.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. Outputs the selection statistics and must be converted to p-values by user. Selection statistics are χ²-distributed with '''-e''' degrees of freedom (number of significant eigenvectors).&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
Our methods for inferring population structure have been published in GENETICS:&lt;br /&gt;
&lt;br /&gt;
[http://www.genetics.org/content/early/2018/08/21/genetics.118.301336 Inferring Population Structure and Admixture Proportions in Low Depth NGS Data]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=858</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=858"/>
		<updated>2018-08-23T14:51:07Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods and is able to perform multiple population genetic analyses in heterogeneous populations. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.95&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy'''[http://www.numpy.org], '''scipy'''[https://www.scipy.org], '''pandas'''[https://pandas.pydata.org], '''numba'''[https://numba.pydata.org] and '''pysnptools'''[https://github.com/MicrosoftGenomics/PySnpTools].&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -inbreed 2 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out data -nThreads 10 -doGlf 2 -doMajorMinor 1 -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format (.beagle.gz).&lt;br /&gt;
; -indf [Individual allele frequencies filename]&lt;br /&gt;
Input file of individual allele frequencies (binary). Numpy format (.indf.npy).&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix. (.bed, .bim, .fam)&lt;br /&gt;
; -epsilon [float]&lt;br /&gt;
Include error assumption in PLINK genotypes. (Default: 0.00)&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 1e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -indf_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
; -expg_save&lt;br /&gt;
Choose to save estimated genotype dosages (Binary). Numpy format (.npy).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha.&lt;br /&gt;
; -admix_K [int]&lt;br /&gt;
Not recommended. Manually specify the number of ancestral populations to use in admixture estimations (overrides number chosen from '''-e'''). Structure explained by individual allele frequencies may therefore not reflect the manually chosen K. It is recommended to adjust '''-e''' instead of '''-admix_K'''.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 5e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of batches to use in NMF method. (Default: 5)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Allows for F-values between -1 and 1. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
Estimator using an estimated kinship matrix. Allows for F-values between -1 and 1. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate].  &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
Use likelihood ratio tests (.lrt.sites.gz) generated from '''-inbreedSites''' to filter out variable sites using a given threshold for HWE test p-value:&lt;br /&gt;
&lt;br /&gt;
; -HWE_filter [LRT filename]&lt;br /&gt;
; -HWE_tole [float]&lt;br /&gt;
Tolerance value for HWE test. (Default: 1e-6)&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs. Outputs the selection statistics and must be converted to p-values by user. Each column reflect the selection statistics along a tested PC and they are χ²-distributed with 1 degree of freedom.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. Outputs the selection statistics and must be converted to p-values by user. Selection statistics are χ²-distributed with '''-e''' degrees of freedom (number of significant eigenvectors).&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
Our methods for inferring population structure have been published in GENETICS:&lt;br /&gt;
[http://www.genetics.org/content/early/2018/08/21/genetics.118.301336 Inferring Population Structure and Admixture Proportions in Low Depth NGS Data]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=857</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=857"/>
		<updated>2018-08-20T11:35:46Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: /* Admixture */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods and is able to perform multiple population genetic analyses in heterogeneous populations. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.95&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy'''[http://www.numpy.org], '''scipy'''[https://www.scipy.org], '''pandas'''[https://pandas.pydata.org], '''numba'''[https://numba.pydata.org] and '''pysnptools'''[https://github.com/MicrosoftGenomics/PySnpTools].&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -inbreed 2 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out data -nThreads 10 -doGlf 2 -doMajorMinor 1 -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format (.beagle.gz).&lt;br /&gt;
; -indf [Individual allele frequencies filename]&lt;br /&gt;
Input file of individual allele frequencies (binary). Numpy format (.indf.npy).&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix. (.bed, .bim, .fam)&lt;br /&gt;
; -epsilon [float]&lt;br /&gt;
Include error assumption in PLINK genotypes. (Default: 0.00)&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 1e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -indf_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
; -expg_save&lt;br /&gt;
Choose to save estimated genotype dosages (Binary). Numpy format (.npy).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha.&lt;br /&gt;
; -admix_K [int]&lt;br /&gt;
Not recommended. Manually specify the number of ancestral populations to use in admixture estimations (overrides number chosen from '''-e'''). Structure explained by individual allele frequencies may therefore not reflect the manually chosen K. It is recommended to adjust '''-e''' instead of '''-admix_K'''.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 5e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of batches to use in NMF method. (Default: 5)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Allows for F-values between -1 and 1. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
Estimator using an estimated kinship matrix. Allows for F-values between -1 and 1. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate].  &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
Use likelihood ratio tests (.lrt.sites.gz) generated from '''-inbreedSites''' to filter out variable sites using a given threshold for HWE test p-value:&lt;br /&gt;
&lt;br /&gt;
; -HWE_filter [LRT filename]&lt;br /&gt;
; -HWE_tole [float]&lt;br /&gt;
Tolerance value for HWE test. (Default: 1e-6)&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs. Outputs the selection statistics and must be converted to p-values by user. Each column reflect the selection statistics along a tested PC and they are χ²-distributed with 1 degree of freedom.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. Outputs the selection statistics and must be converted to p-values by user. Selection statistics are χ²-distributed with '''-e''' degrees of freedom (number of significant eigenvectors).&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
BioRxiv pre-print for population structure and admixture estimation:&lt;br /&gt;
[https://www.biorxiv.org/content/early/2018/05/23/302463 Inferring Population Structure and Admixture Proportions in Low Depth NGS Data]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=856</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=856"/>
		<updated>2018-08-20T11:14:34Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: /* Inbreeding */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods and is able to perform multiple population genetic analyses in heterogeneous populations. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.95&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy'''[http://www.numpy.org], '''scipy'''[https://www.scipy.org], '''pandas'''[https://pandas.pydata.org], '''numba'''[https://numba.pydata.org] and '''pysnptools'''[https://github.com/MicrosoftGenomics/PySnpTools].&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -inbreed 2 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out data -nThreads 10 -doGlf 2 -doMajorMinor 1 -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format (.beagle.gz).&lt;br /&gt;
; -indf [Individual allele frequencies filename]&lt;br /&gt;
Input file of individual allele frequencies (binary). Numpy format (.indf.npy).&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix. (.bed, .bim, .fam)&lt;br /&gt;
; -epsilon [float]&lt;br /&gt;
Include error assumption in PLINK genotypes. (Default: 0.00)&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 1e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -indf_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
; -expg_save&lt;br /&gt;
Choose to save estimated genotype dosages (Binary). Numpy format (.npy).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha.&lt;br /&gt;
; -admix_K [int]&lt;br /&gt;
Not recommended. Manually specify the number of ancestral populations to use in admixture estimations (overrides number chosen from '''-e'''). Structure explained by individual allele frequencies may therefore not reflect the manually chosen K.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 5e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of batches to use in NMF method. (Default: 5)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Allows for F-values between -1 and 1. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
Estimator using an estimated kinship matrix. Allows for F-values between -1 and 1. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate].  &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
Use likelihood ratio tests (.lrt.sites.gz) generated from '''-inbreedSites''' to filter out variable sites using a given threshold for HWE test p-value:&lt;br /&gt;
&lt;br /&gt;
; -HWE_filter [LRT filename]&lt;br /&gt;
; -HWE_tole [float]&lt;br /&gt;
Tolerance value for HWE test. (Default: 1e-6)&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs. Outputs the selection statistics and must be converted to p-values by user. Each column reflect the selection statistics along a tested PC and they are χ²-distributed with 1 degree of freedom.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. Outputs the selection statistics and must be converted to p-values by user. Selection statistics are χ²-distributed with '''-e''' degrees of freedom (number of significant eigenvectors).&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
BioRxiv pre-print for population structure and admixture estimation:&lt;br /&gt;
[https://www.biorxiv.org/content/early/2018/05/23/302463 Inferring Population Structure and Admixture Proportions in Low Depth NGS Data]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=855</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=855"/>
		<updated>2018-08-20T11:14:07Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods and is able to perform multiple population genetic analyses in heterogeneous populations. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.95&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy'''[http://www.numpy.org], '''scipy'''[https://www.scipy.org], '''pandas'''[https://pandas.pydata.org], '''numba'''[https://numba.pydata.org] and '''pysnptools'''[https://github.com/MicrosoftGenomics/PySnpTools].&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -inbreed 2 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out data -nThreads 10 -doGlf 2 -doMajorMinor 1 -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format (.beagle.gz).&lt;br /&gt;
; -indf [Individual allele frequencies filename]&lt;br /&gt;
Input file of individual allele frequencies (binary). Numpy format (.indf.npy).&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix. (.bed, .bim, .fam)&lt;br /&gt;
; -epsilon [float]&lt;br /&gt;
Include error assumption in PLINK genotypes. (Default: 0.00)&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 1e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -indf_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
; -expg_save&lt;br /&gt;
Choose to save estimated genotype dosages (Binary). Numpy format (.npy).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha.&lt;br /&gt;
; -admix_K [int]&lt;br /&gt;
Not recommended. Manually specify the number of ancestral populations to use in admixture estimations (overrides number chosen from '''-e'''). Structure explained by individual allele frequencies may therefore not reflect the manually chosen K.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 5e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of batches to use in NMF method. (Default: 5)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Allows for F-values between 0 and 1. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
Estimator using an estimated kinship matrix. Allows for F-values between 0 and 1. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate].  &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
Use likelihood ratio tests (.lrt.sites.gz) generated from '''-inbreedSites''' to filter out variable sites using a given threshold for HWE test p-value:&lt;br /&gt;
&lt;br /&gt;
; -HWE_filter [LRT filename]&lt;br /&gt;
; -HWE_tole [float]&lt;br /&gt;
Tolerance value for HWE test. (Default: 1e-6)&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs. Outputs the selection statistics and must be converted to p-values by user. Each column reflect the selection statistics along a tested PC and they are χ²-distributed with 1 degree of freedom.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. Outputs the selection statistics and must be converted to p-values by user. Selection statistics are χ²-distributed with '''-e''' degrees of freedom (number of significant eigenvectors).&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
BioRxiv pre-print for population structure and admixture estimation:&lt;br /&gt;
[https://www.biorxiv.org/content/early/2018/05/23/302463 Inferring Population Structure and Admixture Proportions in Low Depth NGS Data]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=854</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=854"/>
		<updated>2018-08-20T11:12:14Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods and is able to perform multiple population genetic analyses in heterogeneous populations. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.95&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy'''[http://www.numpy.org], '''scipy'''[https://www.scipy.org], '''pandas'''[https://pandas.pydata.org], '''numba'''[https://numba.pydata.org] and '''pysnptools'''[https://github.com/MicrosoftGenomics/PySnpTools].&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -inbreed 2 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out data -nThreads 10 -doGlf 2 -doMajorMinor 1 -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format (.beagle.gz).&lt;br /&gt;
; -indf [Individual allele frequencies filename]&lt;br /&gt;
Input file of individual allele frequencies (binary). Numpy format (.indf.npy).&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix. (.bed, .bim, .fam)&lt;br /&gt;
; -epsilon [float]&lt;br /&gt;
Include error assumption in PLINK genotypes. (Default: 0.00)&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 1e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -indf_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
; -expg_save&lt;br /&gt;
Choose to save estimated genotype dosages (Binary). Numpy format (.npy).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed and -admix_K. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha and -admix_K.&lt;br /&gt;
; -admix_K [int]&lt;br /&gt;
Not recommended. Manually specify the number of ancestral populations to use in admixture estimations (overrides number chosen from '''-e'''). Structure explained by individual allele frequencies might therefore not reflect the manually chosen K.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 5e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of batches to use in NMF method. (Default: 5)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Allows for F-values between 0 and 1. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
Estimator using an estimated kinship matrix. Allows for F-values between 0 and 1. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate].  &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
Use likelihood ratio tests (.lrt.sites.gz) generated from '''-inbreedSites''' to filter out variable sites using a given threshold for HWE test p-value:&lt;br /&gt;
&lt;br /&gt;
; -HWE_filter [LRT filename]&lt;br /&gt;
; -HWE_tole [float]&lt;br /&gt;
Tolerance value for HWE test. (Default: 1e-6)&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs. Outputs the selection statistics and must be converted to p-values by user. Each column reflect the selection statistics along a tested PC and they are χ²-distributed with 1 degree of freedom.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. Outputs the selection statistics and must be converted to p-values by user. Selection statistics are χ²-distributed with '''-e''' degrees of freedom (number of significant eigenvectors).&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
BioRxiv pre-print for population structure and admixture estimation:&lt;br /&gt;
[https://www.biorxiv.org/content/early/2018/05/23/302463 Inferring Population Structure and Admixture Proportions in Low Depth NGS Data]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=840</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=840"/>
		<updated>2018-06-14T08:41:54Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods and is able to perform multiple population genetic analyses in heterogeneous populations. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.9&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy'''[http://www.numpy.org], '''scipy'''[https://www.scipy.org], '''pandas'''[https://pandas.pydata.org], '''numba'''[https://numba.pydata.org] and '''pysnptools'''[https://github.com/MicrosoftGenomics/PySnpTools].&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -inbreed 1 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out data -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format.&lt;br /&gt;
; -indf [Individual allele frequencies filename]&lt;br /&gt;
Input file of individual allele frequencies (binary). Numpy format (.indf.npy).&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix. (.bed, .bim, .fam)&lt;br /&gt;
; -epsilon [float]&lt;br /&gt;
Include error assumption in PLINK genotypes. (Default: 0.00)&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 1e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -indf_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
; -expg_save&lt;br /&gt;
Choose to save estimated genotype dosages (Binary). Numpy format (.npy).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed and -admix_K. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha and -admix_K.&lt;br /&gt;
; -admix_K [int-list]&lt;br /&gt;
Not recommended. Specify number of ancestral populations to use in admixture estimations. Can be specified as a sequence to try several K's in a single run.  Fully compatible with -admix_alpha and -admix_seed.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 1e-4)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of batches to use in NMF method. (Default: 5)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
Use likelihood ratio tests (.lrt.sites.gz) to filter out variable sites using a given threshold for HWE test p-value:&lt;br /&gt;
&lt;br /&gt;
; -HWE_filter [LRT filename]&lt;br /&gt;
; -HWE_tole [float]&lt;br /&gt;
Tolerance value for HWE test. (Default: 1e-6)&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
BioRxiv pre-print for population structure and admixture estimation:&lt;br /&gt;
[https://www.biorxiv.org/content/early/2018/05/23/302463 Inferring Population Structure and Admixture Proportions in Low Depth NGS Data]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=839</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=839"/>
		<updated>2018-05-30T14:32:42Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods and is able to perform multiple population genetic analyses in heterogeneous populations. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.9&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy'''[http://www.numpy.org], '''scipy'''[https://www.scipy.org], '''pandas'''[https://pandas.pydata.org], '''numba'''[https://numba.pydata.org] and '''pysnptools'''[https://github.com/MicrosoftGenomics/PySnpTools].&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -inbreed 1 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out data -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format.&lt;br /&gt;
; -indf [Individual allele frequencies filename]&lt;br /&gt;
Input file of individual allele frequencies (binary). Numpy format (.indf.npy).&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix. (.bed, .bim, .fam)&lt;br /&gt;
; -epsilon [float]&lt;br /&gt;
Include error assumption in PLINK genotypes. (Default: 0.00)&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 1e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -indf_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
; -expg_save&lt;br /&gt;
Choose to save estimated genotype dosages (Binary). Numpy format (.npy).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed and -admix_K. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha and -admix_K.&lt;br /&gt;
; -admix_K [int-list]&lt;br /&gt;
Not recommended. Specify number of ancestral populations to use in admixture estimations. Can be specified as a sequence to try several K's in a single run.  Fully compatible with -admix_alpha and -admix_seed.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 1e-4)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of batches to use in NMF method. (Default: 5)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies (Binary). Numpy format (.npy).&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
Use likelihood ratio tests (.lrt.sites.gz) to filter out variable sites using a given threshold for HWE test p-value:&lt;br /&gt;
&lt;br /&gt;
; -HWE_filter [LRT filename]&lt;br /&gt;
; -HWE_tole [float]&lt;br /&gt;
Tolerance value for HWE test. (Default: 1e-6)&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
BioRxiv pre-print for population structure and admixture estimation:&lt;br /&gt;
[https://www.biorxiv.org/content/early/2018/04/17/302463 Inferring Population Structure and Admixture Proportions in Low Depth NGS Data]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=838</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=838"/>
		<updated>2018-04-20T07:29:28Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods and is able to perform multiple population genetic analyses in heterogeneous populations. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.9&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy'''[http://www.numpy.org], '''scipy'''[https://www.scipy.org], '''pandas'''[https://pandas.pydata.org], '''numba'''[https://numba.pydata.org] and '''pysnptools'''[https://github.com/MicrosoftGenomics/PySnpTools].&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -inbreed 1 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out data -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format.&lt;br /&gt;
; -indf [Individual allele frequencies filename]&lt;br /&gt;
Input file of individual allele frequencies (binary).&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix. (.bed, .bim, .fam)&lt;br /&gt;
; -n [int] '''(Required)'''&lt;br /&gt;
Specify the number of individuals in dataset.&lt;br /&gt;
; -epsilon [float]&lt;br /&gt;
Include error assumption in PLINK genotypes. (Default: 0.00)&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 5e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 5e-5)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -freq_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed and -admix_K. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha and -admix_K.&lt;br /&gt;
; -admix_K [int-list]&lt;br /&gt;
Not recommended. Specify number of ancestral populations to use in admixture estimations. Can be specified as a sequence to try several K's in a single run.  Fully compatible with -admix_alpha and -admix_seed.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 5e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of batches to use in NMF method. (Default: 5)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies (Binary).&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 5e-5)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
BioRxiv pre-print for population structure and admixture estimation:&lt;br /&gt;
[https://www.biorxiv.org/content/early/2018/04/17/302463 Inferring Population Structure and Admixture Proportions in Low Depth NGS Data]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=837</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=837"/>
		<updated>2018-04-20T07:26:20Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods and is able to perform multiple population genetic analyses in heterogeneous populations. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.9&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy''', '''scipy''', '''pandas''', '''numba''' and '''pysnptools'''.&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -inbreed 1 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out data -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format.&lt;br /&gt;
; -indf [Individual allele frequencies filename]&lt;br /&gt;
Input file of individual allele frequencies (binary).&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix. (.bed, .bim, .fam)&lt;br /&gt;
; -n [int] '''(Required)'''&lt;br /&gt;
Specify the number of individuals in dataset.&lt;br /&gt;
; -epsilon [float]&lt;br /&gt;
Include error assumption in PLINK genotypes. (Default: 0.00)&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 5e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 5e-5)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -freq_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed and -admix_K. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha and -admix_K.&lt;br /&gt;
; -admix_K [int-list]&lt;br /&gt;
Not recommended. Specify number of ancestral populations to use in admixture estimations. Can be specified as a sequence to try several K's in a single run.  Fully compatible with -admix_alpha and -admix_seed.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 5e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of batches to use in NMF method. (Default: 5)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies (Binary).&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 5e-5)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
BioRxiv pre-print for population structure and admixture estimation:&lt;br /&gt;
[[https://www.biorxiv.org/content/early/2018/04/17/302463|https://www.biorxiv.org/content/early/2018/04/17/302463]]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=836</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=836"/>
		<updated>2018-04-20T07:20:55Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: /* Citation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods and is able to perform multiple population genetic analyses in heterogeneous populations. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.9&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy''', '''scipy''', '''pandas''', '''numba''' and '''pysnptools'''.&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -inbreed 1 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out data -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format.&lt;br /&gt;
; -indf [Individual allele frequencies filename]&lt;br /&gt;
Input file of individual allele frequencies (binary).&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix. (.bed, .bim, .fam)&lt;br /&gt;
; -n [int] '''(Required)'''&lt;br /&gt;
Specify the number of individuals in dataset.&lt;br /&gt;
; -epsilon [float]&lt;br /&gt;
Include error assumption in PLINK genotypes. (Default: 0.00)&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 5e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 5e-5)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -freq_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed and -admix_K. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha and -admix_K.&lt;br /&gt;
; -admix_K [int-list]&lt;br /&gt;
Not recommended. Specify number of ancestral populations to use in admixture estimations. Can be specified as a sequence to try several K's in a single run.  Fully compatible with -admix_alpha and -admix_seed.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 5e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of batches to use in NMF method. (Default: 5)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies (Binary).&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 5e-5)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
BioRxiv pre-print for population structure and admixture estimation:&lt;br /&gt;
[https://www.biorxiv.org/content/early/2018/04/17/302463]&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=835</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=835"/>
		<updated>2018-04-20T07:18:57Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods and is able to perform multiple population genetic analyses in heterogeneous populations. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.9&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy''', '''scipy''', '''pandas''', '''numba''' and '''pysnptools'''.&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -inbreed 1 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle data.beagle.gz -n 100 -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out data -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Input file of genotype likelihoods in Beagle format.&lt;br /&gt;
; -indf [Individual allele frequencies filename]&lt;br /&gt;
Input file of individual allele frequencies (binary).&lt;br /&gt;
; -plink [Prefix for binary PLINK files]&lt;br /&gt;
Path to PLINK files using their prefix. (.bed, .bim, .fam)&lt;br /&gt;
; -n [int] '''(Required)'''&lt;br /&gt;
Specify the number of individuals in dataset.&lt;br /&gt;
; -epsilon [float]&lt;br /&gt;
Include error assumption in PLINK genotypes. (Default: 0.00)&lt;br /&gt;
; -minMaf [float]&lt;br /&gt;
Minimum minor allele frequency threshold. (Default: 0.05)&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 5e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 5e-5)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -freq_save&lt;br /&gt;
Choose to save estimated individual allele frequencies (Binary).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed and -admix_K. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha and -admix_K.&lt;br /&gt;
; -admix_K [int-list]&lt;br /&gt;
Not recommended. Specify number of ancestral populations to use in admixture estimations. Can be specified as a sequence to try several K's in a single run.  Fully compatible with -admix_alpha and -admix_seed.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 5e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of batches to use in NMF method. (Default: 5)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies (Binary).&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 5e-5)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=818</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=818"/>
		<updated>2018-01-12T13:32:05Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods and is able to perform multiple population genetic analyses in heterogeneous populations. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.8&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy''', '''scipy''', '''pandas''', '''sklearn''' and '''numba'''.&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -inbreed 1 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename] '''(Required)'''&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -n [int] '''(Required)'''&lt;br /&gt;
Specify the number of individuals in dataset.&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 5e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 5e-5)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -freq_save&lt;br /&gt;
Choose to save estimated allele frequencies (both individual and population).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed and -admix_K. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha and -admix_K.&lt;br /&gt;
; -admix_K [int-list]&lt;br /&gt;
Not recommended. Specify number of ancestral populations to use in admixture estimations. Can be specified as a sequence to try several K's in a single run.  Fully compatible with -admix_alpha and -admix_seed.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 1e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of mini-batches to use in NMF method. (Default: 20)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 5e-5)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=817</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=817"/>
		<updated>2018-01-12T13:23:25Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.8&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy''', '''scipy''', '''pandas''', '''sklearn''' and '''numba'''.&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -inbreed 1 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename] '''(Required)'''&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -n [int] '''(Required)'''&lt;br /&gt;
Specify the number of individuals in dataset.&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 5e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 5e-5)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -freq_save&lt;br /&gt;
Choose to save estimated allele frequencies (both individual and population).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed and -admix_K. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha and -admix_K.&lt;br /&gt;
; -admix_K [int-list]&lt;br /&gt;
Not recommended. Specify number of ancestral populations to use in admixture estimations. Can be specified as a sequence to try several K's in a single run.  Fully compatible with -admix_alpha and -admix_seed.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 1e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of mini-batches to use in NMF method. (Default: 20)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 5e-5)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=816</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=816"/>
		<updated>2018-01-12T13:22:56Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.8&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy''', '''scipy''', '''pandas''', '''sklearn''' and '''numba'''.&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -inbreed 1 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename] '''Required'''&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -n [int] '''Required'''&lt;br /&gt;
Specify the number of individuals in dataset.&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 5e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 5e-5)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -freq_save&lt;br /&gt;
Choose to save estimated allele frequencies (both individual and population).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alpha's in a single run. Fully compatible with -admix_seed and -admix_K. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha and -admix_K.&lt;br /&gt;
; -admix_K [int-list]&lt;br /&gt;
Not recommended. Specify number of ancestral populations to use in admixture estimations. Can be specified as a sequence to try several K's in a single run.  Fully compatible with -admix_alpha and -admix_seed.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 1e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of mini-batches to use in NMF method. (Default: 20)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 5e-5)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=815</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=815"/>
		<updated>2018-01-12T13:21:09Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.8&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy''', '''scipy''', '''pandas''', '''sklearn''' and '''numba'''.&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -inbreed 1 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename] '''Required'''&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -n [int] '''Required'''&lt;br /&gt;
Specify the number of individuals in dataset.&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 5e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 5e-5)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -freq_save&lt;br /&gt;
Choose to save estimated allele frequencies (both individual and population).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alphas in a single run. Fully compatible with -admix_seed and -admix_K. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha and -admix_K.&lt;br /&gt;
; -admix_K [int-list]&lt;br /&gt;
Not recommended. Specify number of ancestral populations to use in admixture estimations. Can be specified as a sequence to try several K's in a single run.  Fully compatible with -admix_alpha and -admix_seed.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 1e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of mini-batches to use in NMF method. (Default: 20)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 5e-5)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=814</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=814"/>
		<updated>2018-01-12T13:20:24Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on population structure inference, PCAngsd is able to estimate individual allele frequencies. These individual allele frequencies can be used in various population genetic methods for heterogeneous populations, such that PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate individual admixture proportions, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components. The entire program is written in Python 2.7 and is multithreaded to take advantage of several CPUs.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.8&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd: &lt;br /&gt;
'''numpy''', '''scipy''', '''pandas''', '''sklearn''' and '''numba'''.&lt;br /&gt;
&lt;br /&gt;
The packages and their dependencies can easily be installed using the following command inside the pcangsd folder:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install --user -r python_packages.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is highly recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix using 10 threads&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and individual admixture proportions&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -admix -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -inbreed 1 -o test -threads 10&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -n 100 -selection 1 -o test -threads 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more information on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. PCAngsd will always compute the covariance matrix, where it uses principal components to estimate individual allele frequencies in an iterative procedure. The estimated individual allele frequencies will then be used in any of the other specified options of PCAngsd.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Estimation of individual allele frequencies==&lt;br /&gt;
; -beagle [Beagle filename] '''Required'''&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -n [int] '''Required'''&lt;br /&gt;
Specify the number of individuals in dataset.&lt;br /&gt;
; -threads [int]&lt;br /&gt;
Specify the number of thread(s) to use. (Default: 1)&lt;br /&gt;
; -iter [int]&lt;br /&gt;
Maximum number of iterations for estimation of individual allele frequencies. (Default: 100)&lt;br /&gt;
; -tole [float]&lt;br /&gt;
Tolerance value for update in estimation of individual allele frequencies. (Default: 5e-5)&lt;br /&gt;
; -maf [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -maf_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 5e-5)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested using MAP test)&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
; -freq_save&lt;br /&gt;
Choose to save estimated allele frequencies (both individual and population).&lt;br /&gt;
; -sites_save&lt;br /&gt;
Choose to save the marker IDs after performing filtering using population allele frequencies. Useful for especially selection scans and per-site inbreeding coefficients.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required, since individual inbreeding coefficients must have been estimated prior to calling genotypes using that information.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Admixture==&lt;br /&gt;
Individual admixture proportions and population-specific allele frequencies can be estimated based on assuming K ancestral populations using an accelerated mini-batch NMF method.&lt;br /&gt;
&lt;br /&gt;
; -admix&lt;br /&gt;
Toggles admixture estimations.&lt;br /&gt;
; -admix_alpha [int-list]&lt;br /&gt;
Specify alpha (sparseness regularization parameter). Can be specified as a sequence to try several alphas in a single run. Fully compatible with -admix_seed and -admix_K. (Default: 0)&lt;br /&gt;
; -admix_seed [int-list]&lt;br /&gt;
Specify seed for random initializations of factor matrices in admixture estimations. Can be specified as a sequence to try several different seeds in a single run. Fully compatible with -admix_alpha and -admix_K.&lt;br /&gt;
; -admix_K [int-list]&lt;br /&gt;
Not recommended. Specify number of ancestral populations to use in admixture estimations. Can be specified as a sequence to try several K's in a single run.  Fully compatible with -admix_alpha and -admix_seed.&lt;br /&gt;
; -admix_iter [int]&lt;br /&gt;
Maximum number of iterations for admixture estimations using NMF. (Default: 100)&lt;br /&gt;
; -admix_tole [float]&lt;br /&gt;
Tolerance value for update in admixture estimations using NMF. (Default: 1e-5)&lt;br /&gt;
; -admix_batch [int]&lt;br /&gt;
Specify the number of mini-batches to use in NMF method. (Default: 20)&lt;br /&gt;
; -admix_save&lt;br /&gt;
Choose to save the population-specific allele frequencies.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods. However, -inbreed 2 is recommended for low depth cases.&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 5e-5)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes (genotype dosages):&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=805</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=805"/>
		<updated>2017-09-21T14:13:49Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''', '''scipy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when computing the covariance matrix. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
Select the number of preceding sites to use in LD regression.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run PCAngsd in chunk-mode a pre-estimated covariance matrix must be provided. The estimation of the covariance matrix can be feasible by estimating it from a representative subset of the data set. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=804</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=804"/>
		<updated>2017-09-20T08:42:24Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''', '''scipy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when computing the covariance matrix. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
Select the number of preceding sites to use in LD regression.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. '''-inbreed [int]''' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if '''-inbreed 3''' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=803</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=803"/>
		<updated>2017-09-20T08:39:34Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px|Simulated low depth NGS data of 3 populations]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when computing the covariance matrix. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
Select the number of preceding sites to use in LD regression.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=802</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=802"/>
		<updated>2017-09-19T15:16:12Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when computing the covariance matrix. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
Select the number of preceding sites to use in LD regression.&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods based on posterior expectations of the genotypes:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=801</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=801"/>
		<updated>2017-09-19T15:12:41Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame|Including admixed individuals]]&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=800</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=800"/>
		<updated>2017-09-19T15:11:13Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px]]&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame|Including admixed individuals]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=799</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=799"/>
		<updated>2017-09-19T15:10:06Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px]]&lt;br /&gt;
[[File:Pcangsd_admix.gif|right|frame|Including admixed individuals]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=File:Pcangsd_admix.gif&amp;diff=798</id>
		<title>File:Pcangsd admix.gif</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=File:Pcangsd_admix.gif&amp;diff=798"/>
		<updated>2017-09-19T15:08:28Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: Jonas2 uploaded a new version of File:Pcangsd admix.gif&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=File:Pcangsd_admix.gif&amp;diff=797</id>
		<title>File:Pcangsd admix.gif</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=File:Pcangsd_admix.gif&amp;diff=797"/>
		<updated>2017-09-19T14:34:28Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: Jonas2 uploaded a new version of File:Pcangsd admix.gif&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=File:Pcangsd_admix.gif&amp;diff=796</id>
		<title>File:Pcangsd admix.gif</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=File:Pcangsd_admix.gif&amp;diff=796"/>
		<updated>2017-09-19T14:30:21Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: Jonas2 uploaded a new version of File:Pcangsd admix.gif&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=795</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=795"/>
		<updated>2017-09-19T14:26:03Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|400px]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=794</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=794"/>
		<updated>2017-09-19T14:25:26Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb|200px]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=File:Pcangsd_admix.gif&amp;diff=793</id>
		<title>File:Pcangsd admix.gif</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=File:Pcangsd_admix.gif&amp;diff=793"/>
		<updated>2017-09-19T14:23:51Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: Jonas2 uploaded a new version of File:Pcangsd admix.gif&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=792</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=792"/>
		<updated>2017-09-19T14:23:15Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=791</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=791"/>
		<updated>2017-09-19T14:18:06Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frameless|500px]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=790</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=790"/>
		<updated>2017-09-19T14:17:03Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|right|200px]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=789</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=789"/>
		<updated>2017-09-19T14:16:19Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|thumb|500px]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=788</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=788"/>
		<updated>2017-09-19T14:16:06Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|thumb|200px]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=787</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=787"/>
		<updated>2017-09-19T14:15:23Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame|50px]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=786</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=786"/>
		<updated>2017-09-19T14:14:39Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frame]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=785</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=785"/>
		<updated>2017-09-19T14:14:17Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|frameless]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=784</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=784"/>
		<updated>2017-09-19T14:13:29Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=783</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=783"/>
		<updated>2017-09-19T14:11:12Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_admix.gif|thumb]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=782</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=782"/>
		<updated>2017-09-19T14:10:31Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb]]&lt;br /&gt;
[[File:Pcangsd_admix.gif|thumb]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=File:Pcangsd_admix.gif&amp;diff=781</id>
		<title>File:Pcangsd admix.gif</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=File:Pcangsd_admix.gif&amp;diff=781"/>
		<updated>2017-09-19T14:09:55Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=780</id>
		<title>PCAngsd</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=PCAngsd&amp;diff=780"/>
		<updated>2017-09-19T13:35:47Z</updated>

		<summary type="html">&lt;p&gt;Jonas2: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program PCAngsd, which estimates the covariance matrix for low depth NGS data in an iterative procedure based on genotype likelihoods. Based on the population structure inference PCAngsd is able to estimate individual allele frequencies. By incorporating these allele frequencies in Empirical Bayes approaches, PCAngsd can perform PCA (estimate covariance matrix), call genotypes, estimate inbreeding coefficients (per-individual and per-site) and perform a genome selection scan using principal components in structured populations. The entire program is written in Python 2.7.&lt;br /&gt;
&lt;br /&gt;
[[File:Pcangsd_pca.png|thumb]]&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from Github:&lt;br /&gt;
https://github.com/Rosemeis/pcangsd&lt;br /&gt;
&lt;br /&gt;
Latest release of PCAngsd: 0.3&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/Rosemeis/pcangsd.git;&lt;br /&gt;
cd pcangsd/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following Python packages are needed to run PCAngsd (found in all popular distributions): &lt;br /&gt;
'''numpy''' and '''pandas'''.&lt;br /&gt;
&lt;br /&gt;
PCAngsd should work on all platforms meeting the requirements but server-side usage is recommended.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# See all options in PCAngsd&lt;br /&gt;
python pcangsd.py -h&lt;br /&gt;
&lt;br /&gt;
# Only estimate covariance matrix &lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and inbreeding coefficients&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -inbreed 1 -o test&lt;br /&gt;
&lt;br /&gt;
# Estimate covariance matrix and perform selection scan&lt;br /&gt;
python pcangsd.py -beagle test.beagle.gz -selection 1 -o test&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
The only input PCAngsd needs and accepts are genotype likelihoods in  [http://faculty.washington.edu/browning/beagle/beagle.html Beagle] format. [http://popgen.dk/angsd ANGSD] can be easily be used to compute genotype likelihoods and output them in the required Beagle format.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./angsd -GL 1 -out genoLikes -nThreads 10 -doGlf 2 -doMajorMinor 1  -doMaf 2 -SNP_pval 1e-6 -bam bam.filelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
See [http://popgen.dk/angsd ANGSD] for more info on how to compute the genotype likelihoods and call SNPs.&lt;br /&gt;
&lt;br /&gt;
=Using PCAngsd=&lt;br /&gt;
&lt;br /&gt;
All the different options in PCAngsd are listed here. Usually all the desired analyses must be run in the same command, however PCAngsd can also be run in chunk-mode where per-site estimations are performed on a chunk of the data at a time using a pre-estimated covariance matrix. More information of chunk-mode estimations can be found [[#Chunk-mode estimations|here]].&lt;br /&gt;
&lt;br /&gt;
PCAngsd will always compute the covariance matrix (unless performing in chunk-mode estimations). It uses the computed principal components to estimate individual allele frequencies in an iterative procedure. This procedure is performed until the individual allele frequencies have converged.&lt;br /&gt;
&lt;br /&gt;
; -beagle [Beagle filename]&lt;br /&gt;
Path to file of the genotype likelihoods in Beagle format.&lt;br /&gt;
; -beaglelist [filelist]&lt;br /&gt;
Parse a file with a list of multiple Beagle files, e.g. if the genotype likelihoods have been computed separately for each chromosome.&lt;br /&gt;
; -M [int]&lt;br /&gt;
Maximum number of iterations for covariance estimation. Only needed in rare cases. (Default: 100)&lt;br /&gt;
; -M_tole [float]&lt;br /&gt;
Tolerance value for the iterative covariance matrix estimation. (Default: 1e-4)&lt;br /&gt;
; -EM [int]&lt;br /&gt;
Maximum number of EM iterations for computing the population allele frequencies. (Default: 200)&lt;br /&gt;
; -EM_tole [float]&lt;br /&gt;
Tolerance value in EM algorithm for population allele frequencies estimation. (Default: 1e-4)&lt;br /&gt;
; -e [int]&lt;br /&gt;
Manually select the number of eigenvalues to use in the modelling of individual allele frequencies. (Default: Automatically tested)&lt;br /&gt;
; -reg [float]&lt;br /&gt;
Add regularization term in the modelling of individual allele frequencies to perform ridge regression. May help on convergence for individual allele frequencies. Must be used when scaling principal components prior to the modelling of individual allele frequencies.&lt;br /&gt;
; -scaled&lt;br /&gt;
Scale significant principal components in relation to the top principal component using their corresponding eigenvalues prior to modelling individual allele frequencies.&lt;br /&gt;
; -o [prefix]&lt;br /&gt;
Set the prefix for all output files created by PCAngsd (Default: &amp;quot;pcangsd&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
==Call genotypes==&lt;br /&gt;
Genotypes can be called from posterior genotype probabilities incorporating the individual allele frequencies in prior.&lt;br /&gt;
&lt;br /&gt;
; -geno [float]&lt;br /&gt;
Call genotypes with defined threshold.&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
Call genotypes with defined threshold also taking inbreeding into account. ''-inbreed'' is required.&lt;br /&gt;
&lt;br /&gt;
==Inbreeding==&lt;br /&gt;
Per-individual inbreeding coefficients incorporating population structure can be computed using three different methods:&lt;br /&gt;
&lt;br /&gt;
; -inbreed 1&lt;br /&gt;
A maximum likelihood estimator computed by an EM algorithm. Only allows for F-values between 0 and 1. Based on [https://www.cambridge.org/core/journals/genetics-research/article/maximum-likelihood-estimation-of-individual-inbreeding-coefficients-and-null-allele-frequencies/2DEBA0C0C2B92DF0EE89BD27DFCAD3FB].&lt;br /&gt;
; -inbreed 2&lt;br /&gt;
Simple estimator also computed by an EM algorithm. Based on [http://genome.cshlp.org/content/23/11/1852.full ngsF].&lt;br /&gt;
; -inbreed 3&lt;br /&gt;
(Not recommended for low depth NGS data!) Estimator using the kinship matrix. Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]. &lt;br /&gt;
; -inbreed_iter [int]&lt;br /&gt;
Maximum number of iterations for the EM algorithm methods. (Default: 200)&lt;br /&gt;
; -inbreed_tole [float]&lt;br /&gt;
Tolerance value for the EM algorithms for inbreeding coefficients estimation. (Default: 1e-4)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Per-site inbreeding coefficients incorporating population structure alongside likehood ratio tests for HWE can be computed as follows:&lt;br /&gt;
&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
&lt;br /&gt;
==Selection==&lt;br /&gt;
A genome selection scan can be computed using two different methods:&lt;br /&gt;
&lt;br /&gt;
; -selection 1&lt;br /&gt;
Using an extended model of [http://www.cell.com/ajhg/abstract/S0002-9297(16)00003-3 FastPCA]. Performs a genome selection scan along all significant PCs.&lt;br /&gt;
; -selection 2&lt;br /&gt;
(Not fully tested!) Using an extended model of [http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12592/abstract PCAdapt]. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
LD can also be taken into account when performing selection scans. LD regression has been implemented in PCAngsd.&lt;br /&gt;
; -LD [int]&lt;br /&gt;
(Not fully tested!) Select the window (in bases) of preceding sites to use in regression.&lt;br /&gt;
&lt;br /&gt;
==Relatedness==&lt;br /&gt;
'''Work in progress...'''&lt;br /&gt;
&lt;br /&gt;
Estimate kinship matrix based on method Based on [http://www.cell.com/ajhg/abstract/S0002-9297(15)00493-0 PC-Relate]:&lt;br /&gt;
&lt;br /&gt;
; -kinship&lt;br /&gt;
Automatically estimated if ''-inbreed 3'' has been selected.&lt;br /&gt;
&lt;br /&gt;
==Chunk-mode estimations==&lt;br /&gt;
PCAngsd can also be run in chunk-mode, where a chunk of the data is processed at a time. This means that estimations on very large data sets are feasible for per-site parameters. In order to run chunk-mode a pre-estimated covariance matrix must be provided, which can be estimated from a representative subset of the data set such that the estimation of the covariance matrix is feasible. Chunk-mode estimations are enabled by specifying the amount of sites to evaluate at a time:&lt;br /&gt;
&lt;br /&gt;
; -chunksize [int]&lt;br /&gt;
Number of sites to read in at a time for chunk-mode estimations.&lt;br /&gt;
; -cov [file]&lt;br /&gt;
Covariance matrix file needed in order to perform chunk-mode estimations.&lt;br /&gt;
&lt;br /&gt;
The following estimations can be performed in chunk-mode (individual allele frequencies are estimated and saved for all sites automatically):&lt;br /&gt;
; -selection 1&lt;br /&gt;
; -selection 2&lt;br /&gt;
; -inbreedSites&lt;br /&gt;
; -geno [float]&lt;br /&gt;
&lt;br /&gt;
Note: Genotypes can also be called incorporating both individual allele frequencies and inbreeding coefficients, however one must also provide pre-estimated per-individual inbreeding coefficients as done with the covariance matrix:&lt;br /&gt;
&lt;br /&gt;
; -F [file]&lt;br /&gt;
; -genoInbreed [float]&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Jonas2</name></author>
	</entry>
</feed>