<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.popgen.dk/software/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Emil</id>
	<title>software - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://www.popgen.dk/software/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Emil"/>
	<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php/Special:Contributions/Emil"/>
	<updated>2026-04-30T14:16:07Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.40.1</generator>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=912</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=912"/>
		<updated>2019-03-23T09:08:53Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Example */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program '''asaMap''', a tool for doing ancestry specific assocaition mapping for large scales genetic studies. It is based on called genotypes in the binary plink format (.bed). The program is written in C++.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
To be added...&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where '''.Q and .P files''' respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - '''should NOT have columns of 1s for intercept (intercept will be included automatically)'''. This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a '''out.log''' logfile and a '''out.res''' with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A '''.res''' file with the likelihoods of each model and the estimated coefficients in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript '''getPvalues.R''' is provided that makes it easy to obtain P-values from the '''.res''' file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix '''.Pvalues''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
asaMap implements a range of linear models, making it possible to test specific hypotheses.&lt;br /&gt;
For the additive model there are 6 different models:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| M0&lt;br /&gt;
| (beta_1, beta_2, delta_1) in R^3&lt;br /&gt;
| effect of non-assumed effect allele&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M1&lt;br /&gt;
| (beta_1, beta_2) in R^2&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| M2&lt;br /&gt;
| beta_1=0, beta_2 in R&lt;br /&gt;
| no effect in population 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M3&lt;br /&gt;
| beta_1 in R, beta_2=0&lt;br /&gt;
| no effect in population 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M4&lt;br /&gt;
| beta_1=beta_2 in R&lt;br /&gt;
| same effect in both populations&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M5&lt;br /&gt;
| beta_1=beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
For the recessive model there are 8 different models:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| R0&lt;br /&gt;
| (beta_1, beta_m, beta_2, delta_1, delta_2) in R^5&lt;br /&gt;
| recessive effect of non-assumed effect alleles&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R1&lt;br /&gt;
| (beta_1, beta_m, beta_2) in R^3&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 3&lt;br /&gt;
|-&lt;br /&gt;
| R2&lt;br /&gt;
| beta_1 in R, beta_m=beta_2 in R&lt;br /&gt;
| same effect when one or both variant alleles are from pop 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R3&lt;br /&gt;
| beta_1=beta_m in R, beta_2 in R&lt;br /&gt;
| same effect when one or both variant alleles are from pop 1&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R4&lt;br /&gt;
| beta_1 in R, beta_m=beta_2=0&lt;br /&gt;
| only an effect when both variant alleles are from pop 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R5&lt;br /&gt;
| beta_1=beta_m=0, beta_2 in R&lt;br /&gt;
| only an effect when both variant alleles are from pop 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R6&lt;br /&gt;
| beta_1=beta_m=beta_2 in R&lt;br /&gt;
| same effect regardless of ancestry&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R7&lt;br /&gt;
| beta_1=beta_m=beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
'''beta_1''' and '''beta_2''' are the effect of the assumed effect-allele in population 1 and 2 respectively. '''beta_m''' is the recessive effect of being recessive for an allele with one copy from population 1 and one copy from population 2. '''delta_1''' and '''delta_2''' are the effect of the assumed non-effect-allele in population 1 and 2 respectively.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=911</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=911"/>
		<updated>2019-03-23T05:40:46Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Input Files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program '''asaMap''', a tool for doing ancestry specific assocaition mapping for large scales genetic studies. It is based on called genotypes in the binary plink format (.bed). The program is written in C++.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where '''.Q and .P files''' respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - '''should NOT have columns of 1s for intercept (intercept will be included automatically)'''. This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a '''out.log''' logfile and a '''out.res''' with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A '''.res''' file with the likelihoods of each model and the estimated coefficients in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript '''getPvalues.R''' is provided that makes it easy to obtain P-values from the '''.res''' file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix '''.Pvalues''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
asaMap implements a range of linear models, making it possible to test specific hypotheses.&lt;br /&gt;
For the additive model there are 6 different models:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| M0&lt;br /&gt;
| (beta_1, beta_2, delta_1) in R^3&lt;br /&gt;
| effect of non-assumed effect allele&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M1&lt;br /&gt;
| (beta_1, beta_2) in R^2&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| M2&lt;br /&gt;
| beta_1=0, beta_2 in R&lt;br /&gt;
| no effect in population 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M3&lt;br /&gt;
| beta_1 in R, beta_2=0&lt;br /&gt;
| no effect in population 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M4&lt;br /&gt;
| beta_1=beta_2 in R&lt;br /&gt;
| same effect in both populations&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M5&lt;br /&gt;
| beta_1=beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
For the recessive model there are 8 different models:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| R0&lt;br /&gt;
| (beta_1, beta_m, beta_2, delta_1, delta_2) in R^5&lt;br /&gt;
| recessive effect of non-assumed effect alleles&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R1&lt;br /&gt;
| (beta_1, beta_m, beta_2) in R^3&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 3&lt;br /&gt;
|-&lt;br /&gt;
| R2&lt;br /&gt;
| beta_1 in R, beta_m=beta_2 in R&lt;br /&gt;
| same effect when one or both variant alleles are from pop 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R3&lt;br /&gt;
| beta_1=beta_m in R, beta_2 in R&lt;br /&gt;
| same effect when one or both variant alleles are from pop 1&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R4&lt;br /&gt;
| beta_1 in R, beta_m=beta_2=0&lt;br /&gt;
| only an effect when both variant alleles are from pop 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R5&lt;br /&gt;
| beta_1=beta_m=0, beta_2 in R&lt;br /&gt;
| only an effect when both variant alleles are from pop 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R6&lt;br /&gt;
| beta_1=beta_m=beta_2 in R&lt;br /&gt;
| same effect regardless of ancestry&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R7&lt;br /&gt;
| beta_1=beta_m=beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
'''beta_1''' and '''beta_2''' are the effect of the assumed effect-allele in population 1 and 2 respectively. '''beta_m''' is the recessive effect of being recessive for an allele with one copy from population 1 and one copy from population 2. '''delta_1''' and '''delta_2''' are the effect of the assumed non-effect-allele in population 1 and 2 respectively.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=Main_Page&amp;diff=910</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=Main_Page&amp;diff=910"/>
		<updated>2019-03-23T05:13:41Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* fastNGSAdmix */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
&lt;br /&gt;
= [http://www.popgen.dk/angsd ANGSD] =&lt;br /&gt;
Analysis of Next Generation Sequencing Data&lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
[sequence data]-&amp;gt;[genotype;likelihoods]&lt;br /&gt;
[genotype;likelihoods]-&amp;gt;[genotype;probabilities]&lt;br /&gt;
[sequence files|bam files;SOAP files{bg:orange}]-&amp;gt;[sequence data]&lt;br /&gt;
[glf files|glfv3;soapSNP{bg:orange}]-&amp;gt;[genotype;likelihoods]&lt;br /&gt;
[genotype prob|beagle output{bg:orange}]-&amp;gt;[genotype;probabilities]&lt;br /&gt;
&amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= [[NgsAdmix]] =&lt;br /&gt;
[[File:NgsAdmix.png|thumb|NGSadmix]]&lt;br /&gt;
Infer the ancestry proportions from low depth NGS data. The principal is the same as other softwares such as FRAPPE and ADMIXTURE however, ngsAdmix also works when you have uncertainty in your data. This makes it ideal for medium and low depth sequencing data where many genotypes cannot be called without introducing errors or ascertainment bias.&lt;br /&gt;
&lt;br /&gt;
= [[fastNGSadmix]] =&lt;br /&gt;
&lt;br /&gt;
Program for estimating admixture proportions and doing principal component analysis of a single NGS sample.&lt;br /&gt;
&lt;br /&gt;
= [[asaMap]] =&lt;br /&gt;
&lt;br /&gt;
Program for doing ancestry-specific association mapping in admixed populations, working with genotypes.&lt;br /&gt;
&lt;br /&gt;
= [[NgsRelate]] =&lt;br /&gt;
A software package for inferring relatedness and inbreeding between pairs of individuals from NGS data. This is a new version.&lt;br /&gt;
&lt;br /&gt;
= [[Relate]]=&lt;br /&gt;
[[File:relate.png|thumb|left|Relate. Infered IBD sharing across a chromosome for a sib pair estimated using affy 500k data]]&lt;br /&gt;
&lt;br /&gt;
This method estimates the probability of sharing alleles identity by descent (IBD) across the genome and can also be used for mapping disease loci using distantly related individuals. To accommodate LD the methods need SNP for several individuals in order to estimate the allele frequencies and the pairwise LD. The method return the posterior probabilities of the IBD states across the genome and the overall IBD sharing.&lt;br /&gt;
&lt;br /&gt;
= [[RelateAdmix]] =&lt;br /&gt;
A software package for inferring relatedness between pairs of individuals even if the individuals are admixed.&lt;br /&gt;
&lt;br /&gt;
= [[BAMSE]] =&lt;br /&gt;
&lt;br /&gt;
= [[CATS]] =&lt;br /&gt;
&lt;br /&gt;
[[File:catsFig2.png|thumb|left|joint power of a non-symetric two stage GWA design]]&lt;br /&gt;
R package for power estimation for a two-stage genome-wide association design. This is a modification of the code from Skol et al 2006, nat genet. so that the relative risk, case-control ratios and allele frequencies are allowed to vary between stages.&lt;br /&gt;
&lt;br /&gt;
= [[IBSrelate]] =&lt;br /&gt;
= [[LDadmix]] = &lt;br /&gt;
= [[R scripts]] =&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=Main_Page&amp;diff=909</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=Main_Page&amp;diff=909"/>
		<updated>2019-03-23T05:12:17Z</updated>

		<summary type="html">&lt;p&gt;Emil: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
&lt;br /&gt;
= [http://www.popgen.dk/angsd ANGSD] =&lt;br /&gt;
Analysis of Next Generation Sequencing Data&lt;br /&gt;
&lt;br /&gt;
&amp;lt;classdiagram type=&amp;quot;dir:LR&amp;quot;&amp;gt;&lt;br /&gt;
[sequence data]-&amp;gt;[genotype;likelihoods]&lt;br /&gt;
[genotype;likelihoods]-&amp;gt;[genotype;probabilities]&lt;br /&gt;
[sequence files|bam files;SOAP files{bg:orange}]-&amp;gt;[sequence data]&lt;br /&gt;
[glf files|glfv3;soapSNP{bg:orange}]-&amp;gt;[genotype;likelihoods]&lt;br /&gt;
[genotype prob|beagle output{bg:orange}]-&amp;gt;[genotype;probabilities]&lt;br /&gt;
&amp;lt;/classdiagram&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= [[NgsAdmix]] =&lt;br /&gt;
[[File:NgsAdmix.png|thumb|NGSadmix]]&lt;br /&gt;
Infer the ancestry proportions from low depth NGS data. The principal is the same as other softwares such as FRAPPE and ADMIXTURE however, ngsAdmix also works when you have uncertainty in your data. This makes it ideal for medium and low depth sequencing data where many genotypes cannot be called without introducing errors or ascertainment bias.&lt;br /&gt;
&lt;br /&gt;
= [[fastNGSAdmix]] =&lt;br /&gt;
&lt;br /&gt;
Program for estimating admixture proportions and doing principal component analysis of a single NGS sample.&lt;br /&gt;
&lt;br /&gt;
= [[asaMap]] =&lt;br /&gt;
&lt;br /&gt;
Program for doing ancestry-specific association mapping in admixed populations, working with genotypes.&lt;br /&gt;
&lt;br /&gt;
= [[NgsRelate]] =&lt;br /&gt;
A software package for inferring relatedness and inbreeding between pairs of individuals from NGS data. This is a new version.&lt;br /&gt;
&lt;br /&gt;
= [[Relate]]=&lt;br /&gt;
[[File:relate.png|thumb|left|Relate. Infered IBD sharing across a chromosome for a sib pair estimated using affy 500k data]]&lt;br /&gt;
&lt;br /&gt;
This method estimates the probability of sharing alleles identity by descent (IBD) across the genome and can also be used for mapping disease loci using distantly related individuals. To accommodate LD the methods need SNP for several individuals in order to estimate the allele frequencies and the pairwise LD. The method return the posterior probabilities of the IBD states across the genome and the overall IBD sharing.&lt;br /&gt;
&lt;br /&gt;
= [[RelateAdmix]] =&lt;br /&gt;
A software package for inferring relatedness between pairs of individuals even if the individuals are admixed.&lt;br /&gt;
&lt;br /&gt;
= [[BAMSE]] =&lt;br /&gt;
&lt;br /&gt;
= [[CATS]] =&lt;br /&gt;
&lt;br /&gt;
[[File:catsFig2.png|thumb|left|joint power of a non-symetric two stage GWA design]]&lt;br /&gt;
R package for power estimation for a two-stage genome-wide association design. This is a modification of the code from Skol et al 2006, nat genet. so that the relative risk, case-control ratios and allele frequencies are allowed to vary between stages.&lt;br /&gt;
&lt;br /&gt;
= [[IBSrelate]] =&lt;br /&gt;
= [[LDadmix]] = &lt;br /&gt;
= [[R scripts]] =&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=908</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=908"/>
		<updated>2019-03-03T19:15:39Z</updated>

		<summary type="html">&lt;p&gt;Emil: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program '''asaMap''', a tool for doing ancestry specific assocaition mapping for large scales genetic studies. It is based on called genotypes in the binary plink format (.bed). The program is written in C++.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where '''.Q and .P files''' respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a '''out.log''' logfile and a '''out.res''' with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A '''.res''' file with the likelihoods of each model and the estimated coefficients in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript '''getPvalues.R''' is provided that makes it easy to obtain P-values from the '''.res''' file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix '''.Pvalues''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
asaMap implements a range of linear models, making it possible to test specific hypotheses.&lt;br /&gt;
For the additive model there are 6 different models:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| M0&lt;br /&gt;
| (beta_1, beta_2, delta_1) in R^3&lt;br /&gt;
| effect of non-assumed effect allele&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M1&lt;br /&gt;
| (beta_1, beta_2) in R^2&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| M2&lt;br /&gt;
| beta_1=0, beta_2 in R&lt;br /&gt;
| no effect in population 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M3&lt;br /&gt;
| beta_1 in R, beta_2=0&lt;br /&gt;
| no effect in population 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M4&lt;br /&gt;
| beta_1=beta_2 in R&lt;br /&gt;
| same effect in both populations&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M5&lt;br /&gt;
| beta_1=beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
For the recessive model there are 8 different models:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| R0&lt;br /&gt;
| (beta_1, beta_m, beta_2, delta_1, delta_2) in R^5&lt;br /&gt;
| recessive effect of non-assumed effect alleles&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R1&lt;br /&gt;
| (beta_1, beta_m, beta_2) in R^3&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 3&lt;br /&gt;
|-&lt;br /&gt;
| R2&lt;br /&gt;
| beta_1 in R, beta_m=beta_2 in R&lt;br /&gt;
| same effect when one or both variant alleles are from pop 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R3&lt;br /&gt;
| beta_1=beta_m in R, beta_2 in R&lt;br /&gt;
| same effect when one or both variant alleles are from pop 1&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R4&lt;br /&gt;
| beta_1 in R, beta_m=beta_2=0&lt;br /&gt;
| only an effect when both variant alleles are from pop 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R5&lt;br /&gt;
| beta_1=beta_m=0, beta_2 in R&lt;br /&gt;
| only an effect when both variant alleles are from pop 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R6&lt;br /&gt;
| beta_1=beta_m=beta_2 in R&lt;br /&gt;
| same effect regardless of ancestry&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R7&lt;br /&gt;
| beta_1=beta_m=beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
'''beta_1''' and '''beta_2''' are the effect of the assumed effect-allele in population 1 and 2 respectively. '''beta_m''' is the recessive effect of being recessive for an allele with one copy from population 1 and one copy from population 2. '''delta_1''' and '''delta_2''' are the effect of the assumed non-effect-allele in population 1 and 2 respectively.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=907</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=907"/>
		<updated>2019-03-03T19:15:14Z</updated>

		<summary type="html">&lt;p&gt;Emil: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program asaMap, a tool for doing ancestry specific assocaition mapping for large scales genetic studies. It is based on called genotypes in the binary plink format (.bed). The program is written in C++.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where '''.Q and .P files''' respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a '''out.log''' logfile and a '''out.res''' with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A '''.res''' file with the likelihoods of each model and the estimated coefficients in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript '''getPvalues.R''' is provided that makes it easy to obtain P-values from the '''.res''' file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix '''.Pvalues''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
asaMap implements a range of linear models, making it possible to test specific hypotheses.&lt;br /&gt;
For the additive model there are 6 different models:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| M0&lt;br /&gt;
| (beta_1, beta_2, delta_1) in R^3&lt;br /&gt;
| effect of non-assumed effect allele&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M1&lt;br /&gt;
| (beta_1, beta_2) in R^2&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| M2&lt;br /&gt;
| beta_1=0, beta_2 in R&lt;br /&gt;
| no effect in population 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M3&lt;br /&gt;
| beta_1 in R, beta_2=0&lt;br /&gt;
| no effect in population 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M4&lt;br /&gt;
| beta_1=beta_2 in R&lt;br /&gt;
| same effect in both populations&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M5&lt;br /&gt;
| beta_1=beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
For the recessive model there are 8 different models:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| R0&lt;br /&gt;
| (beta_1, beta_m, beta_2, delta_1, delta_2) in R^5&lt;br /&gt;
| recessive effect of non-assumed effect alleles&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R1&lt;br /&gt;
| (beta_1, beta_m, beta_2) in R^3&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 3&lt;br /&gt;
|-&lt;br /&gt;
| R2&lt;br /&gt;
| beta_1 in R, beta_m=beta_2 in R&lt;br /&gt;
| same effect when one or both variant alleles are from pop 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R3&lt;br /&gt;
| beta_1=beta_m in R, beta_2 in R&lt;br /&gt;
| same effect when one or both variant alleles are from pop 1&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R4&lt;br /&gt;
| beta_1 in R, beta_m=beta_2=0&lt;br /&gt;
| only an effect when both variant alleles are from pop 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R5&lt;br /&gt;
| beta_1=beta_m=0, beta_2 in R&lt;br /&gt;
| only an effect when both variant alleles are from pop 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R6&lt;br /&gt;
| beta_1=beta_m=beta_2 in R&lt;br /&gt;
| same effect regardless of ancestry&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R7&lt;br /&gt;
| beta_1=beta_m=beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
'''beta_1''' and '''beta_2''' are the effect of the assumed effect-allele in population 1 and 2 respectively. '''beta_m''' is the recessive effect of being recessive for an allele with one copy from population 1 and one copy from population 2. '''delta_1''' and '''delta_2''' are the effect of the assumed non-effect-allele in population 1 and 2 respectively.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=906</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=906"/>
		<updated>2019-03-03T19:12:03Z</updated>

		<summary type="html">&lt;p&gt;Emil: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where '''.Q and .P files''' respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a '''out.log''' logfile and a '''out.res''' with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A '''.res''' file with the likelihoods of each model and the estimated coefficients in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript '''getPvalues.R''' is provided that makes it easy to obtain P-values from the '''.res''' file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix '''.Pvalues''':&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
asaMap implements a range of linear models, making it possible to test specific hypotheses.&lt;br /&gt;
For the additive model there are 6 different models:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| M0&lt;br /&gt;
| (beta_1, beta_2, delta_1) in R^3&lt;br /&gt;
| effect of non-assumed effect allele&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M1&lt;br /&gt;
| (beta_1, beta_2) in R^2&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| M2&lt;br /&gt;
| beta_1=0, beta_2 in R&lt;br /&gt;
| no effect in population 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M3&lt;br /&gt;
| beta_1 in R, beta_2=0&lt;br /&gt;
| no effect in population 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M4&lt;br /&gt;
| beta_1=beta_2 in R&lt;br /&gt;
| same effect in both populations&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M5&lt;br /&gt;
| beta_1=beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
For the recessive model there are 8 different models:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| R0&lt;br /&gt;
| (beta_1, beta_m, beta_2, delta_1, delta_2) in R^5&lt;br /&gt;
| recessive effect of non-assumed effect alleles&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R1&lt;br /&gt;
| (beta_1, beta_m, beta_2) in R^3&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 3&lt;br /&gt;
|-&lt;br /&gt;
| R2&lt;br /&gt;
| beta_1 in R, beta_m=beta_2 in R&lt;br /&gt;
| same effect when one or both variant alleles are from pop 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R3&lt;br /&gt;
| beta_1=beta_m in R, beta_2 in R&lt;br /&gt;
| same effect when one or both variant alleles are from pop 1&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R4&lt;br /&gt;
| beta_1 in R, beta_m=beta_2=0&lt;br /&gt;
| only an effect when both variant alleles are from pop 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R5&lt;br /&gt;
| beta_1=beta_m=0, beta_2 in R&lt;br /&gt;
| only an effect when both variant alleles are from pop 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R6&lt;br /&gt;
| beta_1=beta_m=beta_2 in R&lt;br /&gt;
| same effect regardless of ancestry&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R7&lt;br /&gt;
| beta_1=beta_m=beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
'''beta_1''' and '''beta_2''' are the effect of the assumed effect-allele in population 1 and 2 respectively. '''beta_m''' is the recessive effect of being recessive for an allele with one copy from population 1 and one copy from population 2. '''delta_1''' and '''delta_2''' are the effect of the assumed non-effect-allele in population 1 and 2 respectively.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=905</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=905"/>
		<updated>2019-03-03T19:10:11Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Models */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript &amp;quot;getPvalues.R&amp;quot; is provided that makes it easy to obtain P-values from the .res file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix .Pvalues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
asaMap implements a range of linear models, making it possible to test specific hypotheses.&lt;br /&gt;
For the additive model there are 6 different models:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| M0&lt;br /&gt;
| (beta_1, beta_2, delta_1) in R^3&lt;br /&gt;
| effect of non-assumed effect allele&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M1&lt;br /&gt;
| (beta_1, beta_2) in R^2&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| M2&lt;br /&gt;
| beta_1=0, beta_2 in R&lt;br /&gt;
| no effect in population 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M3&lt;br /&gt;
| beta_1 in R, beta_2=0&lt;br /&gt;
| no effect in population 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M4&lt;br /&gt;
| beta_1=beta_2 in R&lt;br /&gt;
| same effect in both populations&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M5&lt;br /&gt;
| beta_1=beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
For the recessive model there are 8 different models:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| R0&lt;br /&gt;
| (beta_1, beta_m, beta_2, delta_1, delta_2) in R^5&lt;br /&gt;
| recessive effect of non-assumed effect alleles&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R1&lt;br /&gt;
| (beta_1, beta_m, beta_2) in R^3&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 3&lt;br /&gt;
|-&lt;br /&gt;
| R2&lt;br /&gt;
| beta_1 in R, beta_m=beta_2 in R&lt;br /&gt;
| same effect when one or both variant alleles are from pop 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R3&lt;br /&gt;
| beta_1=beta_m in R, beta_2 in R&lt;br /&gt;
| same effect when one or both variant alleles are from pop 1&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R4&lt;br /&gt;
| beta_1 in R, beta_m=beta_2=0&lt;br /&gt;
| only an effect when both variant alleles are from pop 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R5&lt;br /&gt;
| beta_1=beta_m=0, beta_2 in R&lt;br /&gt;
| only an effect when both variant alleles are from pop 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R6&lt;br /&gt;
| beta_1=beta_m=beta_2 in R&lt;br /&gt;
| same effect regardless of ancestry&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R7&lt;br /&gt;
| beta_1=beta_m=beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
beta_1 and beta_2 are the effect of the assumed effect-allele in population 1 and 2 respectively. beta_m is the recessive effect of being recessive for an allele with one copy from population 1 and one copy from population 2. delta_1 and delta_2 are the effect of the assumed non-effect-allele in population 1 and 2 respectively.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=904</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=904"/>
		<updated>2019-03-03T19:09:43Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Models */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript &amp;quot;getPvalues.R&amp;quot; is provided that makes it easy to obtain P-values from the .res file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix .Pvalues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
asaMap implements a range of linear models, making it possible to test specific hypotheses.&lt;br /&gt;
For the additive model there are 6 different models:&lt;br /&gt;
&lt;br /&gt;
The output from the association analysis is a list of files called '''prefix.lrt'''. These are tab separated plain text files, with nine columns. &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| M0&lt;br /&gt;
| (beta_1, beta_2, delta_1) in R^3&lt;br /&gt;
| effect of non-assumed effect allele&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M1&lt;br /&gt;
| (beta_1, beta_2) in R^2&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| M2&lt;br /&gt;
| beta_1=0, beta_2 in R&lt;br /&gt;
| no effect in population 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M3&lt;br /&gt;
| beta_1 in R, beta_2=0&lt;br /&gt;
| no effect in population 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M4&lt;br /&gt;
| beta_1=beta_2 in R&lt;br /&gt;
| same effect in both populations&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M5&lt;br /&gt;
| beta_1=beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
For the recessive model there are 8 different models:&lt;br /&gt;
&lt;br /&gt;
The output from the association analysis is a list of files called '''prefix.lrt'''. These are tab separated plain text files, with nine columns. &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| R0&lt;br /&gt;
| (beta_1, beta_m, beta_2, delta_1, delta_2) in R^5&lt;br /&gt;
| recessive effect of non-assumed effect alleles&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R1&lt;br /&gt;
| (beta_1, beta_m, beta_2) in R^3&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 3&lt;br /&gt;
|-&lt;br /&gt;
| R2&lt;br /&gt;
| beta_1 in R, beta_m=beta_2 in R&lt;br /&gt;
| same effect when one or both variant alleles are from pop 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R3&lt;br /&gt;
| beta_1=beta_m in R, beta_2 in R&lt;br /&gt;
| same effect when one or both variant alleles are from pop 1&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| R4&lt;br /&gt;
| beta_1 in R, beta_m=beta_2=0&lt;br /&gt;
| only an effect when both variant alleles are from pop 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R5&lt;br /&gt;
| beta_1=beta_m=0, beta_2 in R&lt;br /&gt;
| only an effect when both variant alleles are from pop 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R6&lt;br /&gt;
| beta_1=beta_m=beta_2 in R&lt;br /&gt;
| same effect regardless of ancestry&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| R7&lt;br /&gt;
| beta_1=beta_m=beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
beta_1 and beta_2 are the effect of the assumed effect-allele in population 1 and 2 respectively. beta_m is the recessive effect of being recessive for an allele with one copy from population 1 and one copy from population 2. delta_1 and delta_2 are the effect of the assumed non-effect-allele in population 1 and 2 respectively.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=903</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=903"/>
		<updated>2019-03-03T18:58:49Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Models */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript &amp;quot;getPvalues.R&amp;quot; is provided that makes it easy to obtain P-values from the .res file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix .Pvalues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
asaMap implements a range of linear models, making it possible to test specific hypotheses.&lt;br /&gt;
For the additive model there are 6 different models:&lt;br /&gt;
&lt;br /&gt;
The output from the association analysis is a list of files called '''prefix.lrt'''. These are tab separated plain text files, with nine columns. &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| M0&lt;br /&gt;
| (\beta_1,\beta_2,\delta_1)\in R^3&lt;br /&gt;
| effect of non-assumed effect allele&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M1&lt;br /&gt;
| (\beta_1,\beta_2)\in R^2&lt;br /&gt;
| population specific effects&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
| M2&lt;br /&gt;
| \beta_1=0,\beta_2\in R&lt;br /&gt;
| no effect in population 1&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M3&lt;br /&gt;
| \beta_1\in R, \beta_2=0&lt;br /&gt;
| no effect in population 2&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M4&lt;br /&gt;
| \beta_1=\beta_2\in R&lt;br /&gt;
| same effect in both populations&lt;br /&gt;
| 1&lt;br /&gt;
|-&lt;br /&gt;
| M5&lt;br /&gt;
| \beta_1=\beta_2=0&lt;br /&gt;
| no effect in any population&lt;br /&gt;
| 0&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
| model | parameters | notes | #effect Parameters |&lt;br /&gt;
| --- | --- | --- | --- |&lt;br /&gt;
| M0 | (\beta_1,\beta_2,\delta_1)\in R^3 | effect of non-assumed effect allele | 1 |&lt;br /&gt;
| M1 | (\beta_1,\beta_2)\in R^2  | population specific effects | 2 |&lt;br /&gt;
| M2 | \beta_1=0,\beta_2\in R | no effect in population  | 1 |&lt;br /&gt;
| M3 | \beta_1\in R, \beta_2=0 | no effect in population 2 | 1 |&lt;br /&gt;
| M4 | \beta_1=\beta_2\in R | same effect in both populations | 1 |&lt;br /&gt;
| M5 | \beta_1=\beta_2=0 | no effect in any population | 0 |&lt;br /&gt;
&lt;br /&gt;
| model | parameters | notes | #effect Parameters |&lt;br /&gt;
| --- | --- | --- | --- |&lt;br /&gt;
| R0 | (\beta_1,\beta_m,\beta_2,\delta_1,\delta_2)\in R^5 | recessive effect of non-assumed effect alleles | 2 |&lt;br /&gt;
| R1 | (\beta_1,\beta_m,\beta_2)\in R^3 | population specific effects | 3 |&lt;br /&gt;
| R2 | \beta_1\in R,\beta_m=\beta_2\in R | same effect when one or both variant alleles are from pop 2 | 2 |&lt;br /&gt;
| R3 | \beta_1=\beta_m\in R,\beta_2\in R | same effect when one or both variant alleles are from pop 1 | 2 |&lt;br /&gt;
| R4 | \beta_1\in R,\beta_m=\beta_2=0 | only an effect when both variant alleles are from pop 1 | 1 |&lt;br /&gt;
| R5 | \beta_1=\beta_m=0, \beta_2\in R | only an effect when both variant alleles are from pop 2 | 1 |&lt;br /&gt;
| R6 | \beta_1=\beta_m=\beta_2\in R | same effect regardless of ancestry | 1 |&lt;br /&gt;
| R7 | \beta_1=\beta_m=\beta_2=0 | no effect in any population | 0 |&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=902</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=902"/>
		<updated>2019-03-03T18:56:05Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Models */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript &amp;quot;getPvalues.R&amp;quot; is provided that makes it easy to obtain P-values from the .res file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix .Pvalues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
asaMap implements a range of linear models, making it possible to test specific hypotheses.&lt;br /&gt;
For the additive model there are 6 different models:&lt;br /&gt;
&lt;br /&gt;
The output from the association analysis is a list of files called '''prefix.lrt'''. These are tab separated plain text files, with nine columns. &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
| Chromosome &lt;br /&gt;
| Chromosome &lt;br /&gt;
| Chromosome &lt;br /&gt;
| Chromosome &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
| model | parameters | notes | #effect Parameters |&lt;br /&gt;
| --- | --- | --- | --- |&lt;br /&gt;
| M0 | (\beta_1,\beta_2,\delta_1)\in R^3 | effect of non-assumed effect allele | 1 |&lt;br /&gt;
| M1 | (\beta_1,\beta_2)\in R^2  | population specific effects | 2 |&lt;br /&gt;
| M2 | \beta_1=0,\beta_2\in R | no effect in population  | 1 |&lt;br /&gt;
| M3 | \beta_1\in R, \beta_2=0 | no effect in population 2 | 1 |&lt;br /&gt;
| M4 | \beta_1=\beta_2\in R | same effect in both populations | 1 |&lt;br /&gt;
| M5 | \beta_1=\beta_2=0 | no effect in any population | 0 |&lt;br /&gt;
&lt;br /&gt;
| model | parameters | notes | #effect Parameters |&lt;br /&gt;
| --- | --- | --- | --- |&lt;br /&gt;
| R0 | (\beta_1,\beta_m,\beta_2,\delta_1,\delta_2)\in R^5 | recessive effect of non-assumed effect alleles | 2 |&lt;br /&gt;
| R1 | (\beta_1,\beta_m,\beta_2)\in R^3 | population specific effects | 3 |&lt;br /&gt;
| R2 | \beta_1\in R,\beta_m=\beta_2\in R | same effect when one or both variant alleles are from pop 2 | 2 |&lt;br /&gt;
| R3 | \beta_1=\beta_m\in R,\beta_2\in R | same effect when one or both variant alleles are from pop 1 | 2 |&lt;br /&gt;
| R4 | \beta_1\in R,\beta_m=\beta_2=0 | only an effect when both variant alleles are from pop 1 | 1 |&lt;br /&gt;
| R5 | \beta_1=\beta_m=0, \beta_2\in R | only an effect when both variant alleles are from pop 2 | 1 |&lt;br /&gt;
| R6 | \beta_1=\beta_m=\beta_2\in R | same effect regardless of ancestry | 1 |&lt;br /&gt;
| R7 | \beta_1=\beta_m=\beta_2=0 | no effect in any population | 0 |&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=901</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=901"/>
		<updated>2019-03-03T18:55:34Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Models */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript &amp;quot;getPvalues.R&amp;quot; is provided that makes it easy to obtain P-values from the .res file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix .Pvalues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
asaMap implements a range of linear models, making it possible to test specific hypotheses.&lt;br /&gt;
For the additive model there are 6 different models:&lt;br /&gt;
&lt;br /&gt;
The output from the association analysis is a list of files called '''prefix.lrt'''. These are tab separated plain text files, with nine columns. &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
! | | | |&lt;br /&gt;
| Chromosome | Chromosome | Chromosome | Chromosome | Chromosome&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
| model | parameters | notes | #effect Parameters |&lt;br /&gt;
| --- | --- | --- | --- |&lt;br /&gt;
| M0 | (\beta_1,\beta_2,\delta_1)\in R^3 | effect of non-assumed effect allele | 1 |&lt;br /&gt;
| M1 | (\beta_1,\beta_2)\in R^2  | population specific effects | 2 |&lt;br /&gt;
| M2 | \beta_1=0,\beta_2\in R | no effect in population  | 1 |&lt;br /&gt;
| M3 | \beta_1\in R, \beta_2=0 | no effect in population 2 | 1 |&lt;br /&gt;
| M4 | \beta_1=\beta_2\in R | same effect in both populations | 1 |&lt;br /&gt;
| M5 | \beta_1=\beta_2=0 | no effect in any population | 0 |&lt;br /&gt;
&lt;br /&gt;
| model | parameters | notes | #effect Parameters |&lt;br /&gt;
| --- | --- | --- | --- |&lt;br /&gt;
| R0 | (\beta_1,\beta_m,\beta_2,\delta_1,\delta_2)\in R^5 | recessive effect of non-assumed effect alleles | 2 |&lt;br /&gt;
| R1 | (\beta_1,\beta_m,\beta_2)\in R^3 | population specific effects | 3 |&lt;br /&gt;
| R2 | \beta_1\in R,\beta_m=\beta_2\in R | same effect when one or both variant alleles are from pop 2 | 2 |&lt;br /&gt;
| R3 | \beta_1=\beta_m\in R,\beta_2\in R | same effect when one or both variant alleles are from pop 1 | 2 |&lt;br /&gt;
| R4 | \beta_1\in R,\beta_m=\beta_2=0 | only an effect when both variant alleles are from pop 1 | 1 |&lt;br /&gt;
| R5 | \beta_1=\beta_m=0, \beta_2\in R | only an effect when both variant alleles are from pop 2 | 1 |&lt;br /&gt;
| R6 | \beta_1=\beta_m=\beta_2\in R | same effect regardless of ancestry | 1 |&lt;br /&gt;
| R7 | \beta_1=\beta_m=\beta_2=0 | no effect in any population | 0 |&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=900</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=900"/>
		<updated>2019-03-03T18:54:44Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Models */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript &amp;quot;getPvalues.R&amp;quot; is provided that makes it easy to obtain P-values from the .res file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix .Pvalues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
asaMap implements a range of linear models, making it possible to test specific hypotheses.&lt;br /&gt;
For the additive model there are 6 different models:&lt;br /&gt;
&lt;br /&gt;
The output from the association analysis is a list of files called '''prefix.lrt'''. These are tab separated plain text files, with nine columns. &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
Chromosome | Chromosome | Chromosome | Chromosome | Chromosome&lt;br /&gt;
|  &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
| model | parameters | notes | #effect Parameters |&lt;br /&gt;
| --- | --- | --- | --- |&lt;br /&gt;
| M0 | (\beta_1,\beta_2,\delta_1)\in R^3 | effect of non-assumed effect allele | 1 |&lt;br /&gt;
| M1 | (\beta_1,\beta_2)\in R^2  | population specific effects | 2 |&lt;br /&gt;
| M2 | \beta_1=0,\beta_2\in R | no effect in population  | 1 |&lt;br /&gt;
| M3 | \beta_1\in R, \beta_2=0 | no effect in population 2 | 1 |&lt;br /&gt;
| M4 | \beta_1=\beta_2\in R | same effect in both populations | 1 |&lt;br /&gt;
| M5 | \beta_1=\beta_2=0 | no effect in any population | 0 |&lt;br /&gt;
&lt;br /&gt;
| model | parameters | notes | #effect Parameters |&lt;br /&gt;
| --- | --- | --- | --- |&lt;br /&gt;
| R0 | (\beta_1,\beta_m,\beta_2,\delta_1,\delta_2)\in R^5 | recessive effect of non-assumed effect alleles | 2 |&lt;br /&gt;
| R1 | (\beta_1,\beta_m,\beta_2)\in R^3 | population specific effects | 3 |&lt;br /&gt;
| R2 | \beta_1\in R,\beta_m=\beta_2\in R | same effect when one or both variant alleles are from pop 2 | 2 |&lt;br /&gt;
| R3 | \beta_1=\beta_m\in R,\beta_2\in R | same effect when one or both variant alleles are from pop 1 | 2 |&lt;br /&gt;
| R4 | \beta_1\in R,\beta_m=\beta_2=0 | only an effect when both variant alleles are from pop 1 | 1 |&lt;br /&gt;
| R5 | \beta_1=\beta_m=0, \beta_2\in R | only an effect when both variant alleles are from pop 2 | 1 |&lt;br /&gt;
| R6 | \beta_1=\beta_m=\beta_2\in R | same effect regardless of ancestry | 1 |&lt;br /&gt;
| R7 | \beta_1=\beta_m=\beta_2=0 | no effect in any population | 0 |&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=899</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=899"/>
		<updated>2019-03-03T18:54:33Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Models */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript &amp;quot;getPvalues.R&amp;quot; is provided that makes it easy to obtain P-values from the .res file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix .Pvalues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
asaMap implements a range of linear models, making it possible to test specific hypotheses.&lt;br /&gt;
For the additive model there are 6 different models:&lt;br /&gt;
&lt;br /&gt;
The output from the association analysis is a list of files called '''prefix.lrt'''. These are tab separated plain text files, with nine columns. &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
! Chromosome | Chromosome | Chromosome | Chromosome | Chromosome&lt;br /&gt;
|  &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
| model | parameters | notes | #effect Parameters |&lt;br /&gt;
| --- | --- | --- | --- |&lt;br /&gt;
| M0 | (\beta_1,\beta_2,\delta_1)\in R^3 | effect of non-assumed effect allele | 1 |&lt;br /&gt;
| M1 | (\beta_1,\beta_2)\in R^2  | population specific effects | 2 |&lt;br /&gt;
| M2 | \beta_1=0,\beta_2\in R | no effect in population  | 1 |&lt;br /&gt;
| M3 | \beta_1\in R, \beta_2=0 | no effect in population 2 | 1 |&lt;br /&gt;
| M4 | \beta_1=\beta_2\in R | same effect in both populations | 1 |&lt;br /&gt;
| M5 | \beta_1=\beta_2=0 | no effect in any population | 0 |&lt;br /&gt;
&lt;br /&gt;
| model | parameters | notes | #effect Parameters |&lt;br /&gt;
| --- | --- | --- | --- |&lt;br /&gt;
| R0 | (\beta_1,\beta_m,\beta_2,\delta_1,\delta_2)\in R^5 | recessive effect of non-assumed effect alleles | 2 |&lt;br /&gt;
| R1 | (\beta_1,\beta_m,\beta_2)\in R^3 | population specific effects | 3 |&lt;br /&gt;
| R2 | \beta_1\in R,\beta_m=\beta_2\in R | same effect when one or both variant alleles are from pop 2 | 2 |&lt;br /&gt;
| R3 | \beta_1=\beta_m\in R,\beta_2\in R | same effect when one or both variant alleles are from pop 1 | 2 |&lt;br /&gt;
| R4 | \beta_1\in R,\beta_m=\beta_2=0 | only an effect when both variant alleles are from pop 1 | 1 |&lt;br /&gt;
| R5 | \beta_1=\beta_m=0, \beta_2\in R | only an effect when both variant alleles are from pop 2 | 1 |&lt;br /&gt;
| R6 | \beta_1=\beta_m=\beta_2\in R | same effect regardless of ancestry | 1 |&lt;br /&gt;
| R7 | \beta_1=\beta_m=\beta_2=0 | no effect in any population | 0 |&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=898</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=898"/>
		<updated>2019-03-03T18:54:11Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Models */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript &amp;quot;getPvalues.R&amp;quot; is provided that makes it easy to obtain P-values from the .res file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix .Pvalues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
asaMap implements a range of linear models, making it possible to test specific hypotheses.&lt;br /&gt;
For the additive model there are 6 different models:&lt;br /&gt;
&lt;br /&gt;
The output from the association analysis is a list of files called '''prefix.lrt'''. These are tab separated plain text files, with nine columns. &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;row&amp;quot;| Chromosome | Chromosome | Chromosome | Chromosome&lt;br /&gt;
|  &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
| model | parameters | notes | #effect Parameters |&lt;br /&gt;
| --- | --- | --- | --- |&lt;br /&gt;
| M0 | (\beta_1,\beta_2,\delta_1)\in R^3 | effect of non-assumed effect allele | 1 |&lt;br /&gt;
| M1 | (\beta_1,\beta_2)\in R^2  | population specific effects | 2 |&lt;br /&gt;
| M2 | \beta_1=0,\beta_2\in R | no effect in population  | 1 |&lt;br /&gt;
| M3 | \beta_1\in R, \beta_2=0 | no effect in population 2 | 1 |&lt;br /&gt;
| M4 | \beta_1=\beta_2\in R | same effect in both populations | 1 |&lt;br /&gt;
| M5 | \beta_1=\beta_2=0 | no effect in any population | 0 |&lt;br /&gt;
&lt;br /&gt;
| model | parameters | notes | #effect Parameters |&lt;br /&gt;
| --- | --- | --- | --- |&lt;br /&gt;
| R0 | (\beta_1,\beta_m,\beta_2,\delta_1,\delta_2)\in R^5 | recessive effect of non-assumed effect alleles | 2 |&lt;br /&gt;
| R1 | (\beta_1,\beta_m,\beta_2)\in R^3 | population specific effects | 3 |&lt;br /&gt;
| R2 | \beta_1\in R,\beta_m=\beta_2\in R | same effect when one or both variant alleles are from pop 2 | 2 |&lt;br /&gt;
| R3 | \beta_1=\beta_m\in R,\beta_2\in R | same effect when one or both variant alleles are from pop 1 | 2 |&lt;br /&gt;
| R4 | \beta_1\in R,\beta_m=\beta_2=0 | only an effect when both variant alleles are from pop 1 | 1 |&lt;br /&gt;
| R5 | \beta_1=\beta_m=0, \beta_2\in R | only an effect when both variant alleles are from pop 2 | 1 |&lt;br /&gt;
| R6 | \beta_1=\beta_m=\beta_2\in R | same effect regardless of ancestry | 1 |&lt;br /&gt;
| R7 | \beta_1=\beta_m=\beta_2=0 | no effect in any population | 0 |&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=897</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=897"/>
		<updated>2019-03-03T18:53:34Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Models */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript &amp;quot;getPvalues.R&amp;quot; is provided that makes it easy to obtain P-values from the .res file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix .Pvalues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
asaMap implements a range of linear models, making it possible to test specific hypotheses.&lt;br /&gt;
For the additive model there are 6 different models:&lt;br /&gt;
&lt;br /&gt;
The output from the association analysis is a list of files called '''prefix.lrt'''. These are tab separated plain text files, with nine columns. &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Model&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Parameters&lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Notes &lt;br /&gt;
! scope=&amp;quot;col&amp;quot;| Effect Parameters&lt;br /&gt;
|&lt;br /&gt;
! scope=&amp;quot;row&amp;quot;| Chromosome | Chromosome | Chromosome | Chromosome&lt;br /&gt;
|  &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
| model | parameters | notes | #effect Parameters |&lt;br /&gt;
| --- | --- | --- | --- |&lt;br /&gt;
| M0 | (\beta_1,\beta_2,\delta_1)\in R^3 | effect of non-assumed effect allele | 1 |&lt;br /&gt;
| M1 | (\beta_1,\beta_2)\in R^2  | population specific effects | 2 |&lt;br /&gt;
| M2 | \beta_1=0,\beta_2\in R | no effect in population  | 1 |&lt;br /&gt;
| M3 | \beta_1\in R, \beta_2=0 | no effect in population 2 | 1 |&lt;br /&gt;
| M4 | \beta_1=\beta_2\in R | same effect in both populations | 1 |&lt;br /&gt;
| M5 | \beta_1=\beta_2=0 | no effect in any population | 0 |&lt;br /&gt;
&lt;br /&gt;
| model | parameters | notes | #effect Parameters |&lt;br /&gt;
| --- | --- | --- | --- |&lt;br /&gt;
| R0 | (\beta_1,\beta_m,\beta_2,\delta_1,\delta_2)\in R^5 | recessive effect of non-assumed effect alleles | 2 |&lt;br /&gt;
| R1 | (\beta_1,\beta_m,\beta_2)\in R^3 | population specific effects | 3 |&lt;br /&gt;
| R2 | \beta_1\in R,\beta_m=\beta_2\in R | same effect when one or both variant alleles are from pop 2 | 2 |&lt;br /&gt;
| R3 | \beta_1=\beta_m\in R,\beta_2\in R | same effect when one or both variant alleles are from pop 1 | 2 |&lt;br /&gt;
| R4 | \beta_1\in R,\beta_m=\beta_2=0 | only an effect when both variant alleles are from pop 1 | 1 |&lt;br /&gt;
| R5 | \beta_1=\beta_m=0, \beta_2\in R | only an effect when both variant alleles are from pop 2 | 1 |&lt;br /&gt;
| R6 | \beta_1=\beta_m=\beta_2\in R | same effect regardless of ancestry | 1 |&lt;br /&gt;
| R7 | \beta_1=\beta_m=\beta_2=0 | no effect in any population | 0 |&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=896</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=896"/>
		<updated>2019-03-03T17:38:03Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Outputs */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript &amp;quot;getPvalues.R&amp;quot; is provided that makes it easy to obtain P-values from the .res file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix .Pvalues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=895</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=895"/>
		<updated>2019-03-03T17:37:22Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Outputs */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript &amp;quot;getPvalues.R&amp;quot; is provided that makes it easy to obtain P-values from the .res file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Which produces a file with the suffix .Pvalues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5&lt;br /&gt;
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713&lt;br /&gt;
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114&lt;br /&gt;
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=894</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=894"/>
		<updated>2019-03-03T17:28:23Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Outputs */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)&lt;br /&gt;
1       980552    2737  0.935997  0.937511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093&lt;br /&gt;
1       1068883   2717  0.999990  0.809715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084&lt;br /&gt;
1       1124663   2737  0.886692  0.388175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838&lt;br /&gt;
1       1171417   2736  0.999990  0.445701  nan          nan          nan          3239.320653  3239.524956  3239.641824  nan        nan        nan        -0.033530  -0.015845&lt;br /&gt;
1       1366830   2735  0.999990  0.374078  nan          nan          nan          3241.698019  3241.675158  3241.696793  nan        nan        nan        0.002135   0.007140&lt;br /&gt;
1       1450947   2738  0.659605  0.906222  3240.054094  3243.544587  3243.770254  3243.708934  3243.777517  3243.800524  -0.026101  0.044039   0.016671   -0.014242  -0.005544&lt;br /&gt;
1       1995211   2737  0.856699  0.982350  3235.516404  3242.070487  3242.928680  3242.571223  3242.756177  3242.941750  0.074805   -0.142018  -0.020892  0.039110   0.021462&lt;br /&gt;
1       2004098   2738  0.443711  0.815725  3241.253250  3242.382033  3243.741660  3242.955646  3243.532476  3243.800524  0.058767   -0.055806  -0.016451  0.041228   0.016158&lt;br /&gt;
1       2040898   2738  0.676808  0.610463  3242.664546  3243.371593  3243.574375  3243.801527  3243.787426  3243.800524  -0.024109  0.081087   0.047793   -0.001765  0.004108&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For the recessive model it looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)&lt;br /&gt;
1       980552    2737  0.935997  0.937511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338&lt;br /&gt;
1       1068883   2717  0.999990  0.809715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317&lt;br /&gt;
1       1124663   2737  0.886692  0.388175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227&lt;br /&gt;
1       1171417   2736  0.999990  0.445701  nan          nan          nan          nan          3238.750760  3239.274351  3238.288964  3239.641824  nan        nan        nan        nan        nan        nan        nan        -0.210643  -0.267111  -0.144645&lt;br /&gt;
1       1366830   2735  0.999990  0.374078  nan          nan          nan          nan          3241.645871  3241.199416  3241.338290  3241.696793  nan        nan        nan        nan        nan        nan        nan        -0.045970  -0.273382  -0.070305&lt;br /&gt;
1       1450947   2738  0.659605  0.906222  3240.883715  3242.545834  3243.515375  3243.627600  3243.713843  3243.659336  3243.802228  3243.800524  0.047735   0.291966   -0.216232  0.044591   -0.069851  -0.016796  0.170637   0.032325   0.146528   0.002457&lt;br /&gt;
1       1995211   2737  0.856699  0.982350  3234.731598  3241.839632  3241.919398  3241.997812  3242.204980  3242.750902  3242.000261  3242.941750  0.072845   0.113462   0.601882   0.114683   0.366807   0.175891   0.261334   0.209120   0.516155   0.181162&lt;br /&gt;
1       2004098   2738  0.443711  0.815725  3238.336234  3238.488951  3241.228881  3243.661958  3242.407555  3243.783839  3243.676693  3243.800524  0.133629   0.236260   -0.298383  0.122912   -0.100454  0.025324   -0.013486  0.097341   0.030391   0.019042&lt;br /&gt;
1       2040898   2738  0.676808  0.610463  3241.442146  3242.449918  3242.502684  3243.202847  3243.802047  3243.233496  3243.496321  3243.800524  -0.065485  0.095602   0.207722   -0.057787  0.165752   0.014559   0.205258   0.003543   0.221293   0.037588&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
P-values can be generated doing a likelihood ratio test, between the 2 desired models.&lt;br /&gt;
An Rscript &amp;quot;getPvalues.R&amp;quot; is provided that makes it easy to obtain P-values from the .res file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/getPvalues.R out.res&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=893</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=893"/>
		<updated>2019-03-03T10:16:56Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Running asaMap */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=892</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=892"/>
		<updated>2019-03-03T10:14:36Z</updated>

		<summary type="html">&lt;p&gt;Emil: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Running asaMap=&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running asaMap without any input:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./asaMap&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -p &amp;lt;filename&amp;gt;       &lt;br /&gt;
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.&lt;br /&gt;
; -o &amp;lt;filename&amp;gt;       &lt;br /&gt;
Output filename - a .res file will be written with the results and a .log log file.&lt;br /&gt;
; -y &amp;lt;filename&amp;gt;       &lt;br /&gt;
Phenotypes file, has to be plain text file - with as many rows as .fam file.&lt;br /&gt;
; -Q &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.&lt;br /&gt;
; -a &amp;lt;filename&amp;gt; (either -a or -Q)      &lt;br /&gt;
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.&lt;br /&gt;
; -f &amp;lt;filename&amp;gt;       &lt;br /&gt;
Allele frequencies, .P file from ADMIXTURE.&lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
&lt;br /&gt;
; -c &amp;lt;filename&amp;gt;       &lt;br /&gt;
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!&lt;br /&gt;
; -m &amp;lt;INT&amp;gt;        &lt;br /&gt;
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).&lt;br /&gt;
; -l &amp;lt;INT&amp;gt;        &lt;br /&gt;
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)&lt;br /&gt;
; -b &amp;lt;filename&amp;gt;       &lt;br /&gt;
Text file containing a starting guess of the estimated coefficients.&lt;br /&gt;
; -i &amp;lt;INT&amp;gt;       &lt;br /&gt;
The maximum number of iterations to run for the EM algorithm (default: 80).&lt;br /&gt;
; -t &amp;lt;FLOAT&amp;gt;          &lt;br /&gt;
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).&lt;br /&gt;
; -r &amp;lt;INT&amp;gt;          &lt;br /&gt;
Give seed, for generation of starting values of coefficients.&lt;br /&gt;
; -P &amp;lt;INT&amp;gt;            &lt;br /&gt;
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by &amp;quot;-o&amp;quot;.&lt;br /&gt;
; -e &amp;lt;INT&amp;gt;            &lt;br /&gt;
Estimate standard error of coefficients (0: no, 1: yes - default: 0).&lt;br /&gt;
; -w &amp;lt;INT&amp;gt;            &lt;br /&gt;
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Outputs=&lt;br /&gt;
&lt;br /&gt;
=Models=&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=891</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=891"/>
		<updated>2019-03-03T09:44:23Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Input files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-0.712027291121767&lt;br /&gt;
-0.158413122435864&lt;br /&gt;
-1.77167888612947&lt;br /&gt;
-0.800940619551485&lt;br /&gt;
0.3016297021294&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973&lt;br /&gt;
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705&lt;br /&gt;
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881&lt;br /&gt;
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436&lt;br /&gt;
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#run admixture&lt;br /&gt;
admixture plinkFile.bed 2&lt;br /&gt;
&lt;br /&gt;
#run asaMap with admix proportions&lt;br /&gt;
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This produces a out.log logfile and a out.res with results for each site (after filtering).&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=890</id>
		<title>AsaMap</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=AsaMap&amp;diff=890"/>
		<updated>2019-03-03T09:28:18Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Download */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/asaMap&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/asaMap.git;&lt;br /&gt;
cd asaMap &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems. Use curl if you are on a MAC.&lt;br /&gt;
&lt;br /&gt;
=Example=&lt;br /&gt;
 &lt;br /&gt;
This an example!!&lt;br /&gt;
&lt;br /&gt;
=Input files=&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=834</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=834"/>
		<updated>2018-04-15T10:48:53Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Quick start - 1000 genomes reference panel */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05 -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start - 1000 genomes reference panel==&lt;br /&gt;
&lt;br /&gt;
A reference panel was created using the 1000 genomes data, taking 20 individuls from the following populations: &lt;br /&gt;
Utah Residents (CEPH) with Northern and Western European Ancestry (CEU), Han Chinese (CHB), Finnish (FIN), Peruvian (PEL), Punjabi (PJL) and Yoruba (YRI), in order to cover most of the worldwide genetic diversity. &lt;br /&gt;
For the Peruvians I made sure to only select individauls with no European admixture. &lt;br /&gt;
The sites were filtered so MAF &amp;gt; 0.05 and missingness &amp;lt; 0.01 and only SNPs were included, leaving 6 676 750 sites. &lt;br /&gt;
&lt;br /&gt;
First download these files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data1000genomes.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data1000genomes.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then you can easily run your analysis on a .bam file - requires [http://popgen.dk/angsd ANGSD]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#example bam file with genotype data&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data1000genomes/refPanel_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#sites file to be used with ANGSD&lt;br /&gt;
SITES=data1000genomes/1000genomesRefPanel.sites&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data1000genomes/nInd_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#genotypes of the 1000 genomes data - for the PCA part&lt;br /&gt;
GENO=data1000genomes/1000genomesRefPanel&lt;br /&gt;
&lt;br /&gt;
#use ANGSD for generating a beagle file with the genotypes likelihoods&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out CEU.low_coverage&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes CEU.low_coverage.beagle.gz -fname $REF -Nname $NIND -out CEU.low_coverage -whichPops CEU,CHB,YRI,PEL&lt;br /&gt;
&lt;br /&gt;
#do PCA based on the calculated admixture proportions from the previous command, using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes CEU.low_coverage.beagle.gz -qopt CEU.low_coverage.qopt -out CEU.low_coverage -ref $GENO&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware if you are not a server with a lot of RAM, that the PCA part might consume a lot of RAM,&lt;br /&gt;
especially if you are including all ref sites for the PCA (changing the settings to -overlapRef 0).&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
If one wants to analyse a haploid organism, one should create a regular beagle file (with 6 columns), &lt;br /&gt;
but only fill out the two first genotype likelihood columns, for respectively specifying the genotype likelihood of allele1 and allele2.&lt;br /&gt;
&lt;br /&gt;
Example of haploid data in a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker     allele1 allele2  Ind0      Ind0      Ind0&lt;br /&gt;
1_752566      A      G      0.50      0.50      0&lt;br /&gt;
1_842013      T      G      0.99      0.01      0&lt;br /&gt;
1_891021      A      G      0.99      0.01      0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -haploid [FLAG] (Only write '-haploid' in command line)&lt;br /&gt;
Raise this flag if haploid organism being analyzed, first two cols of beagle file will be used - does not work for plink! Only write &amp;quot;-haploid&amp;quot;.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A barplot can easily be generated using R/makeBarplot.R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/makeBarplot.R NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 0' means that the covariance matrix for the reference panel genotypes is calculated using all markers in the reference genotypes (per default only overlapping markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix can also be run in parallel giving the argument -P X to xargs, where X denotes the number of cores.&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.2: Can also handle and analyse haploid data, only with beagle input files - 03-02-2018&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=833</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=833"/>
		<updated>2018-04-15T10:34:31Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Quick start - 1000 genomes reference panel */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05 -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start - 1000 genomes reference panel==&lt;br /&gt;
&lt;br /&gt;
A reference panel was created using the 1000 genomes data, taking 20 individuls from the following populations: &lt;br /&gt;
Utah Residents (CEPH) with Northern and Western European Ancestry (CEU), Han Chinese (CHB), Finnish (FIN), Peruvian (PEL), Punjabi (PJL) and Yoruba (YRI), in order to cover most of the worldwide genetic diversity. &lt;br /&gt;
For the Peruvians I made sure to only select individauls with no European admixture. &lt;br /&gt;
The sites were filtered so MAF &amp;gt; 0.05 and missingness &amp;lt; 0.01 and only SNPs were included, leaving 6 676 750 sites. &lt;br /&gt;
&lt;br /&gt;
First download these files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data1000genomes.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data1000genomes.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then you can easily run your analysis on a .bam file - requires [http://popgen.dk/angsd ANGSD]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#example bam file with genotype data&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data1000genomes/refPanel_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#sites file to be used with ANGSD&lt;br /&gt;
SITES=data1000genomes/1000genomesRefPanel.sites&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data1000genomes/nInd_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#genotypes of the 1000 genomes data - for the PCA part&lt;br /&gt;
GENO=data1000genomes/1000genomesRefPanel&lt;br /&gt;
&lt;br /&gt;
#use ANGSD for generating a beagle file with the genotypes likelihoods&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out CEU.low_coverage&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes CEU.low_coverage.beagle.gz -fname $REF -Nname $NIND -out CEU.low_coverage -whichPops CEU,CHB,YRI,PEL&lt;br /&gt;
&lt;br /&gt;
#do PCA based on the calculated admixture proportions from the previous command, using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes CEU.low_coverage.beagle.gz -qopt CEU.low_coverage.qopt -out CEU.low_coverage -ref $GENO&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware if you are not a server with a lot of RAM, that the PCA part might consume a lot of RAM,&lt;br /&gt;
that is if you are including all ref sites for the PCA (changing the settings to -overlapRef 0).&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
If one wants to analyse a haploid organism, one should create a regular beagle file (with 6 columns), &lt;br /&gt;
but only fill out the two first genotype likelihood columns, for respectively specifying the genotype likelihood of allele1 and allele2.&lt;br /&gt;
&lt;br /&gt;
Example of haploid data in a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker     allele1 allele2  Ind0      Ind0      Ind0&lt;br /&gt;
1_752566      A      G      0.50      0.50      0&lt;br /&gt;
1_842013      T      G      0.99      0.01      0&lt;br /&gt;
1_891021      A      G      0.99      0.01      0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -haploid [FLAG] (Only write '-haploid' in command line)&lt;br /&gt;
Raise this flag if haploid organism being analyzed, first two cols of beagle file will be used - does not work for plink! Only write &amp;quot;-haploid&amp;quot;.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A barplot can easily be generated using R/makeBarplot.R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/makeBarplot.R NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 0' means that the covariance matrix for the reference panel genotypes is calculated using all markers in the reference genotypes (per default only overlapping markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix can also be run in parallel giving the argument -P X to xargs, where X denotes the number of cores.&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.2: Can also handle and analyse haploid data, only with beagle input files - 03-02-2018&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=832</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=832"/>
		<updated>2018-04-15T09:50:20Z</updated>

		<summary type="html">&lt;p&gt;Emil: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05 -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start - 1000 genomes reference panel==&lt;br /&gt;
&lt;br /&gt;
A reference panel was created using the 1000 genomes data, taking 20 individuls from the following populations: &lt;br /&gt;
Utah Residents (CEPH) with Northern and Western European Ancestry (CEU), Han Chinese (CHB), Finnish (FIN), Peruvian (PEL), Punjabi (PJL) and Yoruba (YRI), in order to cover most of the worldwide genetic diversity. &lt;br /&gt;
For the Peruvians I made sure to only select individauls with no European admixture. &lt;br /&gt;
The sites were filtered so MAF &amp;gt; 0.05 and missingness &amp;lt; 0.01 and only SNPs were included, leaving 6 676 750 sites. &lt;br /&gt;
&lt;br /&gt;
First download these files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data1000genomes.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data1000genomes.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then you can easily run your analysis on a .bam file - requires [http://popgen.dk/angsd ANGSD]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#example bam file with genotype data&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data1000genomes/refPanel_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#sites file to be used with ANGSD&lt;br /&gt;
SITES=data1000genomes/1000genomesRefPanel.sites&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data1000genomes/nInd_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#genotypes of the 1000 genomes data - for the PCA part&lt;br /&gt;
GENO=data1000genomes/1000genomesRefPanel&lt;br /&gt;
&lt;br /&gt;
#use ANGSD for generating a beagle file with the genotypes likelihoods&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out CEU.low_coverage&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  CEU.low_coverage.beagle -fname $REF -Nname $NIND -out CEU.low_coverage -whichPops all&lt;br /&gt;
&lt;br /&gt;
#do PCA based on the calculated admixture proportions from the previous command, using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes CEU.low_coverage.beagle -qopt CEU.low_coverage.qopt -out CEU.low_coverage -ref $GENO&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware if you are not a server with a lot of RAM, that the PCA part might consume a lot of RAM,&lt;br /&gt;
that is if you are including all ref sites for the PCA (changing the settings to -overlapRef 0).&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
If one wants to analyse a haploid organism, one should create a regular beagle file (with 6 columns), &lt;br /&gt;
but only fill out the two first genotype likelihood columns, for respectively specifying the genotype likelihood of allele1 and allele2.&lt;br /&gt;
&lt;br /&gt;
Example of haploid data in a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker     allele1 allele2  Ind0      Ind0      Ind0&lt;br /&gt;
1_752566      A      G      0.50      0.50      0&lt;br /&gt;
1_842013      T      G      0.99      0.01      0&lt;br /&gt;
1_891021      A      G      0.99      0.01      0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -haploid [FLAG] (Only write '-haploid' in command line)&lt;br /&gt;
Raise this flag if haploid organism being analyzed, first two cols of beagle file will be used - does not work for plink! Only write &amp;quot;-haploid&amp;quot;.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A barplot can easily be generated using R/makeBarplot.R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/makeBarplot.R NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 0' means that the covariance matrix for the reference panel genotypes is calculated using all markers in the reference genotypes (per default only overlapping markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix can also be run in parallel giving the argument -P X to xargs, where X denotes the number of cores.&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.2: Can also handle and analyse haploid data, only with beagle input files - 03-02-2018&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=831</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=831"/>
		<updated>2018-04-15T09:45:09Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Quick start - 1000 genomes reference panel */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE  -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start - 1000 genomes reference panel==&lt;br /&gt;
&lt;br /&gt;
A reference panel was created using the 1000 genomes data, taking 20 individuls from the following populations: &lt;br /&gt;
Utah Residents (CEPH) with Northern and Western European Ancestry (CEU), Han Chinese (CHB), Finnish (FIN), Peruvian (PEL), Punjabi (PJL) and Yoruba (YRI), in order to cover most of the worldwide genetic diversity. &lt;br /&gt;
For the Peruvians I made sure to only select individauls with no European admixture. &lt;br /&gt;
The sites were filtered so MAF &amp;gt; 0.05 and missingness &amp;lt; 0.01 and only SNPs were included, leaving 6 676 750 sites. &lt;br /&gt;
&lt;br /&gt;
First download these files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data1000genomes.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data1000genomes.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then you can easily run your analysis on a .bam file - requires [http://popgen.dk/angsd ANGSD]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#example bam file with genotype data&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data1000genomes/refPanel_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#sites file to be used with ANGSD&lt;br /&gt;
SITES=data1000genomes/1000genomesRefPanel.sites&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data1000genomes/nInd_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#genotypes of the 1000 genomes data - for the PCA part&lt;br /&gt;
GENO=data1000genomes/1000genomesRefPanel&lt;br /&gt;
&lt;br /&gt;
#use ANGSD for generating a beagle file with the genotypes likelihoods&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out CEU.low_coverage&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  CEU.low_coverage.beagle -fname $REF -Nname $NIND -out CEU.low_coverage -whichPops all&lt;br /&gt;
&lt;br /&gt;
#do PCA based on the calculated admixture proportions from the previous command, using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes CEU.low_coverage.beagle -qopt CEU.low_coverage.qopt -out CEU.low_coverage -ref $GENO&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware if you are not a server with a lot of RAM, that the PCA part might consume a lot of RAM,&lt;br /&gt;
that is if you are including all ref sites for the PCA (changing the settings to -overlapRef 0).&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
If one wants to analyse a haploid organism, one should create a regular beagle file (with 6 columns), &lt;br /&gt;
but only fill out the two first genotype likelihood columns, for respectively specifying the genotype likelihood of allele1 and allele2.&lt;br /&gt;
&lt;br /&gt;
Example of haploid data in a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker     allele1 allele2  Ind0      Ind0      Ind0&lt;br /&gt;
1_752566      A      G      0.50      0.50      0&lt;br /&gt;
1_842013      T      G      0.99      0.01      0&lt;br /&gt;
1_891021      A      G      0.99      0.01      0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -haploid [FLAG] (Only write '-haploid' in command line)&lt;br /&gt;
Raise this flag if haploid organism being analyzed, first two cols of beagle file will be used - does not work for plink! Only write &amp;quot;-haploid&amp;quot;.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A barplot can easily be generated using R/makeBarplot.R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/makeBarplot.R NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix can also be run in parallel giving the argument -P X to xargs, where X denotes the number of cores.&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.2: Can also handle and analyse haploid data, only with beagle input files - 03-02-2018&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=830</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=830"/>
		<updated>2018-04-15T09:44:33Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Quick start - 1000 genomes reference panel */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE  -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start - 1000 genomes reference panel==&lt;br /&gt;
&lt;br /&gt;
A reference panel was created using the 1000 genomes data, taking 20 individuls from the following populations: &lt;br /&gt;
Utah Residents (CEPH) with Northern and Western European Ancestry (CEU), Han Chinese (CHB), Finnish (FIN), Peruvian (PEL), Punjabi (PJL) and Yoruba (YRI), in order to cover most of the worldwide genetic diversity. &lt;br /&gt;
For the Peruvians I made sure to only select individauls with no European admixture. &lt;br /&gt;
The sites were filtered so MAF &amp;gt; 0.05 and missingness &amp;lt; 0.01 and only SNPs were included, leaving 6 676 750 sites. &lt;br /&gt;
&lt;br /&gt;
First download these files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data1000genomes.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data1000genomes.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then you can easily run your analysis on a .bam file - requires [http://popgen.dk/angsd ANGSD]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#example bam file with genotype data&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data1000genomes/refPanel_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#sites file to be used with ANGSD&lt;br /&gt;
SITES=data1000genomes/1000genomesRefPanel.sites&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data1000genomes/nInd_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#genotypes of the 1000 genomes data - for the PCA part&lt;br /&gt;
GENO=data1000genomes/1000genomesRefPanel&lt;br /&gt;
&lt;br /&gt;
#use ANGSD for generating a beagle file with the genotypes likelihoods&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out CEU.low_coverage&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  CEU.low_coverage.beagle -fname $REF -Nname $NIND -out CEU.low_coverage -whichPops all&lt;br /&gt;
&lt;br /&gt;
#do PCA based on the calculated admixture proportions from the previous command, using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes CEU.low_coverage.beagle -qopt CEU.low_coverage.qopt -out CEU.low_coverage -ref $GENO&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware if you are not a server with a lot of RAM, that the PCA part might consume a lot of RAM,&lt;br /&gt;
that is if you are including all ref sites for the PCA (chang the setting to -overlapRef 0).&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
If one wants to analyse a haploid organism, one should create a regular beagle file (with 6 columns), &lt;br /&gt;
but only fill out the two first genotype likelihood columns, for respectively specifying the genotype likelihood of allele1 and allele2.&lt;br /&gt;
&lt;br /&gt;
Example of haploid data in a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker     allele1 allele2  Ind0      Ind0      Ind0&lt;br /&gt;
1_752566      A      G      0.50      0.50      0&lt;br /&gt;
1_842013      T      G      0.99      0.01      0&lt;br /&gt;
1_891021      A      G      0.99      0.01      0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -haploid [FLAG] (Only write '-haploid' in command line)&lt;br /&gt;
Raise this flag if haploid organism being analyzed, first two cols of beagle file will be used - does not work for plink! Only write &amp;quot;-haploid&amp;quot;.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A barplot can easily be generated using R/makeBarplot.R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/makeBarplot.R NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix can also be run in parallel giving the argument -P X to xargs, where X denotes the number of cores.&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.2: Can also handle and analyse haploid data, only with beagle input files - 03-02-2018&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=829</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=829"/>
		<updated>2018-04-15T09:44:14Z</updated>

		<summary type="html">&lt;p&gt;Emil: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE  -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start - 1000 genomes reference panel==&lt;br /&gt;
&lt;br /&gt;
A reference panel was created using the 1000 genomes data, taking 20 individuls from the following populations: &lt;br /&gt;
Utah Residents (CEPH) with Northern and Western European Ancestry (CEU), Han Chinese (CHB), Finnish (FIN), Peruvian (PEL), Punjabi (PJL) and Yoruba (YRI), in order to cover most of the worldwide genetic diversity. &lt;br /&gt;
For the Peruvians I made sure to only select individauls with no European admixture. &lt;br /&gt;
The sites were filtered so MAF &amp;gt; 0.05 and missingness &amp;lt; 0.01 and only SNPs were included, leaving 6 676 750 sites. &lt;br /&gt;
&lt;br /&gt;
First download these files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data1000genomes.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data1000genomes.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then you can easily run your analysis on a .bam file - requires [http://popgen.dk/angsd ANGSD]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#example bam file with genotype data&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data1000genomes/refPanel_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#sites file to be used with ANGSD&lt;br /&gt;
SITES=data1000genomes/1000genomesRefPanel.sites&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data1000genomes/nInd_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#genotypes of the 1000 genomes data - for the PCA part&lt;br /&gt;
GENO=data1000genomes/1000genomesRefPanel&lt;br /&gt;
&lt;br /&gt;
#use ANGSD for generating a beagle file with the genotypes likelihoods&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out CEU.low_coverage&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  CEU.low_coverage.beagle -fname $REF -Nname $NIND -out CEU.low_coverage -whichPops all&lt;br /&gt;
&lt;br /&gt;
#do PCA based on the calculated admixture proportions from the previous command, using the $GL genotype likelihoods&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes CEU.low_coverage.beagle -qopt CEU.low_coverage.qopt -out CEU.low_coverage -ref $GENO&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware if you are not a server with a lot of RAM, that the PCA part might consume a lot of RAM,&lt;br /&gt;
that is if you are including all ref sites for the PCA (chang the setting to -overlapRef 0).&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
If one wants to analyse a haploid organism, one should create a regular beagle file (with 6 columns), &lt;br /&gt;
but only fill out the two first genotype likelihood columns, for respectively specifying the genotype likelihood of allele1 and allele2.&lt;br /&gt;
&lt;br /&gt;
Example of haploid data in a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker     allele1 allele2  Ind0      Ind0      Ind0&lt;br /&gt;
1_752566      A      G      0.50      0.50      0&lt;br /&gt;
1_842013      T      G      0.99      0.01      0&lt;br /&gt;
1_891021      A      G      0.99      0.01      0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -haploid [FLAG] (Only write '-haploid' in command line)&lt;br /&gt;
Raise this flag if haploid organism being analyzed, first two cols of beagle file will be used - does not work for plink! Only write &amp;quot;-haploid&amp;quot;.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A barplot can easily be generated using R/makeBarplot.R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/makeBarplot.R NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix can also be run in parallel giving the argument -P X to xargs, where X denotes the number of cores.&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.2: Can also handle and analyse haploid data, only with beagle input files - 03-02-2018&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=828</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=828"/>
		<updated>2018-04-15T09:42:53Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Quick start - 1000 genomes reference panel */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE  -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start - 1000 genomes reference panel==&lt;br /&gt;
&lt;br /&gt;
A reference panel was created using the 1000 genomes data, taking 20 individuls from the following populations: &lt;br /&gt;
Utah Residents (CEPH) with Northern and Western European Ancestry (CEU), Han Chinese (CHB), Finnish (FIN), Peruvian (PEL), Punjabi (PJL) and Yoruba (YRI), in order to cover most of the worldwide genetic diversity. &lt;br /&gt;
For the Peruvians I made sure to only select individauls with no European admixture. &lt;br /&gt;
The sites were filtered so MAF &amp;gt; 0.05 and missingness &amp;lt; 0.01 and only SNPs were included, leaving 6 676 750 sites. &lt;br /&gt;
&lt;br /&gt;
First download these files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data1000genomes.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data1000genomes.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then you can easily run your analysis on a .bam file - requires [http://popgen.dk/angsd ANGSD]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#example bam file with genotype data&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data1000genomes/refPanel_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#sites file to be used with ANGSD&lt;br /&gt;
SITES=data1000genomes/1000genomesRefPanel.sites&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data1000genomes/nInd_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#genotypes of the 1000 genomes data - for the PCA part&lt;br /&gt;
GENO=data1000genomes/1000genomesRefPanel&lt;br /&gt;
&lt;br /&gt;
#use ANGSD for generating a beagle file with the genotypes likelihoods&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out CEU.low_coverage&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  CEU.low_coverage.beagle -fname $REF -Nname $NIND -out CEU.low_coverage -whichPops all&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes CEU.low_coverage.beagle -qopt CEU.low_coverage.qopt -out CEU.low_coverage -ref $GENO&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware if you are not a server with a lot of RAM, that the PCA part might consume a lot of RAM,&lt;br /&gt;
that is if you are including all ref sites for the PCA (chang the setting to -overlapRef 0).&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
If one wants to analyse a haploid organism, one should create a regular beagle file (with 6 columns), &lt;br /&gt;
but only fill out the two first genotype likelihood columns, for respectively specifying the genotype likelihood of allele1 and allele2.&lt;br /&gt;
&lt;br /&gt;
Example of haploid data in a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker     allele1 allele2  Ind0      Ind0      Ind0&lt;br /&gt;
1_752566      A      G      0.50      0.50      0&lt;br /&gt;
1_842013      T      G      0.99      0.01      0&lt;br /&gt;
1_891021      A      G      0.99      0.01      0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -haploid [FLAG] (Only write '-haploid' in command line)&lt;br /&gt;
Raise this flag if haploid organism being analyzed, first two cols of beagle file will be used - does not work for plink! Only write &amp;quot;-haploid&amp;quot;.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A barplot can easily be generated using R/makeBarplot.R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/makeBarplot.R NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix can also be run in parallel giving the argument -P X to xargs, where X denotes the number of cores.&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.2: Can also handle and analyse haploid data, only with beagle input files - 03-02-2018&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=827</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=827"/>
		<updated>2018-04-15T09:33:58Z</updated>

		<summary type="html">&lt;p&gt;Emil: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE  -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start - 1000 genomes reference panel==&lt;br /&gt;
&lt;br /&gt;
A reference panel was created using the 1000 genomes data, taking 20 individuls from the following populations: &lt;br /&gt;
Utah Residents (CEPH) with Northern and Western European Ancestry (CEU), Han Chinese (CHB), Finnish (FIN), Peruvian (PEL), Punjabi (PJL) and Yoruba (YRI), in order to cover most of the worldwide genetic diversity. &lt;br /&gt;
For the Peruvians I made sure to only select individauls with no European admixture. &lt;br /&gt;
&lt;br /&gt;
First download these files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data1000genomes.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data1000genomes.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then you can easily run your analysis on a .bam file - requires [http://popgen.dk/angsd ANGSD]:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#example bam file with genotype data&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data1000genomes/refPanel_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data1000genomes/nInd_1000genomesRefPanel.txt&lt;br /&gt;
&lt;br /&gt;
#use ANGSD for generating a beagle file with the genotypes likelihoods&lt;br /&gt;
./angsd &lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
If one wants to analyse a haploid organism, one should create a regular beagle file (with 6 columns), &lt;br /&gt;
but only fill out the two first genotype likelihood columns, for respectively specifying the genotype likelihood of allele1 and allele2.&lt;br /&gt;
&lt;br /&gt;
Example of haploid data in a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker     allele1 allele2  Ind0      Ind0      Ind0&lt;br /&gt;
1_752566      A      G      0.50      0.50      0&lt;br /&gt;
1_842013      T      G      0.99      0.01      0&lt;br /&gt;
1_891021      A      G      0.99      0.01      0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -haploid [FLAG] (Only write '-haploid' in command line)&lt;br /&gt;
Raise this flag if haploid organism being analyzed, first two cols of beagle file will be used - does not work for plink! Only write &amp;quot;-haploid&amp;quot;.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A barplot can easily be generated using R/makeBarplot.R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/makeBarplot.R NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix can also be run in parallel giving the argument -P X to xargs, where X denotes the number of cores.&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.2: Can also handle and analyse haploid data, only with beagle input files - 03-02-2018&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=821</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=821"/>
		<updated>2018-02-03T14:54:09Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Input Files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE  -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
If one wants to analyse a haploid organism, one should create a regular beagle file (with 6 columns), &lt;br /&gt;
but only fill out the two first genotype likelihood columns, for respectively specifying the genotype likelihood of allele1 and allele2.&lt;br /&gt;
&lt;br /&gt;
Example of haploid data in a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker     allele1 allele2  Ind0      Ind0      Ind0&lt;br /&gt;
1_752566      A      G      0.50      0.50      0&lt;br /&gt;
1_842013      T      G      0.99      0.01      0&lt;br /&gt;
1_891021      A      G      0.99      0.01      0&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -haploid [FLAG] (Only write '-haploid' in command line)&lt;br /&gt;
Raise this flag if haploid organism being analyzed, first two cols of beagle file will be used - does not work for plink! Only write &amp;quot;-haploid&amp;quot;.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A barplot can easily be generated using R/makeBarplot.R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/makeBarplot.R NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix can also be run in parallel giving the argument -P X to xargs, where X denotes the number of cores.&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.2: Can also handle and analyse haploid data, only with beagle input files - 03-02-2018&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=820</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=820"/>
		<updated>2018-02-03T14:50:17Z</updated>

		<summary type="html">&lt;p&gt;Emil: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE  -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
If one wants to analyse a haploid organism, one should create a regular beagle file (with 6 columns), &lt;br /&gt;
but only fill out the two first genotype likelihood columns, for respectively specifying the genotype likelihood of allele1 and allele2.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -haploid [FLAG] (Only write '-haploid' in command line)&lt;br /&gt;
Raise this flag if haploid organism being analyzed, first two cols of beagle file will be used - does not work for plink! Only write &amp;quot;-haploid&amp;quot;.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A barplot can easily be generated using R/makeBarplot.R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/makeBarplot.R NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix can also be run in parallel giving the argument -P X to xargs, where X denotes the number of cores.&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.2: Can also handle and analyse haploid data, only with beagle input files - 03-02-2018&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=819</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=819"/>
		<updated>2018-02-03T14:44:48Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Running fastNGSadmix */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE  -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -haploid [FLAG] Raise this flag if haploid organism being analyzed, first two cols of beagle file will be used - does not work for plink! Only write &amp;quot;-haploid&amp;quot;.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A barplot can easily be generated using R/makeBarplot.R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/makeBarplot.R NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix can also be run in parallel giving the argument -P X to xargs, where X denotes the number of cores.&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=811</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=811"/>
		<updated>2017-11-17T13:31:11Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Outputs */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE  -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A barplot can easily be generated using R/makeBarplot.R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/makeBarplot.R NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix can also be run in parallel giving the argument -P X to xargs, where X denotes the number of cores.&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=810</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=810"/>
		<updated>2017-11-17T13:30:09Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Outputs */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE  -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A barplot can easily be generated using R/makeBarplot.R:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix can also be run in parallel giving the argument -P X to xargs, where X denotes the number of cores.&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=809</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=809"/>
		<updated>2017-10-06T14:40:33Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Easy Example for analysing multiple beagle files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE  -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix can also be run in parallel giving the argument -P X to xargs, where X denotes the number of cores.&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=808</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=808"/>
		<updated>2017-09-26T08:33:49Z</updated>

		<summary type="html">&lt;p&gt;Emil: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE  -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''-OPTION ARGUMENT'''', where there is a space between option and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -likes $GL  -qopt yriFrenchHan_depth05.qopt -out yriFrenchHan_depth05  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R -plinkFile $PLINKFILE -qopt NA20502_TSI.qopt -out NA20502_TSI  -ref data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving -PCs 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
Plotting can be disabled by setting -doPlots 0.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set -saveCovar 1,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify -multiCores X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting '-ngsTools 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting '-onlyPrior 1'.&lt;br /&gt;
Setting '-overlapRef 1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
One should be careful doing procrustes if the PCA plots differ much, for instance if different number of sites are used for the ref genotypes.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R -likes % -ref data/humanOrigins_7worldPops -qopt `basename %`.qopt -out test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=807</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=807"/>
		<updated>2017-09-23T13:30:31Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Log */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes= $GL  qopt= yriFrenchHan_depth05.qopt out= yriFrenchHan_depth05  geno= data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile= $PLINKFILE  qopt= NA20502_TSI.qopt out= NA20502_TSI  geno= data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''OPTION= ARGUMENT'''', where there is a space between '=' and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes= $GL  qopt= yriFrenchHan_depth05.qopt out= yriFrenchHan_depth05  geno= data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile= $PLINKFILE qopt= NA20502_TSI.qopt out= NA20502_TSI  geno= data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving PCs= 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set saveCovar=YES,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify multiCores=X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting 'fumagalli= 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting 'onlyPrior= 1'.&lt;br /&gt;
Setting 'overlapRef= 1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R likes= % geno= data/humanOrigins_7worldPops qopt= `basename %`.qopt out= test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&amp;lt;br&amp;gt;&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=806</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=806"/>
		<updated>2017-09-23T12:51:38Z</updated>

		<summary type="html">&lt;p&gt;Emil: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes= $GL  qopt= yriFrenchHan_depth05.qopt out= yriFrenchHan_depth05  geno= data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile= $PLINKFILE  qopt= NA20502_TSI.qopt out= NA20502_TSI  geno= data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Arguments should be given like this ''''OPTION= ARGUMENT'''', where there is a space between '=' and the argument.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes= $GL  qopt= yriFrenchHan_depth05.qopt out= yriFrenchHan_depth05  geno= data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile= $PLINKFILE qopt= NA20502_TSI.qopt out= NA20502_TSI  geno= data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving PCs= 2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set saveCovar=YES,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify multiCores=X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting 'fumagalli= 1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting 'onlyPrior= 1'.&lt;br /&gt;
Setting 'overlapRef= 1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R likes= % geno= data/humanOrigins_7worldPops qopt= `basename %`.qopt out= test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&lt;br /&gt;
version 1.1: Beagle input and reference panel do not have to be sorted genomically, only find overlap based on ID - 23-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=774</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=774"/>
		<updated>2017-09-15T12:33:02Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* vcf files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE  qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools' [https://samtools.github.io/bcftools/bcftools.html] 'query' option for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving PCs=2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set saveCovar=YES,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify multiCores=X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting 'fumagalli=1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting 'onlyPrior=1'.&lt;br /&gt;
Setting 'overlapRef=1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R likes=% geno=data/humanOrigins_7worldPops qopt=`basename %`.qopt out=test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=773</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=773"/>
		<updated>2017-09-15T12:32:40Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* vcf files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE  qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools [https://samtools.github.io/bcftools/bcftools.html] query, for generating a .beagle file from a .vcf file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving PCs=2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set saveCovar=YES,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify multiCores=X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting 'fumagalli=1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting 'onlyPrior=1'.&lt;br /&gt;
Setting 'overlapRef=1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R likes=% geno=data/humanOrigins_7worldPops qopt=`basename %`.qopt out=test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=772</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=772"/>
		<updated>2017-09-15T12:31:31Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* vcf files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE  qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
Or use bcftools [https://samtools.github.io/bcftools/bcftools.html] query, for generating a .beagle file from a .beagle file.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving PCs=2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set saveCovar=YES,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify multiCores=X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting 'fumagalli=1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting 'onlyPrior=1'.&lt;br /&gt;
Setting 'overlapRef=1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R likes=% geno=data/humanOrigins_7worldPops qopt=`basename %`.qopt out=test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=771</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=771"/>
		<updated>2017-09-15T11:52:18Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Input Files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE  qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==vcf files==&lt;br /&gt;
&lt;br /&gt;
Via vcftools [https://vcftools.github.io/man_latest.html] it should be possible to convert .vcf files with genotype likelihoods to .beagle files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
vcftools --vcf test.vcf --out test --BEAGLE-GL --chr 1,2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Chromosome has to be specified, try specifying multiple chromosomes.&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving PCs=2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set saveCovar=YES,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify multiCores=X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting 'fumagalli=1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting 'onlyPrior=1'.&lt;br /&gt;
Setting 'overlapRef=1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R likes=% geno=data/humanOrigins_7worldPops qopt=`basename %`.qopt out=test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=770</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=770"/>
		<updated>2017-09-12T19:23:35Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Easy Example for analysing multiple beagle files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE  qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving PCs=2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set saveCovar=YES,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify multiCores=X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting 'fumagalli=1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting 'onlyPrior=1'.&lt;br /&gt;
Setting 'overlapRef=1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R likes=% geno=data/humanOrigins_7worldPops qopt=`basename %`.qopt out=test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot; | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=769</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=769"/>
		<updated>2017-09-12T19:17:31Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Easy Example for analysing multiple beagle files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE  qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving PCs=2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set saveCovar=YES,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify multiCores=X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting 'fumagalli=1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting 'onlyPrior=1'.&lt;br /&gt;
Setting 'overlapRef=1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R likes=% geno=data/humanOrigins_7worldPops qopt=`basename %`.qopt out=test/`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot;  sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=768</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=768"/>
		<updated>2017-09-12T19:10:52Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Easy Example for analysing multiple beagle files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE  qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving PCs=2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set saveCovar=YES,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify multiCores=X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting 'fumagalli=1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting 'onlyPrior=1'.&lt;br /&gt;
Setting 'overlapRef=1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp example/yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R likes=% geno=data/humanOrigins_7worldPops qopt=`basename %`.qopt out=`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot;  sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=767</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=767"/>
		<updated>2017-09-12T19:03:38Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Easy Example for analysing multiple beagle files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE  qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving PCs=2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set saveCovar=YES,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify multiCores=X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting 'fumagalli=1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting 'onlyPrior=1'.&lt;br /&gt;
Setting 'overlapRef=1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
First we create a new folder with the .beagle files we want to analyse:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
mkdir test&lt;br /&gt;
cp NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz test/&lt;br /&gt;
cp yriFrenchHan_depth05.beagle.gz test/&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;test/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find test/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R likes=% geno=data/humanOrigins_7worldPops qopt=`basename %`.qopt out=`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls test/ | grep &amp;quot;_covar.txt$&amp;quot;  sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R test/ data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=766</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=766"/>
		<updated>2017-09-12T15:14:35Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* Easy Example for analysing multiple beagle files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE  qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving PCs=2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set saveCovar=YES,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify multiCores=X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting 'fumagalli=1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting 'onlyPrior=1'.&lt;br /&gt;
Setting 'overlapRef=1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find example/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;example/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find example/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find example/ | egrep &amp;quot;beagle$|beagle.gz$&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R likes=% geno=data/humanOrigins_7worldPops qopt=`basename %`.qopt out=`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls *_covar.txt | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=765</id>
		<title>FastNGSadmix</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=FastNGSadmix&amp;diff=765"/>
		<updated>2017-09-12T15:08:17Z</updated>

		<summary type="html">&lt;p&gt;Emil: /* PCA */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page contains information about the program fastNGSadmix, a very fast tool for finding admixture proportions from NGS data of a single individual and a method for doing PCA of NGS data, using the estimated admixture proportions. It is based on genotype likelihoods. It also read plink files. The admixture estimation part is written in C++ and the PCA part is written in R.&lt;br /&gt;
&lt;br /&gt;
=Download=&lt;br /&gt;
&lt;br /&gt;
The program can be downloaded from github:&lt;br /&gt;
&lt;br /&gt;
https://github.com/e-jorsboe/fastNGSadmix&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
git clone https://github.com/e-jorsboe/fastNGSadmix.git;&lt;br /&gt;
cd fastNGSadmix &lt;br /&gt;
make&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
So far it has only been tested on Linux systems.&lt;br /&gt;
&lt;br /&gt;
A data folder that has an already made reference panel (more on this in the Making a reference panel section) for the admixture estimation and the genotypes of the reference individuals in plink format for the PCA analysis,&lt;br /&gt;
as well as a .sites file for when generating genotype likelihoods with ANGSD. And an example folder with example data for trying out the method, can be downloaded from:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/data.tar.gz&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use curl if you are on a MAC&lt;br /&gt;
&lt;br /&gt;
They can be unpacked thus:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tar -xzf data.tar.gz&lt;br /&gt;
tar -xzf example.tar.gz&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Quick start==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#beagle file with genotype likelihoods&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#plink file with genotypes for a single individual&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $GL genotype likelihoods file&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $GL genotype likelihoods file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
#Estimate admixture proportions from the $PLINKFILE plink file&lt;br /&gt;
./fastNGSadmix -plink  $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&lt;br /&gt;
#perform PCA based on the calculated admixture proportions from the previous command and using the $PLINKFILE plink file&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE  qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Input Files=&lt;br /&gt;
Input files are genotype likelihoods in the genotype likelihood beagle input file format [http://faculty.washington.edu/browning/beagle/beagle.html]. Or called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]&lt;br /&gt;
We recommend [http://popgen.dk/angsd ANGSD] for easy transformation of Next-generation sequencing data to the beagle format and plink2 for handling plink files. See the [http://popgen.dk/angsd ANGSD] wiki for installing and other documentation&lt;br /&gt;
&lt;br /&gt;
The example below shows how to make a beagle file of genotype likelihoods using ANGSD.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#BAM/CRAM file&lt;br /&gt;
BAM=example/smallNA12874.mapped.ILLUMINA.bwa.CEU.low_coverage.20130415.bam &lt;br /&gt;
&lt;br /&gt;
#sites in the referene panel including major and minor allele&lt;br /&gt;
SITES=data/humanOrigins_7worldPops.sites&lt;br /&gt;
&lt;br /&gt;
#run ANGSD&lt;br /&gt;
./angsd -i $BAM -GL 2 -sites $SITES -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out outName&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of a beagle genotype likelihood input file for 1 individual. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
marker       allele1  allele2   Ind0      Ind0    Ind0&lt;br /&gt;
1_14000023      1       0       0.941    0.058    0.000&lt;br /&gt;
1_14000072      2       3       0.709    0.177    0.112&lt;br /&gt;
1_14000113      0       2       0.855    0.106    0.037&lt;br /&gt;
1_14000202      2       0       0.835    0.104    0.060&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the marker column is '''recommended to be chr_pos''', instead of rs ID - but custom ids can be used.&lt;br /&gt;
&lt;br /&gt;
The reference panel looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
id chr pos name A0_freq A1 French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
1_752566 1 752566 rs3094315 G A 0.166 0.061 0.37 0.0833 0.071 0.306 0.671&lt;br /&gt;
1_842013 1 842013 rs7419119 G T 0.22 0.106 0.565 0.083 0.036 0.361 0.143&lt;br /&gt;
1_891021 1 891021 rs13302957 G A 0.060 0.197 0.326 0.75 0.893 0.056 0.264&lt;br /&gt;
1_903426 1 903426 rs6696609 C T 0.62 0.636 0.705 0.25 0.179 0.583 0.257&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where again we have the '''id column (like marker column in beagle file) that is recommended to be chr_pos''' which is being used for detecting the overlap with the input - custom ids can be used. The frequencies have to be in direction of the A0_freq allele.&lt;br /&gt;
Custom ids can be used for the id column in the .beagle and refPanel files, as the program matches sites on id_A0_A1 (alleles are sorted).&lt;br /&gt;
&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
The number of individuals in each reference population looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French Han Chukchi Karitiana Papuan Sindhi Yoruba&lt;br /&gt;
25 33 23 12 14 18 70&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These number are allowed to be floats, and the names have to match the names in the reference panel!&lt;br /&gt;
&lt;br /&gt;
A provided .sites file has been included, in the data folder.&lt;br /&gt;
&lt;br /&gt;
It looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1 752566 G A&lt;br /&gt;
1 842013 G T&lt;br /&gt;
1 891021 G A&lt;br /&gt;
1 903426 C T&lt;br /&gt;
1 949654 A G&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Running fastNGSadmix=&lt;br /&gt;
&lt;br /&gt;
An example of running a command with fastNGSadmix with a beagle file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
&lt;br /&gt;
#frequencies from reference panel&lt;br /&gt;
REF=data/refPanel_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
#number of individuals in each population of reference panel&lt;br /&gt;
NIND=data/nInd_humanOrigins_7worldPops.txt&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can pick which populations should be analyzed via the &amp;quot;-whichPops&amp;quot; option, where you write the names of the population comma separated, or &amp;quot;all&amp;quot; if you want to include all populations.&lt;br /&gt;
It should also be noted that the program quits if there are duplicate sites (based on chr_pos_A0_A1 (alleles are sorted) ID) in the input or the reference panel.&lt;br /&gt;
&lt;br /&gt;
It then produces two files yriFrenchHan_depth05.qopt with the admixture proportions and a log file yriFrenchHan_depth05.log&lt;br /&gt;
&lt;br /&gt;
Or with a plink file, with also the -whichPops option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PLINKFILE=example/NA20502_TSI&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Han,Yoruba&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A whole list of options can be explored by running fastNGSadmix without any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Must be specified:'''&lt;br /&gt;
; -likes [char*] (either -likes or -plink)&lt;br /&gt;
Path of Beagle likelihood file of one input individual.&lt;br /&gt;
; -plink [char*] (either -likes or -plink)&lt;br /&gt;
Path of Plink file of one input individual - in the binary bed format.&lt;br /&gt;
; -Nname [char*]&lt;br /&gt;
Path of Number of individuals in each reference population file, must have names of the populations.&lt;br /&gt;
; -fname [char*]&lt;br /&gt;
Path of reference panel with population frequencies. Use &amp;quot;-whichPops&amp;quot; to specify which population to include.&lt;br /&gt;
; -out [char*]&lt;br /&gt;
Prefix for the output files .qopt and .log.&lt;br /&gt;
; -whichPops [char*]&lt;br /&gt;
This option tells which populations from the reference panel to include in the analysis, must be comma separated (pop1,pop2,..) if &amp;quot;all&amp;quot;, all populations in the reference will be included. &lt;br /&gt;
&lt;br /&gt;
'''Optional:'''&lt;br /&gt;
; -printFreq [int]&lt;br /&gt;
This option prints the admixture adjusted allele frequencies of reference panel + input individual. Disabled per default can be enabled by setting this to 1.&lt;br /&gt;
; -doAdjust [int]&lt;br /&gt;
To use the unadjusted approach set &amp;quot;-doAdjust 0&amp;quot;. By default the method adjusting the frequencies is used (see more in the paper),&lt;br /&gt;
; -seed [int]&lt;br /&gt;
Set seed for random generation of initial guesses in EM and for bootstrap.&lt;br /&gt;
; -method [int] &lt;br /&gt;
Set this option to enable acceleration of the EM algorithm, enabled per default, set it to 0 for unaccelerated EM.&lt;br /&gt;
; -maf [float]&lt;br /&gt;
Set this option to filter away sites from the reference panel with lower minor allele frequency in any of the analyzed populations, default value is 0.&lt;br /&gt;
; -Qconv [int]&lt;br /&gt;
Set this option for faster inference of admixture proportions, the &amp;quot;-Qconv&amp;quot; option can be set to 1, this bases the converge criteria on change in the admixture proportions values. The threshold of this can be set with &amp;quot;-Qtol&amp;quot;. &lt;br /&gt;
This is less precise than the likelihood based convergence.&lt;br /&gt;
; -Qtol [float]&lt;br /&gt;
Set the threshold for changes in Q values. By default the &amp;quot;-Qtol&amp;quot; threshold is 0.0000001, it should not be put lower than this and generally this is only for when wanting a fast overview of the data, as it is less precise than the likelihood based convergence.&lt;br /&gt;
; -tol [float]&lt;br /&gt;
Set the tolerance for convergence based on likelihoods, per default 0.00001, can only be adjusted for non accelerated EM, the tolerance threshold is 1e-7 for the accelerated EM algorithm.&lt;br /&gt;
; -maxiter [int]&lt;br /&gt;
Set maximum number of EM iterations, per default it is set to 2000.&lt;br /&gt;
; -conv [int]&lt;br /&gt;
Specifies the number of convergence runs, with a new random starting point for each run. This is useful to test for convergence, testing that the same Q values are estimated for each run. The program will execute a maximum of 10 times.&lt;br /&gt;
; -boot [int]&lt;br /&gt;
Specifies the number of bootstrap runs, where random sites are sampled (with replacement) for each run. This is useful for generating a confidence interval of your estimate, for example doing the empirical 0.025 and 0.975 quantiles. The maximum number of bootstraps is 10000.&lt;br /&gt;
; -randomBoot [int]&lt;br /&gt;
Specify if the program takes random Q starting points for each bootstrap, instead of using converged upon estimate, default value 0.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Outputs==&lt;br /&gt;
&lt;br /&gt;
fastNGSadmix produces two files, a .qopt file, with the estimated admixture proportions, with names of which population they are inferred for.&lt;br /&gt;
If n number of bootstraps are run, there will be n+2 rows in the .qopt file the first two rows with the names of the populations analyzed and the converged upon estimates, and then n rows with the bootstrapping estimates.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat NA20502_TSI.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And if doing 10 bootstraps:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI_boot -whichPops French,Han,Yoruba -boot 10 -seed 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat  NA20502_TSI_boot.qopt&lt;br /&gt;
&lt;br /&gt;
French Han Yoruba &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
1.0000 0.0000 0.0000 &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where the first row of numbers is the converged upon estimates, and the rest are the bootstrapping runs.&lt;br /&gt;
&lt;br /&gt;
It also produces a .log file with all the information about the run, as well as the converged upon estimate:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
cat NA20502_TSI_boot.log &lt;br /&gt;
&lt;br /&gt;
Input: likes=(null) plink=example/NA20502_TSI Nname=data/nInd_humanOrigins_7worldPops.txt fname=data/refPanel_humanOrigins_7worldPops.txt outfiles=NA20502_TSI_boot&lt;br /&gt;
Setup: seed=1 method=1&lt;br /&gt;
The accelerated EM has been chosen&lt;br /&gt;
The adjusted method has been chosen&lt;br /&gt;
Convergence: maxIter=2000 tol=0.00000010&lt;br /&gt;
The following number of bootstraps have been chosen: 10&lt;br /&gt;
Input has this many sites without missing data 441695&lt;br /&gt;
Ref has this many sites 442769&lt;br /&gt;
Overlap: of 441695 sites between input and ref&lt;br /&gt;
&lt;br /&gt;
nPop=3&lt;br /&gt;
&lt;br /&gt;
Opening nInd file: data/refPanel_humanOrigins_7worldPops.txt with nPop=3&lt;br /&gt;
Chosen pop French&lt;br /&gt;
N = 25.000000&lt;br /&gt;
Chosen pop Han&lt;br /&gt;
N = 33.000000&lt;br /&gt;
Chosen pop Yoruba&lt;br /&gt;
N = 70.000000&lt;br /&gt;
&lt;br /&gt;
This many iterations 30 for run 0&lt;br /&gt;
&lt;br /&gt;
At this bootstrapping: 1 out of: 10&lt;br /&gt;
At this bootstrapping: 2 out of: 10&lt;br /&gt;
At this bootstrapping: 3 out of: 10&lt;br /&gt;
At this bootstrapping: 4 out of: 10&lt;br /&gt;
At this bootstrapping: 5 out of: 10&lt;br /&gt;
At this bootstrapping: 6 out of: 10&lt;br /&gt;
At this bootstrapping: 7 out of: 10&lt;br /&gt;
At this bootstrapping: 8 out of: 10&lt;br /&gt;
At this bootstrapping: 9 out of: 10&lt;br /&gt;
At this bootstrapping: 10 out of: 10&lt;br /&gt;
best like -304264.169822 after 0!&lt;br /&gt;
Q 0.999980 Q 0.000010 Q 0.000010  after 0!&lt;br /&gt;
&lt;br /&gt;
Estimated  Q = 0.999980 0.000010 0.000010&lt;br /&gt;
best like -304264.169822 after 0 runs!&lt;br /&gt;
FIRST row of .qopt file is BEST estimated Q, rest are nBoot bootstrapping Qs&lt;br /&gt;
        [ALL done] cpu-time used =  14.96 sec&lt;br /&gt;
        [ALL done] walltime used =  15.00 sec&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=PCA=&lt;br /&gt;
&lt;br /&gt;
After having run the admixture estimation, a PCA analysis can be run using the estimated admixture proportions to account for population structure in the PCA.&lt;br /&gt;
The PCA method is implemented in R and is run using the script fastNGSAdmixPCA.R in the R folder.&lt;br /&gt;
It requires the R package '''snpStats''', it requires the R package '''parallel''', if multi-threaded faster analysis is wanted.&lt;br /&gt;
&lt;br /&gt;
The method needs the estimated admixture proportions, the analyzed beagle or plink file as well as the genotypes of the reference panel used (as binary plink files).&lt;br /&gt;
&lt;br /&gt;
For the provided genotypes the .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
GL=example/yriFrenchHan_depth05.beagle.gz&lt;br /&gt;
./fastNGSadmix -likes  $GL -fname $REF -Nname $NIND -out yriFrenchHan_depth05 -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R likes=$GL  qopt=yriFrenchHan_depth05.qopt out=yriFrenchHan_depth05  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or using analyzed plink files.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./fastNGSadmix -plink $PLINKFILE -fname $REF -Nname $NIND -out NA20502_TSI -whichPops French,Yoruba,Han&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R plinkFile=$PLINKFILE qopt=NA20502_TSI.qopt out=NA20502_TSI  geno=data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
By default the populations of the analyzed *.qopt file are used. It can be specified which PCs should be plotted (default PC1 and PC2), by for example giving PCs=2,3 as an argument (if PC2 and PC3), the PCA plot  is plotted as *_PCAplot.pdf.&lt;br /&gt;
The script also generates barplots (*_quantile_admixBarplot.png) of the admixture proportions supplied with confidence intervals (in the case of bootstraps), as well as generating files of the covariance matrix (*_covar.txt), eigenvectors (*_eigenvecs.txt) and eigenvalues (*_eigenvals.txt) used for the PCA plot.&lt;br /&gt;
&lt;br /&gt;
The covariance matrix and eigenvectors have the ids of the individuals, '''the input individual has the id &amp;quot;SAMPLE&amp;quot;'''.&lt;br /&gt;
An *_indi.txt file is also created with the individual id and population/groupd id of each individual.&lt;br /&gt;
&lt;br /&gt;
For faster analyses when analysing multiple individuals with the same reference panel of genotypes, the option saveCovar can be set saveCovar=YES,&lt;br /&gt;
this stores the covariance matrix for the reference panel indivduals, this will be a lot quicker for big reference panel.&lt;br /&gt;
Also one can specify multiCores=X, where parallel:::mcapply will then be used with X cores, this is require the package '''snpStats'''.&lt;br /&gt;
&lt;br /&gt;
The PCA can also be done using the method from Fumagalli et al., 2013, that is done by setting 'fumagalli=1'.&lt;br /&gt;
The method can also be run only using the estimated prior, that means calculating the genotype probability given only the admixture proportions and population specific frequencies&lt;br /&gt;
and NOT the sequencing data, for more see supplementaries of Jørsboe et al., 2017. This is done by setting 'onlyPrior=1'.&lt;br /&gt;
Setting 'overlapRef=1' means that the covariance matrix for the reference panel genotypes is calculated only on overlapping markers with input (per default all markers are used for calculating the reference panel covariance matrix).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To find out which options there are run wihtout any input:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/fastNGSadmixPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats''' R package, is needed for running this script.&lt;br /&gt;
&lt;br /&gt;
For combining these analyses plotting a joint PCA plot, procrustes analyses can be done, there is a script for this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/doProcastesPCA.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Procrustes analysis is needed as the calculated covariance matrix differs, when the inputted sample differs.&lt;br /&gt;
&lt;br /&gt;
It will take a directory of *_covar.txt files. The .fam file of the plink files used for the reference genotypes.&lt;br /&gt;
Then a file with all the samples/files you want to include in the analyses, these should fit with the name of the *_covar.txt files (excluding _covar.txt),&lt;br /&gt;
these should also all be run with the same reference panel and they should all be run with the same populations.&lt;br /&gt;
And then lastly an argument specifying the name of the group being procrusted.&lt;br /&gt;
&lt;br /&gt;
The '''vegan and parallel''' R packages are needed.&lt;br /&gt;
&lt;br /&gt;
Here it can be done like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo -e &amp;quot;yriFrenchHan_depth05\nNA20502_TSI&amp;quot; &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Easy Example for analysing multiple beagle files =&lt;br /&gt;
&lt;br /&gt;
For analysing all the *.beagle files in a directory using a one-line command one can use xargs:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find example/ | egrep &amp;quot;beagle$|beagle$.gz&amp;quot; | xargs -I % bash -c './fastNGSadmix -likes % -fname data/refPanel_humanOrigins_7worldPops.txt -Nname data/nInd_humanOrigins_7worldPops.txt -out `basename %` -whichPops all;'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Be aware that this command will analyse '''ALL *.beagle or *.beagle.gz files''' in the &amp;quot;example/&amp;quot; directory, in order to see which files will be analysed run this command first:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find example/ | egrep &amp;quot;beagle$|beagle$.gz&amp;quot; &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then for doing the PCA (also analyses all *.beagle or *.beagle.gz files in the specified directory):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
find example/ | egrep &amp;quot;beagle$|beagle$.gz&amp;quot; | xargs -I % bash -c 'Rscript R/fastNGSadmixPCA.R likes=% geno=data/humanOrigins_7worldPops qopt=`basename %`.qopt out=`basename %`'&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The PCA plots can be compared using procrustes analysis, using R/doProcastesPCA.R.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ls *_covar.txt | sed -e 's/_covar.txt//g' &amp;gt; procrustes.list&lt;br /&gt;
Rscript R/doProcastesPCA.R . data/humanOrigins_7worldPops.fam procrustes.list Samples&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Making a reference panel =&lt;br /&gt;
&lt;br /&gt;
The program needs a reference panel of population specific frequencies, populations for which admixture proportions should be estimated for.&lt;br /&gt;
This reference panel can be derived from 1000 Genomes or HGDP, or other data. The program also needs a file telling the size of each reference panel population.&lt;br /&gt;
It is recommended to run '''a missingness filter on the ref data first''', as it is implicitly assumed in the model that there is full data for most sites, for example in plink using --geno 0.05. &lt;br /&gt;
There is an R script plinkToRef.R in the R folder, that can convert a plink file to a reference panel fine and size of reference populations file.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This generates 3 files, a reference panel named data/refPanel_humanOrigins_7worldPops.txt, a number of individuals file called  data/nInd_humanOrigins_7worldPops.txt and a data/humanOrigins_7worldPops.sites file with chr pos and minor major alleles, for being used with ANGSD.&lt;br /&gt;
The provided .fam file must have the group or population specified as FID, meaning first column and then the individual ID as the second column:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
French F1 0 0 1 1&lt;br /&gt;
French F2 0 0 1 1&lt;br /&gt;
Yoruba Y1 0 0 2 1&lt;br /&gt;
Yoruba Y2 0 0 1 1&lt;br /&gt;
Yoruba Y3 0 0 2 1&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A second argument can be given, telling if duplicate sites '''(based on chr_pos_A0_A1 ID)''' should be removed from the reference panel created, this argument has to be 0 (no) or 1 (yes), 1 pr. default.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A third argument of a MAF threshold can be supplied, meaning all sites where the MAF is below this value is removed.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Rscript R/plinkToRef.R data/humanOrigins_7worldPops 1 0.05&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The '''snpStats R package''', is also needed for running the plinkToRef.R script.&lt;br /&gt;
&lt;br /&gt;
It should be noted that reading in plink files especially big ones '''might require a lot of RAM''', why doing it on a server might be preferable!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with NGSadmix ==&lt;br /&gt;
&lt;br /&gt;
We can use NGSadmix for generating a reference panel, which we can then use for fast analysis of individual samples. &lt;br /&gt;
&lt;br /&gt;
For instance if we want to construct a reference panel from [http://www.internationalgenome.org/ 1000 genomes] beagle data of (CEU, CHB, JHB and YRI), we first do an analysis of the reference panel with NGSadmix using K=3:&lt;br /&gt;
&lt;br /&gt;
The beagle file should have '''chr_pos''' as marker ID for this to work!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
gunzip example/ceuChbJhbYriChr1.beagle.gz&lt;br /&gt;
./ngsadmix32 -likes example/ceuChbJhbYriChr1.beagle -K 3 -printInfo 1 -outfiles ceuChbJhbYriChr1.beagle&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .fopt.gz file, with three populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2 K3&amp;quot; &amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
cut -f1,2,3 example/ceuChbJhbYriChr1.beagle &amp;gt; tmp.beagle&lt;br /&gt;
gunzip ceuChbJhbYriChr1.beagle.fopt.gz&lt;br /&gt;
## find overlap of .beagle and .filter files and creates tmp.ref with 6 first column of refPanel_&lt;br /&gt;
Rscript example/makeRefNGSadmix.R tmp.beagle ceuChbJhbYriChr1.beagle.filter&lt;br /&gt;
paste tmp.ref ceuChbJhbYriChr1.beagle.fopt &amp;gt;&amp;gt; refPanel_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
rm tmp.ref tmp.beagle&lt;br /&gt;
gzip ceuChbJhbYriChr1.beagle.fopt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .qopt file:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;K1 K2 K3&amp;quot; &amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;lt;(cut -f3 -d&amp;quot; &amp;quot; ceuChbJhbYriChr1.beagle.qopt | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_ceuChbJhbYriChr1.beagle.txt&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_ceuChbJhbYriChr1.beagle.txt -Nname nInd_ceuChbJhbYriChr1.beagle.txt -out NA12763_CEU_K3  -whichPops all&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed when building a reference panel, from another beagle file, is the value of 'K' and the K columns in the reference panel and nInd file should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
== Making a reference panel with ADMIXTURE ==&lt;br /&gt;
&lt;br /&gt;
A reference panel can be generated making use of [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE].&lt;br /&gt;
&lt;br /&gt;
This could for instance be a clustered analysis, where instead of inferring admixture proportions for populations, we instead estimate admixture proportions for clusters of populations.&lt;br /&gt;
&lt;br /&gt;
For instance if we want to detect the african and non-african component on an individual, we might first do an analysis of the reference panel with [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] using K=2:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
admixture data/humanOrigins_7worldPops.bed 2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
It might be necessary to run [https://www.genetics.ucla.edu/software/admixture/ ADMIXTURE] more than once in order to make sure that it has converged.&lt;br /&gt;
&lt;br /&gt;
We can then create a reference panel from the .P file, with two populations:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;id chr pos name A0_freq A1 K1 K2&amp;quot; &amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;( awk -F &amp;quot; &amp;quot; ' { print $1&amp;quot;_&amp;quot;$4, $1, $4, $2, $6, $5 } ' data/humanOrigins_7worldPops.bim ) humanOrigins_7worldPops.2.P &amp;gt;&amp;gt; refPanel_humanOrigins_7worldPops.2.P.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then we can create a number of individuals in each reference population file, summing each column of the .Q file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;K1 K2&amp;quot; &amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
paste -d&amp;quot; &amp;quot; &amp;lt;(cut -f1 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;lt;(cut -f2 -d&amp;quot; &amp;quot; humanOrigins_7worldPops.2.Q | paste -sd+ | bc) &amp;gt;&amp;gt; nInd_humanOrigins_7worldPops.2.Q.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then we can run fastNGSadmix using this reference panel&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#genotype likelihoods from sequencing file&lt;br /&gt;
GL=example/NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz&lt;br /&gt;
&lt;br /&gt;
./fastNGSadmix -likes $GL -fname refPanel_humanOrigins_7worldPops.2.P.txt -Nname nInd_humanOrigins_7worldPops.2.Q.txt -out NA12763_CEU_K2  -whichPops all&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you run fastNGSadmix multiple times, and then would like to cat the .Q file from ADMIXTURE with your results you can do the following, assuming your files are named NA'''XXXXX'''_CEU_K2.qopt:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;(cat humanOrigins_7worldPops.2.Q ) &amp;lt;(find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot; | xargs -I % bash -c 'cat % | head -n 2 | tail -n 1;') &amp;gt; humanOrigins_7worldPops_fastNGSadmix.2.Q&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find out which files from fastNGSadmix have been been merged with the ADMIXTURE .Q file and in which order, run:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
find . | grep &amp;quot;NA....._CEU_K2.qopt$&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The only thing that should be changed for constructing another reference panel, is the value of 'K' and K columns of the reference panel should be set accordingly.&lt;br /&gt;
&lt;br /&gt;
=Human reference panels=&lt;br /&gt;
fastNGSadmix already comes with a premade reference panel.&lt;br /&gt;
&lt;br /&gt;
==refPanel_humanOrigins_7worldPops==&lt;br /&gt;
This reference panel is made from Lazaridis et al. (2014) where the curated Human Origins dataset was selected.&lt;br /&gt;
&lt;br /&gt;
The dataset was lifted to hg19 using the program liftOver, The SNPs then got rs IDs, using 1000 Genomes data.&lt;br /&gt;
&lt;br /&gt;
Furthermore sites with more than 5 % missing and a MAF below 5 % were removed, and only autosomal sites were kept.&lt;br /&gt;
7 populations were selected French, Han, Chukchi, Karitiana, Papuan, Sindhi and Yoruba to have representation for most of the world.&lt;br /&gt;
Furthermore it was made sure that there were only unadmixed individuals within each population.&lt;br /&gt;
&lt;br /&gt;
==Creating other reference panels==&lt;br /&gt;
&lt;br /&gt;
Other reference panels can easily be made from the Human Origins data:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
wget popgen.dk/software/download/fastNGSadmix/humanOrigins_ALL.tar.gz&lt;br /&gt;
tar -xzf humanOrigins_ALL.tar.gz&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Where I have updated the ids with the group/population-id for the FID and individual-id as the IID as well as the snp-ids as '''chr_pos'''.&lt;br /&gt;
Then I would recommend to keep only autosomal sites and do a MAF and geno filter (one can play around with these filters!):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
plink --bfile humanOrigins_ALL/humanOrigins_ALL --make-bed --out humanOrigins_ALL/humanOrigins_ALLV2 --maf 0.05 --geno 0.05 --chr 1-22&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And then the desired individuals/populations can be extracted from the humanOrigins_ALL/humanOrigins_ALLV2.* plink files using plink, using the --keep option. And it can then easily be turned into a reference panel using plinkToRef.R.&lt;br /&gt;
(Please notice that this command takes up a lot of RAM!)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Rscript R/plinkToRef.R humanOrigins_ALL/humanOrigins_ALLV2&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Other reference panels can be created for instance based on 1000 Genomes or own data, first convert the data into plink files and then use the script plinkToRef.R.&lt;br /&gt;
&lt;br /&gt;
=Citation=&lt;br /&gt;
&lt;br /&gt;
The program has been published in Bioinformatics:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Emil Jørsboe, Kristian Hanghøj, Anders Albrechtsen; &lt;br /&gt;
fastNGSadmix: Admixture proportions and principal component analysis of a single NGS sample, &lt;br /&gt;
Bioinformatics, btx474, https://doi.org/10.1093/bioinformatics/btx474&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Log=&lt;br /&gt;
version 1.0: Stable initially released version of fastNGSadmix - 07-09-2017&lt;/div&gt;</summary>
		<author><name>Emil</name></author>
	</entry>
</feed>