<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.popgen.dk/software/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Ida</id>
	<title>software - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://www.popgen.dk/software/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Ida"/>
	<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php/Special:Contributions/Ida"/>
	<updated>2026-04-30T14:16:06Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.40.1</generator>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=694</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=694"/>
		<updated>2017-06-29T13:51:20Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Changelog */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq  &amp;gt; newres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/newres.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 -domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq  &amp;gt;newres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;newres&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
NEW: Example of output of analysis of two samples run without the optional -z:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a	b	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And the same analysis run with the optional flag -z followed by name of file with IDs (where the first two IDs are S1 and S42):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a	b	ida	idb	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	S1	S42	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat newres&lt;br /&gt;
a	b	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
0	2	1121594	0.448790	0.548298	0.002912	-1666189.356801	25	0.808822&lt;br /&gt;
0	3	1131917	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
0	4	1135509	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
0	5	1043719	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
1	2	1118945	0.006249	0.993750	0.000001	-1580989.961356	13	0.806912&lt;br /&gt;
1	3	1129152	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
1	4	1132778	1.000000	0.000000	0.000000	-1744055.210286	-1	0.816887&lt;br /&gt;
1	5	1041298	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
2	3	1122253	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
2	4	1125729	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
2	5	1035731	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
3	4	1136091	0.566552	0.433054	0.000393	-1743752.158759	36	0.819276&lt;br /&gt;
3	5	1046456	0.265831	0.482954	0.251214	-1467343.087558	11	0.754637&lt;br /&gt;
4	5	1047977	0.004653	0.995347	0.000000	-1473415.049864	94	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first two columns contain the information of about what two individuals was used for the analysis. The third column contains information about how many sites were used in the analysis. The following three columns are the maximum likelihood (ML) estimates of the relatedness coefficients. The seventh column is the log of the likelihood of the ML estimate. The eigth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the ninth column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For OLD versions of the program (from before June 28 2017): &lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
Important recent changes:&lt;br /&gt;
&lt;br /&gt;
#We have made -s 1 default (flips all allele frequencies from freq to 1-freq), since this is needed in almost all analyses. If you do not want the frequencies flipped then simply run the program with -s 0&lt;br /&gt;
#The output format has been changed to a more R friendly format (no &amp;quot;:&amp;quot; and parenthesis) &lt;br /&gt;
#The option -z has been added so one can get the sample IDs printed in the output (if one run the program with -z idfilename)&lt;br /&gt;
#We have fixed -m 1 so the estimates can no longer be negative&lt;br /&gt;
&lt;br /&gt;
See github for the full change log.&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=693</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=693"/>
		<updated>2017-06-29T13:48:20Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Changelog */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq  &amp;gt; newres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/newres.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 -domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq  &amp;gt;newres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;newres&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
NEW: Example of output of analysis of two samples run without the optional -z:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a	b	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And the same analysis run with the optional flag -z followed by name of file with IDs (where the first two IDs are S1 and S42):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a	b	ida	idb	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	S1	S42	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat newres&lt;br /&gt;
a	b	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
0	2	1121594	0.448790	0.548298	0.002912	-1666189.356801	25	0.808822&lt;br /&gt;
0	3	1131917	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
0	4	1135509	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
0	5	1043719	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
1	2	1118945	0.006249	0.993750	0.000001	-1580989.961356	13	0.806912&lt;br /&gt;
1	3	1129152	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
1	4	1132778	1.000000	0.000000	0.000000	-1744055.210286	-1	0.816887&lt;br /&gt;
1	5	1041298	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
2	3	1122253	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
2	4	1125729	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
2	5	1035731	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
3	4	1136091	0.566552	0.433054	0.000393	-1743752.158759	36	0.819276&lt;br /&gt;
3	5	1046456	0.265831	0.482954	0.251214	-1467343.087558	11	0.754637&lt;br /&gt;
4	5	1047977	0.004653	0.995347	0.000000	-1473415.049864	94	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first two columns contain the information of about what two individuals was used for the analysis. The third column contains information about how many sites were used in the analysis. The following three columns are the maximum likelihood (ML) estimates of the relatedness coefficients. The seventh column is the log of the likelihood of the ML estimate. The eigth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the ninth column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For OLD versions of the program (from before June 28 2017): &lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
Important recent changes:&lt;br /&gt;
&lt;br /&gt;
 - We have made -s 1 default (flips all allele frequencies from freq to 1-freq), since this is needed in almost all analyses. If you do not want the frequencies flipped then simply run the program with -s 0&lt;br /&gt;
 - The output format has been changed to a more R friendly format (no &amp;quot;:&amp;quot; and parenthesis) &lt;br /&gt;
 - The option -z has been added so one can get the sample IDs printed in the output (if one run the program with -z idfilename)&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=692</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=692"/>
		<updated>2017-06-29T13:44:02Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 1: using only NGS data */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq  &amp;gt; newres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/newres.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 -domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq  &amp;gt;newres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;newres&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
NEW: Example of output of analysis of two samples run without the optional -z:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a	b	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And the same analysis run with the optional flag -z followed by name of file with IDs (where the first two IDs are S1 and S42):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a	b	ida	idb	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	S1	S42	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat newres&lt;br /&gt;
a	b	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
0	2	1121594	0.448790	0.548298	0.002912	-1666189.356801	25	0.808822&lt;br /&gt;
0	3	1131917	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
0	4	1135509	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
0	5	1043719	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
1	2	1118945	0.006249	0.993750	0.000001	-1580989.961356	13	0.806912&lt;br /&gt;
1	3	1129152	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
1	4	1132778	1.000000	0.000000	0.000000	-1744055.210286	-1	0.816887&lt;br /&gt;
1	5	1041298	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
2	3	1122253	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
2	4	1125729	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
2	5	1035731	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
3	4	1136091	0.566552	0.433054	0.000393	-1743752.158759	36	0.819276&lt;br /&gt;
3	5	1046456	0.265831	0.482954	0.251214	-1467343.087558	11	0.754637&lt;br /&gt;
4	5	1047977	0.004653	0.995347	0.000000	-1473415.049864	94	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first two columns contain the information of about what two individuals was used for the analysis. The third column contains information about how many sites were used in the analysis. The following three columns are the maximum likelihood (ML) estimates of the relatedness coefficients. The seventh column is the log of the likelihood of the ML estimate. The eigth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the ninth column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For OLD versions of the program (from before June 28 2017): &lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=691</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=691"/>
		<updated>2017-06-29T13:43:34Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq  &amp;gt; newres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/newres.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq  &amp;gt;newres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;newres&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
NEW: Example of output of analysis of two samples run without the optional -z:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a	b	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And the same analysis run with the optional flag -z followed by name of file with IDs (where the first two IDs are S1 and S42):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a	b	ida	idb	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	S1	S42	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat newres&lt;br /&gt;
a	b	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
0	2	1121594	0.448790	0.548298	0.002912	-1666189.356801	25	0.808822&lt;br /&gt;
0	3	1131917	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
0	4	1135509	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
0	5	1043719	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
1	2	1118945	0.006249	0.993750	0.000001	-1580989.961356	13	0.806912&lt;br /&gt;
1	3	1129152	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
1	4	1132778	1.000000	0.000000	0.000000	-1744055.210286	-1	0.816887&lt;br /&gt;
1	5	1041298	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
2	3	1122253	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
2	4	1125729	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
2	5	1035731	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
3	4	1136091	0.566552	0.433054	0.000393	-1743752.158759	36	0.819276&lt;br /&gt;
3	5	1046456	0.265831	0.482954	0.251214	-1467343.087558	11	0.754637&lt;br /&gt;
4	5	1047977	0.004653	0.995347	0.000000	-1473415.049864	94	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first two columns contain the information of about what two individuals was used for the analysis. The third column contains information about how many sites were used in the analysis. The following three columns are the maximum likelihood (ML) estimates of the relatedness coefficients. The seventh column is the log of the likelihood of the ML estimate. The eigth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the ninth column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For OLD versions of the program (from before June 28 2017): &lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=690</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=690"/>
		<updated>2017-06-29T13:42:10Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Output format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq  &amp;gt; newres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/newres.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
NEW: Example of output of analysis of two samples run without the optional -z:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a	b	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
And the same analysis run with the optional flag -z followed by name of file with IDs (where the first two IDs are S1 and S42):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a	b	ida	idb	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	S1	S42	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat newres&lt;br /&gt;
a	b	nSites	k0	        k1   	        k2	        loglh	        nIter	coverage&lt;br /&gt;
0	1	1128677	0.673213	0.326774	0.000013	-1710940.769938	19	0.813930&lt;br /&gt;
0	2	1121594	0.448790	0.548298	0.002912	-1666189.356801	25	0.808822&lt;br /&gt;
0	3	1131917	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
0	4	1135509	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
0	5	1043719	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
1	2	1118945	0.006249	0.993750	0.000001	-1580989.961356	13	0.806912&lt;br /&gt;
1	3	1129152	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
1	4	1132778	1.000000	0.000000	0.000000	-1744055.210286	-1	0.816887&lt;br /&gt;
1	5	1041298	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
2	3	1122253	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
2	4	1125729	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
2	5	1035731	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
3	4	1136091	0.566552	0.433054	0.000393	-1743752.158759	36	0.819276&lt;br /&gt;
3	5	1046456	0.265831	0.482954	0.251214	-1467343.087558	11	0.754637&lt;br /&gt;
4	5	1047977	0.004653	0.995347	0.000000	-1473415.049864	94	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first two columns contain the information of about what two individuals was used for the analysis. The third column contains information about how many sites were used in the analysis. The following three columns are the maximum likelihood (ML) estimates of the relatedness coefficients. The seventh column is the log of the likelihood of the ML estimate. The eigth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the ninth column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For OLD versions of the program (from before June 28 2017): &lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=689</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=689"/>
		<updated>2017-06-29T13:31:52Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq  &amp;gt; newres&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/newres.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
NEW: Example of output of analysis of two samples run without the optional -z:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a       b    nSites  k0        k1        k2        loglh           nIter   coverage&lt;br /&gt;
0       1    1556741 0.973967  0.025335  0.000698  -2538571.361427 11      0.888187&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
And the same analysis run with the optional flag -z followed by name of file with IDs (where the first two IDs are S1 and S42):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a       b       ida    idb    nSites  k0       k1        k2       loglh           nIter   coverage&lt;br /&gt;
0       1       S1     S42    1556741 0.973967 0.025335  0.000698 -2538571.361427 11      0.888187&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
a       b       nSites k0              k1              k2              loglh           nIter   coverage&lt;br /&gt;
0       1      1556741 0.973967        0.025335        0.000698        -2538571.361427 11      0.888187&lt;br /&gt;
0       2      1546532 1.000000        0.000000        0.000000        -2620047.442923 -1      0.882362&lt;br /&gt;
0       3      1537078 0.992948        0.006718        0.000334        -2565306.739074  9      0.876968&lt;br /&gt;
0       4      1541610 1.000000        0.000000        0.000000        -2579100.207020 -1      0.879554&lt;br /&gt;
0       5      1473792 1.000000        0.000000        0.000000        -2422828.498453 -1      0.840861&lt;br /&gt;
1       2      1718401 0.999671        0.000008        0.000321        -2998704.418367 10      0.980421&lt;br /&gt;
1       3      1692357 0.997923        0.000351        0.001727        -2900358.133067 13      0.965561&lt;br /&gt;
1       4      1711121 1.000000        0.000000        0.000000        -2950744.804601 -1      0.976267&lt;br /&gt;
1       5      1577363 0.999772        0.000006        0.000222        -2656326.110901  8      0.899953&lt;br /&gt;
2       3      1682350 1.000000        0.000000        0.000000        -2985791.102983 -1      0.959852&lt;br /&gt;
2       4      1705088 1.000000        0.000000        0.000000        -3047210.153877 -1      0.972825&lt;br /&gt;
2       5      1566829 1.000000        0.000000        0.000000        -2735339.204938 -1      0.893942&lt;br /&gt;
3       4      1676233 1.000000        0.000000        0.000000        -2941428.137608 -1      0.956362&lt;br /&gt;
3       5      1553114 1.000000        0.000000        0.000000        -2672151.226617 -1      0.886117&lt;br /&gt;
4       5      1561427 1.000000        0.000000        0.000000        -2693953.940803 -1      0.890860&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first two columns contain the information of about what two individuals was used for the analysis. The third column contains information about how many sites were used in the analysis. The following three columns are the maximum likelihood (ML) estimates of the relatedness coefficients. The seventh column is the log of the likelihood of the ML estimate. The eigth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the ninth column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For OLD versions of the program (from before June 28 2017): &lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=688</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=688"/>
		<updated>2017-06-29T13:27:41Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq  &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
NEW: Example of output of analysis of two samples run without the optional -z:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a       b    nSites  k0        k1        k2        loglh           nIter   coverage&lt;br /&gt;
0       1    1556741 0.973967  0.025335  0.000698  -2538571.361427 11      0.888187&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
And the same analysis run with the optional flag -z followed by name of file with IDs (where the first two IDs are S1 and S42):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a       b       ida    idb    nSites  k0       k1        k2       loglh           nIter   coverage&lt;br /&gt;
0       1       S1     S42    1556741 0.973967 0.025335  0.000698 -2538571.361427 11      0.888187&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
a       b       nSites k0              k1              k2              loglh           nIter   coverage&lt;br /&gt;
0       1      1556741 0.973967        0.025335        0.000698        -2538571.361427 11      0.888187&lt;br /&gt;
0       2      1546532 1.000000        0.000000        0.000000        -2620047.442923 -1      0.882362&lt;br /&gt;
0       3      1537078 0.992948        0.006718        0.000334        -2565306.739074  9      0.876968&lt;br /&gt;
0       4      1541610 1.000000        0.000000        0.000000        -2579100.207020 -1      0.879554&lt;br /&gt;
0       5      1473792 1.000000        0.000000        0.000000        -2422828.498453 -1      0.840861&lt;br /&gt;
1       2      1718401 0.999671        0.000008        0.000321        -2998704.418367 10      0.980421&lt;br /&gt;
1       3      1692357 0.997923        0.000351        0.001727        -2900358.133067 13      0.965561&lt;br /&gt;
1       4      1711121 1.000000        0.000000        0.000000        -2950744.804601 -1      0.976267&lt;br /&gt;
1       5      1577363 0.999772        0.000006        0.000222        -2656326.110901  8      0.899953&lt;br /&gt;
2       3      1682350 1.000000        0.000000        0.000000        -2985791.102983 -1      0.959852&lt;br /&gt;
2       4      1705088 1.000000        0.000000        0.000000        -3047210.153877 -1      0.972825&lt;br /&gt;
2       5      1566829 1.000000        0.000000        0.000000        -2735339.204938 -1      0.893942&lt;br /&gt;
3       4      1676233 1.000000        0.000000        0.000000        -2941428.137608 -1      0.956362&lt;br /&gt;
3       5      1553114 1.000000        0.000000        0.000000        -2672151.226617 -1      0.886117&lt;br /&gt;
4       5      1561427 1.000000        0.000000        0.000000        -2693953.940803 -1      0.890860&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first two columns contain the information of about what two individuals was used for the analysis. The third column contains information about how many sites were used in the analysis. The following three columns are the maximum likelihood (ML) estimates of the relatedness coefficients. The seventh column is the log of the likelihood of the ML estimate. The eigth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the ninth column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For OLD versions of the program (from before June 28 2017): &lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=687</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=687"/>
		<updated>2017-06-29T13:27:04Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Output format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
NEW: Example of output of analysis of two samples run without the optional -z:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a       b    nSites  k0        k1        k2        loglh           nIter   coverage&lt;br /&gt;
0       1    1556741 0.973967  0.025335  0.000698  -2538571.361427 11      0.888187&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
And the same analysis run with the optional flag -z followed by name of file with IDs (where the first two IDs are S1 and S42):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a       b       ida    idb    nSites  k0       k1        k2       loglh           nIter   coverage&lt;br /&gt;
0       1       S1     S42    1556741 0.973967 0.025335  0.000698 -2538571.361427 11      0.888187&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
a       b       nSites k0              k1              k2              loglh           nIter   coverage&lt;br /&gt;
0       1      1556741 0.973967        0.025335        0.000698        -2538571.361427 11      0.888187&lt;br /&gt;
0       2      1546532 1.000000        0.000000        0.000000        -2620047.442923 -1      0.882362&lt;br /&gt;
0       3      1537078 0.992948        0.006718        0.000334        -2565306.739074  9      0.876968&lt;br /&gt;
0       4      1541610 1.000000        0.000000        0.000000        -2579100.207020 -1      0.879554&lt;br /&gt;
0       5      1473792 1.000000        0.000000        0.000000        -2422828.498453 -1      0.840861&lt;br /&gt;
1       2      1718401 0.999671        0.000008        0.000321        -2998704.418367 10      0.980421&lt;br /&gt;
1       3      1692357 0.997923        0.000351        0.001727        -2900358.133067 13      0.965561&lt;br /&gt;
1       4      1711121 1.000000        0.000000        0.000000        -2950744.804601 -1      0.976267&lt;br /&gt;
1       5      1577363 0.999772        0.000006        0.000222        -2656326.110901  8      0.899953&lt;br /&gt;
2       3      1682350 1.000000        0.000000        0.000000        -2985791.102983 -1      0.959852&lt;br /&gt;
2       4      1705088 1.000000        0.000000        0.000000        -3047210.153877 -1      0.972825&lt;br /&gt;
2       5      1566829 1.000000        0.000000        0.000000        -2735339.204938 -1      0.893942&lt;br /&gt;
3       4      1676233 1.000000        0.000000        0.000000        -2941428.137608 -1      0.956362&lt;br /&gt;
3       5      1553114 1.000000        0.000000        0.000000        -2672151.226617 -1      0.886117&lt;br /&gt;
4       5      1561427 1.000000        0.000000        0.000000        -2693953.940803 -1      0.890860&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first two columns contain the information of about what two individuals was used for the analysis. The third column contains information about how many sites were used in the analysis. The following three columns are the maximum likelihood (ML) estimates of the relatedness coefficients. The seventh column is the log of the likelihood of the ML estimate. The eigth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the ninth column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For OLD versions of the program (from before June 28 2017): &lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=686</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=686"/>
		<updated>2017-06-29T13:25:41Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Output format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
NEW: Example of output of with two samples run without -z:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a       b    nSites  k0        k1        k2        loglh           nIter   coverage&lt;br /&gt;
0       1    1556741 0.973967  0.025335  0.000698  -2538571.361427 11      0.888187&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
And with -z followed by name of file with IDs:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a       b       ida    idb    nSites  k0       k1        k2       loglh           nIter   coverage&lt;br /&gt;
0       1       S1     S42    1556741 0.973967 0.025335  0.000698 -2538571.361427 11      0.888187&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
a       b       nSites k0              k1              k2              loglh           nIter   coverage&lt;br /&gt;
0       1      1556741 0.973967        0.025335        0.000698        -2538571.361427 11      0.888187&lt;br /&gt;
0       2      1546532 1.000000        0.000000        0.000000        -2620047.442923 -1      0.882362&lt;br /&gt;
0       3      1537078 0.992948        0.006718        0.000334        -2565306.739074  9      0.876968&lt;br /&gt;
0       4      1541610 1.000000        0.000000        0.000000        -2579100.207020 -1      0.879554&lt;br /&gt;
0       5      1473792 1.000000        0.000000        0.000000        -2422828.498453 -1      0.840861&lt;br /&gt;
1       2      1718401 0.999671        0.000008        0.000321        -2998704.418367 10      0.980421&lt;br /&gt;
1       3      1692357 0.997923        0.000351        0.001727        -2900358.133067 13      0.965561&lt;br /&gt;
1       4      1711121 1.000000        0.000000        0.000000        -2950744.804601 -1      0.976267&lt;br /&gt;
1       5      1577363 0.999772        0.000006        0.000222        -2656326.110901  8      0.899953&lt;br /&gt;
2       3      1682350 1.000000        0.000000        0.000000        -2985791.102983 -1      0.959852&lt;br /&gt;
2       4      1705088 1.000000        0.000000        0.000000        -3047210.153877 -1      0.972825&lt;br /&gt;
2       5      1566829 1.000000        0.000000        0.000000        -2735339.204938 -1      0.893942&lt;br /&gt;
3       4      1676233 1.000000        0.000000        0.000000        -2941428.137608 -1      0.956362&lt;br /&gt;
3       5      1553114 1.000000        0.000000        0.000000        -2672151.226617 -1      0.886117&lt;br /&gt;
4       5      1561427 1.000000        0.000000        0.000000        -2693953.940803 -1      0.890860&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first two columns contain the information of about what two individuals was used for the analysis. The third column contains information about how many sites were used in the analysis. The following three columns are the maximum likelihood (ML) estimates of the relatedness coefficients. The seventh column is the log of the likelihood of the ML estimate. The eigth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the ninth column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For OLD versions of the program (from before June 28 2017): &lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=685</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=685"/>
		<updated>2017-06-29T13:17:58Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Output format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
NEW: Example of output of with two samples run without -z:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a       b    nSites  k0        k1        k2        loglh           nIter   coverage&lt;br /&gt;
0       1    1556741 0.973967  0.025335  0.000698  -2538571.361427 11      0.888187&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
And with -z followed by name of fie with IDs:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a       b       ida    idb    nSites  k0       k1        k2       loglh           nIter   coverage&lt;br /&gt;
0       1       S1     S42    1556741 0.973967 0.025335  0.000698 -2538571.361427 11      0.888187&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
a       b       nSites k0              k1              k2              loglh           nIter   coverage&lt;br /&gt;
0       1      1556741 0.973967        0.025335        0.000698        -2538571.361427 11      0.888187&lt;br /&gt;
0       2      1546532 1.000000        0.000000        0.000000        -2620047.442923 -1      0.882362&lt;br /&gt;
0       3      1537078 0.992948        0.006718        0.000334        -2565306.739074  9      0.876968&lt;br /&gt;
0       4      1541610 1.000000        0.000000        0.000000        -2579100.207020 -1      0.879554&lt;br /&gt;
0       5      1473792 1.000000        0.000000        0.000000        -2422828.498453 -1      0.840861&lt;br /&gt;
1       2      1718401 0.999671        0.000008        0.000321        -2998704.418367 10      0.980421&lt;br /&gt;
1       3      1692357 0.997923        0.000351        0.001727        -2900358.133067 13      0.965561&lt;br /&gt;
1       4      1711121 1.000000        0.000000        0.000000        -2950744.804601 -1      0.976267&lt;br /&gt;
1       5      1577363 0.999772        0.000006        0.000222        -2656326.110901  8      0.899953&lt;br /&gt;
2       3      1682350 1.000000        0.000000        0.000000        -2985791.102983 -1      0.959852&lt;br /&gt;
2       4      1705088 1.000000        0.000000        0.000000        -3047210.153877 -1      0.972825&lt;br /&gt;
2       5      1566829 1.000000        0.000000        0.000000        -2735339.204938 -1      0.893942&lt;br /&gt;
3       4      1676233 1.000000        0.000000        0.000000        -2941428.137608 -1      0.956362&lt;br /&gt;
3       5      1553114 1.000000        0.000000        0.000000        -2672151.226617 -1      0.886117&lt;br /&gt;
4       5      1561427 1.000000        0.000000        0.000000        -2693953.940803 -1      0.890860&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first two columns contain the information of about what two individuals was used for the analysis. The third column contains information about how many sites were used in the analysis. The following three columns are the maximum likelihood (ML) estimates of the relatedness coefficients. The seventh column is the log of the likelihood of the ML estimate. The eigth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the ninth column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For OLD versions of the program (from before June 28 2017): &lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=684</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=684"/>
		<updated>2017-06-29T13:17:50Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Output format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
NEW: Example of output of with two samples run without -z:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a       b    nSites  k0        k1        k2        loglh           nIter   coverage&lt;br /&gt;
0       1    1556741 0.973967  0.025335  0.000698  -2538571.361427 11      0.888187&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
And with -z followed by name of fie with IDs:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a       b       ida    idb    nSites  k0       k1        k2       loglh           nIter   coverage&lt;br /&gt;
0       1       S1     S42    1556741 0.973967 0.025335  0.000698 -2538571.361427 11      0.888187&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
a       b       nSites k0              k1              k2              loglh           nIter   coverage&lt;br /&gt;
0       1      1556741 0.973967        0.025335        0.000698        -2538571.361427 11      0.888187&lt;br /&gt;
0       2      1546532 1.000000        0.000000        0.000000        -2620047.442923 -1      0.882362&lt;br /&gt;
0       3      1537078 0.992948        0.006718        0.000334        -2565306.739074  9      0.876968&lt;br /&gt;
0       4      1541610 1.000000        0.000000        0.000000        -2579100.207020 -1      0.879554&lt;br /&gt;
0       5      1473792 1.000000        0.000000        0.000000        -2422828.498453 -1      0.840861&lt;br /&gt;
1       2      1718401 0.999671        0.000008        0.000321        -2998704.418367 10      0.980421&lt;br /&gt;
1       3      1692357 0.997923        0.000351        0.001727        -2900358.133067 13      0.965561&lt;br /&gt;
1       4      1711121 1.000000        0.000000        0.000000        -2950744.804601 -1      0.976267&lt;br /&gt;
1       5      1577363 0.999772        0.000006        0.000222        -2656326.110901  8      0.899953&lt;br /&gt;
2       3      1682350 1.000000        0.000000        0.000000        -2985791.102983 -1      0.959852&lt;br /&gt;
2       4      1705088 1.000000        0.000000        0.000000        -3047210.153877 -1      0.972825&lt;br /&gt;
2       5      1566829 1.000000        0.000000        0.000000        -2735339.204938 -1      0.893942&lt;br /&gt;
3       4      1676233 1.000000        0.000000        0.000000        -2941428.137608 -1      0.956362&lt;br /&gt;
3       5      1553114 1.000000        0.000000        0.000000        -2672151.226617 -1      0.886117&lt;br /&gt;
4       5      1561427 1.000000        0.000000        0.000000        -2693953.940803 -1      0.890860&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first two columns contain the information of about what two individuals was used for the analysis. The third column contains information about how many sites were used in the analysis. The following three columns are the maximum likelihood (ML) estimates of the relatedness coefficients. The seventh column is the log of the likelihood of the ML estimate. The eigth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the ninth column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
For OLD versions of the program (from before June 28 2017): &lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=683</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=683"/>
		<updated>2017-06-29T13:16:23Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Output format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
NEW: Example of output of with two samples run without -z:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a       b    nSites  k0        k1        k2        loglh           nIter   coverage&lt;br /&gt;
0       1    1556741 0.973967  0.025335  0.000698  -2538571.361427 11      0.888187&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
And with -z followed by name of fie with IDs:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a       b       ida    idb    nSites  k0       k1        k2       loglh           nIter   coverage&lt;br /&gt;
0       1       S1     S42    1556741 0.973967 0.025335  0.000698 -2538571.361427 11      0.888187&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
a       b       nSites k0              k1              k2              loglh           nIter   coverage&lt;br /&gt;
0       1      1556741 0.973967        0.025335        0.000698        -2538571.361427 11      0.888187&lt;br /&gt;
0       2      1546532 1.000000        0.000000        0.000000        -2620047.442923 -1      0.882362&lt;br /&gt;
0       3      1537078 0.992948        0.006718        0.000334        -2565306.739074  9      0.876968&lt;br /&gt;
0       4      1541610 1.000000        0.000000        0.000000        -2579100.207020 -1      0.879554&lt;br /&gt;
0       5      1473792 1.000000        0.000000        0.000000        -2422828.498453 -1      0.840861&lt;br /&gt;
1       2      1718401 0.999671        0.000008        0.000321        -2998704.418367 10      0.980421&lt;br /&gt;
1       3      1692357 0.997923        0.000351        0.001727        -2900358.133067 13      0.965561&lt;br /&gt;
1       4      1711121 1.000000        0.000000        0.000000        -2950744.804601 -1      0.976267&lt;br /&gt;
1       5      1577363 0.999772        0.000006        0.000222        -2656326.110901  8      0.899953&lt;br /&gt;
2       3      1682350 1.000000        0.000000        0.000000        -2985791.102983 -1      0.959852&lt;br /&gt;
2       4      1705088 1.000000        0.000000        0.000000        -3047210.153877 -1      0.972825&lt;br /&gt;
2       5      1566829 1.000000        0.000000        0.000000        -2735339.204938 -1      0.893942&lt;br /&gt;
3       4      1676233 1.000000        0.000000        0.000000        -2941428.137608 -1      0.956362&lt;br /&gt;
3       5      1553114 1.000000        0.000000        0.000000        -2672151.226617 -1      0.886117&lt;br /&gt;
4       5      1561427 1.000000        0.000000        0.000000        -2693953.940803 -1      0.890860&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first two columns contain the information of about what two individuals was used for the analysis. The third column contains information about how many sites were used in the analysis. The following three columns are the maximum likelihood (ML) estimates of the relatedness coefficients. The seventh column is the log of the likelihood of the ML estimate. The eigth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the ninth column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
OLD: Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=682</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=682"/>
		<updated>2017-06-29T12:52:24Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Output format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
NEW: Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a	b	nSites		k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
0	1	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
OLD: Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=681</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=681"/>
		<updated>2017-06-29T12:52:05Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Output format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
NEW: Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
a	b	nSites	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
0	1	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
OLD: Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=680</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=680"/>
		<updated>2017-06-29T12:50:13Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 1: using only NGS data */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=679</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=679"/>
		<updated>2017-06-29T12:49:20Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 1: using only NGS data */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
'''NEW''':  Note that if you want you also input a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods. If you do the output will also contain these IDs and not just the numbers of the samples  (one can actually just use that exact file, however the IDs then tend to be a bit long). This can be done with the optional flag -'''z followed by the filename'''.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=678</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=678"/>
		<updated>2017-06-29T12:47:34Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 1: using only NGS data */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
Note that if you also specify a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods then the output will also contain these IDs (one can actually just use that exact file, however the IDs then tend to be a bit long. This can be done with the optional flag -z followed by the filename.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=677</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=677"/>
		<updated>2017-06-29T12:46:17Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 1: using only NGS data */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotype likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (numbered from 0, so 0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
Note that if you also specify a file with the IDs of the individuals (on ID per line) in the same order as in the file 'filelist' used to make the genotype likelihoods then the output will also contain these IDs. One can actually just use that exact file, however the IDs then tend to be a bit long.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=564</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=564"/>
		<updated>2017-01-31T11:38:32Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* How to download and install */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/ngsRelate.cpp &amp;gt;ngsRelate.cpp&lt;br /&gt;
g++ ngsRelate.cpp -O3 -lz -o ngsRelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq_bim angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
== Run example 3: using frequencies from 1000genomes vcf files==&lt;br /&gt;
We want to run ngsRelate using population frequencies from europe. We will extract the frequencies from the 1000genomes project vcf.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#Assuming that we have perchr called: ALL.chr*.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
#We dump output in EUR_AF/*.frq&lt;br /&gt;
#We only use diallelic sites, we extract CHROM,POS,REF,ALT,EUR_AF tags from the vcf&lt;br /&gt;
#We then pulled out the unique sites.&lt;br /&gt;
for f in `seq 1 22`&lt;br /&gt;
do&lt;br /&gt;
IF=/storage/data_shared/callsets/1000genomes/phase3/vcf/ALL.chr${f}.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz&lt;br /&gt;
echo &amp;quot;bcftools view  -m2 -M2 -v snps ${IF} | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%EUR_AF\n' |awk '{if(\$5&amp;gt;0) print \$0 }'|sort -S 50% -u -k1,2 &amp;gt;EUR_AF/${f}.frq&amp;quot;&lt;br /&gt;
done|parallel&lt;br /&gt;
   &lt;br /&gt;
##We merge into one file&lt;br /&gt;
cat EUR_AF/1.frq &amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
for i in `seq 2 22`&lt;br /&gt;
do&lt;br /&gt;
    cat EUR_AF/${i}.frq &amp;gt;&amp;gt;EUR_AF/ALL.frq&lt;br /&gt;
done&lt;br /&gt;
gzip EUR_AF/ALL.frq&lt;br /&gt;
&lt;br /&gt;
#we extract the first 4 columns, which is the sites input for angsd&lt;br /&gt;
gunzip -c EUR_AF/ALL.frq.gz |cut -f1-4 |gzip -c &amp;gt;EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
./angsd/angsd sites index EUR_AF/sites.txt.gz&lt;br /&gt;
./angsd/angsd -b list -gl 1 -domajorminor 3 -C 50 -ref /storage/data_shared/reference_genomes/hs37d5/hs37d5.fa -doglf 3 -minmapq 30 -minq 20 -sites EUR_AF/sites.txt.gz&lt;br /&gt;
&lt;br /&gt;
#Then we extract and match the freqs from the reference population with the sites where we had data. The parser expects a header, so make a dummy file&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;header&amp;quot; |gzip -c &amp;gt;new&lt;br /&gt;
cat EUR_AF/ALL.frq.gz &amp;gt;&amp;gt;new &lt;br /&gt;
ngsRelate extract_freq new angsdput.glf.pos.gz &amp;gt;myfreq&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=541</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=541"/>
		<updated>2016-04-27T11:36:04Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 1: using only NGS data */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
Method is published here: http://bioinformatics.oxfordjournals.org/content/early/2015/08/29/bioinformatics.btv509.abstract&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
zcat angsdput.mafs.gz | cut -f5 |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;br /&gt;
&lt;br /&gt;
=Bugs/Improvements=&lt;br /&gt;
-Make better output message if files doesn't exists when using the extract_freq option&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=522</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=522"/>
		<updated>2015-07-02T06:47:25Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output should be a file called res that contains relatedness estimates for all pairs between 6 individuals. A copy of this file can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=521</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=521"/>
		<updated>2015-07-02T06:44:02Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Help and additional options */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The expected output can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options simply type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=520</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=520"/>
		<updated>2015-07-02T06:42:50Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 1: using only NGS data */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The expected output can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per BAN/CRAM file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=519</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=519"/>
		<updated>2015-07-02T06:42:26Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 1: using only NGS data */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The expected output can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files; one line per file. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=518</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=518"/>
		<updated>2015-07-02T06:34:54Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Output format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The expected output can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contains the information of about which individuals was used for the analysis. The next three columns are the maximum likelihood (ML) estimate of the relatedness coefficients. The fifth column is the log of the likelihood of the ML estimate. The sixth column is the number of iterations of the maximization algorithm that was used to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=517</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=517"/>
		<updated>2015-07-02T06:29:36Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The expected output can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (i.e. set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis. The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the maximum likelihood estimate (MLE). The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that boundary values had better likelihood that the values outputted by the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=516</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=516"/>
		<updated>2015-07-02T06:28:20Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run. '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The expected output can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis. The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the maximum likelihood estimate (MLE). The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that boundary values had better likelihood that the values outputted by the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=515</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=515"/>
		<updated>2015-07-02T06:26:56Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. Furthermore, the examples take several hours to run all in all. They are therefore just meant as illustrations of how NgsRelate can be run.&lt;br /&gt;
 '''If you want to quickly try out NgsRelate, e.g. to check if your installation works, you can download the final input data for NgsRelate used in the very last command in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. Using that data you can try out NgsRelate by running that last command, i.e.''' &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The expected output can be found here http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis. The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the maximum likelihood estimate (MLE). The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that boundary values had better likelihood that the values outputted by the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=514</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=514"/>
		<updated>2015-07-01T22:33:03Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. Note that to be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed and you also need to download large data files from both HapMap3 and 1000 Genomes webpages. If you do not want to do that the examples should still illustrate how NgsRelate can be run. Also, '''if you want to try out NgsRelate without going through all the intermediate steps of downloading huge dataset and generating input data files from these (which takes some time!) you can download the final input data for NgsRelate used in the last command line in run example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/.''' &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency info for as follows:     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis. The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the maximum likelihood estimate (MLE). The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that boundary values had better likelihood that the values outputted by the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=513</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=513"/>
		<updated>2015-07-01T22:26:48Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. '''Note that to run the code in the examples you need to download large data files from both HapMap3 and 1000 Genomes webpages. If you do not want to do that the examples still illustrate how NgsRelate can be run and if you want to try out NgsRelate without going through all the intermediate steps of downloading dataset and generating input data files (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/.''' &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis. The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the maximum likelihood estimate (MLE). The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that boundary values had better likelihood that the values outputted by the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=512</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=512"/>
		<updated>2015-07-01T22:23:23Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still illustrates how NgsRelate can be run and '''if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/.''' &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis. The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the maximum likelihood estimate (MLE). The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that boundary values had better likelihood that the values outputted by the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=511</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=511"/>
		<updated>2015-07-01T22:21:18Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. &lt;br /&gt;
&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still illustrates how NgsRelate can be run. &lt;br /&gt;
;And if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis. The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the maximum likelihood estimate (MLE). The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that boundary values had better likelihood that the values outputted by the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=510</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=510"/>
		<updated>2015-07-01T22:07:38Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq -s 1 &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/. Note that we here used the option -s 1 to flip the allele frequencies (set them to 1 minus the frequencies in the freq file).&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis. The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the maximum likelihood estimate (MLE). The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that boundary values had better likelihood that the values outputted by the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=509</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=509"/>
		<updated>2015-07-01T20:37:24Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Output format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis. The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the maximum likelihood estimate (MLE). The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that boundary values had better likelihood that the values outputted by the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=508</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=508"/>
		<updated>2015-07-01T20:21:25Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: http://www.popgen.dk/ida/NgsRelateExampleData/web/input/. &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot; which can be found here: http://www.popgen.dk/ida/NgsRelateExampleData/web/output/.&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis. The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the maximum likelihood estimate (MLE). The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=507</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=507"/>
		<updated>2015-07-01T20:20:20Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* How to download and install */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux or mac system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis. The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the maximum likelihood estimate (MLE). The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=506</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=506"/>
		<updated>2015-07-01T20:19:12Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Output format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis. The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the maximum likelihood estimate (MLE). The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=505</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=505"/>
		<updated>2015-07-01T20:00:50Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Input file format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis. The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the MLE. The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelihoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Help and additional options =&lt;br /&gt;
To get help and a list of all options the program can be run with type&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=504</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=504"/>
		<updated>2015-07-01T19:58:21Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Output format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis. The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the MLE. The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelhoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=503</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=503"/>
		<updated>2015-07-01T19:49:00Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output  with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis . The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the MLE. The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but may also user specified).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelhoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=502</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=502"/>
		<updated>2015-07-01T19:47:40Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz files/hapmap3Hg19LWK.bim LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output  with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis . The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the MLE. The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but may also user specified).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelhoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=501</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=501"/>
		<updated>2015-07-01T19:45:53Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Input format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz files/hapmap3Hg19LWK.bim files/LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output  with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis . The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the MLE. The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but may also user specified).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelhoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=500</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=500"/>
		<updated>2015-07-01T19:37:50Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Input file format */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz files/hapmap3Hg19LWK.bim files/LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Input format=&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelhoods for.&lt;br /&gt;
The genotype likelihood file needs to contain a line for each sites and 3 columns for each individuals (one genotype likelihood for each of the 3 possible genotypes). Furthermore it need to be in binary format as produced by ANGSD. The frequency file needs to contain line for each site with one number in, namely the allele frequency of the site. &lt;br /&gt;
For examples see here:&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output  with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis . The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the MLE. The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but may also user specified).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelhoods for. &lt;br /&gt;
The genotype likelihood file needs to contain a line for each site with 3 values for each individual (one log transformed genotype likelihood for each of the 3 possible genotypes encoded as 'double's) and it needs to be in binary format and gz compressed.&lt;br /&gt;
The frequency file needs to contain a line per site with the allele frequency of the site in it.&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=499</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=499"/>
		<updated>2015-07-01T19:29:22Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Output */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz files/hapmap3Hg19LWK.bim files/LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Input format=&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelhoods for.&lt;br /&gt;
The genotype likelihood file needs to contain a line for each sites and 3 columns for each individuals (one genotype likelihood for each of the 3 possible genotypes). Furthermore it need to be in binary format as produced by ANGSD. The frequency file needs to contain line for each site with one number in, namely the allele frequency of the site. &lt;br /&gt;
For examples see here:&lt;br /&gt;
&lt;br /&gt;
=Output format=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output  with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis . The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the MLE. The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but may also user specified).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
The input files are binary gz compressed, log like ratios encoded as double. 3 values per sample.&lt;br /&gt;
The freq file is allowed to be gz compressed.&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=498</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=498"/>
		<updated>2015-07-01T19:28:52Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Input */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz files/hapmap3Hg19LWK.bim files/LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Input format=&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelhoods for.&lt;br /&gt;
The genotype likelihood file needs to contain a line for each sites and 3 columns for each individuals (one genotype likelihood for each of the 3 possible genotypes). Furthermore it need to be in binary format as produced by ANGSD. The frequency file needs to contain line for each site with one number in, namely the allele frequency of the site. &lt;br /&gt;
For examples see here:&lt;br /&gt;
&lt;br /&gt;
=Output=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output  with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis . The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the MLE. The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but may also user specified).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
The input files are binary gz compressed, log like ratios encoded as double. 3 values per sample.&lt;br /&gt;
The freq file is allowed to be gz compressed.&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=497</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=497"/>
		<updated>2015-07-01T19:28:40Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
./ngsrelate extract_freq angsdput.glf.pos.gz files/hapmap3Hg19LWK.bim files/LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelhoods for.&lt;br /&gt;
The genotype likelihood file needs to contain a line for each sites and 3 columns for each individuals (one genotype likelihood for each of the 3 possible genotypes). Furthermore it need to be in binary format as produced by ANGSD. The frequency file needs to contain line for each site with one number in, namely the allele frequency of the site. &lt;br /&gt;
For examples see here: &lt;br /&gt;
&lt;br /&gt;
=Output=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output  with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis . The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the MLE. The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but may also user specified).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
The input files are binary gz compressed, log like ratios encoded as double. 3 values per sample.&lt;br /&gt;
The freq file is allowed to be gz compressed.&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=496</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=496"/>
		<updated>2015-07-01T19:27:51Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
./angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
./angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
ngsrelate extract_freq angsdput.glf.pos.gz files/hapmap3Hg19LWK.bim files/LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
./ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelhoods for.&lt;br /&gt;
The genotype likelihood file needs to contain a line for each sites and 3 columns for each individuals (one genotype likelihood for each of the 3 possible genotypes). Furthermore it need to be in binary format as produced by ANGSD. The frequency file needs to contain line for each site with one number in, namely the allele frequency of the site. &lt;br /&gt;
For examples see here: &lt;br /&gt;
&lt;br /&gt;
=Output=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output  with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis . The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the MLE. The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but may also user specified).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
The input files are binary gz compressed, log like ratios encoded as double. 3 values per sample.&lt;br /&gt;
The freq file is allowed to be gz compressed.&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=495</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=495"/>
		<updated>2015-07-01T19:27:28Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run example 1: using only NGS data */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
ngsrelate extract_freq angsdput.glf.pos.gz files/hapmap3Hg19LWK.bim files/LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelhoods for.&lt;br /&gt;
The genotype likelihood file needs to contain a line for each sites and 3 columns for each individuals (one genotype likelihood for each of the 3 possible genotypes). Furthermore it need to be in binary format as produced by ANGSD. The frequency file needs to contain line for each site with one number in, namely the allele frequency of the site. &lt;br /&gt;
For examples see here: &lt;br /&gt;
&lt;br /&gt;
=Output=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output  with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis . The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the MLE. The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but may also user specified).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
The input files are binary gz compressed, log like ratios encoded as double. 3 values per sample.&lt;br /&gt;
The freq file is allowed to be gz compressed.&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=494</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=494"/>
		<updated>2015-07-01T19:15:16Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Output */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
ngsrelate extract_freq angsdput.glf.pos.gz files/hapmap3Hg19LWK.bim files/LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Input=&lt;br /&gt;
NgsAdmix takes two files as input: a file with genotype likelihoods and a file with frequencies for the sites there are genotype likelhoods for.&lt;br /&gt;
The genotype likelihood file needs to contain a line for each sites and 3 columns for each individuals (one genotype likelihood for each of the 3 possible genotypes). Furthermore it need to be in binary format as produced by ANGSD. The frequency file needs to contain line for each site with one number in, namely the allele frequency of the site. &lt;br /&gt;
For examples see here: &lt;br /&gt;
&lt;br /&gt;
=Output=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output  with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis . The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the MLE. The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but may also user specified).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
The input files are binary gz compressed, log like ratios encoded as double. 3 values per sample.&lt;br /&gt;
The freq file is allowed to be gz compressed.&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
	<entry>
		<id>https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=493</id>
		<title>NgsRelate</title>
		<link rel="alternate" type="text/html" href="https://www.popgen.dk/software/index.php?title=NgsRelate&amp;diff=493"/>
		<updated>2015-07-01T19:04:05Z</updated>

		<summary type="html">&lt;p&gt;Ida: /* Run examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Brief description=&lt;br /&gt;
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals from low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods instead of called genotypes. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the examples below. For more information about ANGSD see here: http://popgen.dk/angsd/index.php/Quick_Start.&lt;br /&gt;
&lt;br /&gt;
=How to download and install=&lt;br /&gt;
The source code for NgsRelate is deposited on github: https://github.com/ANGSD/NgsRelate. On a linux system with curl and g++ installed NgsRelate can be downloaded and installed as follows:  &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp &amp;gt;NgsRelate.cpp&lt;br /&gt;
g++ NgsRelate.cpp -O3 -lz -o ngsrelate&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=Run examples=&lt;br /&gt;
Below are two examples of how NgsRelate can be used to estimate relatedness from NGS data. To be able to run all steps of the examples you need to have the programs ANGSD and PLINK installed. \n&lt;br /&gt;
Note that you need to download large data files from both HapMap3 and 1000 Genomes webpages. But if you do not want to do that the examples still shows you how NgsRelate can be run. Note also that if you just want to try out NgsRelate without going through all the intermediate steps of generating input data (which takes some time!) you can download the final input data for NgsRelate used in the last command line in example 2 here: . &lt;br /&gt;
&lt;br /&gt;
== Run example 1: using only NGS data==&lt;br /&gt;
Assume we have file containing paths to 100 BAM/CRAM files. Then we can use ANGSD to estimate frequencies and calculate genotype likelihoods while doing SNP calling and in the end produce the the input files needed for the NgsRelate program as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### First we generate a file with allele frequencies (angsdput.mafs.gz) and a file with genotpe likelihoods (angsdput.glf.gz).&lt;br /&gt;
angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3&lt;br /&gt;
&lt;br /&gt;
### Then we extract the frequency column from the allele frequency file and remove the header (to make it in the format NgsRelate needs)&lt;br /&gt;
cut -f5 angsdput.mafs.gz |sed 1d &amp;gt;freq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Once we have these files we can use NgsRelate to estimate relatedness between any pairs of individuals. E.g. if we want to estimate relatedness between the first two individuals (0 and 1) we can do it using the following command: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 &amp;gt;gl.res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Here we specify the name of our file with genotype likelihoods after the option &amp;quot;-g&amp;quot;, the number of individuals in the file after the option &amp;quot;-n&amp;quot;, the name of the file with allele frequencies after the option &amp;quot;-f&amp;quot; and the number of the two individuals after the options &amp;quot;-a&amp;quot; and &amp;quot;-b&amp;quot; . If -a and -b are not specified NgsRelate will loop through all pairs of individuals in the input file.&lt;br /&gt;
&lt;br /&gt;
== Run example 2: using NGS data with population frequencies estimated from genetic data from PLINK files ==&lt;br /&gt;
In this example we show how you can estimate relatedness between a number of individuals which you have NGS data from (in bam files) using genetic data from PLINK files for frequency estimation. &lt;br /&gt;
&lt;br /&gt;
Assume the individuals we want to estimate relatedness from are from the population called LWK and assume we have files with genetic data from individuals from LWK as well as other populations in binary PLINK format (e.g. hapmap3_r2_b36_fwd.consensus.qc.polyHg19.*) and a file, LWK.fam, with the IDs of the LWK individuals in this dataset. Then using PLINK we can produce allele frequency information in a format that NgsRelate can use as follows: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract individuals from LWK from huge binary plink file&lt;br /&gt;
plink --bfile hapmap3_r2_b36_fwd.consensus.qc.polyHg19 --keep LWK.fam  --make-bed --out hapmap3Hg19LWK --noweb&lt;br /&gt;
&lt;br /&gt;
### calculate frequencies for this population&lt;br /&gt;
plink --bfile  hapmap3Hg19LWK --freq --noweb --out LWKsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Afterwards we can use ANGSD to calculate genotype likelihoods for the sites for which we have frequency in     &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the chr,pos,major,minor information about the sites we have frequency info from into a file &lt;br /&gt;
### (so we can extract data from these sites from the NGS data files) &lt;br /&gt;
cut -f1,4-6  hapmap3Hg19LWK.bim &amp;gt;forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### index this file for angsd&lt;br /&gt;
angsd sites index forAngsd.txt&lt;br /&gt;
&lt;br /&gt;
### calculate genotype likelihoods for the six individuals for the sites we have frequency info on based on the bam files &lt;br /&gt;
### (assuming the paths to the bam files are listed in the file 'list'):&lt;br /&gt;
angsd -gl 1 -doglf 3 -sites forAngsd.txt -b list -domajorminor 3 -P 2 -minMapQ 30 -minQ 20&lt;br /&gt;
### this generates the output files angsdput.glf.gz and a angsdput.glf.pos.gz.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Finally we can use NgsRelate to estimate relatedness for the six individuals from which we have NGS data in bam files:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
### extract the frequencies and sync it to the angsd output&lt;br /&gt;
ngsrelate extract_freq angsdput.glf.pos.gz files/hapmap3Hg19LWK.bim files/LWKsub.frq &amp;gt;freq&lt;br /&gt;
&lt;br /&gt;
### run ngsrelate &lt;br /&gt;
ngsrelate  -g angsdput.glf.gz -n 6 -f freq &amp;gt;res&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The final relatedness estimates will then be available in the file called &amp;quot;res&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=Output=&lt;br /&gt;
Example of output of with two samples&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example of output  with 6 samples:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat res&lt;br /&gt;
Pair	k0	k1	k2	loglh	nIter	coverage&lt;br /&gt;
(0,1)	0.675337	0.322079	0.002584	-1710946.832375	10	0.813930&lt;br /&gt;
(0,2)	0.458841	0.526377	0.014782	-1666215.528333	10	0.808822&lt;br /&gt;
(0,3)	1.000000	0.000000	0.000000	-1743992.363193	-1	0.816266&lt;br /&gt;
(0,4)	1.000000	0.000000	0.000000	-1759202.971213	-1	0.818856&lt;br /&gt;
(0,5)	1.000000	0.000000	0.000000	-1550475.615322	-1	0.752663&lt;br /&gt;
(1,2)	0.007111	0.991020	0.001868	-1580995.130867	10	0.806912&lt;br /&gt;
(1,3)	1.000000	0.000000	0.000000	-1728859.988212	-1	0.814272&lt;br /&gt;
(1,4)	1.000001	-0.000001	0.000000	-1744055.203870	9	0.816887&lt;br /&gt;
(1,5)	1.000000	0.000000	0.000000	-1536858.187440	-1	0.750917&lt;br /&gt;
(2,3)	1.000000	0.000000	0.000000	-1705157.832621	-1	0.809297&lt;br /&gt;
(2,4)	1.000000	0.000000	0.000000	-1719681.338365	-1	0.811804&lt;br /&gt;
(2,5)	1.000000	0.000000	0.000000	-1517388.260612	-1	0.746903&lt;br /&gt;
(3,4)	0.547602	0.439423	0.012975	-1743899.789842	10	0.819276&lt;br /&gt;
(3,5)	0.265819	0.482953	0.251228	-1467343.087647	10	0.754637&lt;br /&gt;
(4,5)	0.004655	0.995345	-0.000000	-1473415.049411	8	0.755734&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first column contain the individuals that was used for the analysis . The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the MLE. The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but may also user specified).&lt;br /&gt;
&lt;br /&gt;
= Input file format =&lt;br /&gt;
The input files are binary gz compressed, log like ratios encoded as double. 3 values per sample.&lt;br /&gt;
The freq file is allowed to be gz compressed.&lt;br /&gt;
&lt;br /&gt;
= Citing and references =&lt;br /&gt;
&lt;br /&gt;
= Changelog =&lt;br /&gt;
See github for log&lt;/div&gt;</summary>
		<author><name>Ida</name></author>
	</entry>
</feed>