NgsRelate: Difference between revisions

From software
Jump to navigation Jump to search
No edit summary
 
(118 intermediate revisions by 2 users not shown)
Line 1: Line 1:
=Brief description=
= NEW VERSION =  
This page contains information about the program called NgsRelate, which can be used to infer relatedness coefficients for pairs of individuals for low coverage Next Generation Sequencing (NGS) data by using genotype likelihoods. To be able to infer the relatedness you will need to know the population frequencies and have genotype likelihoods. This can be obtained e.g. using the program ANGSD as shown in the example below.
For the NEW version of ngsRelate that coestimates relatedness and inbreeding go to this link https://github.com/ANGSD/NgsRelate


=Download and Installation=
Primary repository is github. https://github.com/ANGSD/NgsRelate
<pre>
curl https://raw.githubusercontent.com/ANGSD/NgsRelate/master/NgsRelate.cpp >NgsRelate.cpp
g++ NgsRelate.cpp -O3 -lz -o ngsrelate
</pre>


= Run example using only NGS data=
= OLD VERSION =  
Assume we have file containing paths to 100 BAM/CRAM files, then we can use ANGSD to estimate frequencies calculate genotype likelihoods while doing SNP calling and dumping the input files needed for the NgsRelate program
For the old version please use this link: http://www.popgen.dk/software/index.php?title=NgsRelate&oldid=694
<pre>
./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3
#this generates an angsdput.mafs.gz and a angsdput.glf.gz.
#we will need to extract the frequency column from the mafs file and remove the header
cut -f5 angsdput.mafs.gz |sed 1d >freq
./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 >gl.res
</pre>
Here we specify that our binary genotype likelihood file contains 100 samples, and that we want to run the analysis for the first two samples -a 0 -b 1.
If no -a and -b are specified it will loop through all pairs
 
=Output=
Example of output
<pre>
Pair k0 k1 k2 loglh nIter coverage
(0,1) 0.673213 0.326774 0.000013 -1710940.769941 19 0.814658
</pre>
 
 
The first column contain the individuals that was used for the analysis . The next three columns are the estimated relatedness coefficient. The fifth column is the log of the likelihood of the MLE. The sixth column is the number of iterations required to find the MLE, and finally the seventh column is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and the minor allele frequency (MAF) above the threshold (default is 0.05 but may also user specified).
 
= Input file format =
The input files are binary gz compressed, log like ratios encoded as double. 3 values per sample.
The freq file is allowed to be gz compressed.
 
= Citing and references =
 
= Changelog =
See github for log

Latest revision as of 15:59, 5 October 2018

NEW VERSION

For the NEW version of ngsRelate that coestimates relatedness and inbreeding go to this link https://github.com/ANGSD/NgsRelate


OLD VERSION

For the old version please use this link: http://www.popgen.dk/software/index.php?title=NgsRelate&oldid=694