IBSrelate: Difference between revisions

Revision as of 14:33, 20 May 2019

Overview

This page contains information about the method IBSrelate, a method to identify relatives without requiring population allele frequencies. Here we show you how to estimate the R0, R1 and KING-robust kinship statistics for a pair (or more!) of individuals from aligned sequencing data. These statistics are informative about relatedness, but can also be useful for quality-control (QC). For details please see our paper in Molecular Ecology at: https://doi.org/10.1111/mec.14954

Calculating statistics from the output of IBS and realSFS

IBS and realSFS are two methods implemented in ANGSD [1] that can be used to estimate the allele sharing "genotype distribution" for a pair of individuals. The paper describes and examines the differences between the two methods, but we expect they both will perform comparably well in most applications. Below are links to two R scripts that can be used to load the output of IBS and realSFS and produce estimates of R0, R1 and KING-robust kinship.

https://github.com/rwaples/freqfree_suppl/blob/master/read_IBS.R

https://github.com/rwaples/freqfree_suppl/blob/master/read_realSFS.R

Example Usage

realSFS method

make a consensus sequence (fasta) from one of the individuals

Here the .list file contains a list of the bam files for the individual. Create a separate consensus for each chromosome. This step is optional, you could also use reference sequence the data is aligned to.

{ANGSD} -b ./data/1000G_aln/NA19042.mapped.ILLUMINA.bwa.LWK.low_coverage.20130415.list \
-r {CHR} -minMapQ 30 -minQ 20 -setMinDepth 3 -doFasta 2 -doCounts 1 -out ./data/consensus.NA19042.chr{CHR}

make *.saf files

Run for each chromosome within each individual. Here the .list file contains a list of the bam files for an individual.

{ANGSD} -b ./data/1000G_aln/NA19027.mapped.ILLUMINA.bwa.LWK.low_coverage.20130415.list \
-r {CHR} \
-ref ./data/1000G_aln/hs37d5.fa \
-anc ./data/consensus.NA19042.chr{CHR}.fa.gz  \
-sites ./data/1000G_aln/GEM_mappability1_75mer.angsd \
-minMapQ 30 -minQ 20 -GL 2 \
-doSaf 1 -doDepth 1 -doCounts 1 \
-out ./data/1000G_aln/saf/chromosomes/NA19027_chr{CHR}

run realSFS for each pair of individuals

{realSFS} ./data/1000G_aln/saf/chromosomes/NA19042_chr{CHR}.saf.idx ./data/1000G_aln/saf/chromosomes/NA19027_chr{CHR}.saf.idx -r {CHR} -P 2 -tole 1e-10 > ./data/1000G_aln/saf/chromosomes/NA19042_NA19027_chr{CHR}.2dsfs

IBS method

make genotype likelihood file

{ANGSD} -b ./data/1000G_aln/bamlist.all.txt \ -r {CHR} \ -sites ./data/1000G_aln/GEM_mappability1_75mer.angsd \ -minMapQ 30 -minQ 20 -GL 2 \ -doGlf 1 \ -out ./data/1000G_aln/GLF/chromosomes/chr{CHR}

IBS

{IBS} -glf ./data/1000G_aln/GLF/chromosomes/chr{CHR}.glf.gz \ -seed {CHR} -maxSites 300000000 -model 0 \ -nInd 5 -allpairs 1 \ -outFileName ./data/1000G_aln/GLF/chromosomes/chr{CHR}.model0

Use the above R scripts to interpret the output of IBS and realSFS for each pair of individuals

Citation

Waples, R. K., Albrechtsen, A. and Moltke, I. (2018), Allele frequency‐free inference of close familial relationships from genotypes or low depth sequencing data. Mol Ecol. doi:10.1111/mec.14954

Bibtex

@article{doi:10.1111/mec.14954,
author = {Waples, Ryan K and Albrechtsen, Anders and Moltke, Ida},
title = {Allele frequency-free inference of close familial relationships from genotypes or low depth sequencing data},
journal = {Molecular Ecology},
volume = {0},
number = {ja},
pages = {},
doi = {10.1111/mec.14954},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/mec.14954},
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1111/mec.14954},
}

@@ Line 1: / Line 1: @@
-==Overview==
+== Overview ==
 This page contains information about the method '''IBSrelate''', a method to identify relatives without requiring population allele frequencies.
 Here we show you how to estimate the R0, R1 and KING-robust kinship statistics for a pair (or more!) of individuals from aligned sequencing data.  These statistics are informative about relatedness, but can also be useful for quality-control (QC). For details please see our paper in Molecular Ecology at: https://doi.org/10.1111/mec.14954
-==Calculating statistics from the output of IBS and realSFS ==
+== Calculating statistics from the output of IBS and realSFS ==
 '''IBS''' and '''realSFS''' are two methods implemented in ANGSD [http://www.popgen.dk/angsd/index.php/ANGSD] that can be used to estimate the allele sharing "genotype distribution" for a pair of individuals.  The paper describes and examines the differences between the two methods, but we expect they both will perform comparably well in most applications.  Below are links to two R scripts that can be used to load the output of '''IBS''' and '''realSFS''' and produce estimates of '''R0''', '''R1''' and '''KING-robust kinship'''.
@@ Line 11: / Line 11: @@
 https://github.com/rwaples/freqfree_suppl/blob/master/read_realSFS.R
-==Example Usage==
+== Example Usage ==
-<pre># make consensus - needed to make saf files
-{ANGSD} -b ./data/1000G_aln/NA19042.mapped.ILLUMINA.bwa.LWK.low_coverage.20130415.list \
--r {CHR} -minMapQ 30 -minQ 20 -setMinDepth 3 -doFasta 2 -doCounts 1 -out ./data/consensus.NA19042.chr{CHR}
-# make *.saf files (per individual)
+=== realSFS method ===
-{ANGSD} -b ./data/1000G_aln/NA19027.mapped.ILLUMINA.bwa.LWK.low_coverage.20130415.list \
+==== make a consensus sequence (fasta) from one of the individuals ====
+Here the .list file contains a list of the bam files for the individual. Create a separate consensus for each chromosome. This step is optional, you could also use reference sequence the data is aligned to.
+<pre>{ANGSD} -b ./data/1000G_aln/NA19042.mapped.ILLUMINA.bwa.LWK.low_coverage.20130415.list \
+-r {CHR} -minMapQ 30 -minQ 20 -setMinDepth 3 -doFasta 2 -doCounts 1 -out ./data/consensus.NA19042.chr{CHR}</pre>
+==== make *.saf files ====
+Run for each chromosome within each individual.
+Here the .list file contains a list of the bam files for an individual.
+<pre>{ANGSD} -b ./data/1000G_aln/NA19027.mapped.ILLUMINA.bwa.LWK.low_coverage.20130415.list \
 -r {CHR} \
 -ref ./data/1000G_aln/hs37d5.fa \
@@ Line 24: / Line 29: @@
 -minMapQ 30 -minQ 20 -GL 2 \
 -doSaf 1 -doDepth 1 -doCounts 1 \
--out ./data/1000G_aln/saf/chromosomes/NA19027_chr{CHR}
+-out ./data/1000G_aln/saf/chromosomes/NA19027_chr{CHR}</pre>
-# realSFS for each pair of individuals
+==== run realSFS for each pair of individuals====
-{realSFS} ./data/1000G_aln/saf/chromosomes/NA19042_chr{CHR}.saf.idx ./data/1000G_aln/saf/chromosomes/NA19027_chr{CHR}.saf.idx -r {CHR} -P 2 -tole 1e-10 > ./data/1000G_aln/saf/chromosomes/NA19042_NA19027_chr{CHR}.2dsfs
+<pre>{realSFS} ./data/1000G_aln/saf/chromosomes/NA19042_chr{CHR}.saf.idx ./data/1000G_aln/saf/chromosomes/NA19027_chr{CHR}.saf.idx -r {CHR} -P 2 -tole 1e-10 > ./data/1000G_aln/saf/chromosomes/NA19042_NA19027_chr{CHR}.2dsfs</pre>
+=== IBS method===
 # make genotype likelihood file
 {ANGSD} -b ./data/1000G_aln/bamlist.all.txt \