IBSrelate

From software
Jump to navigation Jump to search

This page contains information about the method IBSrelate, a method to identify relatives without requiring population allele frequencies. Here we show you how to estimate the R0, R1 and KING-robust kinship statistics for a pair (or more!) of individuals from aligned sequencing data. These statistics are informative about relatedness, but can also be useful for quality-control (QC). For details please see our paper in Molecular Ecology at: https://doi.org/10.1111/mec.14954


Calculating statistics from the output of IBS and realSFS

IBS and realSFS are two methods implemented in ANGSD [1] that can be used to estimate the allele sharing "genotype distribution" for a pair of individuals. The paper describes and examines the differences between the two methods, but we expect they both will perform comparably well in most applications. Below are links to two R scripts that can be used to load the output of IBS and realSFS and produce estimates of R0, R1 and KING-robust kinship.

https://github.com/rwaples/freqfree_suppl/blob/master/read_IBS.R

https://github.com/rwaples/freqfree_suppl/blob/master/read_realSFS.R

Demonstration of the IBS and realSFS methods on the angsd example data

available in a jupyter notebook here: https://nbviewer.jupyter.org/github/rwaples/freqfree_suppl/blob/master/example_data.ipynb


Example based on the 1000 Genomes data used in the paper

{ANGSD} = path to ANGSD executable
{IBS} = path to IBS executable (found at misc/ibs relative to ANGSD installation)
{realSFS} = path to realSFS executable (found at misc/realSFS relative to ANGSD installation)
{CHR} = name of chromosome (for the realSFS analysis, make sure it matches the name in the consensus fasta)

realSFS method

make a consensus sequence (fasta) from one of the individuals

Here the *.list file contains paths to the bam files for NA19042. A separate consensus should be created for each chromosome. This step is optional, the reference sequence used for alignment can also be used.

{ANGSD} -b ./data/1000G_aln/NA19042.mapped.ILLUMINA.bwa.LWK.low_coverage.20130415.list \
-r {CHR} -minMapQ 30 -minQ 20 -setMinDepth 3 -doFasta 2 -doCounts 1 -out ./data/consensus.NA19042.chr{CHR}

make *.saf files

  • .saf files are needed for each chromosome within each individual.

The *.list file contains paths to the bam files for NA19027. The file GEM_mappability1_75mer.angsd gives the sites passing the GEM mappability filter in a bed-like format, as required by ANGSD (see here: [2])

{ANGSD} -b ./data/1000G_aln/NA19027.mapped.ILLUMINA.bwa.LWK.low_coverage.20130415.list \
-r {CHR} \
-ref ./data/1000G_aln/hs37d5.fa \
-anc ./data/consensus.NA19042.chr{CHR}.fa.gz  \
-sites ./data/1000G_aln/GEM_mappability1_75mer.angsd \
-minMapQ 30 -minQ 20 -GL 2 \
-doSaf 1 -doDepth 1 -doCounts 1 \
-out ./data/1000G_aln/saf/chromosomes/NA19027_chr{CHR}

run realSFS for each pair of individuals

This will generate a 2-dimensional site-frequency spectrum. The command below runs realSFS for NA19042 and NA19027. Run for each chromosome for each pair of individuals.

{realSFS} ./data/1000G_aln/saf/chromosomes/NA19042_chr{CHR}.saf.idx ./data/1000G_aln/saf/chromosomes/NA19027_chr{CHR}.saf.idx -r {CHR} -P 2 -tole 1e-10 > ./data/1000G_aln/saf/chromosomes/NA19042_NA19027_chr{CHR}.2dsfs

Use the above R script (read_realSFS.R) to interpret the output for each pair of individuals

IBS method

make a genotype likelihood file

The file bamlist.all.txt contains paths to the bam files for each individual, one per individual. The file GEM_mappability1_75mer.angsd gives the sites passing the GEM mappability filter in a bed-like format, as required by ANGSD (see here: [3]) The output will contain genotype likelihoods for each individual at each site (*.glf.gz). Run for each chromosome.

{ANGSD} -b ./data/1000G_aln/bamlist.all.txt \
-r {CHR} \
-sites ./data/1000G_aln/GEM_mappability1_75mer.angsd \
-minMapQ 30 -minQ 20 -GL 2 \
-doGlf 1 \
-out ./data/1000G_aln/GLF/chromosomes/chr{CHR}

run IBS

Here there are 5 individuals in the glf file (-nInd 5), and we want to evaluate at each pair (-allpairs 1), using IBS model 0 (-model 0).

{IBS} -glf ./data/1000G_aln/GLF/chromosomes/chr{CHR}.glf.gz \
-seed {CHR} -maxSites 300000000 -model 0 \
-nInd 5 -allpairs 1 \
-outFileName ./data/1000G_aln/GLF/chromosomes/chr{CHR}.model0

Use the above R script (read_IBS.R) to interpret the output of IBS for each pair of individuals

Citation

Waples, R. K., Albrechtsen, A. and Moltke, I. (2018), Allele frequency‐free inference of close familial relationships from genotypes or low depth sequencing data. Mol Ecol. doi:10.1111/mec.14954


Bibtex

@article{doi:10.1111/mec.14954,
author = {Waples, Ryan K and Albrechtsen, Anders and Moltke, Ida},
title = {Allele frequency-free inference of close familial relationships from genotypes or low depth sequencing data},
journal = {Molecular Ecology},
volume = {0},
number = {ja},
pages = {},
doi = {10.1111/mec.14954},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/mec.14954},
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1111/mec.14954},
}