ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Major Minor: Difference between revisions

From angsd
Jump to navigation Jump to search
No edit summary
 
(32 intermediate revisions by 2 users not shown)
Line 1: Line 1:
We allow the major and minor to be determined from either the counts of nucleotides, based on genotype likelihoods or specified by the ancestral/reference.
Many method assume that polymorphic sites are diallelic. For these methods one needs to define what is the major and minor allele. We allow the major and minor to be determined from either the counts of nucleotides, based on genotype likelihoods, specified by the ancestral/reference or even force both major minor to specific [[Sites|bases]], which can be useful if you compare with HapMap data etc.


NB version 505 or higher is required for doMajorMinor 4 and doMajorMinor 5.
=Brief Overview=
<pre>
-> angsd version: 0.910-19-g8b9b43a-dirty (htslib: 1.2.1-251-g2072527) build(Dec  4 2015 11:37:02)
-> Analysis helpbox/synopsis information:
-> Command:
./angsd -domajorminor -> Fri Dec  4 13:56:10 2015
-------------------
abcMajorMinor.cpp:
-doMajorMinor 0
1: Infer major and minor from GL
2: Infer major and minor from allele counts
3: use major and minor from a file (requires -sites file.txt)
4: Use reference allele as major (requires -ref)
5: Use ancestral allele as major (requires -anc)
-rmTrans: remove transitions 0
-skipTriallelic 0
</pre>


==arguments==
=Details=
; -doMajorMinor 1 (major and minor determined from GL)
==From genotype likelihood data==
; -doMajorMinor 2 (major and minor determined from counts of nucs)
; -doMajorMinor 1
; -doMajorMinor 3 (major and minor determined from filter list)
; -doMajorMinor 4 (major is reference (minor from GL))
; -doMajorMinor 5 (major is ancestral (minor from GL))


=Inferring Major and Minor alleles=
From input for either sequencing data like bam files or from genotype likelihood data like glfv3 the major and minor allele can be inferred directly from likelihoods. We use a maximum likelihood approach to choose the major and minor alleles. Details of the method can be found in the theory section of this page and for citation use this publication [[Skotte2012]] and is briefly described [[MajorMinor_Method|here]].


The inference method is chosen based on the data input.
==From counts of data==
===From alignment data===
; -doMajorMinor 2
; -doMajorMinor 2
; -doCount 1
If you input sequencing data like the bam format you can choose to infer the major and minor allele by picking the two most frequently observed bases across individuals. This is the approach from here: [[Li2010|citation]].
If you input sequencing data like the bam format you can choose to infer the major and minor allele by picking the two most frequently observed bases across individuals. This is the approach from here: [[Li2010|citation]]. To use this appraoch choose


===From genotype likelihood data===
; -doMajorMinor 1


From input for either sequencing data like bam files or from genotype likelihood data like glfv3 the major and minor allele can be inferred directly from likelihoods. We use a maximum likelihood approach to choose the major and minor alleles. Details of the method can be found [[majorminor|here]] and for citation use this publication [[Skotte2012]].
==Pre specified Major and Minor==
Using the [[Sites|-sites]] option the major and minor allele can be predefined for the desired sites. The is very useful when comparing with other data sources e.g. SNP chips where the major and minor allele is known.
; -doMajorMinor 3
; -sites [filename]
 
==Pre specified Major using a reference==
You can force the major allele according to the reference states if you have defined those '''-ref'''. The minor allele will be inferred based on the genotype likelihood (see do major minor 1). This is the approach used by both GATK and Samtools
; -doMajorMinor 4
; -ref [fasta.fa]


===From genotype probability data===
==Pre specified Major using the ancestral state==
; -doMajorMinor 3
You can force the major allelel according to your ancestral states if you have defined those '''-anc'''. The minor allele will be inferred based on the genotype likelihood (see do major minor 1)
Currently only genotype probability data in beagle output format is allowed. This format already contains information for the major and minor allele.
; -doMajorMinor 5
; -anc [fasta.fa]

Latest revision as of 14:56, 4 December 2015

Many method assume that polymorphic sites are diallelic. For these methods one needs to define what is the major and minor allele. We allow the major and minor to be determined from either the counts of nucleotides, based on genotype likelihoods, specified by the ancestral/reference or even force both major minor to specific bases, which can be useful if you compare with HapMap data etc.

Brief Overview

	-> angsd version: 0.910-19-g8b9b43a-dirty (htslib: 1.2.1-251-g2072527) build(Dec  4 2015 11:37:02)
	-> Analysis helpbox/synopsis information:
	-> Command: 
./angsd -domajorminor 	-> Fri Dec  4 13:56:10 2015
-------------------
abcMajorMinor.cpp:
	-doMajorMinor	0
	1: Infer major and minor from GL
	2: Infer major and minor from allele counts
	3: use major and minor from a file (requires -sites file.txt)
	4: Use reference allele as major (requires -ref)
	5: Use ancestral allele as major (requires -anc)
	-rmTrans: remove transitions 0
	-skipTriallelic	0

Details

From genotype likelihood data

-doMajorMinor 1

From input for either sequencing data like bam files or from genotype likelihood data like glfv3 the major and minor allele can be inferred directly from likelihoods. We use a maximum likelihood approach to choose the major and minor alleles. Details of the method can be found in the theory section of this page and for citation use this publication Skotte2012 and is briefly described here.

From counts of data

-doMajorMinor 2

If you input sequencing data like the bam format you can choose to infer the major and minor allele by picking the two most frequently observed bases across individuals. This is the approach from here: citation.


Pre specified Major and Minor

Using the -sites option the major and minor allele can be predefined for the desired sites. The is very useful when comparing with other data sources e.g. SNP chips where the major and minor allele is known.

-doMajorMinor 3
-sites [filename]

Pre specified Major using a reference

You can force the major allele according to the reference states if you have defined those -ref. The minor allele will be inferred based on the genotype likelihood (see do major minor 1). This is the approach used by both GATK and Samtools

-doMajorMinor 4
-ref [fasta.fa]

Pre specified Major using the ancestral state

You can force the major allelel according to your ancestral states if you have defined those -anc. The minor allele will be inferred based on the genotype likelihood (see do major minor 1)

-doMajorMinor 5
-anc [fasta.fa]