AsaMap: Difference between revisions

From software
Jump to navigation Jump to search
(Replaced content with "The program is available and described on github: https://github.com/e-jorsboe/asaMap")
Tag: Replaced
 
Line 1: Line 1:
This page contains information about the program '''asaMap''', a tool for doing ancestry specific assocaition mapping for large scales genetic studies. It is based on called genotypes in the binary plink format (.bed). The program is written in C++.
The program is available and described on github:
 
=Download=
 
The program can be downloaded from github:


https://github.com/e-jorsboe/asaMap
https://github.com/e-jorsboe/asaMap
<pre>
git clone https://github.com/e-jorsboe/asaMap.git;
cd asaMap
make
</pre>
So far it has only been tested on Linux systems. Use curl if you are on a MAC.
=Example=
To be added...
=Input Files=
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where '''.Q and .P files''' respectively can be given directly to asaMap.
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:
<pre>
-0.712027291121767
-0.158413122435864
-1.77167888612947
-0.800940619551485
0.3016297021294
...
</pre>
A covarite file can also be provided, where each column is a covariate and each row is an individual - '''should NOT have columns of 1s for intercept (intercept will be included automatically)'''. This file has to have same number of rows as phenotype file and .fam file.
<pre>
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168
...
</pre>
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:
<pre>
#run admixture
admixture plinkFile.bed 2
#run asaMap with admix proportions
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P
</pre>
This produces a out.log logfile and a out.res with results for each site (after filtering).
=Running asaMap=
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:
<pre>
#run admixture
admixture plinkFile.bed 2
#run asaMap with admix proportions
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P
</pre>
This produces a '''out.log''' logfile and a '''out.res''' with results for each site (after filtering).
A whole list of options can be explored by running asaMap without any input:
<pre>
./asaMap
</pre>
'''Must be specified:'''
; -p <filename>     
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.
; -o <filename>     
Output filename - a .res file will be written with the results and a .log log file.
; -y <filename>     
Phenotypes file, has to be plain text file - with as many rows as .fam file.
; -Q <filename> (either -a or -Q)     
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.
; -a <filename> (either -a or -Q)     
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.
; -f <filename>     
Allele frequencies, .P file from ADMIXTURE.
'''Optional:'''
; -c <filename>     
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!
; -m <INT>       
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).
; -l <INT>       
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)
; -b <filename>     
Text file containing a starting guess of the estimated coefficients.
; -i <INT>     
The maximum number of iterations to run for the EM algorithm (default: 80).
; -t <FLOAT>         
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).
; -r <INT>         
Give seed, for generation of starting values of coefficients.
; -P <INT>           
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by "-o".
; -e <INT>           
Estimate standard error of coefficients (0: no, 1: yes - default: 0).
; -w <INT>           
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)
=Outputs=
A '''.res''' file with the likelihoods of each model and the estimated coefficients in each model is produced, here for the additive:
<pre>
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)    b2(M1)    b1(M2)    b2(M3)    b(M4)
1      9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018  -0.166907  -0.053931  0.047357  0.020093
1      10684283  1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084
1      11247763  1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007  0.045277  -0.030582  -0.016838
...
</pre>
For the recessive model it looks like this:
<pre>
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)    b2(R1)    bm(R1)    b1(R2)    b2m(R2)    b1m(R3)    b2(R3)    b1(R4)    b2(R5)    b(R6)
1      9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373  -2.082935  -0.027433  0.016608  -0.582318  0.004700  -2.083112  -0.046849  -2.083275  -0.259338
1      10684283  1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317
1      11247763  1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419  -0.047597  -0.004021  0.068119  -0.019760  0.042905  -0.078669  0.060373  -0.018537  0.029227
...
</pre>
P-values can be generated doing a likelihood ratio test, between the 2 desired models.
An Rscript '''getPvalues.R''' is provided that makes it easy to obtain P-values from the '''.res''' file:
<pre>
Rscript R/getPvalues.R out.res
</pre>
Which produces a file with the suffix '''.Pvalues''':
<pre>
Chromo  Position  nInd  f1        f2        M0vM1                M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5
1      9855422    1237  0.935997  0.537511  0.630338505521655    0.40636967666779  0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713
1      10684283  1217  0.99999  0.509715  NA                    NA                NA                NA                NA                NA                0.163577574260359  0.275437296874114
1      11247763  1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275
...
</pre>
=Models=
asaMap implements a range of linear models, making it possible to test specific hypotheses.
For the additive model there are 6 different models:
{| class="wikitable"
|-
! scope="col"| Model
! scope="col"| Parameters
! scope="col"| Notes
! scope="col"| Effect Parameters
|-
| M0
| (beta_1, beta_2, delta_1) in R^3
| effect of non-assumed effect allele
| 1
|-
| M1
| (beta_1, beta_2) in R^2
| population specific effects
| 2
|-
| M2
| beta_1=0, beta_2 in R
| no effect in population 1
| 1
|-
| M3
| beta_1 in R, beta_2=0
| no effect in population 2
| 1
|-
| M4
| beta_1=beta_2 in R
| same effect in both populations
| 1
|-
| M5
| beta_1=beta_2=0
| no effect in any population
| 0
|}
For the recessive model there are 8 different models:
{| class="wikitable"
|-
! scope="col"| Model
! scope="col"| Parameters
! scope="col"| Notes
! scope="col"| Effect Parameters
|-
| R0
| (beta_1, beta_m, beta_2, delta_1, delta_2) in R^5
| recessive effect of non-assumed effect alleles
| 2
|-
| R1
| (beta_1, beta_m, beta_2) in R^3
| population specific effects
| 3
|-
| R2
| beta_1 in R, beta_m=beta_2 in R
| same effect when one or both variant alleles are from pop 2
| 2
|-
| R3
| beta_1=beta_m in R, beta_2 in R
| same effect when one or both variant alleles are from pop 1
| 2
|-
| R4
| beta_1 in R, beta_m=beta_2=0
| only an effect when both variant alleles are from pop 1
| 1
|-
| R5
| beta_1=beta_m=0, beta_2 in R
| only an effect when both variant alleles are from pop 2
| 1
|-
| R6
| beta_1=beta_m=beta_2 in R
| same effect regardless of ancestry
| 1
|-
| R7
| beta_1=beta_m=beta_2=0
| no effect in any population
| 0
|}
'''beta_1''' and '''beta_2''' are the effect of the assumed effect-allele in population 1 and 2 respectively. '''beta_m''' is the recessive effect of being recessive for an allele with one copy from population 1 and one copy from population 2. '''delta_1''' and '''delta_2''' are the effect of the assumed non-effect-allele in population 1 and 2 respectively.
=Citation=

Latest revision as of 09:32, 24 March 2026

The program is available and described on github:

https://github.com/e-jorsboe/asaMap