AsaMap

From software
Revision as of 10:08, 23 March 2019 by Emil (talk | contribs) (→‎Example)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page contains information about the program asaMap, a tool for doing ancestry specific assocaition mapping for large scales genetic studies. It is based on called genotypes in the binary plink format (.bed). The program is written in C++.

Download

The program can be downloaded from github:

https://github.com/e-jorsboe/asaMap

git clone https://github.com/e-jorsboe/asaMap.git;
cd asaMap 
make

So far it has only been tested on Linux systems. Use curl if you are on a MAC.

Example

To be added...

Input Files

Input files are called genotypes in the binary plink files (*.bed) format [1]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies ADMIXTURE, can be used, where .Q and .P files respectively can be given directly to asaMap.


A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:

-0.712027291121767
-0.158413122435864
-1.77167888612947
-0.800940619551485
0.3016297021294
...

A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.

0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168
...

Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:

#run admixture
admixture plinkFile.bed 2

#run asaMap with admix proportions
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P

This produces a out.log logfile and a out.res with results for each site (after filtering).

Running asaMap

Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:

#run admixture
admixture plinkFile.bed 2

#run asaMap with admix proportions
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P

This produces a out.log logfile and a out.res with results for each site (after filtering).


A whole list of options can be explored by running asaMap without any input:

./asaMap


Must be specified:

-p <filename>

Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.

-o <filename>

Output filename - a .res file will be written with the results and a .log log file.

-y <filename>

Phenotypes file, has to be plain text file - with as many rows as .fam file.

-Q <filename> (either -a or -Q)

Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.

-a <filename> (either -a or -Q)

Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.

-f <filename>

Allele frequencies, .P file from ADMIXTURE.


Optional:

-c <filename>

Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!

-m <INT>

Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).

-l <INT>

Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)

-b <filename>

Text file containing a starting guess of the estimated coefficients.

-i <INT>

The maximum number of iterations to run for the EM algorithm (default: 80).

-t <FLOAT>

Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).

-r <INT>

Give seed, for generation of starting values of coefficients.

-P <INT>

Number of threads to be used for analysis. Each thread will write to temporary file in path specified by "-o".

-e <INT>

Estimate standard error of coefficients (0: no, 1: yes - default: 0).

-w <INT>

Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)

Outputs

A .res file with the likelihoods of each model and the estimated coefficients in each model is produced, here for the additive:


Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838
...


For the recessive model it looks like this:


Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227
...


P-values can be generated doing a likelihood ratio test, between the 2 desired models. An Rscript getPvalues.R is provided that makes it easy to obtain P-values from the .res file:


Rscript R/getPvalues.R out.res

Which produces a file with the suffix .Pvalues:


Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275
...

Models

asaMap implements a range of linear models, making it possible to test specific hypotheses. For the additive model there are 6 different models:

Model Parameters Notes Effect Parameters
M0 (beta_1, beta_2, delta_1) in R^3 effect of non-assumed effect allele 1
M1 (beta_1, beta_2) in R^2 population specific effects 2
M2 beta_1=0, beta_2 in R no effect in population 1 1
M3 beta_1 in R, beta_2=0 no effect in population 2 1
M4 beta_1=beta_2 in R same effect in both populations 1
M5 beta_1=beta_2=0 no effect in any population 0

For the recessive model there are 8 different models:

Model Parameters Notes Effect Parameters
R0 (beta_1, beta_m, beta_2, delta_1, delta_2) in R^5 recessive effect of non-assumed effect alleles 2
R1 (beta_1, beta_m, beta_2) in R^3 population specific effects 3
R2 beta_1 in R, beta_m=beta_2 in R same effect when one or both variant alleles are from pop 2 2
R3 beta_1=beta_m in R, beta_2 in R same effect when one or both variant alleles are from pop 1 2
R4 beta_1 in R, beta_m=beta_2=0 only an effect when both variant alleles are from pop 1 1
R5 beta_1=beta_m=0, beta_2 in R only an effect when both variant alleles are from pop 2 1
R6 beta_1=beta_m=beta_2 in R same effect regardless of ancestry 1
R7 beta_1=beta_m=beta_2=0 no effect in any population 0

beta_1 and beta_2 are the effect of the assumed effect-allele in population 1 and 2 respectively. beta_m is the recessive effect of being recessive for an allele with one copy from population 1 and one copy from population 2. delta_1 and delta_2 are the effect of the assumed non-effect-allele in population 1 and 2 respectively.

Citation