|
|
| Line 1: |
Line 1: |
| This page contains information about the program '''asaMap''', a tool for doing ancestry specific assocaition mapping for large scales genetic studies. It is based on called genotypes in the binary plink format (.bed). The program is written in C++.
| | The program is available and described on github: |
| | |
| =Download=
| |
| | |
| The program can be downloaded from github:
| |
|
| |
|
| https://github.com/e-jorsboe/asaMap | | https://github.com/e-jorsboe/asaMap |
|
| |
| <pre>
| |
| git clone https://github.com/e-jorsboe/asaMap.git;
| |
| cd asaMap
| |
| make
| |
| </pre>
| |
|
| |
| So far it has only been tested on Linux systems. Use curl if you are on a MAC.
| |
|
| |
| =Example=
| |
|
| |
| To be added...
| |
|
| |
| =Input Files=
| |
| Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where '''.Q and .P files''' respectively can be given directly to asaMap.
| |
|
| |
|
| |
| A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:
| |
|
| |
| <pre>
| |
| -0.712027291121767
| |
| -0.158413122435864
| |
| -1.77167888612947
| |
| -0.800940619551485
| |
| 0.3016297021294
| |
| ...
| |
| </pre>
| |
|
| |
| A covarite file can also be provided, where each column is a covariate and each row is an individual - '''should NOT have columns of 1s for intercept (intercept will be included automatically)'''. This file has to have same number of rows as phenotype file and .fam file.
| |
|
| |
| <pre>
| |
| 0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973
| |
| 0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705
| |
| 0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881
| |
| 0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436
| |
| 0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168
| |
| ...
| |
| </pre>
| |
|
| |
| Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:
| |
|
| |
| <pre>
| |
| #run admixture
| |
| admixture plinkFile.bed 2
| |
|
| |
| #run asaMap with admix proportions
| |
| ./asaMap -p plinkFile -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P
| |
| </pre>
| |
|
| |
| This produces a out.log logfile and a out.res with results for each site (after filtering).
| |
|
| |
| =Running asaMap=
| |
|
| |
| Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:
| |
|
| |
| <pre>
| |
| #run admixture
| |
| admixture plinkFile.bed 2
| |
|
| |
| #run asaMap with admix proportions
| |
| ./asaMap -p plinkFile -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P
| |
| </pre>
| |
|
| |
| This produces a '''out.log''' logfile and a '''out.res''' with results for each site (after filtering).
| |
|
| |
|
| |
| A whole list of options can be explored by running asaMap without any input:
| |
|
| |
| <pre>
| |
| ./asaMap
| |
| </pre>
| |
|
| |
|
| |
| '''Must be specified:'''
| |
|
| |
| ; -p <filename>
| |
| Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.
| |
| ; -o <filename>
| |
| Output filename - a .res file will be written with the results and a .log log file.
| |
| ; -y <filename>
| |
| Phenotypes file, has to be plain text file - with as many rows as .fam file.
| |
| ; -Q <filename> (either -a or -Q)
| |
| Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.
| |
| ; -a <filename> (either -a or -Q)
| |
| Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.
| |
| ; -f <filename>
| |
| Allele frequencies, .P file from ADMIXTURE.
| |
|
| |
|
| |
| '''Optional:'''
| |
|
| |
| ; -c <filename>
| |
| Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!
| |
| ; -m <INT>
| |
| Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).
| |
| ; -l <INT>
| |
| Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)
| |
| ; -b <filename>
| |
| Text file containing a starting guess of the estimated coefficients.
| |
| ; -i <INT>
| |
| The maximum number of iterations to run for the EM algorithm (default: 80).
| |
| ; -t <FLOAT>
| |
| Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).
| |
| ; -r <INT>
| |
| Give seed, for generation of starting values of coefficients.
| |
| ; -P <INT>
| |
| Number of threads to be used for analysis. Each thread will write to temporary file in path specified by "-o".
| |
| ; -e <INT>
| |
| Estimate standard error of coefficients (0: no, 1: yes - default: 0).
| |
| ; -w <INT>
| |
| Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)
| |
|
| |
| =Outputs=
| |
|
| |
| A '''.res''' file with the likelihoods of each model and the estimated coefficients in each model is produced, here for the additive:
| |
|
| |
| <pre>
| |
|
| |
| Chromo Position nInd f1 f2 llh(M0) llh(M1) llh(M2) llh(M3) llh(M4) llh(M5) b1(M1) b2(M1) b1(M2) b2(M3) b(M4)
| |
| 1 9855422 1237 0.935997 0.537511 3242.099033 3242.214834 3243.033924 3242.812740 3243.019888 3243.115326 0.093018 -0.166907 -0.053931 0.047357 0.020093
| |
| 1 10684283 1217 0.999990 0.509715 nan nan nan 3214.598952 3214.974638 3215.569371 nan nan nan -0.110044 -0.054084
| |
| 1 11247763 1237 0.856692 0.78175 3234.025418 3241.930891 3242.902363 3242.561728 3242.820387 3243.028131 -0.048894 0.108007 0.045277 -0.030582 -0.016838
| |
| ...
| |
| </pre>
| |
|
| |
|
| |
| For the recessive model it looks like this:
| |
|
| |
| <pre>
| |
|
| |
| Chromo Position nInd f1 f2 llh(R0) llh(R1) llh(R2) llh(R3) llh(R4) llh(R5) llh(R6) llh(R7) b1(R1) b2(R1) bm(R1) b1(R2) b2m(R2) b1m(R3) b2(R3) b1(R4) b2(R5) b(R6)
| |
| 1 9855422 1237 0.935997 0.537511 3236.442376 3241.191367 3242.235364 3241.191468 3243.112239 3241.188747 3242.691370 3243.115326 0.023373 -2.082935 -0.027433 0.016608 -0.582318 0.004700 -2.083112 -0.046849 -2.083275 -0.259338
| |
| 1 10684283 1217 0.999990 0.509715 nan nan nan nan 3215.162291 3215.133559 3214.502575 3215.569371 nan nan nan nan nan nan nan -0.529999 -0.721649 -0.438317
| |
| 1 11247763 1237 0.856692 0.78175 3235.030514 3242.807127 3242.809076 3242.836233 3242.818987 3243.028431 3242.907072 3243.028131 0.064419 -0.047597 -0.004021 0.068119 -0.019760 0.042905 -0.078669 0.060373 -0.018537 0.029227
| |
| ...
| |
| </pre>
| |
|
| |
|
| |
| P-values can be generated doing a likelihood ratio test, between the 2 desired models.
| |
| An Rscript '''getPvalues.R''' is provided that makes it easy to obtain P-values from the '''.res''' file:
| |
|
| |
| <pre>
| |
|
| |
| Rscript R/getPvalues.R out.res
| |
|
| |
| </pre>
| |
|
| |
| Which produces a file with the suffix '''.Pvalues''':
| |
|
| |
| <pre>
| |
|
| |
| Chromo Position nInd f1 f2 M0vM1 M1vM5 M1vM2 M1vM3 M1vM4 M2vM5 M3vM5 M4vM5
| |
| 1 9855422 1237 0.935997 0.537511 0.630338505521655 0.40636967666779 0.200575362363081 0.274160334109282 0.204476621296224 0.686587953953705 0.436611450245155 0.662188528285713
| |
| 1 10684283 1217 0.99999 0.509715 NA NA NA NA NA NA 0.163577574260359 0.275437296874114
| |
| 1 11247763 1237 0.856692 0.78175 6.99963946833027e-05 0.333791076895669 0.163349235419537 0.261334462945287 0.182273151757048 0.615995603296571 0.334134847663281 0.51919707427275
| |
| ...
| |
| </pre>
| |
|
| |
| =Models=
| |
|
| |
| asaMap implements a range of linear models, making it possible to test specific hypotheses.
| |
| For the additive model there are 6 different models:
| |
|
| |
| {| class="wikitable"
| |
| |-
| |
| ! scope="col"| Model
| |
| ! scope="col"| Parameters
| |
| ! scope="col"| Notes
| |
| ! scope="col"| Effect Parameters
| |
| |-
| |
| | M0
| |
| | (beta_1, beta_2, delta_1) in R^3
| |
| | effect of non-assumed effect allele
| |
| | 1
| |
| |-
| |
| | M1
| |
| | (beta_1, beta_2) in R^2
| |
| | population specific effects
| |
| | 2
| |
| |-
| |
| | M2
| |
| | beta_1=0, beta_2 in R
| |
| | no effect in population 1
| |
| | 1
| |
| |-
| |
| | M3
| |
| | beta_1 in R, beta_2=0
| |
| | no effect in population 2
| |
| | 1
| |
| |-
| |
| | M4
| |
| | beta_1=beta_2 in R
| |
| | same effect in both populations
| |
| | 1
| |
| |-
| |
| | M5
| |
| | beta_1=beta_2=0
| |
| | no effect in any population
| |
| | 0
| |
| |}
| |
|
| |
| For the recessive model there are 8 different models:
| |
|
| |
| {| class="wikitable"
| |
| |-
| |
| ! scope="col"| Model
| |
| ! scope="col"| Parameters
| |
| ! scope="col"| Notes
| |
| ! scope="col"| Effect Parameters
| |
| |-
| |
| | R0
| |
| | (beta_1, beta_m, beta_2, delta_1, delta_2) in R^5
| |
| | recessive effect of non-assumed effect alleles
| |
| | 2
| |
| |-
| |
| | R1
| |
| | (beta_1, beta_m, beta_2) in R^3
| |
| | population specific effects
| |
| | 3
| |
| |-
| |
| | R2
| |
| | beta_1 in R, beta_m=beta_2 in R
| |
| | same effect when one or both variant alleles are from pop 2
| |
| | 2
| |
| |-
| |
| | R3
| |
| | beta_1=beta_m in R, beta_2 in R
| |
| | same effect when one or both variant alleles are from pop 1
| |
| | 2
| |
| |-
| |
| | R4
| |
| | beta_1 in R, beta_m=beta_2=0
| |
| | only an effect when both variant alleles are from pop 1
| |
| | 1
| |
| |-
| |
| | R5
| |
| | beta_1=beta_m=0, beta_2 in R
| |
| | only an effect when both variant alleles are from pop 2
| |
| | 1
| |
| |-
| |
| | R6
| |
| | beta_1=beta_m=beta_2 in R
| |
| | same effect regardless of ancestry
| |
| | 1
| |
| |-
| |
| | R7
| |
| | beta_1=beta_m=beta_2=0
| |
| | no effect in any population
| |
| | 0
| |
| |}
| |
|
| |
| '''beta_1''' and '''beta_2''' are the effect of the assumed effect-allele in population 1 and 2 respectively. '''beta_m''' is the recessive effect of being recessive for an allele with one copy from population 1 and one copy from population 2. '''delta_1''' and '''delta_2''' are the effect of the assumed non-effect-allele in population 1 and 2 respectively.
| |
|
| |
| =Citation=
| |