|
|
| (16 intermediate revisions by one other user not shown) |
| Line 1: |
Line 1: |
| | | The program is available and described on github: |
| =Download=
| |
| | |
| The program can be downloaded from github: | |
|
| |
|
| https://github.com/e-jorsboe/asaMap | | https://github.com/e-jorsboe/asaMap |
|
| |
| <pre>
| |
| git clone https://github.com/e-jorsboe/asaMap.git;
| |
| cd asaMap
| |
| make
| |
| </pre>
| |
|
| |
| So far it has only been tested on Linux systems. Use curl if you are on a MAC.
| |
|
| |
| =Example=
| |
|
| |
| This an example!!
| |
|
| |
| =Input Files=
| |
| Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.
| |
|
| |
|
| |
| A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:
| |
|
| |
| <pre>
| |
| -0.712027291121767
| |
| -0.158413122435864
| |
| -1.77167888612947
| |
| -0.800940619551485
| |
| 0.3016297021294
| |
| ...
| |
| </pre>
| |
|
| |
| A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.
| |
|
| |
| <pre>
| |
| 0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973
| |
| 0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705
| |
| 0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881
| |
| 0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436
| |
| 0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168
| |
| ...
| |
| </pre>
| |
|
| |
| Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:
| |
|
| |
| <pre>
| |
| #run admixture
| |
| admixture plinkFile.bed 2
| |
|
| |
| #run asaMap with admix proportions
| |
| ./asaMap -p plinkFile -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P
| |
| </pre>
| |
|
| |
| This produces a out.log logfile and a out.res with results for each site (after filtering).
| |
|
| |
| =Running asaMap=
| |
|
| |
| Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:
| |
|
| |
| <pre>
| |
| #run admixture
| |
| admixture plinkFile.bed 2
| |
|
| |
| #run asaMap with admix proportions
| |
| ./asaMap -p plinkFile -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P
| |
| </pre>
| |
|
| |
| This produces a out.log logfile and a out.res with results for each site (after filtering).
| |
|
| |
|
| |
| A whole list of options can be explored by running asaMap without any input:
| |
|
| |
| <pre>
| |
| ./asaMap
| |
| </pre>
| |
|
| |
|
| |
| '''Must be specified:'''
| |
|
| |
| ; -p <filename>
| |
| Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.
| |
| ; -o <filename>
| |
| Output filename - a .res file will be written with the results and a .log log file.
| |
| ; -y <filename>
| |
| Phenotypes file, has to be plain text file - with as many rows as .fam file.
| |
| ; -Q <filename> (either -a or -Q)
| |
| Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.
| |
| ; -a <filename> (either -a or -Q)
| |
| Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.
| |
| ; -f <filename>
| |
| Allele frequencies, .P file from ADMIXTURE.
| |
|
| |
|
| |
| '''Optional:'''
| |
|
| |
| ; -c <filename>
| |
| Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!
| |
| ; -m <INT>
| |
| Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).
| |
| ; -l <INT>
| |
| Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)
| |
| ; -b <filename>
| |
| Text file containing a starting guess of the estimated coefficients.
| |
| ; -i <INT>
| |
| The maximum number of iterations to run for the EM algorithm (default: 80).
| |
| ; -t <FLOAT>
| |
| Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).
| |
| ; -r <INT>
| |
| Give seed, for generation of starting values of coefficients.
| |
| ; -P <INT>
| |
| Number of threads to be used for analysis. Each thread will write to temporary file in path specified by "-o".
| |
| ; -e <INT>
| |
| Estimate standard error of coefficients (0: no, 1: yes - default: 0).
| |
| ; -w <INT>
| |
| Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)
| |
|
| |
| =Outputs=
| |
|
| |
| A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:
| |
|
| |
| <pre>
| |
|
| |
| Chromo Position nInd f1 f2 llh(M0) llh(M1) llh(M2) llh(M3) llh(M4) llh(M5) b1(M1) b2(M1) b1(M2) b2(M3) b(M4)
| |
| 1 980552 2737 0.935997 0.937511 3242.099033 3242.214834 3243.033924 3242.812740 3243.019888 3243.115326 0.093018 -0.166907 -0.053931 0.047357 0.020093
| |
| 1 1068883 2717 0.999990 0.809715 nan nan nan 3214.598952 3214.974638 3215.569371 nan nan nan -0.110044 -0.054084
| |
| 1 1124663 2737 0.886692 0.388175 3234.025418 3241.930891 3242.902363 3242.561728 3242.820387 3243.028131 -0.048894 0.108007 0.045277 -0.030582 -0.016838
| |
| 1 1171417 2736 0.999990 0.445701 nan nan nan 3239.320653 3239.524956 3239.641824 nan nan nan -0.033530 -0.015845
| |
| 1 1366830 2735 0.999990 0.374078 nan nan nan 3241.698019 3241.675158 3241.696793 nan nan nan 0.002135 0.007140
| |
| 1 1450947 2738 0.659605 0.906222 3240.054094 3243.544587 3243.770254 3243.708934 3243.777517 3243.800524 -0.026101 0.044039 0.016671 -0.014242 -0.005544
| |
| 1 1995211 2737 0.856699 0.982350 3235.516404 3242.070487 3242.928680 3242.571223 3242.756177 3242.941750 0.074805 -0.142018 -0.020892 0.039110 0.021462
| |
| 1 2004098 2738 0.443711 0.815725 3241.253250 3242.382033 3243.741660 3242.955646 3243.532476 3243.800524 0.058767 -0.055806 -0.016451 0.041228 0.016158
| |
| 1 2040898 2738 0.676808 0.610463 3242.664546 3243.371593 3243.574375 3243.801527 3243.787426 3243.800524 -0.024109 0.081087 0.047793 -0.001765 0.004108
| |
|
| |
| </pre>
| |
|
| |
|
| |
| For the recessive model it looks like this:
| |
|
| |
| <pre>
| |
|
| |
| Chromo Position nInd f1 f2 llh(R0) llh(R1) llh(R2) llh(R3) llh(R4) llh(R5) llh(R6) llh(R7) b1(R1) b2(R1) bm(R1) b1(R2) b2m(R2) b1m(R3) b2(R3) b1(R4) b2(R5) b(R6)
| |
| 1 980552 2737 0.935997 0.937511 3236.442376 3241.191367 3242.235364 3241.191468 3243.112239 3241.188747 3242.691370 3243.115326 0.023373 -2.082935 -0.027433 0.016608 -0.582318 0.004700 -2.083112 -0.046849 -2.083275 -0.259338
| |
| 1 1068883 2717 0.999990 0.809715 nan nan nan nan 3215.162291 3215.133559 3214.502575 3215.569371 nan nan nan nan nan nan nan -0.529999 -0.721649 -0.438317
| |
| 1 1124663 2737 0.886692 0.388175 3235.030514 3242.807127 3242.809076 3242.836233 3242.818987 3243.028431 3242.907072 3243.028131 0.064419 -0.047597 -0.004021 0.068119 -0.019760 0.042905 -0.078669 0.060373 -0.018537 0.029227
| |
| 1 1171417 2736 0.999990 0.445701 nan nan nan nan 3238.750760 3239.274351 3238.288964 3239.641824 nan nan nan nan nan nan nan -0.210643 -0.267111 -0.144645
| |
| 1 1366830 2735 0.999990 0.374078 nan nan nan nan 3241.645871 3241.199416 3241.338290 3241.696793 nan nan nan nan nan nan nan -0.045970 -0.273382 -0.070305
| |
| 1 1450947 2738 0.659605 0.906222 3240.883715 3242.545834 3243.515375 3243.627600 3243.713843 3243.659336 3243.802228 3243.800524 0.047735 0.291966 -0.216232 0.044591 -0.069851 -0.016796 0.170637 0.032325 0.146528 0.002457
| |
| 1 1995211 2737 0.856699 0.982350 3234.731598 3241.839632 3241.919398 3241.997812 3242.204980 3242.750902 3242.000261 3242.941750 0.072845 0.113462 0.601882 0.114683 0.366807 0.175891 0.261334 0.209120 0.516155 0.181162
| |
| 1 2004098 2738 0.443711 0.815725 3238.336234 3238.488951 3241.228881 3243.661958 3242.407555 3243.783839 3243.676693 3243.800524 0.133629 0.236260 -0.298383 0.122912 -0.100454 0.025324 -0.013486 0.097341 0.030391 0.019042
| |
| 1 2040898 2738 0.676808 0.610463 3241.442146 3242.449918 3242.502684 3243.202847 3243.802047 3243.233496 3243.496321 3243.800524 -0.065485 0.095602 0.207722 -0.057787 0.165752 0.014559 0.205258 0.003543 0.221293 0.037588
| |
|
| |
| </pre>
| |
|
| |
|
| |
| P-values can be generated doing a likelihood ratio test, between the 2 desired models.
| |
| An Rscript "getPvalues.R" is provided that makes it easy to obtain P-values from the .res file:
| |
|
| |
| <pre>
| |
|
| |
| Rscript R/getPvalues.R out.res
| |
|
| |
| </pre>
| |
|
| |
| =Models=
| |
|
| |
| =Citation=
| |