|
|
| (15 intermediate revisions by one other user not shown) |
| Line 1: |
Line 1: |
| | | The program is available and described on github: |
| =Download=
| |
| | |
| The program can be downloaded from github: | |
|
| |
|
| https://github.com/e-jorsboe/asaMap | | https://github.com/e-jorsboe/asaMap |
|
| |
| <pre>
| |
| git clone https://github.com/e-jorsboe/asaMap.git;
| |
| cd asaMap
| |
| make
| |
| </pre>
| |
|
| |
| So far it has only been tested on Linux systems. Use curl if you are on a MAC.
| |
|
| |
| =Example=
| |
|
| |
| This an example!!
| |
|
| |
| =Input Files=
| |
| Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.
| |
|
| |
|
| |
| A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:
| |
|
| |
| <pre>
| |
| -0.712027291121767
| |
| -0.158413122435864
| |
| -1.77167888612947
| |
| -0.800940619551485
| |
| 0.3016297021294
| |
| ...
| |
| </pre>
| |
|
| |
| A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.
| |
|
| |
| <pre>
| |
| 0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973
| |
| 0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705
| |
| 0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881
| |
| 0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436
| |
| 0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168
| |
| ...
| |
| </pre>
| |
|
| |
| Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:
| |
|
| |
| <pre>
| |
| #run admixture
| |
| admixture plinkFile.bed 2
| |
|
| |
| #run asaMap with admix proportions
| |
| ./asaMap -p plinkFile -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P
| |
| </pre>
| |
|
| |
| This produces a out.log logfile and a out.res with results for each site (after filtering).
| |
|
| |
| =Running asaMap=
| |
|
| |
| Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:
| |
|
| |
| <pre>
| |
| #run admixture
| |
| admixture plinkFile.bed 2
| |
|
| |
| #run asaMap with admix proportions
| |
| ./asaMap -p plinkFile -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P
| |
| </pre>
| |
|
| |
| This produces a out.log logfile and a out.res with results for each site (after filtering).
| |
|
| |
|
| |
| A whole list of options can be explored by running asaMap without any input:
| |
|
| |
| <pre>
| |
| ./asaMap
| |
| </pre>
| |
|
| |
|
| |
| '''Must be specified:'''
| |
|
| |
| ; -p <filename>
| |
| Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.
| |
| ; -o <filename>
| |
| Output filename - a .res file will be written with the results and a .log log file.
| |
| ; -y <filename>
| |
| Phenotypes file, has to be plain text file - with as many rows as .fam file.
| |
| ; -Q <filename> (either -a or -Q)
| |
| Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.
| |
| ; -a <filename> (either -a or -Q)
| |
| Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.
| |
| ; -f <filename>
| |
| Allele frequencies, .P file from ADMIXTURE.
| |
|
| |
|
| |
| '''Optional:'''
| |
|
| |
| ; -c <filename>
| |
| Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!
| |
| ; -m <INT>
| |
| Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).
| |
| ; -l <INT>
| |
| Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)
| |
| ; -b <filename>
| |
| Text file containing a starting guess of the estimated coefficients.
| |
| ; -i <INT>
| |
| The maximum number of iterations to run for the EM algorithm (default: 80).
| |
| ; -t <FLOAT>
| |
| Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).
| |
| ; -r <INT>
| |
| Give seed, for generation of starting values of coefficients.
| |
| ; -P <INT>
| |
| Number of threads to be used for analysis. Each thread will write to temporary file in path specified by "-o".
| |
| ; -e <INT>
| |
| Estimate standard error of coefficients (0: no, 1: yes - default: 0).
| |
| ; -w <INT>
| |
| Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)
| |
|
| |
| =Outputs=
| |
|
| |
| A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:
| |
|
| |
| <pre>
| |
|
| |
| Chromo Position nInd f1 f2 llh(M0) llh(M1) llh(M2) llh(M3) llh(M4) llh(M5) b1(M1) b2(M1) b1(M2) b2(M3) b(M4)
| |
| 1 9855422 1237 0.935997 0.537511 3242.099033 3242.214834 3243.033924 3242.812740 3243.019888 3243.115326 0.093018 -0.166907 -0.053931 0.047357 0.020093
| |
| 1 10684283 1217 0.999990 0.509715 nan nan nan 3214.598952 3214.974638 3215.569371 nan nan nan -0.110044 -0.054084
| |
| 1 11247763 1237 0.856692 0.78175 3234.025418 3241.930891 3242.902363 3242.561728 3242.820387 3243.028131 -0.048894 0.108007 0.045277 -0.030582 -0.016838
| |
| ...
| |
| </pre>
| |
|
| |
|
| |
| For the recessive model it looks like this:
| |
|
| |
| <pre>
| |
|
| |
| Chromo Position nInd f1 f2 llh(R0) llh(R1) llh(R2) llh(R3) llh(R4) llh(R5) llh(R6) llh(R7) b1(R1) b2(R1) bm(R1) b1(R2) b2m(R2) b1m(R3) b2(R3) b1(R4) b2(R5) b(R6)
| |
| 1 9855422 1237 0.935997 0.537511 3236.442376 3241.191367 3242.235364 3241.191468 3243.112239 3241.188747 3242.691370 3243.115326 0.023373 -2.082935 -0.027433 0.016608 -0.582318 0.004700 -2.083112 -0.046849 -2.083275 -0.259338
| |
| 1 10684283 1217 0.999990 0.509715 nan nan nan nan 3215.162291 3215.133559 3214.502575 3215.569371 nan nan nan nan nan nan nan -0.529999 -0.721649 -0.438317
| |
| 1 11247763 1237 0.856692 0.78175 3235.030514 3242.807127 3242.809076 3242.836233 3242.818987 3243.028431 3242.907072 3243.028131 0.064419 -0.047597 -0.004021 0.068119 -0.019760 0.042905 -0.078669 0.060373 -0.018537 0.029227
| |
| ...
| |
| </pre>
| |
|
| |
|
| |
| P-values can be generated doing a likelihood ratio test, between the 2 desired models.
| |
| An Rscript "getPvalues.R" is provided that makes it easy to obtain P-values from the .res file:
| |
|
| |
| <pre>
| |
|
| |
| Rscript R/getPvalues.R out.res
| |
|
| |
| </pre>
| |
|
| |
| Which produces a file with the suffix .Pvalues:
| |
|
| |
| <pre>
| |
|
| |
| Chromo Position nInd f1 f2 M0vM1 M1vM5 M1vM2 M1vM3 M1vM4 M2vM5 M3vM5 M4vM5
| |
| 1 9855422 1237 0.935997 0.537511 0.630338505521655 0.40636967666779 0.200575362363081 0.274160334109282 0.204476621296224 0.686587953953705 0.436611450245155 0.662188528285713
| |
| 1 10684283 1217 0.99999 0.509715 NA NA NA NA NA NA 0.163577574260359 0.275437296874114
| |
| 1 11247763 1237 0.856692 0.78175 6.99963946833027e-05 0.333791076895669 0.163349235419537 0.261334462945287 0.182273151757048 0.615995603296571 0.334134847663281 0.51919707427275
| |
|
| |
| </pre>
| |
|
| |
| =Models=
| |
|
| |
| =Citation=
| |