AsaMap: Difference between revisions
| Line 124: | Line 124: | ||
| Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4) | Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4) | ||
| 1        | 1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093 | ||
| 1        | 1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084 | ||
| 1        | 1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838 | ||
| ... | |||
| </pre> | </pre> | ||
| Line 142: | Line 136: | ||
| Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6) | Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6) | ||
| 1        | 1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338 | ||
| 1        | 1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317 | ||
| 1        | 1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227 | ||
| ... | |||
| </pre> | </pre> | ||
| Line 161: | Line 149: | ||
| Rscript R/getPvalues.R out.res | Rscript R/getPvalues.R out.res | ||
| </pre> | |||
| Which produces a file with the suffix .Pvalues: | |||
| <pre> | |||
| Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5 | |||
| 1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713 | |||
| 1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114 | |||
| 1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275 | |||
| </pre> | </pre> | ||
Revision as of 18:37, 3 March 2019
Download
The program can be downloaded from github:
https://github.com/e-jorsboe/asaMap
git clone https://github.com/e-jorsboe/asaMap.git; cd asaMap make
So far it has only been tested on Linux systems. Use curl if you are on a MAC.
Example
This an example!!
Input Files
Input files are called genotypes in the binary plink files (*.bed) format [1]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies ADMIXTURE, can be used, where .Q and .P files respectively can be given directly to asaMap.
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:
-0.712027291121767 -0.158413122435864 -1.77167888612947 -0.800940619551485 0.3016297021294 ...
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973 0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705 0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881 0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436 0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168 ...
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:
#run admixture admixture plinkFile.bed 2 #run asaMap with admix proportions ./asaMap -p plinkFile -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P
This produces a out.log logfile and a out.res with results for each site (after filtering).
Running asaMap
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:
#run admixture admixture plinkFile.bed 2 #run asaMap with admix proportions ./asaMap -p plinkFile -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P
This produces a out.log logfile and a out.res with results for each site (after filtering).
A whole list of options can be explored by running asaMap without any input:
./asaMap
Must be specified:
- -p <filename>
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.
- -o <filename>
Output filename - a .res file will be written with the results and a .log log file.
- -y <filename>
Phenotypes file, has to be plain text file - with as many rows as .fam file.
- -Q <filename> (either -a or -Q)
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.
- -a <filename> (either -a or -Q)
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.
- -f <filename>
Allele frequencies, .P file from ADMIXTURE.
Optional:
- -c <filename>
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!
- -m <INT>
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).
- -l <INT>
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)
- -b <filename>
Text file containing a starting guess of the estimated coefficients.
- -i <INT>
The maximum number of iterations to run for the EM algorithm (default: 80).
- -t <FLOAT>
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).
- -r <INT>
Give seed, for generation of starting values of coefficients.
- -P <INT>
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by "-o".
- -e <INT>
Estimate standard error of coefficients (0: no, 1: yes - default: 0).
- -w <INT>
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)
Outputs
A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:
Chromo Position nInd f1 f2 llh(M0) llh(M1) llh(M2) llh(M3) llh(M4) llh(M5) b1(M1) b2(M1) b1(M2) b2(M3) b(M4) 1 9855422 1237 0.935997 0.537511 3242.099033 3242.214834 3243.033924 3242.812740 3243.019888 3243.115326 0.093018 -0.166907 -0.053931 0.047357 0.020093 1 10684283 1217 0.999990 0.509715 nan nan nan 3214.598952 3214.974638 3215.569371 nan nan nan -0.110044 -0.054084 1 11247763 1237 0.856692 0.78175 3234.025418 3241.930891 3242.902363 3242.561728 3242.820387 3243.028131 -0.048894 0.108007 0.045277 -0.030582 -0.016838 ...
For the recessive model it looks like this:
Chromo Position nInd f1 f2 llh(R0) llh(R1) llh(R2) llh(R3) llh(R4) llh(R5) llh(R6) llh(R7) b1(R1) b2(R1) bm(R1) b1(R2) b2m(R2) b1m(R3) b2(R3) b1(R4) b2(R5) b(R6) 1 9855422 1237 0.935997 0.537511 3236.442376 3241.191367 3242.235364 3241.191468 3243.112239 3241.188747 3242.691370 3243.115326 0.023373 -2.082935 -0.027433 0.016608 -0.582318 0.004700 -2.083112 -0.046849 -2.083275 -0.259338 1 10684283 1217 0.999990 0.509715 nan nan nan nan 3215.162291 3215.133559 3214.502575 3215.569371 nan nan nan nan nan nan nan -0.529999 -0.721649 -0.438317 1 11247763 1237 0.856692 0.78175 3235.030514 3242.807127 3242.809076 3242.836233 3242.818987 3243.028431 3242.907072 3243.028131 0.064419 -0.047597 -0.004021 0.068119 -0.019760 0.042905 -0.078669 0.060373 -0.018537 0.029227 ...
P-values can be generated doing a likelihood ratio test, between the 2 desired models.
An Rscript "getPvalues.R" is provided that makes it easy to obtain P-values from the .res file:
Rscript R/getPvalues.R out.res
Which produces a file with the suffix .Pvalues:
Chromo Position nInd f1 f2 M0vM1 M1vM5 M1vM2 M1vM3 M1vM4 M2vM5 M3vM5 M4vM5 1 9855422 1237 0.935997 0.537511 0.630338505521655 0.40636967666779 0.200575362363081 0.274160334109282 0.204476621296224 0.686587953953705 0.436611450245155 0.662188528285713 1 10684283 1217 0.99999 0.509715 NA NA NA NA NA NA 0.163577574260359 0.275437296874114 1 11247763 1237 0.856692 0.78175 6.99963946833027e-05 0.333791076895669 0.163349235419537 0.261334462945287 0.182273151757048 0.615995603296571 0.334134847663281 0.51919707427275