ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Fasta: Difference between revisions

From angsd
Jump to navigation Jump to search
No edit summary
No edit summary
Line 4: Line 4:


<classdiagram type="dir:LR">
<classdiagram type="dir:LR">
  [Single BAM file{bg:orange}]->[Sequencing data|Random base (-doFasta 1);Consensus base (-doFasta 2)]
  [Single BAM file{bg:orange}]->[Sequence data|Random base (-doFasta 1);Consensus base (-doFasta 2)]
[sequencing data]->doFasta[fasta file{bg:blue}]
[sequencing data]->doFasta[fasta file{bg:blue}]
  </classdiagram>
  </classdiagram>

Revision as of 22:05, 27 November 2013

Available from version 0.559+.

This option creates a fasta file from a sequencing data file (BAM file). The function uses genome information in the BAM header to determine the length and chromosome names. For the sites without data an "N" is written.

<classdiagram type="dir:LR">

[Single BAM file{bg:orange}]->[Sequence data|Random base (-doFasta 1);Consensus base (-doFasta 2)]

[sequencing data]->doFasta[fasta file{bg:blue}]

</classdiagram>

Brief Overview

> ./angsd -doFasta
--------------
analysisFasta.cpp:
	-doFasta	0
	1: use a random base
	2: use the most common base (needs -doCounts 1)
	-minQ		13	(remove bases with qscore<minQ)

Options

-doFasta 1
sample a random base at each position.
-doFasta 2
use the most common base. In the case of ties a random base is chosen among the bases with the same maximum counts. The "-doCounts 1" options for allele counts is needed in order to determine the most common base.
-minQ [INT]

minimum base quality score.


Example

Create a fasta file bases from a random samples of bases.

./angsd -i smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1