RefFinder

From software
Jump to navigation Jump to search

Small fast cprogram to extract bases from a fasta file. Download here [1]

Program can either work as a standalone program, or allow for easy retrieval of reference bases by using the API.

Install

wget http://popgen.dk/software/download/refFinder/refFinder.tar.gz
tar xf refFinder.tar.gz
cd refFinder/
make
cd ..

Stand alone

Example

Generate samtools chr pos ref doing

samtools mpileup -b smallBam.filelist -f /space/genomes/refgenomes/hg19/merged/hg19NoChr.fa |cut -f1-3 >small.sam

Use refFinder to find the bases for each position in small.sam

cut -f1-2 ../angsd/test/small.sam |./refFinder /space/genomes/refgenomes/hg19/merged/hg19NoChr.fa full >tst
cmp tst ../angsd/test/small.sam

possible options are

inputIsZero
full

These are flags, so examples are

cut -f1-2 ../angsd/test/small.sam |./refFinder /space/genomes/refgenomes/hg19/merged/hg19NoChr.fa |head
a
g
c
t
a
c
t
c
g
g

Or if we want the chr position also

cut -f1-2 ../angsd/test/small.sam |./refFinder /space/genomes/refgenomes/hg19/merged/hg19NoChr.fa full |head
1	13999902	  a
1	13999903	  g
1	13999904	  c
1	13999905	  t
1	13999906	  a
1	13999907	  c
1	13999908	  t
1	13999909	  c
1	13999910	  g
1	13999911	  g

Or if the positions are zero index as opposed to one indexed:

cut -f1-2 ../angsd/test/small.sam |./refFinder /space/genomes/refgenomes/hg19/merged/hg19NoChr.fa full inputIsZero |head
1	13999902	 g
1	13999903	 c
1	13999904	 t
1	13999905	 a
1	13999906	 c
1	13999907	 t
1	13999908	 c
1	13999909	 g
1	13999910	 g
1	13999911	 g

API


#include "refFinder.h"
perFasta *pf = init("hg19.fa");
char refbase = getchar("chr20",130224101,pf)
//refbase now contains the reference base for chr20 at position 130,224,101
destroy(pf);

Remember to link with refFinder.o and -lz

g++ sampleProg.cpp refFinder.o -lz

bugs

  1. do check if reference file doesn't exist.