Re: [Seqan-dev] createQGramIndex example

"Holtgrewe, Manuel" <manuel.holtgrewe@fu-berlin.de> · Tue, 27 Apr 2010 15:59:10 +0200

Hi, sorry for taking so long to answer. The delayed answer has the "advantage" that there is more documentation than a month ago :) If you have any more questions, feel free to ask them here. We will try to be more responsive in the future.

>   I would like to use SeqAn to create a qgram/kmer index of the mouse genome and would like to have control over the kmer and step sizes.  I have tried combined bits of example code on the website to do this, but either cannot get it compile or cause segfaults.  
> 
>   Is there a clear working example (perhaps I've missed it) of creating a qgram index from a genome StringSet and then using it to seed alignments from say a FASTQ file?

Did you try looking into the source of our read mapper RazerS? It builds a q-gram index of the genome for its verification step.

Maybe also have a look at the current "bleeding edge" trunk version of SeqAn. The documentation for the new version (which is more comprehensive than the one in the last release) can be found here:

http://www.seqan.de/dddoc/html_devel/

Also, there are various demos in the projects/library/demos/folder you might want to check out. There also is a brand new tutorial of various aspects of SeqAn.

http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial
http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Indices

We'd be happy over your feedback.

>   That said, as has been noted in an earlier post to this list, I find that the readMeta function does not work with FASTQ.  What idiom should be followed for extracting ids, sequences and qualities from a FASTQ file?

Maybe have a look at:

http://trac.mi.fu-berlin.de/seqan/wiki/HowTo/EfficientImportOfMillionsOfSequences

Bests,
Manuel