From manuel.holtgrewe@fu-berlin.de Tue Apr 27 15:59:13 2010
Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66])
 by list1.zedat.fu-berlin.de (Exim 4.69)
 for seqan-dev@lists.fu-berlin.de with esmtp
 (envelope-from <manuel.holtgrewe@fu-berlin.de>)
 id <1O6lJf-0000OH-P2>; Tue, 27 Apr 2010 15:59:11 +0200
Received: from relay2.zedat.fu-berlin.de ([130.133.4.80])
 by outpost1.zedat.fu-berlin.de (Exim 4.69)
 for seqan-dev@lists.fu-berlin.de with esmtp
 (envelope-from <manuel.holtgrewe@fu-berlin.de>)
 id <1O6lJf-0008U7-NH>; Tue, 27 Apr 2010 15:59:11 +0200
Received: from exchange6.fu-berlin.de ([160.45.9.133])
 by relay2.zedat.fu-berlin.de (Exim 4.69)
 for seqan-dev@lists.fu-berlin.de with esmtp
 (envelope-from <manuel.holtgrewe@fu-berlin.de>)
 id <1O6lJf-0003DQ-KO>; Tue, 27 Apr 2010 15:59:11 +0200
Received: from exchange6.fu-berlin.de ([160.45.9.133]) by
 exchange6.fu-berlin.de ([160.45.9.133]) with mapi;
 Tue, 27 Apr 2010 15:59:11 +0200
From: "Holtgrewe, Manuel" <manuel.holtgrewe@fu-berlin.de>
To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
Date: Tue, 27 Apr 2010 15:59:10 +0200
Thread-Topic: [Seqan-dev] createQGramIndex example
Thread-Index: AcrmEdAO3WsXkhbYQr+8BW6xC/n0bw==
Message-ID: <12C4ED49-8ED1-4F6F-A25F-D3F0DD434D03@fu-berlin.de>
References: <C09C26A4-C52F-47CF-BC32-F88C693C701A@gmail.com>
In-Reply-To: <C09C26A4-C52F-47CF-BC32-F88C693C701A@gmail.com>
Accept-Language: en-US, de-DE
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
acceptlanguage: en-US, de-DE
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Originating-IP: 160.45.9.133
X-purgate: clean
X-purgate-ID: 151147::1272376751-00000DDD-8C91239E/0-0/0-0
X-Bogosity: Ham, tests=bogofilter, spamicity=0.085651, version=1.1.6
X-Spam-Flag: NO
X-Spam-Checker-Version: SpamAssassin 3.0.4 on Algerien.ZEDAT.FU-Berlin.DE
X-Spam-Level: 
X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED
Subject: Re: [Seqan-dev] createQGramIndex example
X-BeenThere: seqan-dev@lists.fu-berlin.de
X-Mailman-Version: 2.1.11
Precedence: list
Reply-To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
List-Id: SeqAn Development <seqan-dev.lists.fu-berlin.de>
List-Unsubscribe: <https://lists.fu-berlin.de/options/seqan-dev>,
 <mailto:seqan-dev-request@lists.fu-berlin.de?subject=unsubscribe>
List-Archive: <https://lists.fu-berlin.de/private/seqan-dev>
List-Post: <mailto:seqan-dev@lists.fu-berlin.de>
List-Help: <mailto:seqan-dev-request@lists.fu-berlin.de?subject=help>
List-Subscribe: <https://lists.fu-berlin.de/listinfo/seqan-dev>,
 <mailto:seqan-dev-request@lists.fu-berlin.de?subject=subscribe>
X-List-Received-Date: Tue, 27 Apr 2010 13:59:14 -0000

Hi, sorry for taking so long to answer. The delayed answer has the "advanta=
ge" that there is more documentation than a month ago :) If you have any mo=
re questions, feel free to ask them here. We will try to be more responsive=
 in the future.

>   I would like to use SeqAn to create a qgram/kmer index of the mouse gen=
ome and would like to have control over the kmer and step sizes.  I have tr=
ied combined bits of example code on the website to do this, but either can=
not get it compile or cause segfaults. =20
>=20
>   Is there a clear working example (perhaps I've missed it) of creating a=
 qgram index from a genome StringSet and then using it to seed alignments f=
rom say a FASTQ file?

Did you try looking into the source of our read mapper RazerS? It builds a =
q-gram index of the genome for its verification step.

Maybe also have a look at the current "bleeding edge" trunk version of SeqA=
n. The documentation for the new version (which is more comprehensive than =
the one in the last release) can be found here:

http://www.seqan.de/dddoc/html_devel/

Also, there are various demos in the projects/library/demos/folder you migh=
t want to check out. There also is a brand new tutorial of various aspects =
of SeqAn.

http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial
http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Indices

We'd be happy over your feedback.

>   That said, as has been noted in an earlier post to this list, I find th=
at the readMeta function does not work with FASTQ.  What idiom should be fo=
llowed for extracting ids, sequences and qualities from a FASTQ file?

Maybe have a look at:

http://trac.mi.fu-berlin.de/seqan/wiki/HowTo/EfficientImportOfMillionsOfSeq=
uences

Bests,
Manuel=