From manuel.holtgrewe@fu-berlin.de Tue Apr 27 15:59:13 2010 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1O6lJf-0000OH-P2>; Tue, 27 Apr 2010 15:59:11 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1O6lJf-0008U7-NH>; Tue, 27 Apr 2010 15:59:11 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1O6lJf-0003DQ-KO>; Tue, 27 Apr 2010 15:59:11 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Tue, 27 Apr 2010 15:59:11 +0200 From: "Holtgrewe, Manuel" To: SeqAn Development Date: Tue, 27 Apr 2010 15:59:10 +0200 Thread-Topic: [Seqan-dev] createQGramIndex example Thread-Index: AcrmEdAO3WsXkhbYQr+8BW6xC/n0bw== Message-ID: <12C4ED49-8ED1-4F6F-A25F-D3F0DD434D03@fu-berlin.de> References: In-Reply-To: Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-purgate: clean X-purgate-ID: 151147::1272376751-00000DDD-8C91239E/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.085651, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Algerien.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Subject: Re: [Seqan-dev] createQGramIndex example X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Apr 2010 13:59:14 -0000 Hi, sorry for taking so long to answer. The delayed answer has the "advanta= ge" that there is more documentation than a month ago :) If you have any mo= re questions, feel free to ask them here. We will try to be more responsive= in the future. > I would like to use SeqAn to create a qgram/kmer index of the mouse gen= ome and would like to have control over the kmer and step sizes. I have tr= ied combined bits of example code on the website to do this, but either can= not get it compile or cause segfaults. =20 >=20 > Is there a clear working example (perhaps I've missed it) of creating a= qgram index from a genome StringSet and then using it to seed alignments f= rom say a FASTQ file? Did you try looking into the source of our read mapper RazerS? It builds a = q-gram index of the genome for its verification step. Maybe also have a look at the current "bleeding edge" trunk version of SeqA= n. The documentation for the new version (which is more comprehensive than = the one in the last release) can be found here: http://www.seqan.de/dddoc/html_devel/ Also, there are various demos in the projects/library/demos/folder you migh= t want to check out. There also is a brand new tutorial of various aspects = of SeqAn. http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Indices We'd be happy over your feedback. > That said, as has been noted in an earlier post to this list, I find th= at the readMeta function does not work with FASTQ. What idiom should be fo= llowed for extracting ids, sequences and qualities from a FASTQ file? Maybe have a look at: http://trac.mi.fu-berlin.de/seqan/wiki/HowTo/EfficientImportOfMillionsOfSeq= uences Bests, Manuel=