From fheeger@mi.fu-berlin.de Sun May 02 11:21:15 2010 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1O8VMQ-0008F3-DX>; Sun, 02 May 2010 11:21:14 +0200 Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1O8VMQ-00011E-Bp>; Sun, 02 May 2010 11:21:14 +0200 Received: from dslb-092-078-133-078.pools.arcor-ip.net ([92.78.133.78] helo=[192.168.2.101]) by inpost2.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtpsa (envelope-from ) id <1O8VMP-0001cn-Ll>; Sun, 02 May 2010 11:21:14 +0200 From: Felix Heeger To: seqan mailingliste Content-Type: text/plain Date: Sun, 02 May 2010 11:21:10 +0200 Message-Id: <1272792070.4067.9.camel@NX> Mime-Version: 1.0 X-Mailer: Evolution 2.26.1 Content-Transfer-Encoding: 7bit X-Originating-IP: 92.78.133.78 X-purgate: clean X-purgate-ID: 151147::1272792074-00000DDD-5DC6618E/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Dschibuti.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Subject: [Seqan-dev] getting started X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 May 2010 09:21:15 -0000 hello all, I tried to build seqAn yesterday from the svn and encountered some problems with missing files. After a night of sleep I remembered from my seqAn lessons during the software project last summer that sometimes the initial checkout of seqAn will skip some files (for no reason if I remember correctly). So I did a "svn up", got some more files and now I can build seqAn (micro_razers still causes problems). Maybe this issue could be added to the "getting started" page of the trac wiki. Greetings, Felix Heeger From manuel.holtgrewe@fu-berlin.de Sun May 02 14:05:41 2010 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1O8XvY-0004mz-2K>; Sun, 02 May 2010 14:05:40 +0200 Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost1.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1O8XvY-0005tO-0e>; Sun, 02 May 2010 14:05:40 +0200 Received: from g225139025.adsl.alicedsl.de ([92.225.139.25] helo=[192.168.1.101]) by inpost2.zedat.fu-berlin.de (Exim 4.69) with esmtpsa (envelope-from ) id <1O8XvX-0000Qk-Sc>; Sun, 02 May 2010 14:05:40 +0200 Message-Id: From: Manuel Holtgrewe To: SeqAn Development In-Reply-To: <1272792070.4067.9.camel@NX> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Date: Sun, 2 May 2010 14:05:38 +0200 References: <1272792070.4067.9.camel@NX> X-Mailer: Apple Mail (2.936) X-Originating-IP: 92.225.139.25 X-purgate: clean X-purgate-ID: 151147::1272801940-00000DDD-CEB5CD9D/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Botsuana.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Subject: Re: [Seqan-dev] getting started X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 May 2010 12:05:42 -0000 Thanks for reporting this. I just added a note to the getting started page. Am 02.05.2010 um 11:21 schrieb Felix Heeger: > hello all, > > I tried to build seqAn yesterday from the svn and encountered some > problems with missing files. After a night of sleep I remembered > from my > seqAn lessons during the software project last summer that sometimes > the > initial checkout of seqAn will skip some files (for no reason if I > remember correctly). So I did a "svn up", got some more files and > now I > can build seqAn (micro_razers still causes problems). > > Maybe this issue could be added to the "getting started" page of the > trac wiki. > > Greetings, > Felix Heeger > > > _______________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev From manuel.holtgrewe@fu-berlin.de Sun May 02 14:42:45 2010 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1O8YVQ-0005xh-UO>; Sun, 02 May 2010 14:42:45 +0200 Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost1.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1O8YVQ-0001QA-Se>; Sun, 02 May 2010 14:42:44 +0200 Received: from g225139025.adsl.alicedsl.de ([92.225.139.25] helo=[192.168.1.101]) by inpost2.zedat.fu-berlin.de (Exim 4.69) with esmtpsa (envelope-from ) id <1O8YVQ-00021N-OW>; Sun, 02 May 2010 14:42:44 +0200 Message-Id: <475830E5-A93F-4D02-8542-DEB464621D3C@fu-berlin.de> From: Manuel Holtgrewe To: SeqAn Development In-Reply-To: <1272792070.4067.9.camel@NX> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Date: Sun, 2 May 2010 14:42:43 +0200 References: <1272792070.4067.9.camel@NX> X-Mailer: Apple Mail (2.936) X-Originating-IP: 92.225.139.25 X-purgate: clean X-purgate-ID: 151147::1272804164-00000DDD-858D1DA4/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000007, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Gabun.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Subject: Re: [Seqan-dev] getting started X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 May 2010 12:42:45 -0000 > remember correctly). So I did a "svn up", got some more files and > now I > can build seqAn (micro_razers still causes problems). Yes, the current trunk is broken. We will fix this shortly. *m From manuel.holtgrewe@fu-berlin.de Mon May 03 14:12:46 2010 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1O8uVx-0002yZ-R6>; Mon, 03 May 2010 14:12:45 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1O8uVx-0008Jp-PI>; Mon, 03 May 2010 14:12:45 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1O8uVx-0007Is-Ja>; Mon, 03 May 2010 14:12:45 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Mon, 3 May 2010 14:12:45 +0200 From: "Holtgrewe, Manuel" To: "Holtgrewe, Manuel" Date: Mon, 3 May 2010 14:12:44 +0200 Thread-Topic: [Seqan-dev] getting started Thread-Index: Acrque/SAaVioCOyT06fdyFFBk0+yA== Message-ID: <09CC4D92-2FE0-4BA0-9E05-6DCE4B0000AF@fu-berlin.de> References: <1272792070.4067.9.camel@NX> In-Reply-To: Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-purgate: clean X-purgate-ID: 151147::1272888765-00000DDD-B3B99ECA/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.067077, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Dschibuti.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Cc: SeqAn Development Subject: Re: [Seqan-dev] getting started X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 May 2010 12:12:46 -0000 >> can build seqAn (micro_razers still causes problems). This should be fixed now. From f.buske@uq.edu.au Fri May 14 09:03:36 2010 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OCovm-0003LO-Uo>; Fri, 14 May 2010 09:03:35 +0200 Received: from mailhub4.uq.edu.au ([130.102.149.131]) by relay1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OCovm-0005zs-55>; Fri, 14 May 2010 09:03:34 +0200 Received: from smtp3.uq.edu.au (smtp3.uq.edu.au [130.102.128.18]) by mailhub4.uq.edu.au (8.13.8/8.13.8) with ESMTP id o4E73VFe006550 for ; Fri, 14 May 2010 17:03:31 +1000 Received: from tlb-sumo.imb.uq.edu.au (tlb-sumo.imb.uq.edu.au [130.102.118.111]) (authenticated bits=0) by smtp3.uq.edu.au (8.13.8/8.13.8) with ESMTP id o4E73UHN023334 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Fri, 14 May 2010 17:03:31 +1000 Message-ID: <4BECF5C3.6040600@uq.edu.au> Date: Fri, 14 May 2010 17:03:31 +1000 From: Fabian Buske User-Agent: Thunderbird 2.0.0.24 (Macintosh/20100228) MIME-Version: 1.0 To: seqan-dev@lists.fu-berlin.de Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-UQ-FilterTime: 1273820611 X-Scanned-By: MIMEDefang 2.58 on UQ Mailhub on 130.102.149.131 X-Originating-IP: 130.102.149.131 X-purgate: clean X-purgate-ID: 151147::1273820614-000051C5-184BC750/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.191861, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Burundi.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none Subject: [Seqan-dev] SwiftLocal specialization with Hamming distance X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 07:03:36 -0000 Hi, I'm currently converting my project from SeqAn release 1.2 to the svn trunk. Doing this I realized that there has been quite a few changes going on. I have a couple of questions that I hope you can comment on. Is there a reason that there is no Hamming-specialization of the SwiftLocal swift-algorithm (find_swift)? Since I'm keen on using a local alignment filter with Hamming distance I would be grateful if that could be added. Secondly, in the same algorithm ndlPos seems to be particular interesting for the Hamming distance version, since it could potentially speed up any verification due to the known offset of the needle and the haystack to each other. Which from the set of available algorithms would be most suited for the validation, i.e. does approximate matching without indels returning all alignments that fulfill a minimum size and error rate criteria. Cheers, Fabian -- Fabian Buske Institute for Molecular Bioscience The University of Queensland Brisbane, Qld. 4072 Australia Phone: (61)-(7)-334-62608 From Birte.Kehr@fu-berlin.de Fri May 14 11:39:55 2010 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OCrN3-0008NZ-Ev>; Fri, 14 May 2010 11:39:53 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OCrN3-0002Gs-Ct>; Fri, 14 May 2010 11:39:53 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OCrN3-0005dX-9s>; Fri, 14 May 2010 11:39:53 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Fri, 14 May 2010 11:39:53 +0200 From: "Kehr, Birte" To: "'seqan-dev@lists.fu-berlin.de'" Date: Fri, 14 May 2010 11:39:52 +0200 Thread-Topic: Re: [Seqan-dev] SwiftLocal specialization with Hamming distance Thread-Index: AcrzSWc3kmduHtM0TcqDo4+EDAZoNQ== Message-ID: Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: multipart/alternative; boundary="_000_DAD226CB6878494EABEFD5215AA10201416C7948A9exchange6fube_" MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-ZEDAT-Hint: A X-purgate: clean X-purgate-ID: 151147::1273829993-000051C5-0E13994E/0-0/0-0 X-Bogosity: Unsure, tests=bogofilter, spamicity=0.454933, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Algerien.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=ALL_TRUSTED,FU_BOGO_UNSURE, HTML_60_70,HTML_MESSAGE Subject: Re: [Seqan-dev] SwiftLocal specialization with Hamming distance X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 09:39:55 -0000 --_000_DAD226CB6878494EABEFD5215AA10201416C7948A9exchange6fube_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Fabian, you are right, there is no Hamming specialization for SwiftLocal in SeqAn y= et. I am currently working on a verification strategy for the more general edit= distance version. In your case, I would suggest to use the more general edit distance filter = (SwiftLocal). Swift is only a filter algorithm, so all hamming distance mat= ches will be contained in the results from the edit distance version. And y= ou will have to verify the reported hits in any case. You are also right that the local Swift computes not only the positions in = the haystack but also the positions in the needle. The local version is tho= ught to be a filter for local alignments between two long sequences. Once you have called the find function on a finder and pattern, e.g. find(f= inder, pattern, epsilon, minLength), you can obtain the positions of a hit = in the haystack and needle with the function positionRange(finder) and post= ionRange(pattern), and the corresponding sequences with infix(finder) and i= nfix(pattern). For the verification , you should be aware that Swift only guarantees to re= port regions that *overlap* with possible epsilon-matches. My suggestion is= to use banded local alignment (BandedWatermanEggert) on the swift hits (pa= rallelograms). The local alignments could be used as seeds for ungapped ext= ension (UngappedXDrop). Here are the corresponding links to the SeqAn docum= entation: http://www.seqan.de/dddoc/html_devel/FUNCTION.local_Alignment.html http://www.seqan.de/dddoc/html_devel/FUNCTION.extend_Seed.html You may also find the sections about local alignment and seed extension of = the SeqAn tutorial interesting: https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Alignments#Local https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtens= ionAndBandedAlignment For the details of your verification step you will have to see what is appr= opriate for your special application. Cheers, Birte --_000_DAD226CB6878494EABEFD5215AA10201416C7948A9exchange6fube_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi Fabian,

 

you are right, there is no Hamming specialization for SwiftLocal in SeqAn yet.

I am currently working on a verific= ation strategy for the more general edit distance version.

 

In your case, I would suggest to us= e the more general edit distance filter (SwiftLocal). Swift is only a filter algorithm= , so all hamming distance matches will be contained in the results from the edit distance version. And you will have to verify the reported hits in any case= .

 

You are also right that the local S= wift computes not only the positions in the haystack but also the positions in the needle= . The local version is thought to be a filter for local alignments between two lo= ng sequences.

Once you have called the find funct= ion on a finder and pattern, e.g. find(finder, pattern, epsilon, minLength)= , you can obtain the positions of a hit in the haystack and ne= edle with the function positionRange(finder) and postionRange(pattern), and the corresponding sequences with infix(finder) and infix(pa= ttern).

 

For the verification , you should b= e aware that Swift only guarantees to report regions that *overlap* with pos= sible epsilon-matches. My suggestion is to use banded local alignment (BandedWate= rmanEggert) on the swift hits (parallelograms). The local alignments could be used as s= eeds for ungapped extension (UngappedXDrop). Here are the corresponding links to= the SeqAn documentation:

http://www.seqan.de/dddoc/html_devel/FUNCTION.local_Alignment.html

htt= p://www.seqan.de/dddoc/html_devel/FUNCTION.extend_Seed.html<= /span>

 

You may also find the sections abou= t local alignment and seed extension of the SeqAn tutorial interesting:<= /span>

= https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Alignments#Local<= /o:p>

https://trac.mi.fu-berlin.de/seqan/wiki/Tuto= rial/Seed-and-Extend#SeedExtensionAndBandedAlignment<= /p>

 

For the details of your verificatio= n step you will have to see what is appropriate for your special application.=

 

Cheers,

Birte

--_000_DAD226CB6878494EABEFD5215AA10201416C7948A9exchange6fube_-- From f.buske@uq.edu.au Thu May 20 04:08:15 2010 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OEvBF-00056w-Rm>; Thu, 20 May 2010 04:08:13 +0200 Received: from mailhub4.uq.edu.au ([130.102.149.131]) by relay1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OEvBF-0008BT-2D>; Thu, 20 May 2010 04:08:13 +0200 Received: from smtp4.uq.edu.au (smtp4.uq.edu.au [130.102.128.19]) by mailhub4.uq.edu.au (8.13.8/8.13.8) with ESMTP id o4K28AaG032456 for ; Thu, 20 May 2010 12:08:10 +1000 Received: from tlb-sumo.imb.uq.edu.au (tlb-sumo.imb.uq.edu.au [130.102.118.111]) (authenticated bits=0) by smtp4.uq.edu.au (8.13.8/8.13.8) with ESMTP id o4K2891B031694 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Thu, 20 May 2010 12:08:09 +1000 Message-ID: <4BF49989.4050508@uq.edu.au> Date: Thu, 20 May 2010 12:08:09 +1000 From: Fabian Buske User-Agent: Thunderbird 2.0.0.24 (Macintosh/20100228) MIME-Version: 1.0 To: SeqAn Development References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-UQ-FilterTime: 1274321291 X-Scanned-By: MIMEDefang 2.58 on UQ Mailhub on 130.102.149.131 X-Originating-IP: 130.102.149.131 X-purgate: clean X-purgate-ID: 151147::1274321293-000051C5-98AA77F0/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.049240, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Burundi.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none Subject: Re: [Seqan-dev] SwiftLocal specialization with Hamming distance - problems with WatermanEggert X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 May 2010 02:08:15 -0000 Hi Birte, Thanks for the quick response. Would you mind to add an example how to use the BandedWatermanEggert. I get smacked when I simply adjust the example in alignment_local under demos to : Align< String > ali2; appendValue(rows(ali2), "AAAAAAANAAAGGGNGGGGGGGGNGGGGGANAA"); appendValue(rows(ali2), "GGGGGGCGGGGGGGA"); LocalAlignmentFinder<> finder(ali2); Score scoring(1, -1, -1, -1); while (localAlignment(ali2, finder, scoring, 5, BandedWatermanEggert())) { ::std::cout << "Score = " << getScore(finder) << ::std::endl; ::std::cout << ali2; ::std::cout << "Aligns Seq1[" << sourceBeginPosition(row(ali2, 0)) << ":" << (sourceEndPosition(row(ali2, 0))-1) << "]"; ::std::cout << " and Seq2[" << sourceBeginPosition(row(ali2, 1)) << ":" << (sourceEndPosition(row(ali2, 1))-1) << "]" << ::std::endl << ::std::endl; } The compiler complains: alignment_local.cpp: In function 'int main()': alignment_local.cpp:28: error: no matching function for call to 'localAlignment(seqan::Align >, seqan::ArrayGaps>&, seqan::LocalAlignmentFinder&, seqan::Score&, int, const seqan::BandedWatermanEggert)' The (non-banded) WatermanEggert is throwing a "Bad access memory exception" from time to time (even in the alignment example under demos). Its difficult to narrow down what the problem actually is. I take it thats the reason why the example is commented out in the first place? Also this method in find_swift.h template inline typename Infix< typename GetSequenceByNo< TIndex const >::Type >::Type infix(Pattern > const & pattern, TText &text) { __int64 hitBegin = pattern.curBeginPos; __int64 hitEnd = pattern.curEndPos; __int64 textLength = sequenceLength(pattern.curSeqNo, needle(pattern)); if (hitEnd > textLength) hitEnd = textLength; if (hitBegin < 0) hitBegin = 0; return infix(text, hitBegin, hitEnd); } complains about TText &text and would rather prefer a TText const &text (at least for the data structure I input). Cheers, Fabian Kehr, Birte wrote: > > Hi Fabian, > > > > you are right, there is no Hamming specialization for SwiftLocal in > SeqAn yet. > > I am currently working on a verification strategy for the more general > edit distance version. > > > > In your case, I would suggest to use the more general edit distance > filter (SwiftLocal). Swift is only a filter algorithm, so all hamming > distance matches will be contained in the results from the edit > distance version. And you will have to verify the reported hits in any > case. > > > > You are also right that the local Swift computes not only the > positions in the haystack but also the positions in the needle. The > local version is thought to be a filter for local alignments between > two long sequences. > > Once you have called the find function on a finder and pattern, e.g. > find(finder, pattern, epsilon, minLength), you can obtain the > positions of a hit in the haystack and needle with the function > positionRange(finder) and postionRange(pattern), and the corresponding > sequences with infix(finder) and infix(pattern). > > > > For the verification , you should be aware that Swift only guarantees > to report regions that **overlap** with possible epsilon-matches. My > suggestion is to use banded local alignment (BandedWatermanEggert) on > the swift hits (parallelograms). The local alignments could be used as > seeds for ungapped extension (UngappedXDrop). Here are the > corresponding links to the SeqAn documentation: > > http://www.seqan.de/dddoc/html_devel/FUNCTION.local_Alignment.html > > http://www.seqan.de/dddoc/html_devel/FUNCTION.extend_Seed.html > > > > You may also find the sections about local alignment and seed > extension of the SeqAn tutorial interesting: > > https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Alignments#Local > > https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtensionAndBandedAlignment > > > > For the details of your verification step you will have to see what is > appropriate for your special application. > > > > Cheers, > > Birte > > ------------------------------------------------------------------------ > > _______________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev > -- Fabian Buske Institute for Molecular Bioscience The University of Queensland Brisbane, Qld. 4072 Australia Phone: (61)-(7)-334-62608 From f.buske@uq.edu.au Thu May 20 04:20:22 2010 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OEvMy-0005S6-Sj>; Thu, 20 May 2010 04:20:20 +0200 Received: from mailhub4.uq.edu.au ([130.102.149.131]) by relay1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OEvMx-0001Iw-NV>; Thu, 20 May 2010 04:20:20 +0200 Received: from smtp4.uq.edu.au (smtp4.uq.edu.au [130.102.128.19]) by mailhub4.uq.edu.au (8.13.8/8.13.8) with ESMTP id o4K2KHpP022013 for ; Thu, 20 May 2010 12:20:17 +1000 Received: from tlb-sumo.imb.uq.edu.au (tlb-sumo.imb.uq.edu.au [130.102.118.111]) (authenticated bits=0) by smtp4.uq.edu.au (8.13.8/8.13.8) with ESMTP id o4K2KH31001093 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Thu, 20 May 2010 12:20:17 +1000 Message-ID: <4BF49C61.7070602@uq.edu.au> Date: Thu, 20 May 2010 12:20:17 +1000 From: Fabian Buske User-Agent: Thunderbird 2.0.0.24 (Macintosh/20100228) MIME-Version: 1.0 To: SeqAn Development References: <4BF49989.4050508@uq.edu.au> In-Reply-To: <4BF49989.4050508@uq.edu.au> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-UQ-FilterTime: 1274322017 X-Scanned-By: MIMEDefang 2.58 on UQ Mailhub on 130.102.149.131 X-Originating-IP: 130.102.149.131 X-purgate: clean X-purgate-ID: 151147::1274322020-000051C5-917FC41F/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.036985, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Burundi.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none Subject: Re: [Seqan-dev] SwiftLocal specialization with Hamming distance - problems with WatermanEggert X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 May 2010 02:20:22 -0000 Hi Birte, Another remark: for the sequences: s1: GGGAGGGGAGGGGGAGGGGGGAGGG s2: GGGGGGAGGGGGAGGGGAGGG using : localAlignment(align, finder, Score scoring(1, -1, -100, -100), 13, WatermanEggert()) I get: Score = 15 0 . : . GGGGAGGGGGAGGGG ||||||||||||||| GGGGAGGGGGAGGGG Aligns Seq1[4:18] and Seq2[2:16] Score = 14 0 . : . GGGAGGGGAGGGGGAGGG |||||||| |||||||| GGGAGGGGGAGGGGAGGG Aligns Seq1[0:17] and Seq2[3:20] Score = 13 0 . : . : GGGGAGGGGGAGGGGGGAGGG |||| | ||| | |||||||| GGGGGGAGGGGGAGGGGAGGG Aligns Seq1[4:24] and Seq2[0:20] I would have though I only get the first since its giving the best score and the other two alignments clearly overlap with the sequence parts used in the first alignment. I thought the WatermanEggert would filter these alignments as not feasible(?) Is this a bug or am I mistaken? Cheers, Fabian Fabian Buske wrote: > Hi Birte, > > Thanks for the quick response. > > Would you mind to add an example how to use the BandedWatermanEggert. > I get smacked when I simply adjust the example in alignment_local > under demos to : > > Align< String > ali2; > appendValue(rows(ali2), "AAAAAAANAAAGGGNGGGGGGGGNGGGGGANAA"); > appendValue(rows(ali2), "GGGGGGCGGGGGGGA"); > > LocalAlignmentFinder<> finder(ali2); > Score scoring(1, -1, -1, -1); > while (localAlignment(ali2, finder, scoring, 5, > BandedWatermanEggert())) { > ::std::cout << "Score = " << getScore(finder) << ::std::endl; > ::std::cout << ali2; > ::std::cout << "Aligns Seq1[" << sourceBeginPosition(row(ali2, 0)) > << ":" << (sourceEndPosition(row(ali2, 0))-1) << "]"; > ::std::cout << " and Seq2[" << sourceBeginPosition(row(ali2, 1)) << > ":" << (sourceEndPosition(row(ali2, 1))-1) << "]" << ::std::endl << > ::std::endl; > } > > > The compiler complains: > > alignment_local.cpp: In function 'int main()': > alignment_local.cpp:28: error: no matching function for call to > 'localAlignment(seqan::Align seqan::Alloc >, seqan::ArrayGaps>&, > seqan::LocalAlignmentFinder&, seqan::Score&, > int, const seqan::BandedWatermanEggert)' > > The (non-banded) WatermanEggert is throwing a "Bad access memory > exception" from time to time (even in the alignment example under > demos). Its difficult to narrow down what the problem actually is. I > take it thats the reason why the example is commented out in the first > place? > > Also this method in find_swift.h > > template > inline typename Infix< typename GetSequenceByNo< TIndex const >::Type > >::Type > infix(Pattern > const & pattern, TText &text) > { > __int64 hitBegin = pattern.curBeginPos; > __int64 hitEnd = pattern.curEndPos; > __int64 textLength = sequenceLength(pattern.curSeqNo, > needle(pattern)); > > if (hitEnd > textLength) hitEnd = textLength; > if (hitBegin < 0) hitBegin = 0; > > return infix(text, hitBegin, hitEnd); > } > > complains about TText &text and would rather prefer a TText const > &text (at least for the data structure I input). > > Cheers, > Fabian > > > Kehr, Birte wrote: >> >> Hi Fabian, >> >> >> >> you are right, there is no Hamming specialization for SwiftLocal in >> SeqAn yet. >> >> I am currently working on a verification strategy for the more >> general edit distance version. >> >> >> >> In your case, I would suggest to use the more general edit distance >> filter (SwiftLocal). Swift is only a filter algorithm, so all hamming >> distance matches will be contained in the results from the edit >> distance version. And you will have to verify the reported hits in >> any case. >> >> >> >> You are also right that the local Swift computes not only the >> positions in the haystack but also the positions in the needle. The >> local version is thought to be a filter for local alignments between >> two long sequences. >> >> Once you have called the find function on a finder and pattern, e.g. >> find(finder, pattern, epsilon, minLength), you can obtain the >> positions of a hit in the haystack and needle with the function >> positionRange(finder) and postionRange(pattern), and the >> corresponding sequences with infix(finder) and infix(pattern). >> >> >> >> For the verification , you should be aware that Swift only guarantees >> to report regions that **overlap** with possible epsilon-matches. My >> suggestion is to use banded local alignment (BandedWatermanEggert) on >> the swift hits (parallelograms). The local alignments could be used >> as seeds for ungapped extension (UngappedXDrop). Here are the >> corresponding links to the SeqAn documentation: >> >> http://www.seqan.de/dddoc/html_devel/FUNCTION.local_Alignment.html >> >> http://www.seqan.de/dddoc/html_devel/FUNCTION.extend_Seed.html >> >> >> >> You may also find the sections about local alignment and seed >> extension of the SeqAn tutorial interesting: >> >> https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Alignments#Local >> >> https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtensionAndBandedAlignment >> >> >> >> >> For the details of your verification step you will have to see what >> is appropriate for your special application. >> >> >> >> Cheers, >> >> Birte >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> seqan-dev mailing list >> seqan-dev@lists.fu-berlin.de >> https://lists.fu-berlin.de/listinfo/seqan-dev >> > > -- Fabian Buske Institute for Molecular Bioscience The University of Queensland Brisbane, Qld. 4072 Australia Phone: (61)-(7)-334-62608 From Birte.Kehr@fu-berlin.de Thu May 20 13:35:08 2010 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OF41p-0006zX-TZ>; Thu, 20 May 2010 13:35:05 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OF41p-0007pZ-Rm>; Thu, 20 May 2010 13:35:05 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OF41p-0007xQ-LO>; Thu, 20 May 2010 13:35:05 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Thu, 20 May 2010 13:35:05 +0200 From: "Kehr, Birte" To: 'SeqAn Development' Date: Thu, 20 May 2010 13:35:04 +0200 Thread-Topic: [Seqan-dev] SwiftLocal specialization with Hamming distance - problems with WatermanEggert Thread-Index: Acr3wVdTb3egiTxGQjCoA2JpKP/4wQARuL7g Message-ID: References: <4BF49989.4050508@uq.edu.au> In-Reply-To: <4BF49989.4050508@uq.edu.au> Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-purgate: clean X-purgate-ID: 151147::1274355305-000051C5-33363064/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.200411, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Algerien.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Subject: Re: [Seqan-dev] SwiftLocal specialization with Hamming distance - problems with WatermanEggert X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 May 2010 11:35:08 -0000 Hi Fabian, I added an example for the BandedWatermanEggert in demos/alignment_local.cp= p. Your problem was that you did not specify the band - the lowest diagonal an= d the highest diagonal that is to be computed in the alignment matrix. Also make sure that you have updated to the current version of SeqAn. (I ha= ve fixed a small bug in the banded local alignment.) > The (non-banded) WatermanEggert is throwing a "Bad access memory=20 exception" from time to time (even in the alignment example under=20 demos). Its difficult to narrow down what the problem actually is. I=20 take it thats the reason why the example is commented out in the first=20 place? Here, I need some more information from you. In the current version no exam= ple is commented out in demos/alignment_local.cpp. I also do not get an exc= eption. Could you send me the code that is producing the error message? Again, make sure that you have checked out the current version of SeqAn. Your other remark: > ... the other two alignments clearly overlap with the sequence parts used= in the first alignment. I thought the WatermanEggert would filter these al= ignments as not feasible(?) Is this a bug or am I mistaken? The Waterman-Eggert algorithm computes no alignments where a character from= seq1 is matched to the same character in seq2 as in a previous alignment. = Sequence parts are allowed to overlap, only the traces in the alignment mat= rix are not allowed to overlap. In your example, there are different possib= ilities to match a G from one sequence to a G in the other sequence. The pa= irs of characters that are matched to each other are all different. > Also this method in find_swift.h [...] complains about TText &text and wo= uld rather prefer a TText const &text=20 (at least for the data structure I input). We have fixed a c&p bug here, it should work now. Thanks. In my applications I do not specify the text separately. If you have a reas= on for specifying it, I would be curious about it. Cheers, Birte Kehr, Birte wrote: > > Hi Fabian, > > =20 > > you are right, there is no Hamming specialization for SwiftLocal in=20 > SeqAn yet. > > I am currently working on a verification strategy for the more general=20 > edit distance version. > > =20 > > In your case, I would suggest to use the more general edit distance=20 > filter (SwiftLocal). Swift is only a filter algorithm, so all hamming=20 > distance matches will be contained in the results from the edit=20 > distance version. And you will have to verify the reported hits in any=20 > case. > > =20 > > You are also right that the local Swift computes not only the=20 > positions in the haystack but also the positions in the needle. The=20 > local version is thought to be a filter for local alignments between=20 > two long sequences. > > Once you have called the find function on a finder and pattern, e.g.=20 > find(finder, pattern, epsilon, minLength), you can obtain the=20 > positions of a hit in the haystack and needle with the function=20 > positionRange(finder) and postionRange(pattern), and the corresponding=20 > sequences with infix(finder) and infix(pattern). > > =20 > > For the verification , you should be aware that Swift only guarantees=20 > to report regions that **overlap** with possible epsilon-matches. My=20 > suggestion is to use banded local alignment (BandedWatermanEggert) on=20 > the swift hits (parallelograms). The local alignments could be used as=20 > seeds for ungapped extension (UngappedXDrop). Here are the=20 > corresponding links to the SeqAn documentation: > > http://www.seqan.de/dddoc/html_devel/FUNCTION.local_Alignment.html > > http://www.seqan.de/dddoc/html_devel/FUNCTION.extend_Seed.html > > =20 > > You may also find the sections about local alignment and seed=20 > extension of the SeqAn tutorial interesting: > > https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Alignments#Local > > https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExte= nsionAndBandedAlignment > > =20 > > For the details of your verification step you will have to see what is=20 > appropriate for your special application. > > =20 > > Cheers, > > Birte > > ------------------------------------------------------------------------ > > _______________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev > =20 --=20 Fabian Buske Institute for Molecular Bioscience The University of Queensland=20 Brisbane, Qld. 4072 Australia Phone: (61)-(7)-334-62608 _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de https://lists.fu-berlin.de/listinfo/seqan-dev From f.buske@uq.edu.au Tue May 25 03:52:34 2010 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OGjJn-0004OZ-P0>; Tue, 25 May 2010 03:52:31 +0200 Received: from mailhub4.uq.edu.au ([130.102.149.131]) by relay1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OGjJm-0004wQ-VM>; Tue, 25 May 2010 03:52:31 +0200 Received: from smtp4.uq.edu.au (smtp4.uq.edu.au [130.102.128.19]) by mailhub4.uq.edu.au (8.13.8/8.13.8) with ESMTP id o4P1qQbj011053 for ; Tue, 25 May 2010 11:52:27 +1000 Received: from tlb-sumo.imb.uq.edu.au (tlb-sumo.imb.uq.edu.au [130.102.118.111]) (authenticated bits=0) by smtp4.uq.edu.au (8.13.8/8.13.8) with ESMTP id o4P1qPel031120 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Tue, 25 May 2010 11:52:26 +1000 Message-ID: <4BFB2D5A.50200@uq.edu.au> Date: Tue, 25 May 2010 11:52:26 +1000 From: Fabian Buske User-Agent: Thunderbird 2.0.0.24 (Macintosh/20100228) MIME-Version: 1.0 To: SeqAn Development References: <4BF49989.4050508@uq.edu.au> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-UQ-FilterTime: 1274752347 X-Scanned-By: MIMEDefang 2.58 on UQ Mailhub on 130.102.149.131 X-Originating-IP: 130.102.149.131 X-purgate: clean X-purgate-ID: 151147::1274752351-000051C5-710931A2/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.162552, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Benin.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none Subject: Re: [Seqan-dev] SwiftLocal specialization with Hamming distance - problems with WatermanEggert X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 May 2010 01:52:34 -0000 Hi Birte, Kehr, Birte wrote: > Hi Fabian, > > I added an example for the BandedWatermanEggert in demos/alignment_local.cpp. > Your problem was that you did not specify the band - the lowest diagonal and the highest diagonal that is to be computed in the alignment matrix. > Also make sure that you have updated to the current version of SeqAn. (I have fixed a small bug in the banded local alignment.) > > >> The (non-banded) WatermanEggert is throwing a "Bad access memory >> > exception" from time to time (even in the alignment example under > demos). Its difficult to narrow down what the problem actually is. I > take it thats the reason why the example is commented out in the first > place? > Apologies for the confusion about the commented WatermanEggert example. I did that myself and forgot about it. > Here, I need some more information from you. In the current version no example is commented out in demos/alignment_local.cpp. I also do not get an exception. Could you send me the code that is producing the error message? > Again, make sure that you have checked out the current version of SeqAn. > I've updated my local version to the current trunk but I still get a "Bus error" when running the Waterman Eggert algorithm. In fact the very first of the examples of alignment_local demo causes the problem. I'm working on i686-apple-darwin9-gcc-4.0.1 on a Mac osx. I dump the debugger tree below: #0 0x0000725a in seqan::previousPosition at matrix_base.h:413 #1 0x0000813c in seqan::goPrevious at matrix_base.h:543 #2 0x0001fd90 in seqan::_smith_waterman_declump, seqan::Alloc >, seqan::ArrayGaps> at align_local_dynprog.h:414 #3 0x00020654 in seqan::_smithWatermanGetNext, seqan::Alloc >, seqan::ArrayGaps, int> at align_local_dynprog.h:765 #4 0x00020886 in seqan::localAlignment, seqan::Alloc >, seqan::ArrayGaps, int, int, int> at align_local_dynprog.h:841 #5 0x000036e9 in main at alignment_local.cpp:26 The variables in previousPosition contain: position_ = 3221222916 dimension_ = 1 while me.data_factors is pretty much empty (capacity=0) So I take it that is has not been initialized properly. > Your other remark: > >> ... the other two alignments clearly overlap with the sequence parts used in the first alignment. I thought the WatermanEggert would filter these alignments as not feasible(?) Is this a bug or am I mistaken? >> > > The Waterman-Eggert algorithm computes no alignments where a character from seq1 is matched to the same character in seq2 as in a previous alignment. Sequence parts are allowed to overlap, only the traces in the alignment matrix are not allowed to overlap. In your example, there are different possibilities to match a G from one sequence to a G in the other sequence. The pairs of characters that are matched to each other are all different. > > Thanks for making that clear. >> Also this method in find_swift.h [...] complains about TText &text and would rather prefer a TText const &text >> > (at least for the data structure I input). > > We have fixed a c&p bug here, it should work now. Thanks. > In my applications I do not specify the text separately. If you have a reason for specifying it, I would be curious about it. > I'm not doing that either. The compiler just complained about it. Cheers, Fabian > Cheers, > Birte > > > > Kehr, Birte wrote: > >> Hi Fabian, >> >> >> >> you are right, there is no Hamming specialization for SwiftLocal in >> SeqAn yet. >> >> I am currently working on a verification strategy for the more general >> edit distance version. >> >> >> >> In your case, I would suggest to use the more general edit distance >> filter (SwiftLocal). Swift is only a filter algorithm, so all hamming >> distance matches will be contained in the results from the edit >> distance version. And you will have to verify the reported hits in any >> case. >> >> >> >> You are also right that the local Swift computes not only the >> positions in the haystack but also the positions in the needle. The >> local version is thought to be a filter for local alignments between >> two long sequences. >> >> Once you have called the find function on a finder and pattern, e.g. >> find(finder, pattern, epsilon, minLength), you can obtain the >> positions of a hit in the haystack and needle with the function >> positionRange(finder) and postionRange(pattern), and the corresponding >> sequences with infix(finder) and infix(pattern). >> >> >> >> For the verification , you should be aware that Swift only guarantees >> to report regions that **overlap** with possible epsilon-matches. My >> suggestion is to use banded local alignment (BandedWatermanEggert) on >> the swift hits (parallelograms). The local alignments could be used as >> seeds for ungapped extension (UngappedXDrop). Here are the >> corresponding links to the SeqAn documentation: >> >> http://www.seqan.de/dddoc/html_devel/FUNCTION.local_Alignment.html >> >> http://www.seqan.de/dddoc/html_devel/FUNCTION.extend_Seed.html >> >> >> >> You may also find the sections about local alignment and seed >> extension of the SeqAn tutorial interesting: >> >> https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Alignments#Local >> >> https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtensionAndBandedAlignment >> >> >> >> For the details of your verification step you will have to see what is >> appropriate for your special application. >> >> >> >> Cheers, >> >> Birte >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> seqan-dev mailing list >> seqan-dev@lists.fu-berlin.de >> https://lists.fu-berlin.de/listinfo/seqan-dev >> >> > > > -- Fabian Buske Institute for Molecular Bioscience The University of Queensland Brisbane, Qld. 4072 Australia Phone: (61)-(7)-334-62608 From Birte.Kehr@fu-berlin.de Tue May 25 10:37:10 2010 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OGpdN-0000HG-GJ>; Tue, 25 May 2010 10:37:09 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OGpdN-0003fH-E9>; Tue, 25 May 2010 10:37:09 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OGpdN-0006Nw-8N>; Tue, 25 May 2010 10:37:09 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Tue, 25 May 2010 10:37:09 +0200 From: "Kehr, Birte" To: 'SeqAn Development' Date: Tue, 25 May 2010 10:36:35 +0200 Thread-Topic: [Seqan-dev] SwiftLocal specialization with Hamming distance - problems with WatermanEggert Thread-Index: Acr7rPcSLk0Of2kERQqPEXpEdXkg8wAN9h5Q Message-ID: References: <4BF49989.4050508@uq.edu.au> <4BFB2D5A.50200@uq.edu.au> In-Reply-To: <4BFB2D5A.50200@uq.edu.au> Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-purgate: clean X-purgate-ID: 151147::1274776629-000051C5-82196327/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.106061, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Botsuana.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Subject: Re: [Seqan-dev] SwiftLocal specialization with Hamming distance - problems with WatermanEggert X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 May 2010 08:37:10 -0000 Hi Fabian, > I've updated my local version to the current trunk but I still get a=20 "Bus error" when running the Waterman Eggert algorithm. In fact the very=20 first of the examples of alignment_local demo causes the problem. I'm=20 working on i686-apple-darwin9-gcc-4.0.1 on a Mac osx. > [...] > So I take it that it has not been initialized properly. Thanks, there was an initialization problem that did not appear under Linux= and Windows. It is fixed now. Cheers, Birte From f.buske@uq.edu.au Wed May 26 04:12:39 2010 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OH66o-0000dh-QO>; Wed, 26 May 2010 04:12:38 +0200 Received: from mailhub3.uq.edu.au ([130.102.148.131]) by relay1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OH66o-0002tH-0s>; Wed, 26 May 2010 04:12:38 +0200 Received: from smtp4.uq.edu.au (smtp4.uq.edu.au [130.102.128.19]) by mailhub3.uq.edu.au (8.13.8/8.13.8) with ESMTP id o4Q2CYYF004285 for ; Wed, 26 May 2010 12:12:34 +1000 Received: from tlb-sumo.imb.uq.edu.au (tlb-sumo.imb.uq.edu.au [130.102.118.111]) (authenticated bits=0) by smtp4.uq.edu.au (8.13.8/8.13.8) with ESMTP id o4Q2CXHK021021 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 26 May 2010 12:12:34 +1000 Message-ID: <4BFC8391.1060706@uq.edu.au> Date: Wed, 26 May 2010 12:12:33 +1000 From: Fabian Buske User-Agent: Thunderbird 2.0.0.24 (Macintosh/20100228) MIME-Version: 1.0 To: SeqAn Development References: <4BF49989.4050508@uq.edu.au> <4BFB2D5A.50200@uq.edu.au> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-UQ-FilterTime: 1274839954 X-Scanned-By: MIMEDefang 2.58 on UQ Mailhub on 130.102.148.131 X-Originating-IP: 130.102.148.131 X-purgate: clean X-purgate-ID: 151147::1274839958-000051C5-B449D10E/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000030, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Algerien.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none Subject: Re: [Seqan-dev] SwiftPattern reinit required X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 May 2010 02:12:40 -0000 Hi Birte, thanks for fixing the bug. I made another observation that seems strange. I run swift on a bunch of sequences (Finder). However, only the first sequence will ever get a match. Looking at you swiftlocal app I see that you call Pattern > pattern_swift(index_qgram); in the inner loop. This probably resets something in the object that is crucial to the program. I was wondering if it wouldn't make sense to "reuse" the already instantiated object and "just" reset what ever required? Or is there a special reason to do it this way? I think it used to work (with seqan release 1.2). The swiftlocal app currently doesn't compile so I couldn't just give it a go and try what happens when the line mentioned above is initialized outside the loop. Cheers, Fabian Kehr, Birte wrote: > Hi Fabian, > > >> I've updated my local version to the current trunk but I still get a >> > "Bus error" when running the Waterman Eggert algorithm. In fact the very > first of the examples of alignment_local demo causes the problem. I'm > working on i686-apple-darwin9-gcc-4.0.1 on a Mac osx. > >> [...] >> So I take it that it has not been initialized properly. >> > > Thanks, there was an initialization problem that did not appear under Linux and Windows. It is fixed now. > > Cheers, Birte > > _______________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev > -- Fabian Buske Institute for Molecular Bioscience The University of Queensland Brisbane, Qld. 4072 Australia Phone: (61)-(7)-334-62608 From eissler@in.tum.de Wed May 26 17:11:28 2010 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OHIGU-0002cs-2v>; Wed, 26 May 2010 17:11:26 +0200 Received: from mail-out1.informatik.tu-muenchen.de ([131.159.0.8]) by relay1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OHIGU-0006bM-00>; Wed, 26 May 2010 17:11:26 +0200 Received: from [131.159.35.23] (unknown [131.159.35.23]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.in.tum.de (Postfix) with ESMTP id 13C8CC582 for ; Wed, 26 May 2010 17:11:25 +0200 (CEST) Message-ID: <4BFD3A18.3010609@in.tum.de> Date: Wed, 26 May 2010 17:11:20 +0200 From: =?ISO-8859-1?Q?Tilo_Ei=DFler?= User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100423 Lightning/1.0b1 Thunderbird/3.0.4 MIME-Version: 1.0 To: Seqan developer list Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-Originating-IP: 131.159.0.8 X-purgate: clean X-purgate-ID: 151147::1274886686-000051C5-82DA57AF/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000073, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Dschibuti.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=RATWARE_GECKO_BUILD Subject: [Seqan-dev] Approximate Searching in set of strings X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 May 2010 15:11:28 -0000 Hi everyone, as the subject implies I have a question about approximate searching in sets of strings with Seqan. What I want to do: Search for patterns (exact and approximate) in a set of DNA or RNA sequence strings. What I've tried so far is to generate an index for the set of strings and search exact patterns, which works fine. I've also created a finder for a single test string and tried searching approximately as it is shown in the documentation. No problem so far. What I want to know now is if it is generally possible to create an index for a set of strings and search approximately using a finder for this index? Maybe there is some documentation I didn't recognize so far. I've tried the seqan 1.2 stable release as well as the svn trunk. Thank you very much in advance. Best wishes, Tilo P.S. I'm currently using Eclipse under Linux as IDE for developing C++. Unfortunately the integrated C++-indexer has some problems with seqan running out of heap-space. Which IDE do you use? Or do you have any hints how to change the eclipse settings to get along with seqan properly without loosing the benefits of an IDE over a simple text editor, e.g. code completion? Turning of the C++-indexer is not what I want to do ;-) -- Dipl.-Inf. Tilo Eißler Technische Universität München |tel. +49-89-289-19478 Institut für Informatik, I10 | Boltzmannstr. 3 |Office 02.13.059 85748 Garching b. München |email eissler@in.tum.de From Birte.Kehr@fu-berlin.de Wed May 26 17:44:25 2010 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OHImO-0003jV-El>; Wed, 26 May 2010 17:44:24 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OHImO-0001T1-D6>; Wed, 26 May 2010 17:44:24 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OHImO-0002uC-B7>; Wed, 26 May 2010 17:44:24 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Wed, 26 May 2010 17:44:23 +0200 From: "Kehr, Birte" To: SeqAn Development Date: Wed, 26 May 2010 17:44:22 +0200 Thread-Topic: [Seqan-dev] SwiftPattern reinit required Thread-Index: Acr8ePMGcD3ptZ/9TPmOaFVMi7cLLgAa/esA Message-ID: References: <4BF49989.4050508@uq.edu.au> <4BFB2D5A.50200@uq.edu.au> <4BFC8391.1060706@uq.edu.au> In-Reply-To: <4BFC8391.1060706@uq.edu.au> Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-purgate: clean X-purgate-ID: 151147::1274888664-000051C5-9BC3996F/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000015, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Botsuana.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Subject: Re: [Seqan-dev] SwiftPattern reinit required X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 May 2010 15:44:25 -0000 Hi Fabian, > I made another observation that seems=20 > strange. I run swift on a bunch of sequences (Finder). However, only the= =20 > first sequence will ever get a match. > Looking at you swiftlocal app I see that you call >=20 > Pattern > pattern_swift(index_qgram); >=20 > in the inner loop. This probably resets something in the object that is=20 > crucial to the program. I was wondering if it wouldn't make sense to=20 > "reuse" the already instantiated object and "just" reset what ever=20 > required? Or is there a special reason to do it this way? I think it=20 > used to work (with seqan release 1.2).=20 Yes, one needed to re-initialize the pattern. Reusing the object was only p= ossible with the semi-global version until now. But as I planned to have a = look at it soon anyway, I now fixed it for the local version. > The swiftlocal app currently=20 > doesn't compile so I couldn't just give it a go and try what happens=20 > when the line mentioned above is initialized outside the loop. Normally, it should compile. I recently moved the file from the demos folde= r to the apps folder. Have you updated all files to the current version? Cheers,=20 Birte From manuel.holtgrewe@fu-berlin.de Wed May 26 17:50:54 2010 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OHIse-0003uX-6h>; Wed, 26 May 2010 17:50:52 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OHIse-0002R4-4o>; Wed, 26 May 2010 17:50:52 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OHIse-0003Ll-1F>; Wed, 26 May 2010 17:50:52 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Wed, 26 May 2010 17:50:51 +0200 From: "Holtgrewe, Manuel" To: SeqAn Development Date: Wed, 26 May 2010 17:50:50 +0200 Thread-Topic: [Seqan-dev] Approximate Searching in set of strings Thread-Index: Acr86zd0dq/Z9d0VTXO3TEhSwN7CtQ== Message-ID: <0F13AD37-5FE7-4E5D-B054-FE9925F96199@fu-berlin.de> References: <4BFD3A18.3010609@in.tum.de> In-Reply-To: <4BFD3A18.3010609@in.tum.de> Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-purgate: clean X-purgate-ID: 151147::1274889052-000051C5-86C0747C/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.075524, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Dschibuti.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Subject: Re: [Seqan-dev] Approximate Searching in set of strings X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 May 2010 15:50:54 -0000 Hi Tilo, as far as I know, there is no way to do non-heuristic approximate string se= arching using indices, i.e. something like O(1) lookups. What exactly are y= ou looking for and trying to solve? SeqAn has implementations of various online algorithms for approximate stri= ng searching, e.g. see [1]. Notably, Myer's Bitvector algorithm is very fas= t for searching single strings with a length up to the machine word width i= n large strings. Going beyond this limit makes things a bit slower since bi= ts have to migrate over machine words. Pex, AbndmAlgo and DPSearch also allow for edit distance, DPSearch even all= ows linear gap and arbitrary substitution scores. If you are looking for read mapping, i.e. many short reads to be found with= Hamming/edit distance in long strings, have a look at RazerS, our read map= ping tool. It uses the SWIFT algorithm to first find regions that overlap w= ith epsilon-matches and then uses Myer's Bitvector algorithm or a naive Ham= ming search to look for the exact positions. Another possible approach is building a q-gram index of your long string (h= aystack) and then searching for the q-grams of your short string (needle) i= n the haystack. You can look at the LAGAN demo to see an application of q-g= ram indices and seeds in SeqAn. FASTA, BLAST, bowtie and bwa are codes that= do kinds of "approximate string searching" and could be implemented with t= he infrastructure of the SeqAn library. People around here use Visual Studio, Emacs, XCode and occasionally vim. I = have never seen Visual Studio really make sense out of SeqAn style C++, XCo= de seems to infer symbol names that make sense (which Visual Studio should = be able, too). I have not tried to get syntax completion running in either = Emacs or XCode. HTH, Manuel [1] http://www.seqan.de/dddoc/html_devel/CLASS_Pattern.html From Knut.Reinert@fu-berlin.de Wed May 26 20:02:52 2010 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OHKwM-0008JB-2q>; Wed, 26 May 2010 20:02:50 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OHKwM-0006Hw-10>; Wed, 26 May 2010 20:02:50 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OHKwL-0004EW-TU>; Wed, 26 May 2010 20:02:50 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Wed, 26 May 2010 20:02:49 +0200 From: "Reinert, Knut" To: SeqAn Development Date: Wed, 26 May 2010 20:02:51 +0200 Thread-Topic: [Seqan-dev] Approximate Searching in set of strings Thread-Index: Acr8/abv2orLehu8TmuaZZxynm/ggw== Message-ID: References: <4BFD3A18.3010609@in.tum.de> <0F13AD37-5FE7-4E5D-B054-FE9925F96199@fu-berlin.de> In-Reply-To: <0F13AD37-5FE7-4E5D-B054-FE9925F96199@fu-berlin.de> Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: multipart/signed; boundary="Apple-Mail-3-237028661"; protocol="application/pkcs7-signature"; micalg=sha1 MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-ZEDAT-Hint: A X-purgate: clean X-purgate-ID: 151147::1274896970-000051C5-E18D21E4/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.005767, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Botsuana.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Subject: Re: [Seqan-dev] Approximate Searching in set of strings X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 May 2010 18:02:52 -0000 --Apple-Mail-3-237028661 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Wie sehe ich denn die Frage von Tilo? On May 26, 2010, at 5:50 PM, Holtgrewe, Manuel wrote: > Hi Tilo, >=20 > as far as I know, there is no way to do non-heuristic approximate = string searching using indices, i.e. something like O(1) lookups. What = exactly are you looking for and trying to solve? >=20 >=20 > SeqAn has implementations of various online algorithms for approximate = string searching, e.g. see [1]. Notably, Myer's Bitvector algorithm is = very fast for searching single strings with a length up to the machine = word width in large strings. Going beyond this limit makes things a bit = slower since bits have to migrate over machine words. >=20 > Pex, AbndmAlgo and DPSearch also allow for edit distance, DPSearch = even allows linear gap and arbitrary substitution scores. >=20 >=20 > If you are looking for read mapping, i.e. many short reads to be found = with Hamming/edit distance in long strings, have a look at RazerS, our = read mapping tool. It uses the SWIFT algorithm to first find regions = that overlap with epsilon-matches and then uses Myer's Bitvector = algorithm or a naive Hamming search to look for the exact positions. >=20 >=20 > Another possible approach is building a q-gram index of your long = string (haystack) and then searching for the q-grams of your short = string (needle) in the haystack. You can look at the LAGAN demo to see = an application of q-gram indices and seeds in SeqAn. FASTA, BLAST, = bowtie and bwa are codes that do kinds of "approximate string searching" = and could be implemented with the infrastructure of the SeqAn library. >=20 >=20 > People around here use Visual Studio, Emacs, XCode and occasionally = vim. I have never seen Visual Studio really make sense out of SeqAn = style C++, XCode seems to infer symbol names that make sense (which = Visual Studio should be able, too). I have not tried to get syntax = completion running in either Emacs or XCode. >=20 > HTH, > Manuel >=20 > [1] http://www.seqan.de/dddoc/html_devel/CLASS_Pattern.html >=20 >=20 > _______________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev --Apple-Mail-3-237028661 Content-Disposition: attachment; filename="smime.p7s" Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIF+jCCBfYw ggTeoAMCAQICBA4+Vb4wDQYJKoZIhvcNAQEFBQAwgbUxCzAJBgNVBAYTAkRFMQ8wDQYDVQQIEwZC ZXJsaW4xDzANBgNVBAcTBkJlcmxpbjEiMCAGA1UEChMZRnJlaWUgVW5pdmVyc2l0YWV0IEJlcmxp bjEOMAwGA1UECxMFWkVEQVQxMDAuBgNVBAMTJ0ZyZWllIFVuaXZlcnNpdGFldCBCZXJsaW4gLSBG VS1DQSAtIEcwMTEeMBwGCSqGSIb3DQEJARYPY2FARlUtQmVybGluLkRFMB4XDTA5MDUyODE4MzAw NloXDTEyMDUyNzE4MzAwNlowgZoxCzAJBgNVBAYTAkRFMQ8wDQYDVQQIEwZCZXJsaW4xDzANBgNV BAcTBkJlcmxpbjEiMCAGA1UEChMZRnJlaWUgVW5pdmVyc2l0YWV0IEJlcmxpbjEuMCwGA1UECxMl RmFjaGJlcmVpY2ggTWF0aGVtYXRpayB1bmQgSW5mb3JtYXRpazEVMBMGA1UEAxMMS251dCBSZWlu ZXJ0MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA1KhJct0+8zA+Rpez11JSOVahqmh2 YJ8TWWVIvxERJPkUUgz+M4u4mEk3fr4oayj2KC5MoT8sHRbcIw4pEnN1NP3a9tWJZhXbInsR0eWM 5s6LXaLEHbczNg+V4xaFzAm6JR1sJ5h6LDWqmh2lmUoJE9l1ypydet5rf6Qnvbkys4Xwg4Dp4f89 uAZznbpo36FgDqS848FzIRW6wvzatFtVxYiQ/zpRggWLYNRIWx9jZi5A9LrFq79Cx6h7bWU13hpW u8QT2yE1cfRnTw2lvNXdKQDmNVHtVub7CHdG3voeJiiFvApgPkSbGDi1nXMtVePb/1xd5CFMMoqw IwH44W2KVwIDAQABo4ICJTCCAiEwCQYDVR0TBAIwADALBgNVHQ8EBAMCBeAwKQYDVR0lBCIwIAYI KwYBBQUHAwIGCCsGAQUFBwMEBgorBgEEAYI3FAICMB0GA1UdDgQWBBQf3YyAtgT64FlT3cbKq7WM Zsg64DAfBgNVHSMEGDAWgBQG4T30b/Qwt3o7V7AxBYl7DVhabDCBkQYDVR0RBIGJMIGGgRlrbnV0 LnJlaW5lcnRAZnUtYmVybGluLmRlgRdyZWluZXJ0QG1pLmZ1LWJlcmxpbi5kZYEYcmVpbmVydEBp bmYuZnUtYmVybGluLmRlgRtyZWluZXJ0QGNhbXB1cy5mdS1iZXJsaW4uZGWBGUtudXQuUmVpbmVy dEBmdS1iZXJsaW4uZGUwdQYDVR0fBG4wbDA0oDKgMIYuaHR0cDovL2NkcDEucGNhLmRmbi5kZS9m dS1jYS9wdWIvY3JsL2NhY3JsLmNybDA0oDKgMIYuaHR0cDovL2NkcDIucGNhLmRmbi5kZS9mdS1j YS9wdWIvY3JsL2NhY3JsLmNybDCBkAYIKwYBBQUHAQEEgYMwgYAwPgYIKwYBBQUHMAKGMmh0dHA6 Ly9jZHAxLnBjYS5kZm4uZGUvZnUtY2EvcHViL2NhY2VydC9jYWNlcnQuY3J0MD4GCCsGAQUFBzAC hjJodHRwOi8vY2RwMi5wY2EuZGZuLmRlL2Z1LWNhL3B1Yi9jYWNlcnQvY2FjZXJ0LmNydDANBgkq hkiG9w0BAQUFAAOCAQEAekvz2kbo9rxQsMh1ETLQMyFUoo4Dcm4FEXXvAl0k9jXCmq+6kcctkRFY 8Adm3GJ0QHvvTzd9X/fNUKw76Yr31QkczMfLz+9aJjNvro7EfKXsPGQI/ODiUuxR8Q8cOYYsmZqB R2PFZCHfWwN4IyNoPHeBySlguxjNSVXQ5+xMEzuEc/7iNW2MjaX5VjRah9zuP6iJpqgnSkJ6ttYl a+8vhNQ1e/Cx8k+FFF7w+7nF+dZLWhGuylJEjPpBdGjh16BtHWA6AsInAek8xzJyIloXa20jebAo Nx4RdQ6LjaqovFMQWanCXWzOUWpH1BvZ9NPXNDnCgkOqJCClqM3pZHLzxDGCA+0wggPpAgEBMIG+ MIG1MQswCQYDVQQGEwJERTEPMA0GA1UECBMGQmVybGluMQ8wDQYDVQQHEwZCZXJsaW4xIjAgBgNV BAoTGUZyZWllIFVuaXZlcnNpdGFldCBCZXJsaW4xDjAMBgNVBAsTBVpFREFUMTAwLgYDVQQDEydG cmVpZSBVbml2ZXJzaXRhZXQgQmVybGluIC0gRlUtQ0EgLSBHMDExHjAcBgkqhkiG9w0BCQEWD2Nh QEZVLUJlcmxpbi5ERQIEDj5VvjAJBgUrDgMCGgUAoIICAzAYBgkqhkiG9w0BCQMxCwYJKoZIhvcN AQcBMBwGCSqGSIb3DQEJBTEPFw0xMDA1MjYxODAyNTJaMCMGCSqGSIb3DQEJBDEWBBRb9jfOfTYE PVdJvbVixDACS4CnoTCBzwYJKwYBBAGCNxAEMYHBMIG+MIG1MQswCQYDVQQGEwJERTEPMA0GA1UE CBMGQmVybGluMQ8wDQYDVQQHEwZCZXJsaW4xIjAgBgNVBAoTGUZyZWllIFVuaXZlcnNpdGFldCBC ZXJsaW4xDjAMBgNVBAsTBVpFREFUMTAwLgYDVQQDEydGcmVpZSBVbml2ZXJzaXRhZXQgQmVybGlu IC0gRlUtQ0EgLSBHMDExHjAcBgkqhkiG9w0BCQEWD2NhQEZVLUJlcmxpbi5ERQIEDj5VvjCB0QYL KoZIhvcNAQkQAgsxgcGggb4wgbUxCzAJBgNVBAYTAkRFMQ8wDQYDVQQIEwZCZXJsaW4xDzANBgNV BAcTBkJlcmxpbjEiMCAGA1UEChMZRnJlaWUgVW5pdmVyc2l0YWV0IEJlcmxpbjEOMAwGA1UECxMF WkVEQVQxMDAuBgNVBAMTJ0ZyZWllIFVuaXZlcnNpdGFldCBCZXJsaW4gLSBGVS1DQSAtIEcwMTEe MBwGCSqGSIb3DQEJARYPY2FARlUtQmVybGluLkRFAgQOPlW+MA0GCSqGSIb3DQEBAQUABIIBALeF kCOj51tOzzFg+ekKA9/sCip9yJ7X+ZrHsR/WGqWW22fJTa34l/uQW7biqehw4x8je9jd3UMxK0az P7kzJQUIx9LLaSmiG2EwcslCuXl8KbOvWsyTnYxV0gmRhl9y0Sfu81q8s5ShfsTFXUy82gwl6Wsq EmaLlzxlJEz43Bkti86w/Ps994G0sjiS+KV6S1Wpq5YJNsk/79pqQANub3sxplH6I0t7Mb6saqVR r0VVaydujPb5YW8hwpZyh8ER6qiJ1o0m/FguWemaoCbGp3setsud71p7knD8a6VyI3hRo9ZussB5 owYPVe+Aa7+UmvCbBkm9Y1b34wSN24YZks4AAAAAAAA= --Apple-Mail-3-237028661-- From eissler@in.tum.de Thu May 27 10:21:31 2010 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OHYLG-00017l-1t>; Thu, 27 May 2010 10:21:26 +0200 Received: from mail-out2.informatik.tu-muenchen.de ([131.159.0.36]) by relay1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OHYLF-0000Ym-VM>; Thu, 27 May 2010 10:21:26 +0200 Received: from [131.159.35.23] (unknown [131.159.35.23]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.in.tum.de (Postfix) with ESMTP id 6830EDFF3 for ; Thu, 27 May 2010 10:21:25 +0200 (CEST) Message-ID: <4BFE2B80.7060701@in.tum.de> Date: Thu, 27 May 2010 10:21:20 +0200 From: =?ISO-8859-1?Q?Tilo_Ei=DFler?= User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100423 Lightning/1.0b1 Thunderbird/3.0.4 MIME-Version: 1.0 To: seqan-dev@lists.fu-berlin.de References: <4BFD3A18.3010609@in.tum.de> <0F13AD37-5FE7-4E5D-B054-FE9925F96199@fu-berlin.de> In-Reply-To: <0F13AD37-5FE7-4E5D-B054-FE9925F96199@fu-berlin.de> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-Originating-IP: 131.159.0.36 X-purgate: clean X-purgate-ID: 151147::1274948486-000051C5-F51F239A/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.035851, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Benin.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=RATWARE_GECKO_BUILD Subject: Re: [Seqan-dev] Approximate Searching in set of strings X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 May 2010 08:21:31 -0000 Hi Manuel, thank you very much for your fast reply. I'm looking for a replacement for a proprietary search index (based on suffix tries) which allows for approximate searching. Thus at the moment I don't know how the approximate searching is implemented exactly and therefore about its complexity. I'm not to deep into search index theory so I've asked to not miss something. Your answer confirms what I expected. I will toy around a little bit more taking your proposals into account and get back if I've got further questions. Thanks again and best wishes, Tilo Am 26.05.2010 17:50, schrieb Holtgrewe, Manuel: > Hi Tilo, > > as far as I know, there is no way to do non-heuristic approximate string searching using indices, i.e. something like O(1) lookups. What exactly are you looking for and trying to solve? > > > SeqAn has implementations of various online algorithms for approximate string searching, e.g. see [1]. Notably, Myer's Bitvector algorithm is very fast for searching single strings with a length up to the machine word width in large strings. Going beyond this limit makes things a bit slower since bits have to migrate over machine words. > > Pex, AbndmAlgo and DPSearch also allow for edit distance, DPSearch even allows linear gap and arbitrary substitution scores. > > > If you are looking for read mapping, i.e. many short reads to be found with Hamming/edit distance in long strings, have a look at RazerS, our read mapping tool. It uses the SWIFT algorithm to first find regions that overlap with epsilon-matches and then uses Myer's Bitvector algorithm or a naive Hamming search to look for the exact positions. > > > Another possible approach is building a q-gram index of your long string (haystack) and then searching for the q-grams of your short string (needle) in the haystack. You can look at the LAGAN demo to see an application of q-gram indices and seeds in SeqAn. FASTA, BLAST, bowtie and bwa are codes that do kinds of "approximate string searching" and could be implemented with the infrastructure of the SeqAn library. > > > People around here use Visual Studio, Emacs, XCode and occasionally vim. I have never seen Visual Studio really make sense out of SeqAn style C++, XCode seems to infer symbol names that make sense (which Visual Studio should be able, too). I have not tried to get syntax completion running in either Emacs or XCode. > > HTH, > Manuel > > [1] http://www.seqan.de/dddoc/html_devel/CLASS_Pattern.html > > > _______________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev -- Dipl.-Inf. Tilo Eißler Technische Universität München |tel. +49-89-289-19478 Institut für Informatik, I10 | Boltzmannstr. 3 |Office 02.13.059 85748 Garching b. München |email eissler@in.tum.de From f.buske@uq.edu.au Mon May 31 07:33:40 2010 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OIxd5-0000Ld-1t>; Mon, 31 May 2010 07:33:39 +0200 Received: from mailhub4.uq.edu.au ([130.102.149.131]) by relay1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OIxd4-0006Py-8M>; Mon, 31 May 2010 07:33:39 +0200 Received: from smtp3.uq.edu.au (smtp3.uq.edu.au [130.102.128.18]) by mailhub4.uq.edu.au (8.13.8/8.13.8) with ESMTP id o4V5XYi8026272 for ; Mon, 31 May 2010 15:33:34 +1000 Received: from tlb-sumo.imb.uq.edu.au (tlb-sumo.imb.uq.edu.au [130.102.118.111]) (authenticated bits=0) by smtp3.uq.edu.au (8.13.8/8.13.8) with ESMTP id o4V5XXwW009328 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Mon, 31 May 2010 15:33:34 +1000 Message-ID: <4C034A2D.60402@uq.edu.au> Date: Mon, 31 May 2010 15:33:33 +1000 From: Fabian Buske User-Agent: Thunderbird 2.0.0.24 (Macintosh/20100228) MIME-Version: 1.0 To: "'seqan-dev@lists.fu-berlin.de'" Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-UQ-FilterTime: 1275284015 X-Scanned-By: MIMEDefang 2.58 on UQ Mailhub on 130.102.149.131 X-Originating-IP: 130.102.149.131 X-purgate: clean X-purgate-ID: 151147::1275284019-000051C5-7E49988D/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000231, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Gabun.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none Subject: [Seqan-dev] Merging seeds X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 May 2010 05:33:40 -0000 Hi, I found that the seed merging depends on the order of the merge. For example: SeedSet< int, SimpleSeed, DefaultNoScore , void > seedset(100, 0);bool yep = addSeed(seedset, Seed(1, 1, 4),Single()); ::std::cout << yep << " " << length(seedset) << " " << leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) << ::std::endl; yep = addSeed(seedset, Seed(0, 0, 4), 0, Merge()); ::std::cout << yep << " " << length(seedset) << " " << leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) << ::std::endl; results in: 1 1 1-4 0 1 1-4 the seed containing a preceding location is not added. On the other hand adding the seeds in reverse order, i.e.: SeedSet< int, SimpleSeed, DefaultNoScore , void > seedset(100, 0);bool yep = addSeed(seedset, Seed(0, 0, 4),Single()); ::std::cout << yep << " " << length(seedset) << " " << leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) << ::std::endl; yep = addSeed(seedset, Seed(1, 1, 4), 0, Merge()); ::std::cout << yep << " " << length(seedset) << " " << leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) << ::std::endl; gives what one expects: 1 1 0-3 1 1 0-4 Is this a feature or a bug? It should be annotated if it is the former. It would also be nice to have an example for seed merging in demos (although I found some in tests). Secondly, If I use the tag Single() while adding the second seed I get a compiler error: SeedSet< int, SimpleSeed, DefaultNoScore , void > seedset(100, 0); bool yep = addSeed(seedset, Seed(0, 0, 4),Single()); ::std::cout << yep << " " << length(seedset) << " " << leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) << ::std::endl; yep = addSeed(seedset, Seed(1, 1, 4), 0, Single()); ::std::cout << yep << " " << length(seedset) << " " << leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) << ::std::endl; Thirdly, two adjacent seeds are currently not concatenated though it would make sense to join them I think: SeedSet< int, SimpleSeed, DefaultNoScore , void > seedset(100, 0); bool yep = addSeed(seedset, Seed(0, 0, 4),Single()); ::std::cout << yep << " " << length(seedset) << " " << leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) << ::std::endl; yep = addSeed(seedset, Seed(4, 4, 4), 0, Merge()); ::std::cout << yep << " " << length(seedset) << " " << leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) << ::std::endl; gives: 1 1 0-3 0 1 0-3 rather than: 1 1 0-3 1 1 0-7 And finally, its called SeedSet but if a merge is not successful, like in the last example, no new member in the set is added (by default). Why so? Using appendValue() does also give a compiler error: SeedSet< int, SimpleSeed, DefaultNoScore , void > seedset(100, 0); bool yep = addSeed(seedset, Seed(0, 0, 4),Single()); ::std::cout << yep << " " << length(seedset) << " " << leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) << ::std::endl; yep = addSeed(seedset, Seed(4, 4, 4), 0, Merge()); ::std::cout << yep << " " << length(seedset) << " " << leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) << ::std::endl; if (not yep) appendValue(seedset, Seed(4, 4, 4)); ::std::cout << yep << " " << length(seedset) << " " << leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) << ::std::endl; Cheers, Fabian -- Fabian Buske Institute for Molecular Bioscience The University of Queensland Brisbane, Qld. 4072 Australia Phone: (61)-(7)-334-62608 From Birte.Kehr@fu-berlin.de Mon May 31 13:23:14 2010 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1OJ35N-0004AY-9O>; Mon, 31 May 2010 13:23:13 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1OJ35N-0003V3-56>; Mon, 31 May 2010 13:23:13 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1OJ35N-0003Dr-1H>; Mon, 31 May 2010 13:23:13 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Mon, 31 May 2010 13:23:13 +0200 From: "Kehr, Birte" To: SeqAn Development Date: Mon, 31 May 2010 13:23:12 +0200 Thread-Topic: [Seqan-dev] Merging seeds Thread-Index: AcsAgtfqndKq1GvNQiylqSMIphrQCwAKq7Kg Message-ID: References: <4C034A2D.60402@uq.edu.au> In-Reply-To: <4C034A2D.60402@uq.edu.au> Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-purgate: clean X-purgate-ID: 151147::1275304993-000051C5-4D19ECD9/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.1.6 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Benin.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Cc: Carsten Kemena , "andreas.doering@mdc-berlin.de" Subject: Re: [Seqan-dev] Merging seeds X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 May 2010 11:23:14 -0000 Hi Fabian, I am currently not working on the seeds module in SeqAn but I will do my be= st to answer your questions. For more help, we will have to contact the aut= hors of the module. > I found that the seed merging depends on the order of the merge.=20 The algorithm is a heuristic so it could be that it depends on the order of= insertion. However, I am not sure about it in your case, so if you think t= hat you have found a bug, please report it at: http://trac.mi.fu-berlin.de/= seqan/newticket > Secondly, If I use the tag Single() while adding the second seed I get a = compiler error. You do not need to specify a gapDistance (third parameter) for adding seeds= with the tag Single. This is only needed for merging seeds: addSeed(seedset, Seed(1, 1, 4), Single()); > Thirdly, two adjacent seeds are currently not concatenated though it woul= d make sense to join them I think. This was probably a design question. I cannot help you with this. I cc the = authors of the module. > And finally, its called SeedSet but if a merge is not successful, like in= the last example, no new member in the set is added (by default). Why so?= =20 It has the advantage that the user can decide whether to add the seed with = the tag Single afterwards, or not to add the seed when merging is unsuccess= ful. If it would be added by default, the latter would clearly not be possi= ble. > Using appendValue() does also give a compiler error. Try Seed seed(4, 4, 4); appendValue(seedset, seed); Cheers,=20 Birte -----Original Message----- From: Fabian Buske [mailto:f.buske@uq.edu.au]=20 Sent: Montag, 31. Mai 2010 07:34 To: 'seqan-dev@lists.fu-berlin.de' Subject: [Seqan-dev] Merging seeds Hi, I found that the seed merging depends on the order of the merge. For=20 example: SeedSet< int, SimpleSeed, DefaultNoScore , void > seedset(100, 0);bool=20 yep =3D addSeed(seedset, Seed(1, 1, 4),Single()); ::std::cout << yep << " " << length(seedset) << " " <<=20 leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) <<=20 ::std::endl; yep =3D addSeed(seedset, Seed(0, 0, 4), 0, Merge()); ::std::cout << yep << " " << length(seedset) << " " <<=20 leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) <<=20 ::std::endl; =20 results in: 1 1 1-4 0 1 1-4 the seed containing a preceding location is not added. On the other hand adding the seeds in reverse order, i.e.: SeedSet< int, SimpleSeed, DefaultNoScore , void > seedset(100, 0);bool=20 yep =3D addSeed(seedset, Seed(0, 0, 4),Single()); ::std::cout << yep << " " << length(seedset) << " " <<=20 leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) <<=20 ::std::endl; yep =3D addSeed(seedset, Seed(1, 1, 4), 0, Merge()); ::std::cout << yep << " " << length(seedset) << " " <<=20 leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) <<=20 ::std::endl; gives what one expects: 1 1 0-3 1 1 0-4 Is this a feature or a bug? It should be annotated if it is the former. It would also be nice to have an example for seed merging in demos=20 (although I found some in tests). Secondly, If I use the tag Single() while adding the second seed I get a=20 compiler error: SeedSet< int, SimpleSeed, DefaultNoScore , void > seedset(100, 0); bool yep =3D addSeed(seedset, Seed(0, 0, 4),Single()); ::std::cout << yep << " " << length(seedset) << " " <<=20 leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) <<=20 ::std::endl; yep =3D addSeed(seedset, Seed(1, 1, 4), 0, Single()); ::std::cout << yep << " " << length(seedset) << " " <<=20 leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) <<=20 ::std::endl; Thirdly, two adjacent seeds are currently not concatenated though it=20 would make sense to join them I think: SeedSet< int, SimpleSeed, DefaultNoScore , void > seedset(100, 0); bool yep =3D addSeed(seedset, Seed(0, 0, 4),Single()); ::std::cout << yep << " " << length(seedset) << " " <<=20 leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) <<=20 ::std::endl; yep =3D addSeed(seedset, Seed(4, 4, 4), 0, Merge()); ::std::cout << yep << " " << length(seedset) << " " <<=20 leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) <<=20 ::std::endl; =20 gives: 1 1 0-3 0 1 0-3 rather than: 1 1 0-3 1 1 0-7 And finally, its called SeedSet but if a merge is not successful, like=20 in the last example, no new member in the set is added (by default).=20 Why so? Using appendValue() does also give a compiler error: SeedSet< int, SimpleSeed, DefaultNoScore , void > seedset(100, 0); bool yep =3D addSeed(seedset, Seed(0, 0, 4),Single()); ::std::cout << yep << " " << length(seedset) << " " <<=20 leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) <<=20 ::std::endl; yep =3D addSeed(seedset, Seed(4, 4, 4), 0, Merge()); ::std::cout << yep << " " << length(seedset) << " " <<=20 leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) <<=20 ::std::endl; if (not yep) appendValue(seedset, Seed(4, 4, 4)); ::std::cout << yep << " " << length(seedset) << " " <<=20 leftDim0(*begin(seedset)) << "-" << rightDim0(*begin(seedset)) <<=20 ::std::endl; Cheers, Fabian --=20 Fabian Buske Institute for Molecular Bioscience The University of Queensland=20 Brisbane, Qld. 4072 Australia Phone: (61)-(7)-334-62608 _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de https://lists.fu-berlin.de/listinfo/seqan-dev