From Enrico.Siragusa@fu-berlin.de Mon Aug 03 16:30:47 2015 Received: from outpost9.zedat.fu-berlin.de ([130.133.4.95]) by list1.zedat.fu-berlin.de (Exim 4.85) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1ZMGlH-002m3t-Ff>; Mon, 03 Aug 2015 16:30:43 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost.zedat.fu-berlin.de (Exim 4.85) with esmtp (envelope-from ) id <1ZMGlH-000Sbb-EW>; Mon, 03 Aug 2015 16:30:43 +0200 Received: from cas2.campus.fu-berlin.de ([130.133.170.202]) by relay2.zedat.fu-berlin.de (Exim 4.85) with esmtps (envelope-from ) id <1ZMGlH-002vum-4j>; Mon, 03 Aug 2015 16:30:43 +0200 Received: from EX02B.campus.fu-berlin.de ([130.133.170.133]) by CAS2.campus.fu-berlin.de ([130.133.170.202]) with mapi id 14.03.0248.002; Mon, 3 Aug 2015 16:30:40 +0200 From: "Siragusa, Enrico" To: SeqAn Development Thread-Topic: [Seqan-dev] Simple radix tree for a set of strings Thread-Index: AdDF2XFkzpr0nxxaQvq1TPX9aAtMUgIDsMSA Message-ID: <15ECB03B-C076-4CB2-A0D5-DAD4D7505D64@fu-berlin.de> References: In-Reply-To: Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="Windows-1252" Content-ID: <0B915D005C50814087055AFDF5E97CC3@campus.fu-berlin.de> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Date: Mon, 03 Aug 2015 16:30:39 +0200 X-Original-Date: Mon, 3 Aug 2015 14:30:39 +0000 X-Originating-IP: 130.133.170.202 X-ZEDAT-Hint: X X-purgate: suspect X-purgate-type: suspect X-purgate-ID: 151147::1438612243-0005941F-51427C18/2/17309862937 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000035, version=1.2.4 X-Spam-Flag: NO X-Spam-Status: No, score=-49.0 required=5.0 tests=ALL_TRUSTED, FU_XPURGATE_SUSP X-Spam-Checker-Version: SpamAssassin 3.4.1 on Tuvalu.ZEDAT.FU-Berlin.DE X-Spam-Level: Subject: Re: [Seqan-dev] Simple radix tree for a set of strings X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.16 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Aug 2015 14:30:47 -0000 Dear Hieu, You can implement a static trie (having edges labeled by single characters)= using the IndexSa class. Such data structure stores only trie leaves (i.e.= the suffix array), while its iterator derives inner nodes via binary searc= h. Alternatively, you can implement a static radix tree (having edges labeled = by substrings) by means of the IndexWotd class (i.e. a lazy suffix tree). T= his data structure explicitly stores inner nodes, thus it consumes slightly= more space than IndexSa. The construction API is very minimal (see https://github.com/seqan/seqan/is= sues/1063), but it's not yet documented as it=92s not standardized - if you= need further assistance, we could continue this discussion on github (http= s://github.com/seqan/seqan/issues/1063). After construction, both indices can be iterated top-down just like virtual= suffix tries/trees (see http://seqan.readthedocs.org/en/master/Tutorial/In= dexIterators.html). Cheers, Enrico On 24 Jul 2015, at 08:24, Tran Ngoc Hieu (Dr) wrote: > Dear All, > =20 > Could you please advise if there is any built-in way to construct a simpl= e radix tree of a given set of strings? I don=92t need random access to pre= fixes or suffixes of individual strings. I only need a DFS iterator to trav= erse from the first to the last character of each string. > =20 > A simple example: if my set is {=93to=94, =93tea=94, =93ten=94}, then I n= eed to process in this order: =93t=94, =93te=94, =93tea=94, =93ten=94, =93t= o=94. > =20 > I know that some built-in SeqAn indices, e.g. IndexSa, can do what I want= , and even offer more support but they also require more memory space. So i= s there any simple way to do that? > =20 > Thanks a lot! > =20 > Regards, > Hieu > =20 >=20 > CONFIDENTIALITY: This email is intended solely for the person(s) named an= d may be confidential and/or privileged. If you are not the intended recipi= ent, please delete it, notify us and do not copy, use, or disclose its cont= ents. > Towards a sustainable earth: Print only when necessary. Thank you. ______= _________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev From NHTran@ntu.edu.sg Tue Aug 04 04:27:18 2015 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.85) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1ZMRwi-003jlp-Le>; Tue, 04 Aug 2015 04:27:16 +0200 Received: from smtp4.ntu.edu.sg ([155.69.5.132]) by relay1.zedat.fu-berlin.de (Exim 4.85) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1ZMRwh-003yVj-SJ>; Tue, 04 Aug 2015 04:27:16 +0200 X-AuditID: 9b450584-f794d6d000000ce4-cf-55c022ffd219 Received: from EXSMTP6.staff.main.ntu.edu.sg ( [155.69.5.101]) by SMTP4.ntu.edu.sg (Symantec Messaging Gateway) with SMTP id 53.92.03300.FF220C55; Tue, 4 Aug 2015 10:27:11 +0800 (SGT) Received: from EXCHHUB11.staff.main.ntu.edu.sg (155.69.25.14) by EXSMTP6.staff.main.ntu.edu.sg (155.69.5.101) with Microsoft SMTP Server (TLS) id 14.3.123.3; Tue, 4 Aug 2015 10:27:04 +0800 Received: from EXCHMBOX33.staff.main.ntu.edu.sg ([169.254.3.250]) by EXCHHUB11.staff.main.ntu.edu.sg ([155.69.25.14]) with mapi id 14.03.0123.003; Tue, 4 Aug 2015 10:27:10 +0800 From: "Tran Ngoc Hieu (Dr)" To: SeqAn Development Thread-Topic: [Seqan-dev] Simple radix tree for a set of strings Thread-Index: AdDF2XFkzpr0nxxaQvq1TPX9aAtMUgIDsMSAAB0nq4A= Date: Tue, 4 Aug 2015 02:27:09 +0000 Message-ID: References: <15ECB03B-C076-4CB2-A0D5-DAD4D7505D64@fu-berlin.de> In-Reply-To: <15ECB03B-C076-4CB2-A0D5-DAD4D7505D64@fu-berlin.de> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrOKsWRmVeSWpSXmKPExsUy25U1Vfe/0oFQg9bJMhant7xmstj+bjW7 A5PH3vlt7B6bduxlC2CK4rJJSc3JLEst0rdL4MqY9lW2YLpIxbUNyQ2MWwS6GDk5JARMJG6s Ps8GYYtJXLi3Hsjm4hAS2MUo8f3zb1YI5wCjxLqTF5ggnK2MEr2v9jODtLAJ6Ev8+3mfEcQW ARq18PhKoCIODmYBU4n3S/JBTGEBe4k9d6RATBEBB4kdnUkQxVYSf5ZsYgexWQRUJHYs6gG7 gVcgUuJxRwM7xKZ+Rokb6xazg/RyAvWee18HUsMIdOf3U2uYQGxmAXGJW0/mM0HcLyCxZM95 ZghbVOLl43+sELaixJxLj9gg6nUkFuz+BGVrSyxb+JoZYq+gxMmZT1hAbCEBOYk7z5uYJjBK zEKyYhaS9llI2mchaV/AyLKKUSDYNyTARC+vpFQvNaVUrzh9EyMkvlp2MF7+o3eIUYCDUYmH N/XJ/lAh1sSy4srcQ4ySHExKorwHZQ6ECvEl5adUZiQWZ8QXleakFh9ilOBgVhLhXXgDqJw3 JbGyKrUoHyYlzcGiJM779vi+UCGB9MSS1OzU1ILUIpisDAeHkgSvCDCNCAkWpaanVqRl5pQg pJk4OEGG8wANf6sIVMNbXJCYW5yZDpE/xagoJc6bBJIQAElklObB9cLS3ytGcaBXhHn1QKp4 gKkTrvsV0GAmoMGXM/aADC5JREhJNTA6Tb6l0fc77nPG0ak2dbtVjramvr06aaK/tGLd5tzo 9a8nTX+z71JTVdGnpdsinf+zi8+feOBVMZ8lS9RX1oYjQd0bNwrZfVF7c+T7h73hYvsauPJP Xw1kKOOqnZxntGHdm6NTv5eYm14+Xsb2papNsK/v8mHGPOn+NUuN7TNftsd/dQ6++LxTiaU4 I9FQi7moOBEAXP919VoDAAA= X-Originating-IP: 155.69.5.132 X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1438655236-0005941F-752F46CD/0/0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.499966, version=1.2.4 X-Spam-Flag: NO X-Spam-Status: No, score=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED X-Spam-Checker-Version: SpamAssassin 3.4.1 on Vanuatu.ZEDAT.FU-Berlin.DE X-Spam-Level: Subject: Re: [Seqan-dev] Simple radix tree for a set of strings X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.16 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Aug 2015 02:27:18 -0000 Hi Enrico, Thank you for your advice, I will try to work it out. Regards, Hieu -----Original Message----- From: Siragusa, Enrico [mailto:Enrico.Siragusa@fu-berlin.de]=20 Sent: Monday, 3 August, 2015 10:31 PM To: SeqAn Development Cc: Tran Ngoc Hieu (Dr) Subject: Re: [Seqan-dev] Simple radix tree for a set of strings Dear Hieu, You can implement a static trie (having edges labeled by single characters)= using the IndexSa class. Such data structure stores only trie leaves (i.e.= the suffix array), while its iterator derives inner nodes via binary searc= h. Alternatively, you can implement a static radix tree (having edges labeled = by substrings) by means of the IndexWotd class (i.e. a lazy suffix tree). T= his data structure explicitly stores inner nodes, thus it consumes slightly= more space than IndexSa. The construction API is very minimal (see https://github.com/seqan/seqan/is= sues/1063), but it's not yet documented as it's not standardized - if you n= eed further assistance, we could continue this discussion on github (https:= //github.com/seqan/seqan/issues/1063). After construction, both indices can be iterated top-down just like virtual= suffix tries/trees (see http://seqan.readthedocs.org/en/master/Tutorial/In= dexIterators.html). Cheers, Enrico On 24 Jul 2015, at 08:24, Tran Ngoc Hieu (Dr) wrote: > Dear All, > =20 > Could you please advise if there is any built-in way to construct a simpl= e radix tree of a given set of strings? I don't need random access to prefi= xes or suffixes of individual strings. I only need a DFS iterator to traver= se from the first to the last character of each string. > =20 > A simple example: if my set is {"to", "tea", "ten"}, then I need to proce= ss in this order: "t", "te", "tea", "ten", "to". > =20 > I know that some built-in SeqAn indices, e.g. IndexSa, can do what I want= , and even offer more support but they also require more memory space. So i= s there any simple way to do that? > =20 > Thanks a lot! > =20 > Regards, > Hieu > =20 >=20 > CONFIDENTIALITY: This email is intended solely for the person(s) named an= d may be confidential and/or privileged. If you are not the intended recipi= ent, please delete it, notify us and do not copy, use, or disclose its cont= ents. > Towards a sustainable earth: Print only when necessary. Thank you. ______= _________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev From n.ahmed@tudelft.nl Tue Aug 11 19:13:49 2015 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.85) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1ZPD7T-000FkG-P5>; Tue, 11 Aug 2015 19:13:47 +0200 Received: from mailservice.tudelft.nl ([130.161.131.5]) by relay1.zedat.fu-berlin.de (Exim 4.85) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1ZPD7T-000XW3-N6>; Tue, 11 Aug 2015 19:13:47 +0200 Received: from localhost (localhost [127.0.0.1]) by amavis (Postfix) with ESMTP id 80C647200D9 for ; Tue, 11 Aug 2015 19:13:46 +0200 (CEST) X-Virus-Scanned: amavisd-new at tudelft.nl Received: from mailservice.tudelft.nl ([130.161.131.69]) by localhost (tudelft.nl [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id yz86-rDw2lkT for ; Tue, 11 Aug 2015 19:13:46 +0200 (CEST) Received: from smtp-a.tudelft.nl (smtp-a.tudelft.nl [131.180.190.158]) by mx1.tudelft.nl (Postfix) with ESMTP id 14B8F7200DF for ; Tue, 11 Aug 2015 19:13:46 +0200 (CEST) Received: from [131.180.174.197] (tud206927.ws.tudelft.net [131.180.174.197]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtp-a.tudelft.nl (Postfix) with ESMTP id 0E1C21A40A2 for ; Tue, 11 Aug 2015 19:13:46 +0200 (CEST) Message-ID: <55CA2D3F.1090204@tudelft.nl> Date: Tue, 11 Aug 2015 19:13:35 +0200 From: Nauman Ahmed User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0 MIME-Version: 1.0 To: seqan-dev@lists.fu-berlin.de Content-Type: multipart/alternative; boundary="------------050804050709090301060506" X-Originating-IP: 130.161.131.5 X-ZEDAT-Hint: A X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1439313227-00000D05-BBB0E53B/0/0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.003369, version=1.2.4 X-Spam-Flag: NO X-Spam-Status: No, score=-2.3 required=5.0 tests=HTML_MESSAGE, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL X-Spam-Checker-Version: SpamAssassin 3.4.1 on Tokelau.ZEDAT.FU-Berlin.DE X-Spam-Level: Subject: [Seqan-dev] alignQualityStore score = 0 X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.16 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Aug 2015 17:13:49 -0000 This is a multi-part message in MIME format. --------------050804050709090301060506 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hi, I am using the Fragment store for the read mapping problem. I append the matches using |appendAlignedRead() and then perform global alignment using |convertMatchesToGlobalAlignment(), but when I print the scores using *fragStore.alignQualityStore[i].score* it shows 0 ("fragStore" is the FragmentStore object and "i" is the alignedRead ID). I have generated the SAM file CIGAR string and the edit distance (NM:i:) are perfectly OK. -- Nauman Ahmed PhD candidate, Deparment of Software and Computer Technology Computer Engineering Laboratory, EWI, TU Delft --------------050804050709090301060506 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit Hi,
I am using the Fragment store for the read mapping problem. I append the matches using appendAlignedRead() and then perform global alignment using convertMatchesToGlobalAlignment(), but when I print the scores using fragStore.alignQualityStore[i].score it shows 0 ("fragStore" is the FragmentStore object and "i" is the alignedRead ID). I have generated the SAM file CIGAR string and the edit distance (NM:i:) are perfectly OK.
-- 
Nauman Ahmed
PhD candidate,
Deparment of Software and Computer Technology
Computer Engineering Laboratory,
EWI, TU Delft 
--------------050804050709090301060506--