From gany.fml@gmail.com Tue Jun 14 06:14:48 2016 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.85) for seqan-dev@lists.fu-berlin.de with esmtps (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (envelope-from ) id <1bCfkW-0041SN-56>; Tue, 14 Jun 2016 06:14:48 +0200 Received: from mail-it0-f47.google.com ([209.85.214.47]) by relay1.zedat.fu-berlin.de (Exim 4.85) for seqan-dev@lists.fu-berlin.de with esmtps (TLSv1.2:AES128-GCM-SHA256:128) (envelope-from ) id <1bCfkV-0002nD-Rg>; Tue, 14 Jun 2016 06:14:48 +0200 Received: by mail-it0-f47.google.com with SMTP id e5so13419938ith.1 for ; Mon, 13 Jun 2016 21:14:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=5qGCPpFJXf6f0flch3HYZbESOpoq1r88qjdyjLPmhPQ=; b=nhVc+CZvGGNtkiZ7d0SB/UPxTtd7ex/JV3XIaRWU0bLC1Xtjsgl96sHuPFgVP8CSUu gWCST14qM1ln6hhE/4fbmntuoKdmi0SHtB0H5vyb3FAgIOJHZTlf22IkBjOSw9BXn+l6 bThJuNrzcQcLBG7L7C1lnlp7HVJFjJASlkRduON/cBJSPryy/82OiFWke3q9xmATMhlK WDLXZy/3mBpHpTd2BnW0k4Pme8uGoX5uZ9PkQrAQf1TwQVzwfeM1n4oF7PtmhCYnXPO/ ZH14bh6xTcEPwy0Rq8HhtBXTk9y3yFQfggYMkVOYrGjluWbVLKR1jB/oQ329NM+jL98p vs0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=5qGCPpFJXf6f0flch3HYZbESOpoq1r88qjdyjLPmhPQ=; b=OQUS6zoPu6/wgPh1SCDwDxNx8USfQbVVX8/vK7+zVdTSYBxnzkw/XaSpdwhuqn5LUC gvW3YvVZyRCNg7Nw/sVznF2U2LN3OrrYAnyaD+RorC/MiCAkEjabW/IZEsc3OzPv7X7L VKbgrVS1eclPYeqsHNCCIqG0zk0GY1mqOjb9E/wGAJzUrNcY/rN3e4hl2sx1q/KcU5eT 6jovOmzJ9lFptGGQoaFVZOaYeyP6yh9dnemBmPrRXzyQ5cGiPv3XgCDgJEzI7eA7Z8qQ RZ4G2eG2PbVKLiOiclR8PWML5knEUXZyQAJqH+LgIlgwQo9eIvk9TBykFEGRHNTK+8vr U1rg== X-Gm-Message-State: ALyK8tKTQMq6gk9ajoUCtUQMQnG7Tfa3ux4YrCgArLhS4i4RuzzcEYN3gnppNabSjgvbyaD8UrBCkUhPpYGbiA== X-Received: by 10.36.73.219 with SMTP id e88mr4902400itd.88.1465877685078; Mon, 13 Jun 2016 21:14:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.105.162 with HTTP; Mon, 13 Jun 2016 21:14:25 -0700 (PDT) From: Yue Gan Date: Mon, 13 Jun 2016 23:14:25 -0500 Message-ID: To: seqan-dev@lists.fu-berlin.de Content-Type: multipart/alternative; boundary=001a114489d0b411f60535353e66 X-Originating-IP: 209.85.214.47 X-ZEDAT-Hint: A X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1465877688-000CEE60-465321AA/0/0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.136988, version=1.2.4 X-Spam-Flag: NO X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,SPF_PASS X-Spam-Checker-Version: SpamAssassin 3.4.1 on Tuvalu.ZEDAT.FU-Berlin.DE X-Spam-Level: Subject: [Seqan-dev] Annotation and spliced site X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.16 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jun 2016 04:14:48 -0000 --001a114489d0b411f60535353e66 Content-Type: text/plain; charset=UTF-8 Hi all, I am trying to use the annotation part of Seqan to solve a question, but I am a little bit confused. What I want to do is to 1. Read in an annotation file (the format can be GFF, GTF, or BED) 2. Giving two positions (q1, q2) in the genome, decide if (q1, q2) is a known junction. My Solution: 1. use GffFile in to load the annotation file(GFF, GTF) into a FragmentStore<> 2. extract the exons and create a junction Interval tree using the FragmentStore<> 3. using interval`s 'findIntervals' function giving q1 and q2 to find if there is an interval(junction) exists. My Question: 1. Is BED better then GFF and GTF in this situation? 2. If BED is better, could anybody give me some examples what seqan data structure I should use to store and do the search? Some pseudocode with seqan data structure and function could be better.... 3. If GFF and GTF are better, is my solution correct and efficient? Is there any better way to solve my question? Is there anything I need to pay attention to? Thank you very much! Yue Gan --001a114489d0b411f60535353e66 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi all,

I am trying to use the annotati= on part of Seqan to solve a question,=C2=A0but I am a little bit confused.<= /div>

What I want to do is to
1. Read in an an= notation file (the format can be GFF, GTF, or BED)
2. Giving two = positions (q1, q2) in the genome, decide if (q1, q2) is a known junction.

My Solution:
1. use GffFile in to load th= e annotation file(GFF, GTF) into a FragmentStore<>
2. extra= ct the exons and create a junction Interval tree using the FragmentStore<= ;>
3. using interval`s 'findIntervals' function giving= q1 and q2 to find if there is an interval(junction) exists.

=
My Question:
1. Is BED better then GFF and GTF in this= situation?
2. If BED is better, could anybody give me some examp= les what seqan data structure I should use to store and do the search? Some= =C2=A0pseudocode with seqan=C2=A0data structure and function could be bette= r....
3. If GFF and GTF are better, is my solution correct and ef= ficient? Is there any better way to solve my question? Is there anything I = need to pay attention to?

Thank you very much!

Yue Gan
--001a114489d0b411f60535353e66--