From rliu0606@gmail.com Sun Apr 30 10:09:16 2017 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.85) for seqan-dev@lists.fu-berlin.de with esmtps (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (envelope-from ) id <1d4jut-000HH0-MV>; Sun, 30 Apr 2017 10:09:15 +0200 Received: from mail-it0-f51.google.com ([209.85.214.51]) by relay1.zedat.fu-berlin.de (Exim 4.85) for seqan-dev@lists.fu-berlin.de with esmtps (TLSv1.2:AES128-GCM-SHA256:128) (envelope-from ) id <1d4jut-0000yZ-9q>; Sun, 30 Apr 2017 10:09:15 +0200 Received: by mail-it0-f51.google.com with SMTP id c15so1027326ith.0 for ; Sun, 30 Apr 2017 01:09:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=3aHx/bI+O4diFsrkUnqbLofSvSMgBM/FLeOfdBlJ7uw=; b=m1RcMkNcpOgtMRB3q/4cI8430gR+SSCz9qHVVK2ffIPN22b89wUcaNMnYPVWVptUSs dWmL0K/IWcjIQHGs1HYova01BBa40U14+RNx5ISPBpFwdGB2PJE2eGbtXEpZ5hdLR2B9 a4/Etf5EgYGRPXv1rb4rV+QIWIo2vK7EfUarb3Q8xyiUIYGpid3AoLyY5ybSNesvIDO4 97F7iNx+NLdcdgx08HuEVFpiAS2LHFWjuKeDWadoOBMHY/4f4Qxww7qGyvDkiL2QoiAu Zxw9BnCazrDhUm02mkMeXp49WAQ784fpzkpLCd4oAsliX26xrImQ3k7fDK8S6weNoPaI Py7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=3aHx/bI+O4diFsrkUnqbLofSvSMgBM/FLeOfdBlJ7uw=; b=dfmuJ/l3IBGBMOWReSse42WhN4XjfXbavHR2QnlmDSVRWz8N29Emv97jGeM0M8eMTT RFKKle9aC/Dr7tH/rGqcZyyp8kSxhBvPj6F0B8rOes0p/NG5Z684i5MYFoC0Ti+5CIkM j7TynDgBLmTXknJR+CPDHwm61G3EfjPemDwnOz0/3WTOhYULWIXrAJSBoWY62EcRBeud jKz0e4DseEG6ozdX6gnYcXkge8YDZhiCgwnggI8U0igyRP9tIoJqCu78NRP5oMLPsMmL pxMcfo8sMho0OdNt4PcWi4c39KvYQ4dEDigHDqC4SueipnbxObQvf16pmUnmVvZb+wc5 xxQw== X-Gm-Message-State: AN3rC/7vrPtQqLqqdTgkwWTWbU3P/alOka4fxYrhDfwnZo0yVjRldCQP yoFg/mvj+ptmVo3sNtgLv642c403+mTz X-Received: by 10.36.87.84 with SMTP id u81mr578281ita.40.1493539752510; Sun, 30 Apr 2017 01:09:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.38.142 with HTTP; Sun, 30 Apr 2017 01:09:11 -0700 (PDT) From: Ruolin Liu Date: Sun, 30 Apr 2017 01:09:11 -0700 Message-ID: To: seqan-dev@lists.fu-berlin.de Content-Type: multipart/alternative; boundary=001a1134f23a680dc2054e5dd2b8 X-Originating-IP: 209.85.214.51 X-ZEDAT-Hint: A X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1493539755-000004B7-F869D0BC/0/0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.200105, version=1.2.4 X-Spam-Flag: NO X-Spam-Status: No, score=0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,RCVD_IN_SORBS_SPAM, SPF_PASS X-Spam-Checker-Version: SpamAssassin 3.4.1 on Niue.ZEDAT.FU-Berlin.DE X-Spam-Level: Subject: [Seqan-dev] AlignedReadStore X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.16 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Apr 2017 08:09:16 -0000 --001a1134f23a680dc2054e5dd2b8 Content-Type: text/plain; charset=UTF-8 Dear seqan developers, I realized that they might be a bug in the read counting example of the RNA-Seq tutorial. The beginPos and endPos of the AlignedReadStoreElement are in the gapped space of the reference genome while the Gff Records beginPos and endPos are supposed to be in the ungapped space of the reference genome. This relates to my observations that the AlignedReadStoreElement does not actually contain any coordinates from the ungapped space. This makes the beginPos and endPos of them are different from the beginPos and endPos of the BamAlignmentRecord. And it creates a lot of confusion. I would actually suggest adding two extra fields for AlignedReadStoreElement, such as beginSourcePos and endSourcePos. Below is a copy of the code from the example. I commented on some lines. void countReadsPerGene(String & readsPerGene, String const & intervalTrees, TStore const & store){ resize(readsPerGene, length(store.annotationStore), 0); String result; int numAlignments = length(store.alignedReadStore); // iterate aligned reads and get search their begin and end positions SEQAN_OMP_PRAGMA(parallel for private (result)) for (int i = 0; i < numAlignments; ++i) { TAlignedRead const & ar = store.alignedReadStore[i]; TPos queryBegin = _min(ar.beginPos, ar.endPos); // In gapped space TPos queryEnd = _max(ar.beginPos, ar.endPos); // In gapped space // search read-overlapping genes findIntervals(result, intervalTrees[ar.contigId] /*Ungapped space*/, queryBegin, queryEnd); // increase read counter for each overlapping annotation given the id in the interval tree for (unsigned j = 0; j < length(result); ++j) { SEQAN_OMP_PRAGMA(atomic) readsPerGene[result[j]] += 1; } }} Best, Ruolin --001a1134f23a680dc2054e5dd2b8 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Dear seqan developers,=C2=A0

I re= alized that they might be a bug in the read counting example of the RNA-Seq= tutorial. The beginPos and endPos of the AlignedReadStoreElement are in th= e gapped space of the reference genome while the Gff=C2=A0Records beginPos = and endPos are supposed to be in the ungapped space of the reference genome= . This relates to my observations that the AlignedReadStoreElement does not= actually contain any coordinates from the ungapped space. This makes the b= eginPos and endPos of them are different from the beginPos and endPos of th= e BamAlignmentRecord.=C2=A0 And it creates a lot=C2=A0of confusion. I would= actually suggest adding two extra fields for AlignedReadStoreElement, such= as beginSourcePos and endSourcePos.=C2=A0

Below is a=C2= =A0copy of the code from the example. I commented on some lines.=C2=A0
<= pre style=3D"box-sizing:border-box;font-family:consolas,"andale mono w= t","andale mono","lucida console","lucida san= s typewriter","dejavu sans mono","bitstream vera sans m= ono","liberation mono","nimbus mono l",monaco,&quo= t;courier new",courier,monospace;font-size:12px;margin-top:0px;margin-= bottom:0px;padding:12px;line-height:normal;overflow:auto;color:rgb(170,170,= 170)">void countReadsPerGene(Strin= g<unsigned> & = readsPerGene, String<TIntervalTree> const = &a= mp; intervalTrees, TStore const & store) { resize(= readsPerGene, length(store.annotationStore), 0); String<TId> result; int numAlignments =3D length(store= .ali= gnedReadStore); // iterate aligned reads and get search their be= gin and end positions SEQAN_OMP_PRAGMA(parallel for private (result)) for (int i =3D 0; i < <= span class=3D"gmail-n" style=3D"box-sizing:border-box;color:rgb(51,51,51)">= numAlignments; ++i) { TAlignedRead const & ar =3D s= tore.= alignedReadStore[i]; TPos queryBegin =3D _min(ar.beginPos, ar.endPos); // In gapped space TPos queryEnd =3D _max(ar.beginPos, ar.endPos); // In gapped space // search read-overlapping genes findIntervals(result, intervalTrees[ar.contigId] /*Ungapped space*/, queryBegin= , queryEnd<= /span>); // increase read counter for each overlappin= g annotation given the id in the interval tree for (<= /span>unsigned j =3D 0; j <= span class=3D"gmail-o" style=3D"box-sizing:border-box;font-weight:bold"><= ; length(result); ++j) { SEQAN_OMP_PRAGMA(atomic) readsPerGene[result[j]] +=3D 1; } } }

Best,

Ruolin
--001a1134f23a680dc2054e5dd2b8--