From merkle@in.tum.de Thu Oct 06 14:43:02 2011 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RBnHw-0000Dn-IU>; Thu, 06 Oct 2011 14:43:00 +0200 Received: from mail-out1.informatik.tu-muenchen.de ([131.159.0.8]) by relay1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RBnHw-000231-FF>; Thu, 06 Oct 2011 14:43:00 +0200 Received: from [129.187.173.166] (h166.tum.vpn.lrz.de [129.187.173.166]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.in.tum.de (Postfix) with ESMTP id D0215EE47 for ; Thu, 6 Oct 2011 14:42:59 +0200 (CEST) Message-ID: <4E8DA24A.8040009@in.tum.de> Date: Thu, 06 Oct 2011 14:42:50 +0200 From: Johannes Merkle User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 MIME-Version: 1.0 To: seqan-dev@lists.fu-berlin.de Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: 131.159.0.8 X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1317904980-00005A17-21ADF4EC/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000004, version=1.2.2 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Gabun.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=RATWARE_GECKO_BUILD Subject: [Seqan-dev] usage of setStepSize X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Oct 2011 12:43:02 -0000 Hi, I'm experiencing some odd behaviour when using the q-gram-index with a different stepsize. The attached code (which is very similar to your tutorial, only with the added setStepSize command and the OpenAddressing spec) returns wrong results (0 and 7). Without the OpenAddressing spec it doesn't work at all (see attached error message). Am i using the setStepSize command incorrect or is this a bug? Thanks, Johannes Code: typedef Index, OpenAddressing > > TIndex; TIndex index("CATGATTACATA"); setStepSize(index,2); hash(indexShape(index), "CAT"); for (unsigned i = 0; i < length(getOccurrences(index, indexShape(index))); ++i) std::cout << getOccurrences(index, indexShape(index))[i] << std::endl; Error Message: ...SeqAn\core\include\seqan/sequence/string_base.h:227 Assertion failed : static_cast(pos) < static_cast(length(me)) was: 254 >= 65 (Trying to access an element behind the last one!) From johnthesaintjohn@gmail.com Sun Oct 09 02:06:47 2011 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RCguj-0001FJ-Cm>; Sun, 09 Oct 2011 02:06:45 +0200 Received: from mail-gy0-f182.google.com ([209.85.160.182]) by relay1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RCguj-0008RF-6b>; Sun, 09 Oct 2011 02:06:45 +0200 Received: by gyf2 with SMTP id 2so5775374gyf.13 for ; Sat, 08 Oct 2011 17:06:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:content-type:subject:date:message-id:to:mime-version:x-mailer; bh=Q3EzL1kZ4DE5p2m241ZzB3MHiusMZU6km131R+GXWtw=; b=O/sQbwXHXYopHRaAHcQxbhS2bCD4Kqrfe7TvQHlW0lzBoh2iwNBUmK1PVz81WVpVTg /Gp6CBuC9U3xSh/MMlqrcogKkDQpWWM6TrXHl7pn4P5gIWAMt5bGY2A0cyzza5gWySQY hMZXxdjO7hT+HHsSZSt1lzwWK09ep8Pls+rPY= Received: by 10.68.27.199 with SMTP id v7mr25306228pbg.2.1318118803729; Sat, 08 Oct 2011 17:06:43 -0700 (PDT) Received: from [192.168.1.120] (c-24-5-146-141.hsd1.ca.comcast.net. [24.5.146.141]) by mx.google.com with ESMTPS id z1sm48685274pbl.5.2011.10.08.17.06.42 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 08 Oct 2011 17:06:42 -0700 (PDT) From: John St John Content-Type: multipart/alternative; boundary="Apple-Mail=_FE31F8E8-CE1F-4074-969F-FCA8715FFF57" Date: Sat, 8 Oct 2011 17:06:47 -0700 Message-Id: <925CCD60-FDE9-4197-BE49-03AA8EFC5374@gmail.com> To: seqan-dev@lists.fu-berlin.de Mime-Version: 1.0 (Apple Message framework v1244.3) X-Mailer: Apple Mail (2.1244.3) X-Originating-IP: 209.85.160.182 X-ZEDAT-Hint: A X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1318118805-00005A17-9A653956/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000310, version=1.2.2 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Burundi.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=0.5 required=5.0 tests=DNS_FROM_RFC_ABUSE, HTML_30_40, HTML_MESSAGE,RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS Subject: [Seqan-dev] Read trimming question X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Oct 2011 00:06:47 -0000 --Apple-Mail=_FE31F8E8-CE1F-4074-969F-FCA8715FFF57 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hello, I am working on a quick re-write of an alignment based short-read = trimmer I wrote in C using SeqAnn. So far things are going really well. = I followed the Alignment tutorial on the trac wiki, and now I have an = Alignment Graph of a global alignment where gaps at all ends of the two = short sequences aren't penalized and gaps in the middle are treated = harshly. So pictorially I have an alignment like this: Seq1: ----ACATAG Seq2: TTAGATA--- I want to output the following trimmed sequences: Seq1: ACA Seq2: ATA However if the above alignment were reversed: Seq1: TTAGATA--- Seq2: ----ACATAG Then I want to output the merged and extended consensus, where I call = mismatches using a seperate quality score string as a tie breaker. Basically I don't know how to traverse the Alignment Graph to pull out = the information I need. I need to keep track of which sequence is which = in the alignment graph so that I can deal with the above two cases = properly. Any help or at least a link to a good resource on traversing = an alignment graph and doing something similar would be greatly = appreciated. I need to keep track of the indices of the bases I trim so = that I can also output the trimmed or merged quality string. Thanks everyone for your time, -John --Apple-Mail=_FE31F8E8-CE1F-4074-969F-FCA8715FFF57 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii Seq1: = ----ACATAG
Seq2: TTAGATA---

I want to output the following trimmed = sequences:
Seq1: ACA
Seq2: = ATA

However if the above alignment were = reversed:
Seq1: TTAGATA---
Seq2: = ----ACATAG

Then I want to output the = merged and extended consensus, where I call mismatches using a seperate = quality score string as a tie = breaker.

Basically I don't know how to traverse = the Alignment Graph to pull out the information I need. I need to keep = track of which sequence is which in the alignment graph so that I can = deal with the above two cases properly. Any help or at least a link to a = good resource on traversing an alignment graph and doing something = similar would be greatly appreciated. I need to keep track of the = indices of the bases I trim so that I can also output the trimmed or = merged quality string.

Thanks everyone for your = time,
-John



= --Apple-Mail=_FE31F8E8-CE1F-4074-969F-FCA8715FFF57-- From manuel.holtgrewe@fu-berlin.de Sun Oct 09 20:09:02 2011 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RCxo5-0007n3-O8>; Sun, 09 Oct 2011 20:09:01 +0200 Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost1.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1RCxo5-0007l0-LO>; Sun, 09 Oct 2011 20:09:01 +0200 Received: from 91-65-212-104-dynip.superkabel.de ([91.65.212.104] helo=[192.168.0.100]) by inpost2.zedat.fu-berlin.de (Exim 4.69) with esmtpsa (envelope-from ) id <1RCxo5-0005oG-GV>; Sun, 09 Oct 2011 20:09:01 +0200 Message-Id: <58675980-E73D-4F2B-BF54-4678043D23A5@fu-berlin.de> From: Manuel Holtgrewe To: SeqAn Development In-Reply-To: <925CCD60-FDE9-4197-BE49-03AA8EFC5374@gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Date: Sun, 9 Oct 2011 20:09:00 +0200 References: <925CCD60-FDE9-4197-BE49-03AA8EFC5374@gmail.com> X-Mailer: Apple Mail (2.936) X-Originating-IP: 91.65.212.104 X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1318183741-00005A17-3CFA9036/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.067548, version=1.2.2 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Algerien.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Subject: Re: [Seqan-dev] Read trimming question X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Oct 2011 18:09:02 -0000 John, the following computes a profile from an alignment graph. First, the connected components are computed, then they are iterated in lexicographical order. This gives you a colum-by-column iteration of the matrix based alignment represented by the alignment graph. Note that the alignment graph can contain ambiguities and there is more than one matrix based alignment that represents the alignment graph in most cases. HTH, Manuel template bool computeProfiles(StringSet > > >, TSpec0> & profiles, String & profileSupportInfos, Graph, TCargo, TSpec> > /*const*/ & g, Graph > /*const*/ & distances, bool logging) { using namespace seqan; typedef Graph, TCargo, TSpec> > TAlignmentGraph; typedef std::map TComponentLength; typedef Dna5 TAlphabet; // Allocate information for which sequence supports the profile at which position. resize(profileSupportInfos, length(stringSet(g))); // ----------------------------------------------------------------------- // Compute connected components and get topological sorting of them. // ----------------------------------------------------------------------- String component; String order; TComponentLength componentLength; if (!convertAlignment(g, component, order, componentLength)) return false; unsigned numComponents = length(order); // if (logging) // for (unsigned i = 0; i < numComponents; ++i) // std::cerr << "order[" << i << "] == " << order[i] << std::endl; // ----------------------------------------------------------------------- // Get connected components of distances / read alignment clusters. // ----------------------------------------------------------------------- // Each cluster corresponds to a contig. // A cluster is a CC in the graph where each sequences is a vertex and two vertices are connected if they have an // overlap alignment. String seqToCluster; unsigned numClusters = connectedComponents(distances, seqToCluster); // if (logging) // for (unsigned i = 0; i < length(seqToCluster); ++i) // std::cerr << "SEQ TO CLUSTER\t" << i << " --> " << seqToCluster[i] << std::endl; // std::cerr << distances << std::endl; // std::cerr << "numVertices(distances) == " << numVertices(distances) << std::endl; if (logging) std::cerr << "# clusters: " << numClusters << std::endl << "# components: " << numComponents << std::endl; resize(profiles, numClusters); // ----------------------------------------------------------------------- // Visit components in topological order and generate profile sequences. // ----------------------------------------------------------------------- // Get mapping from component to vertices. String > componentVertices; resize(componentVertices, numComponents); typedef typename Iterator::Type TVertexIterator; for (TVertexIterator itV(g); !atEnd(itV); goNext(itV)) { // std::cerr << "VERTEX TO COMPONENT\t" << *itV << " --> " << getProperty(component, *itV) << std::endl; appendValue(componentVertices[getProperty(component, *itV)], *itV); } // For each cluster, the number of currently overlapping reads. String activeReads; resize(activeReads, numClusters, 0); // Iterate vertices in topological order. unsigned verticesVisited = 0; for (typename Iterator, Rooted>::Type it = begin(order, Rooted()); !atEnd(it); goNext(it)) { unsigned c = *it; // Current component. unsigned fLen = fragmentLength(g, front(componentVertices[c])); unsigned cl = seqToCluster[sequenceId(g, front(componentVertices[c]))]; // Current cluster/contig. // Grow profile string for the current contig/cluster. unsigned from = length(profiles[cl]); resize(profiles[cl], from + fLen); // if (logging) // std::cerr << "seq id == " << sequenceId(g, front(componentVertices[c])) << ", cl == " << cl << std::endl; // Make fragments of vertices of current component vote for their character. unsigned numNewThisRound = 0; unsigned numDoneThisRound = 0; typedef typename Iterator, Rooted>::Type TDescIt; // std::cerr << "length(componentVertices[" << c << "]) == " << length(componentVertices[c]) << std::endl; for (TDescIt itV = begin(componentVertices[c], Rooted()); ! atEnd(itV); goNext(itV)) { verticesVisited += 1; // std::cerr << "VISITING\t" << *itV << std::endl; unsigned idx = idToPosition(stringSet(g), sequenceId(g, *itV)); // if (logging) // std::cerr << "\t id == " << idToPosition(stringSet(g), sequenceId(g, *itV)) << ", idx == " << idx << std::endl; unsigned fBeg = fragmentBegin(g, *itV); // Register sequence as supporting in profile cl starting at position from in profile. if (fBeg == 0u) profileSupportInfos[idx] = ProfileSupportInfo(idx, cl, from, from); profileSupportInfos[idx].profileEnd = from + fLen; numNewThisRound += (fBeg == 0); unsigned fEnd = fBeg + fLen; numDoneThisRound += (fEnd == length(stringSet(g)[idx])); SEQAN_ASSERT_EQ(fLen, fragmentLength(g, *itV)); for (unsigned i = 0; i < fLen; ++i) profiles[cl][from + i].count[ordValue(stringSet(g) [idx][fBeg + i])] += 1; } // Some reads became active *in* this round. activeReads[cl] += numNewThisRound; // Now, make the active reads in the current component vote for "not here"/'-'. SEQAN_ASSERT_GEQ(activeReads[cl], length(componentVertices[c])); unsigned numGapVotes = activeReads[cl] - length(componentVertices[c]); for (unsigned i = from; i < length(profiles[cl]); ++i) profiles[cl][i].count[ValueSize::VALUE] += numGapVotes; // if (logging) // std::cerr << "NEW THIS ROUND " << numNewThisRound << "\tDONE THIS ROUND " << numDoneThisRound << std::endl // << "\t GAP VOTES " << numGapVotes << "\tNON- GAP VOTES " << length(componentVertices[c]) << std::endl; // Some reads become inactive *after* this round. SEQAN_ASSERT_GEQ(activeReads[cl], numDoneThisRound); activeReads[cl] -= numDoneThisRound; } SEQAN_ASSERT_EQ(numVertices(g), verticesVisited); // if (logging) // for (unsigned i = 0; i < numClusters; ++i) // std::cerr << "len(profiles[" << i << "]) == " << length(profiles[i]) << std::endl; return true; } From weese@campus.fu-berlin.de Mon Oct 10 21:31:31 2011 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RDLZS-0007ls-L4>; Mon, 10 Oct 2011 21:31:30 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RDLZS-0007lf-Ix>; Mon, 10 Oct 2011 21:31:30 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RDLZS-0003M2-Dj>; Mon, 10 Oct 2011 21:31:30 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Mon, 10 Oct 2011 21:31:30 +0200 From: "Weese, David" To: SeqAn Development Date: Mon, 10 Oct 2011 21:31:26 +0200 Thread-Topic: [Seqan-dev] usage of setStepSize Thread-Index: AcyHgzWvRshUDwLOSUOH7AoQqasp8w== Message-ID: <084713C6-52A6-4010-8B02-8E33CC983B1A@fu-berlin.de> References: <4E8DA24A.8040009@in.tum.de> In-Reply-To: <4E8DA24A.8040009@in.tum.de> Accept-Language: de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: de-DE Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1318275090-00005A17-3765B111/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Dschibuti.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Subject: Re: [Seqan-dev] usage of setStepSize X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Oct 2011 19:31:31 -0000 Hi Johannes, the way you use setStepSize is absolutely right. It was a bug in the q-gram= index construction, fixed in [10554]. http://trac.mi.fu-berlin.de/seqan/changeset/10554 Thanks for reporting, David Am 06.10.2011 um 14:42 schrieb Johannes Merkle: > Hi, >=20 > I'm experiencing some odd behaviour when using the q-gram-index with a=20 > different stepsize. > The attached code (which is very similar to your tutorial, only with the= =20 > added setStepSize command and the OpenAddressing spec) returns wrong=20 > results (0 and 7). Without the OpenAddressing spec it doesn't work at=20 > all (see attached error message). >=20 > Am i using the setStepSize command incorrect or is this a bug? >=20 > Thanks, > Johannes >=20 >=20 >=20 > Code: > typedef Index, OpenAddressing >=20 >> TIndex; > TIndex index("CATGATTACATA"); > setStepSize(index,2); > hash(indexShape(index), "CAT"); > for (unsigned i =3D 0; i < length(getOccurrences(index,=20 > indexShape(index))); ++i) > std::cout << getOccurrences(index, indexShape(index))[i] <<=20 > std::endl; >=20 > Error Message: > ...SeqAn\core\include\seqan/sequence/string_base.h:227 Assertion failed=20 > : static_cast(pos) < static_cast(length(me))=20 > was: 254 >=3D 65 (Trying to access an element behind the last one!) >=20 > _______________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev From earonesty@gmail.com Wed Oct 12 22:03:10 2011 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RE51B-00058B-Uy>; Wed, 12 Oct 2011 22:03:10 +0200 Received: from mail-vx0-f182.google.com ([209.85.220.182]) by relay1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RE51B-0006r6-Oz>; Wed, 12 Oct 2011 22:03:09 +0200 Received: by vcbf13 with SMTP id f13so1324442vcb.13 for ; Wed, 12 Oct 2011 13:03:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=gegLLgw4hBhi14TfGUS9jw+3zaLvlGvTdAHOeD1hQis=; b=dOcFq8pxB8JwPLpniQ9DZg1sbjP1Sfshz7XrxZ+k57Xv1zv60iL1q8KwYDQpvXQ4de utugLU1zKeuLXlpY1XVUyoffoIsbtcmTQglLn4q7bPS42oU9E6voU9zIUQvfvZaaeBYw jOD+OGF4MXId5j3bg5R0vCbxQM+AWu8XaDGB8= MIME-Version: 1.0 Received: by 10.220.140.146 with SMTP id i18mr45687vcu.75.1318449788714; Wed, 12 Oct 2011 13:03:08 -0700 (PDT) Sender: earonesty@gmail.com Received: by 10.220.188.140 with HTTP; Wed, 12 Oct 2011 13:03:08 -0700 (PDT) Date: Wed, 12 Oct 2011 16:03:08 -0400 X-Google-Sender-Auth: cQD7zwDxqrnmNodEDiC4xTkWcMQ Message-ID: From: Erik Aronesty To: seqan-dev@lists.fu-berlin.de Content-Type: multipart/alternative; boundary=f46d043749f74f7ef204af1f8351 X-Originating-IP: 209.85.220.182 X-ZEDAT-Hint: A X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1318449789-00005A17-21A7F2C7/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000003, version=1.2.2 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Gabun.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=0.5 required=5.0 tests=DNS_FROM_RFC_ABUSE, HTML_30_40, HTML_MESSAGE,RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS Subject: [Seqan-dev] obtain consensus sequence X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Oct 2011 20:03:10 -0000 --f46d043749f74f7ef204af1f8351 Content-Type: text/plain; charset=ISO-8859-1 i have a list of sequences that i ran a globalmsa on i can output the alignment and see that it worked ok i would like obtain the "consensus sequence" for it... i was going to walk through each alignment and record the "majority rules" for each position is this the best way to do it... it seems there a tool that's built (consensus.h)... but not sure how to use it: I do this: .... appendValue(rows(sub),fq.seq+p) ... Then this: globalMsaAlignment(sub); Then I'd like to get the consensus sequence out of "sub" such that at each position, the majority rules sequence. --f46d043749f74f7ef204af1f8351 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable i have a list of sequences that i ran a globalmsa on

i c= an output the alignment and see that it worked ok

= i would like obtain the "consensus sequence" for it...=A0

i was going to walk through each alignment and record the &q= uot;majority rules" for each position=A0

is t= his the best way to do it... it seems there a tool that's built (consen= sus.h)... but not sure how to use it:

I do this:
.... appendValue(rows(sub),fq.seq+= p) ...

Then this:

=A0glob= alMsaAlignment(sub);

Then I'd like to get the = consensus sequence out of "sub" such that at each position, the m= ajority rules sequence.

--f46d043749f74f7ef204af1f8351-- From manuel.holtgrewe@fu-berlin.de Wed Oct 12 23:09:50 2011 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RE63h-0007Vw-NF>; Wed, 12 Oct 2011 23:09:49 +0200 Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost1.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1RE63h-00056C-E9>; Wed, 12 Oct 2011 23:09:49 +0200 Received: from 91-65-212-104-dynip.superkabel.de ([91.65.212.104] helo=[192.168.0.100]) by inpost2.zedat.fu-berlin.de (Exim 4.69) with esmtpsa (envelope-from ) id <1RE63h-0004oI-6k>; Wed, 12 Oct 2011 23:09:49 +0200 Message-Id: <62C03EF7-9FAB-4435-8BBA-A3A0F027F7DE@fu-berlin.de> From: Manuel Holtgrewe To: SeqAn Development In-Reply-To: Content-Type: multipart/mixed; boundary=Apple-Mail-4-844172551 Mime-Version: 1.0 (Apple Message framework v936) Date: Wed, 12 Oct 2011 23:09:48 +0200 References: X-Mailer: Apple Mail (2.936) X-Originating-IP: 91.65.212.104 X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1318453789-00005A17-D716AB8B/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.001029, version=1.2.2 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Benin.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Subject: Re: [Seqan-dev] obtain consensus sequence X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Oct 2011 21:09:50 -0000 --Apple-Mail-4-844172551 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Dear Erik, the consensus module in its current version will not help you. What you have to do is first to compute the profiles for each cluster of sequences in your MSA. Then, you can comput the consensus from that. Appended is a header with functions to do just that. Code with similar functionality will be part of a future version of SeqAn. Don't hesitate to send questions, proposals for improvements etc. HTH, Manuel --Apple-Mail-4-844172551 Content-Disposition: attachment; filename=consensus_calling.h Content-Type: application/octet-stream; x-unix-mode=0644; name="consensus_calling.h" Content-Transfer-Encoding: 7bit // ========================================================================== // consensus_calling.h // ========================================================================== // Copyright (c) 2006-2010, Knut Reinert, FU Berlin // All rights reserved. // // Redistribution and use in source and binary forms, with or without // modification, are permitted provided that the following conditions are met: // // * Redistributions of source code must retain the above copyright // notice, this list of conditions and the following disclaimer. // * Redistributions in binary form must reproduce the above copyright // notice, this list of conditions and the following disclaimer in the // documentation and/or other materials provided with the distribution. // * Neither the name of Knut Reinert or the FU Berlin nor the names of // its contributors may be used to endorse or promote products derived // from this software without specific prior written permission. // // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" // AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE // IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE // ARE DISCLAIMED. IN NO EVENT SHALL KNUT REINERT OR THE FU BERLIN BE LIABLE // FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL // DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR // SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT // LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY // OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH // DAMAGE. // // ========================================================================== // Author: Manuel Holtgrewe // ========================================================================== #ifndef SANDBOX_HOLTGREW_APPS_CONSASS_CONSENSUS_CALLING_H_ #define SANDBOX_HOLTGREW_APPS_CONSASS_CONSENSUS_CALLING_H_ #include namespace seqan { // ============================================================================ // Forwards // ============================================================================ // ============================================================================ // Tags, Classes, Enums // ============================================================================ // Store the information that the sequence with the id sequenceId starts at position profilePos in profile profileId. struct ProfileSupportInfo { unsigned sequenceId; unsigned profileId; unsigned profilePos; // TODO(holtgrew): rename to beginPos? unsigned profileEnd; // TODO(holtgrew): rename to endPos? ProfileSupportInfo() : sequenceId(MaxValue::VALUE), profileId(MaxValue::VALUE), profilePos(MaxValue::VALUE), profileEnd(MaxValue::VALUE) {} ProfileSupportInfo(unsigned _sequenceId, unsigned _profileId, unsigned _profilePos, unsigned _profileEnd) : sequenceId(_sequenceId), profileId(_profileId), profilePos(_profilePos), profileEnd(_profileEnd) {} }; struct ConsensusStats { unsigned minCoverage; unsigned maxCoverage; double avgCoverage; double avgKnownCoverage; unsigned unknownBases; ConsensusStats() : minCoverage(0), maxCoverage(0), avgCoverage(0), avgKnownCoverage(0), unknownBases(0) {} }; // ============================================================================ // Metafunctions // ============================================================================ // ============================================================================ // Functions // ============================================================================ // This function is given an alignment graph and the distance matrix for the sequences (for components in this graph). // It computes the profile sequences for each component in the distance graph. template bool computeProfiles(StringSet > > >, TSpec0> & profiles, String & profileSupportInfos, Graph, TCargo, TSpec> > /*const*/ & g, Graph > /*const*/ & distances, bool logging) { // TODO(holtgrew): This function is very similar to the updateStoreFromAlignmentGraph function. Maybe we can share the commonality? using namespace seqan; typedef Graph, TCargo, TSpec> > TAlignmentGraph; typedef std::map TComponentLength; typedef Dna5 TAlphabet; // Allocate information for which sequence supports the profile at which position. resize(profileSupportInfos, length(stringSet(g))); // ----------------------------------------------------------------------- // Compute connected components and get topological sorting of them. // ----------------------------------------------------------------------- String component; String order; TComponentLength componentLength; if (!convertAlignment(g, component, order, componentLength)) return false; unsigned numComponents = length(order); // if (logging) // for (unsigned i = 0; i < numComponents; ++i) // std::cerr << "order[" << i << "] == " << order[i] << std::endl; // ----------------------------------------------------------------------- // Get connected components of distances / read alignment clusters. // ----------------------------------------------------------------------- // Each cluster corresponds to a contig. // A cluster is a CC in the graph where each sequences is a vertex and two vertices are connected if they have an // overlap alignment. String seqToCluster; unsigned numClusters = connectedComponents(distances, seqToCluster); // if (logging) // for (unsigned i = 0; i < length(seqToCluster); ++i) // std::cerr << "SEQ TO CLUSTER\t" << i << " --> " << seqToCluster[i] << std::endl; // std::cerr << distances << std::endl; // std::cerr << "numVertices(distances) == " << numVertices(distances) << std::endl; if (logging) std::cerr << "# clusters: " << numClusters << std::endl << "# components: " << numComponents << std::endl; resize(profiles, numClusters); // ----------------------------------------------------------------------- // Visit components in topological order and generate profile sequences. // ----------------------------------------------------------------------- // Get mapping from component to vertices. String > componentVertices; resize(componentVertices, numComponents); typedef typename Iterator::Type TVertexIterator; for (TVertexIterator itV(g); !atEnd(itV); goNext(itV)) { // std::cerr << "VERTEX TO COMPONENT\t" << *itV << " --> " << getProperty(component, *itV) << std::endl; appendValue(componentVertices[getProperty(component, *itV)], *itV); } // For each cluster, the number of currently overlapping reads. String activeReads; resize(activeReads, numClusters, 0); // Iterate vertices in topological order. unsigned verticesVisited = 0; for (typename Iterator, Rooted>::Type it = begin(order, Rooted()); !atEnd(it); goNext(it)) { unsigned c = *it; // Current component. unsigned fLen = fragmentLength(g, front(componentVertices[c])); unsigned cl = seqToCluster[sequenceId(g, front(componentVertices[c]))]; // Current cluster/contig. // Grow profile string for the current contig/cluster. unsigned from = length(profiles[cl]); resize(profiles[cl], from + fLen); // if (logging) // std::cerr << "seq id == " << sequenceId(g, front(componentVertices[c])) << ", cl == " << cl << std::endl; // Make fragments of vertices of current component vote for their character. unsigned numNewThisRound = 0; unsigned numDoneThisRound = 0; typedef typename Iterator, Rooted>::Type TDescIt; // std::cerr << "length(componentVertices[" << c << "]) == " << length(componentVertices[c]) << std::endl; for (TDescIt itV = begin(componentVertices[c], Rooted()); !atEnd(itV); goNext(itV)) { verticesVisited += 1; // std::cerr << "VISITING\t" << *itV << std::endl; unsigned idx = idToPosition(stringSet(g), sequenceId(g, *itV)); // if (logging) // std::cerr << "\t id == " << idToPosition(stringSet(g), sequenceId(g, *itV)) << ", idx == " << idx << std::endl; unsigned fBeg = fragmentBegin(g, *itV); // Register sequence as supporting in profile cl starting at position from in profile. if (fBeg == 0u) profileSupportInfos[idx] = ProfileSupportInfo(idx, cl, from, from); profileSupportInfos[idx].profileEnd = from + fLen; numNewThisRound += (fBeg == 0); unsigned fEnd = fBeg + fLen; numDoneThisRound += (fEnd == length(stringSet(g)[idx])); SEQAN_ASSERT_EQ(fLen, fragmentLength(g, *itV)); for (unsigned i = 0; i < fLen; ++i) profiles[cl][from + i].count[ordValue(stringSet(g)[idx][fBeg + i])] += 1; } // Some reads became active *in* this round. activeReads[cl] += numNewThisRound; // Now, make the active reads in the current component vote for "not here"/'-'. SEQAN_ASSERT_GEQ(activeReads[cl], length(componentVertices[c])); unsigned numGapVotes = activeReads[cl] - length(componentVertices[c]); for (unsigned i = from; i < length(profiles[cl]); ++i) profiles[cl][i].count[ValueSize::VALUE] += numGapVotes; // if (logging) // std::cerr << "NEW THIS ROUND " << numNewThisRound << "\tDONE THIS ROUND " << numDoneThisRound << std::endl // << "\t GAP VOTES " << numGapVotes << "\tNON-GAP VOTES " << length(componentVertices[c]) << std::endl; // Some reads become inactive *after* this round. SEQAN_ASSERT_GEQ(activeReads[cl], numDoneThisRound); activeReads[cl] -= numDoneThisRound; } SEQAN_ASSERT_EQ(numVertices(g), verticesVisited); // if (logging) // for (unsigned i = 0; i < numClusters; ++i) // std::cerr << "len(profiles[" << i << "]) == " << length(profiles[i]) << std::endl; return true; } // Given a profile sequence, compute majority vote of consensus, remove gaps. // // 'N' is called if less than minSupport votes. template void callConsensus(TTargetSeq & consensus, String > > > & profile, unsigned minSupport, MajorityVote const & /*tag*/) { typedef String > > > TProfileString; typedef typename Iterator::Type TProfileStringIter; reserve(consensus, length(profile), Exact()); for (TProfileStringIter it = begin(profile); !atEnd(it); goNext(it)) { unsigned idx = _getMaxIndex(*it); if (idx >= ValueSize::VALUE) continue; // Call as gap. if (it->count[idx] >= minSupport) appendValue(consensus, TAlphabet(idx)); else appendValue(consensus, unknownValue()); } } // Given a profile sequence, compute stats for when calling consensus. template void computeConsensusStats(ConsensusStats & stats, String > > > & profile, unsigned minSupport, MajorityVote const & /*tag*/) { typedef String > > > TProfileString; typedef typename Iterator::Type TProfileStringIter; typedef ModifiedAlphabet > TModifiedAlphabet; stats.minCoverage = MaxValue::VALUE; stats.maxCoverage = 0; stats.avgCoverage = 0; stats.avgKnownCoverage = 0; stats.unknownBases = 0; unsigned numGaps = 0; __uint64 coverageSum = 0; __uint64 knownCoverageSum = 0; for (TProfileStringIter it = begin(profile); !atEnd(it); goNext(it)) { unsigned idx = _getMaxIndex(*it); if (idx >= ValueSize::VALUE) { numGaps += 1; continue; // Skip gaps. } if (it->count[idx] < minSupport) stats.unknownBases += 1; unsigned coverage = 0; for (unsigned i = 0; i < ValueSize::VALUE; ++i) coverage += it->count[i]; stats.minCoverage = std::min(stats.minCoverage, coverage); stats.maxCoverage = std::max(stats.maxCoverage, coverage); coverageSum += coverage; if (it->count[idx] >= minSupport) knownCoverageSum += coverage; } stats.avgCoverage = (length(profile) == numGaps) ? 0.0 : 1.0 * coverageSum / (length(profile) - numGaps); stats.avgKnownCoverage = (length(profile) == numGaps + stats.unknownBases) ? 0.0 : 1.0 * knownCoverageSum / (length(profile) - numGaps - stats.unknownBases); } } // namespace seqan #endif // #ifndef SANDBOX_HOLTGREW_APPS_CONSASS_CONSENSUS_CALLING_H_ --Apple-Mail-4-844172551 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit --Apple-Mail-4-844172551-- From neilniu.cn@gmail.com Mon Oct 24 23:50:36 2011 Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RISPj-0007lU-66>; Mon, 24 Oct 2011 23:50:35 +0200 Received: from mail-qw0-f54.google.com ([209.85.216.54]) by relay1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RISPj-0000ub-0E>; Mon, 24 Oct 2011 23:50:35 +0200 Received: by qadz32 with SMTP id z32so3956471qad.13 for ; Mon, 24 Oct 2011 14:50:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=PzPzW2v5nlyDAG2W644P1/4oIwc+kC3vbB5gahyYGEc=; b=AJGASLAXGn/psuTnU8lxZ87Cc8A5TQu9xelnpncqw2ykzOUKZ/ZvSKE1S7G+r+/PaY ycuDsAjU8oFeeyrEcXB4otf/w04dYiuQXgP6pSGcWbZd/TqOoxzRvPDaFToIB6F5KS58 PuPBIsen9Hn2WW5ozhDNGn0mkJZvD10pnotV4= MIME-Version: 1.0 Received: by 10.224.9.11 with SMTP id j11mr20114589qaj.97.1319493033751; Mon, 24 Oct 2011 14:50:33 -0700 (PDT) Received: by 10.224.86.11 with HTTP; Mon, 24 Oct 2011 14:50:33 -0700 (PDT) Date: Mon, 24 Oct 2011 14:50:33 -0700 Message-ID: From: Beifang Niu To: seqan-dev@lists.fu-berlin.de Content-Type: multipart/alternative; boundary=bcaec51b18ef8f744704b012694a X-Originating-IP: 209.85.216.54 X-ZEDAT-Hint: A X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1319493035-00005A17-9ECAE1D6/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Dschibuti.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=0.9 required=5.0 tests=DNS_FROM_RFC_ABUSE, HTML_20_30, HTML_MESSAGE,RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS Subject: [Seqan-dev] about extendSeed of Seqan X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Oct 2011 21:50:36 -0000 --bcaec51b18ef8f744704b012694a Content-Type: text/plain; charset=ISO-8859-1 Hi, I am trying to use Seqan library to do the MEM (max exact match) extension. Firstly, I get the MEM of the two genome sequences using MUMMER3 and then I want to use extendSeed of Seqan to do extension of MEMs. Is there any examples for extendSeed function of seeds class? thanks, Beifang. --bcaec51b18ef8f744704b012694a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi,

I am trying to use Seqan library to do the MEM (max = exact match) extension.
Firstly, I get the MEM of the two genome = sequences using MUMMER3 and then I want to use extendSeed of Seqan to do ex= tension of MEMs.
Is there any examples for extendSeed function of seeds class?=A0
=

thanks,
Beifang.
--bcaec51b18ef8f744704b012694a-- From Birte.Kehr@fu-berlin.de Tue Oct 25 00:59:28 2011 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RITUN-0001wX-C3>; Tue, 25 Oct 2011 00:59:27 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RITUN-00089d-94>; Tue, 25 Oct 2011 00:59:27 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RITUN-0001Gg-2n>; Tue, 25 Oct 2011 00:59:27 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Tue, 25 Oct 2011 00:59:27 +0200 From: "Kehr, Birte" To: SeqAn Development Date: Tue, 25 Oct 2011 00:59:24 +0200 Thread-Topic: [Seqan-dev] about extendSeed of Seqan Thread-Index: AcySlv5LB2PIRkdsTheDl7nojS3MuwACCIYg Message-ID: References: In-Reply-To: Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: multipart/alternative; boundary="_000_DAD226CB6878494EABEFD5215AA102015A8DE1CD68exchange6fube_" MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-ZEDAT-Hint: A X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1319497167-00005A17-B519BD0E/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Gabun.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED,HTML_MESSAGE Subject: Re: [Seqan-dev] about extendSeed of Seqan X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Oct 2011 22:59:28 -0000 --_000_DAD226CB6878494EABEFD5215AA102015A8DE1CD68exchange6fube_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Beifang, you can find an example for seed extension in the SeqAn-Tutorial at http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtensi= onAndBandedAlignment. You might also want to consider to use the seeds2 module instead of the see= ds module since we plan to replace the seeds module by the seeds2 module. U= nfortunately, there is no example on how to use the seeds2 module, yet. -Birte From: Beifang Niu [mailto:neilniu.cn@gmail.com] Sent: Montag, 24. Oktober 2011 14:51 To: seqan-dev@lists.fu-berlin.de Subject: [Seqan-dev] about extendSeed of Seqan Hi, I am trying to use Seqan library to do the MEM (max exact match) extension. Firstly, I get the MEM of the two genome sequences using MUMMER3 and then I= want to use extendSeed of Seqan to do extension of MEMs. Is there any examples for extendSeed function of seeds class? thanks, Beifang. --_000_DAD226CB6878494EABEFD5215AA102015A8DE1CD68exchange6fube_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi Beifang,

 

you = can find an example for seed extension in the SeqAn-Tutorial at<= /span>

http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#Se= edExtensionAndBandedAlignment.

 

You might also want to consider to use the seeds2 module in= stead of the seeds module since we plan to replace the seeds module by the = seeds2 module. Unfortunately, there is no example on how to use the seeds2 = module, yet.

=  

-Bir= te

 = ;

 

From: Bei= fang Niu [mailto:neilniu.cn@gmail.com]
Sent: Montag, 24. Oktober= 2011 14:51
To: seqan-dev@lists.fu-berlin.de
Subject: [= Seqan-dev] about extendSeed of Seqan

 

Hi,

=

 

I= am trying to use Seqan library to do the MEM (max exact match) extension.<= o:p>

Firstly, I get the MEM of the= two genome sequences using MUMMER3 and then I want to use extendSeed of Se= qan to do extension of MEMs.

= Is there any examples for extendSeed function of seeds class? 

 

thanks,

Beif= ang.

= --_000_DAD226CB6878494EABEFD5215AA102015A8DE1CD68exchange6fube_-- From Birte.Kehr@fu-berlin.de Thu Oct 27 23:08:11 2011 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RJXBJ-0005k0-4Q>; Thu, 27 Oct 2011 23:08:09 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1RJXBI-0006xM-VM>; Thu, 27 Oct 2011 23:08:09 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1RJXBI-0006AO-JL>; Thu, 27 Oct 2011 23:08:08 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Thu, 27 Oct 2011 23:08:08 +0200 From: "Kehr, Birte" To: Beifang Niu , "seqan-dev@lists.fu-berlin.de" Date: Thu, 27 Oct 2011 23:08:07 +0200 Thread-Topic: [Seqan-dev] about extendSeed of Seqan Thread-Index: AQHMlOyF1Xialy1ask2YlGFCqAybLg== Message-ID: Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1319749689-000041E7-88CBF177/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Botsuana.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Subject: Re: [Seqan-dev] about extendSeed of Seqan X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Oct 2011 21:08:11 -0000 Hi Beifang, the function extendSeed does not return the number of matched sequence posi= tions. I assume you have used ungapped X-drop extension? Then you can count matchi= ng positions by simply iterating over the infixes: typedef typename Infix::Type TInfix; TInfix infix1 =3D infix(seq1, leftPosition(seed, 0), rightPosition(seed, 0)= +1); TInfix infix2 =3D infix(seq2, leftPosition(seed, 1), rightPosition(seed, 1)= +1); unsigned count =3D 0; for(int i =3D 0; i < length(seed); ++i) { if (value(infix1, i) =3D=3D value(infix2, i)) ++count; } -Birte ________________________________________ From: Beifang Niu [neilniu.cn@gmail.com] Sent: Thursday, October 27, 2011 9:19 PM To: Kehr, Birte Subject: Re: seqan-dev Digest, Vol 25, Issue 6 Hi Birte, Thank you for your prompt response but I didn't receive your reply from seq= an development mail list. I did see the example for seed extension in the SeqAn-Tutorial. Now, I hav= e other questions for you. How can I get the actual aligned bases number between two extended seeds a= fter running extendSeeds? for example, sequences: ACGTAGTTT and ACGTGGTTT , there is one seed GTTT,= after the extension of left , I got the extension seeds: ACGTAGTTT a= nd ACGTGGTTT ( there is only one mismatch) , the actual aligned bases numbe= r is 8. I want to get this number but I don;t know how to get it only running exte= ndSeeds. thank you, Beifang. On Tue, Oct 25, 2011 at 3:00 AM, > wrote: Send seqan-dev mailing list submissions to seqan-dev@lists.fu-berlin.de To subscribe or unsubscribe via the World Wide Web, visit https://lists.fu-berlin.de/listinfo/seqan-dev or, via email, send a message with subject or body 'help' to seqan-dev-request@lists.fu-berlin.de You can reach the person managing the list at seqan-dev-owner@lists.fu-berlin.de When replying, please edit your Subject line so it is more specific than "Re: Contents of seqan-dev digest..." Today's Topics: 1. about extendSeed of Seqan (Beifang Niu) 2. Re: about extendSeed of Seqan (Kehr, Birte) ---------------------------------------------------------------------- Message: 1 Date: Mon, 24 Oct 2011 14:50:33 -0700 From: Beifang Niu > Subject: [Seqan-dev] about extendSeed of Seqan To: seqan-dev@lists.fu-berlin.de Message-ID: > Content-Type: text/plain; charset=3D"iso-8859-1" Hi, I am trying to use Seqan library to do the MEM (max exact match) extension. Firstly, I get the MEM of the two genome sequences using MUMMER3 and then I want to use extendSeed of Seqan to do extension of MEMs. Is there any examples for extendSeed function of seeds class? thanks, Beifang. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 25 Oct 2011 00:59:24 +0200 From: "Kehr, Birte" > Subject: Re: [Seqan-dev] about extendSeed of Seqan To: SeqAn Development > Message-ID: > Content-Type: text/plain; charset=3D"us-ascii" Hi Beifang, you can find an example for seed extension in the SeqAn-Tutorial at http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtensi= onAndBandedAlignment. You might also want to consider to use the seeds2 module instead of the see= ds module since we plan to replace the seeds module by the seeds2 module. U= nfortunately, there is no example on how to use the seeds2 module, yet. -Birte From: Beifang Niu [mailto:neilniu.cn@gmail.com= ] Sent: Montag, 24. Oktober 2011 14:51 To: seqan-dev@lists.fu-berlin.de Subject: [Seqan-dev] about extendSeed of Seqan Hi, I am trying to use Seqan library to do the MEM (max exact match) extension. Firstly, I get the MEM of the two genome sequences using MUMMER3 and then I= want to use extendSeed of Seqan to do extension of MEMs. Is there any examples for extendSeed function of seeds class? thanks, Beifang. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de https://lists.fu-berlin.de/listinfo/seqan-dev End of seqan-dev Digest, Vol 25, Issue 6 **************************************** From Birte.Kehr@fu-berlin.de Fri Oct 28 00:41:51 2011 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RJYdy-0000mP-SC>; Fri, 28 Oct 2011 00:41:51 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1RJYdy-0000f1-ON>; Fri, 28 Oct 2011 00:41:50 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1RJYdy-0003bt-Do>; Fri, 28 Oct 2011 00:41:50 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Fri, 28 Oct 2011 00:41:50 +0200 From: "Kehr, Birte" To: Beifang Niu , "seqan-dev@lists.fu-berlin.de" Date: Fri, 28 Oct 2011 00:41:49 +0200 Thread-Topic: [Seqan-dev] about extendSeed of Seqan Thread-Index: AcyU7ZAtH6diQRKDSNKMULbktPlhEgACW1jw Message-ID: References: , In-Reply-To: Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1319755310-000041E7-C22D57B3/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Gabun.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED Subject: Re: [Seqan-dev] about extendSeed of Seqan X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Oct 2011 22:41:52 -0000 Yes, in the case of gapped X-drop extension you have to do globalAlignment.= The function extendSeed does not do the traceback and does not determine t= he number of matching positions. But it does compute maximal and minimal diagonals, such that you can band t= he global alignment (see the Tutorial example). What kind of score are you using? Would the score of the extensions help yo= u? -Birte ________________________________________ From: Beifang Niu [neilniu.cn@gmail.com] Sent: Thursday, October 27, 2011 11:15 PM To: Kehr, Birte Subject: Re: [Seqan-dev] about extendSeed of Seqan Hi Birte, Unfortunately, I have to use gapped X-drop extension. Do I have to do globalAlignment to get the matched number? I just need the= matched number after gapped seed extension and there will be increase in c= omputation time if I have to do globalAlignment for getting matched number. Any ideas? thanks, Beifang. On Thu, Oct 27, 2011 at 2:08 PM, Kehr, Birte > wrote: Hi Beifang, the function extendSeed does not return the number of matched sequence posi= tions. I assume you have used ungapped X-drop extension? Then you can count matchi= ng positions by simply iterating over the infixes: typedef typename Infix::Type TInfix; TInfix infix1 =3D infix(seq1, leftPosition(seed, 0), rightPosition(seed, 0)= +1); TInfix infix2 =3D infix(seq2, leftPosition(seed, 1), rightPosition(seed, 1)= +1); unsigned count =3D 0; for(int i =3D 0; i < length(seed); ++i) { if (value(infix1, i) =3D=3D value(infix2, i)) ++count; } -Birte ________________________________________ From: Beifang Niu [neilniu.cn@gmail.com] Sent: Thursday, October 27, 2011 9:19 PM To: Kehr, Birte Subject: Re: seqan-dev Digest, Vol 25, Issue 6 Hi Birte, Thank you for your prompt response but I didn't receive your reply from seq= an development mail list. I did see the example for seed extension in the SeqAn-Tutorial. Now, I hav= e other questions for you. How can I get the actual aligned bases number between two extended seeds a= fter running extendSeeds? for example, sequences: ACGTAGTTT and ACGTGGTTT , there is one seed GTTT,= after the extension of left , I got the extension seeds: ACGTAGTTT a= nd ACGTGGTTT ( there is only one mismatch) , the actual aligned bases numbe= r is 8. I want to get this number but I don;t know how to get it only running exte= ndSeeds. thank you, Beifang. On Tue, Oct 25, 2011 at 3:00 AM, >> wrote: Send seqan-dev mailing list submissions to seqan-dev@lists.fu-berlin.de> To subscribe or unsubscribe via the World Wide Web, visit https://lists.fu-berlin.de/listinfo/seqan-dev or, via email, send a message with subject or body 'help' to seqan-dev-request@lists.fu-berlin.de> You can reach the person managing the list at seqan-dev-owner@lists.fu-berlin.de> When replying, please edit your Subject line so it is more specific than "Re: Contents of seqan-dev digest..." Today's Topics: 1. about extendSeed of Seqan (Beifang Niu) 2. Re: about extendSeed of Seqan (Kehr, Birte) ---------------------------------------------------------------------- Message: 1 Date: Mon, 24 Oct 2011 14:50:33 -0700 From: Beifang Niu >> Subject: [Seqan-dev] about extendSeed of Seqan To: seqan-dev@lists.fu-berlin.de> Message-ID: = >> Content-Type: text/plain; charset=3D"iso-8859-1" Hi, I am trying to use Seqan library to do the MEM (max exact match) extension. Firstly, I get the MEM of the two genome sequences using MUMMER3 and then I want to use extendSeed of Seqan to do extension of MEMs. Is there any examples for extendSeed function of seeds class? thanks, Beifang. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 25 Oct 2011 00:59:24 +0200 From: "Kehr, Birte" >> Subject: Re: [Seqan-dev] about extendSeed of Seqan To: SeqAn Development >> Message-ID: >> Content-Type: text/plain; charset=3D"us-ascii" Hi Beifang, you can find an example for seed extension in the SeqAn-Tutorial at http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtensi= onAndBandedAlignment. You might also want to consider to use the seeds2 module instead of the see= ds module since we plan to replace the seeds module by the seeds2 module. U= nfortunately, there is no example on how to use the seeds2 module, yet. -Birte From: Beifang Niu [mailto:neilniu.cn@gmail.com= >] Sent: Montag, 24. Oktober 2011 14:51 To: seqan-dev@lists.fu-berlin.de> Subject: [Seqan-dev] about extendSeed of Seqan Hi, I am trying to use Seqan library to do the MEM (max exact match) extension. Firstly, I get the MEM of the two genome sequences using MUMMER3 and then I= want to use extendSeed of Seqan to do extension of MEMs. Is there any examples for extendSeed function of seeds class? thanks, Beifang. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de> https://lists.fu-berlin.de/listinfo/seqan-dev End of seqan-dev Digest, Vol 25, Issue 6 **************************************** From Birte.Kehr@fu-berlin.de Fri Oct 28 21:21:16 2011 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RJrzO-0007Uu-Ge>; Fri, 28 Oct 2011 21:21:14 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1RJrzO-0008TL-4H>; Fri, 28 Oct 2011 21:21:14 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1RJrzN-00068p-I2>; Fri, 28 Oct 2011 21:21:14 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Fri, 28 Oct 2011 21:21:13 +0200 From: "Kehr, Birte" To: Beifang Niu , SeqAn Development Date: Fri, 28 Oct 2011 21:21:10 +0200 Thread-Topic: [Seqan-dev] about extendSeed of Seqan Thread-Index: AcyVBuHK5RHH8dM3RR+E8Oxkpy65+AAnULAg Message-ID: References: In-Reply-To: Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: multipart/alternative; boundary="_000_DAD226CB6878494EABEFD5215AA102015A9527E3D0exchange6fube_" MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-ZEDAT-Hint: A X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1319829674-000041E7-3D435CA3/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Dschibuti.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.7 required=5.0 tests=ALL_TRUSTED,HTML_50_60, HTML_MESSAGE Subject: Re: [Seqan-dev] about extendSeed of Seqan X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Oct 2011 19:21:16 -0000 --_000_DAD226CB6878494EABEFD5215AA102015A9527E3D0exchange6fube_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Beifang, Apart from modifying the function extendSeed, there is another simple way h= ow you can limit the extension to up to 100bp. You can use infixes of the input sequences that you pass over to the functi= on extendSeed instead of seq0 and seq1, e.g.: typedef typename Infix::Type TInfix; TInfix infix0 =3D infix(seq0, _max(0, leftPosition(seed, 0) - 100), _min(le= ngth(seq0), rightPosition(seed, 0) + 101)); TInfix infix1 =3D infix(seq1, _max(0, leftPosition(seed, 1) - 100), _min(le= ngth(seq1), rightPosition(seed, 1) + 101)); extendSeed(..., infix0, infix1, 2, ...); In order to get the number of matching positions of an alignment you have t= o iterate over the alignment and test at every position if there is a gap i= n any of the sequences, and if not if the characters are equal. Have a look= at the Alignment Tutorial for an introduction to the Align data structure. http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Alignments -Birte From: Beifang Niu [mailto:neilniu.cn@gmail.com] Sent: Donnerstag, 27. Oktober 2011 17:17 To: Kehr, Birte Subject: Re: [Seqan-dev] about extendSeed of Seqan Hi Birte, I have other questions for you. One is about the scope of seed. I just want to do seed extension around see= d because there are so many seeds (maximal exact matchs) between two genome= s. There is just one seed extension example for two sequences in the tutorial.= Can I set the scope of seed extension? for example, I just want to do seed= extension within 100bps around the seed in two directions, not the extensi= on on the whole sequence. I checked the code of gapped extension and found the prefix() and suffix() = function. I don't know if it is feasible to modify these two functions to g= et the part prefix and suffix of the seed. It will be simple to ungapped extension and I can directly give a threshold= to limit the seed extension within 100bps. another question is : How do i get the match numbers from globalAlignment results of the seed? thank you, Beifang. On Thu, Oct 27, 2011 at 3:41 PM, Kehr, Birte > wrote: Yes, in the case of gapped X-drop extension you have to do globalAlignment.= The function extendSeed does not do the traceback and does not determine t= he number of matching positions. But it does compute maximal and minimal diagonals, such that you can band t= he global alignment (see the Tutorial example). What kind of score are you using? Would the score of the extensions help yo= u? -Birte ________________________________________ From: Beifang Niu [neilniu.cn@gmail.com] Sent: Thursday, October 27, 2011 11:15 PM To: Kehr, Birte Subject: Re: [Seqan-dev] about extendSeed of Seqan Hi Birte, Unfortunately, I have to use gapped X-drop extension. Do I have to do globalAlignment to get the matched number? I just need the= matched number after gapped seed extension and there will be increase in c= omputation time if I have to do globalAlignment for getting matched number. Any ideas? thanks, Beifang. On Thu, Oct 27, 2011 at 2:08 PM, Kehr, Birte >> wrote: Hi Beifang, the function extendSeed does not return the number of matched sequence posi= tions. I assume you have used ungapped X-drop extension? Then you can count matchi= ng positions by simply iterating over the infixes: typedef typename Infix::Type TInfix; TInfix infix1 =3D infix(seq1, leftPosition(seed, 0), rightPosition(seed, 0)= +1); TInfix infix2 =3D infix(seq2, leftPosition(seed, 1), rightPosition(seed, 1)= +1); unsigned count =3D 0; for(int i =3D 0; i < length(seed); ++i) { if (value(infix1, i) =3D=3D value(infix2, i)) ++count; } -Birte ________________________________________ From: Beifang Niu [neilniu.cn@gmail.com>] Sent: Thursday, October 27, 2011 9:19 PM To: Kehr, Birte Subject: Re: seqan-dev Digest, Vol 25, Issue 6 Hi Birte, Thank you for your prompt response but I didn't receive your reply from seq= an development mail list. I did see the example for seed extension in the SeqAn-Tutorial. Now, I hav= e other questions for you. How can I get the actual aligned bases number between two extended seeds a= fter running extendSeeds? for example, sequences: ACGTAGTTT and ACGTGGTTT , there is one seed GTTT,= after the extension of left , I got the extension seeds: ACGTAGTTT a= nd ACGTGGTTT ( there is only one mismatch) , the actual aligned bases numbe= r is 8. I want to get this number but I don;t know how to get it only running exte= ndSeeds. thank you, Beifang. On Tue, Oct 25, 2011 at 3:00 AM, >>>> wrote: Send seqan-dev mailing list submissions to seqan-dev@lists.fu-berlin.de>>> To subscribe or unsubscribe via the World Wide Web, visit https://lists.fu-berlin.de/listinfo/seqan-dev or, via email, send a message with subject or body 'help' to seqan-dev-request@lists.fu-berlin.de>>> You can reach the person managing the list at seqan-dev-owner@lists.fu-berlin.de>>> When replying, please edit your Subject line so it is more specific than "Re: Contents of seqan-dev digest..." Today's Topics: 1. about extendSeed of Seqan (Beifang Niu) 2. Re: about extendSeed of Seqan (Kehr, Birte) ---------------------------------------------------------------------- Message: 1 Date: Mon, 24 Oct 2011 14:50:33 -0700 From: Beifang Niu >>>> Subject: [Seqan-dev] about extendSeed of Seqan To: seqan-dev@lists.fu-berlin.de>>> Message-ID: <= mailto:CABnPkb9P5nAvhQD_4in6mo%2BVtim6uhjWEFgzGMhxfb5XWuLQ2g@mail.gmail.com= >>>> Content-Type: text/plain; charset=3D"iso-8859-1" Hi, I am trying to use Seqan library to do the MEM (max exact match) extension. Firstly, I get the MEM of the two genome sequences using MUMMER3 and then I want to use extendSeed of Seqan to do extension of MEMs. Is there any examples for extendSeed function of seeds class? thanks, Beifang. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 25 Oct 2011 00:59:24 +0200 From: "Kehr, Birte" >>>> Subject: Re: [Seqan-dev] about extendSeed of Seqan To: SeqAn Development >>>> Message-ID: >>>> Content-Type: text/plain; charset=3D"us-ascii" Hi Beifang, you can find an example for seed extension in the SeqAn-Tutorial at http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtensi= onAndBandedAlignment. You might also want to consider to use the seeds2 module instead of the see= ds module since we plan to replace the seeds module by the seeds2 module. U= nfortunately, there is no example on how to use the seeds2 module, yet. -Birte From: Beifang Niu [mailto:neilniu.cn@gmail.com= >>>] Sent: Montag, 24. Oktober 2011 14:51 To: seqan-dev@lists.fu-berlin.de>>> Subject: [Seqan-dev] about extendSeed of Seqan Hi, I am trying to use Seqan library to do the MEM (max exact match) extension. Firstly, I get the MEM of the two genome sequences using MUMMER3 and then I= want to use extendSeed of Seqan to do extension of MEMs. Is there any examples for extendSeed function of seeds class? thanks, Beifang. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de>>> https://lists.fu-berlin.de/listinfo/seqan-dev End of seqan-dev Digest, Vol 25, Issue 6 **************************************** --_000_DAD226CB6878494EABEFD5215AA102015A9527E3D0exchange6fube_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi Beifang,

 

Apar= t from modifying the function extendSeed, there is another simple way how y= ou can limit the extension to up to 100bp.

You can use infixes of the input sequences th= at you pass over to the function extendSeed instead of seq0 and seq1, e.g.:=

 <= /o:p>

typedef typename I= nfix<DnaString>::Type TInfix;

TInfix infix0 =3D infix(seq0, _max(0, leftPosition(s= eed, 0) – 100), _min(length(seq0), rightPosition(seed, 0) + 101));

TInfix infix1 = =3D infix(seq1, _max(0, leftPosition(seed, 1) – 100), _min(length(seq= 1), rightPosition(seed, 1) + 101));

extendSeed(…, infix0, infix1, 2, …);

 

In order to get the n= umber of matching positions of an alignment you have to iterate over the al= ignment and test at every position if there is a gap in any of the sequence= s, and if not if the characters are equal. Have a look at the Alignment Tut= orial for an introduction to the Align data structure.

http://trac.mi.fu-berlin.de/seqan/wik= i/Tutorial/Alignments

 

-Birte

 

 

From: Beifang Niu [mail= to:neilniu.cn@gmail.com]
Sent: Donnerstag, 27.
Oktob= er 2011 17:17
To: Kehr, Birte
Subject: Re: [Seqan-dev] = about extendSeed of Seqan

<= o:p> 

Hi Birte,

 

I have= other questions for you.

One= is about the scope of seed. I just want to do seed extension around seed b= ecause there are so many seeds (maximal exact matchs) between two genomes.<= o:p>

There is just one seed extens= ion example for two sequences in the tutorial. Can I set the scope of seed = extension? for example, I just want to do seed extension within 100bps arou= nd the seed in two directions, not the extension on the whole sequence.

I checked the code of gapped ext= ension and found the prefix() and suffix() function. I don't know if it is = feasible to modify these two functions to get the part prefix and suffix of= the seed. 

It will be s= imple to ungapped extension and I can directly give a threshold to limit th= e seed extension within 100bps.

 

 

another question is : 

 

How do i get the match numbers from globalAlignment results of= the seed?

 <= /p>

thank you,

Beifang.

 

On Thu, Oct 27, 2011= at 3:41 PM, Kehr, Birte <Bir= te.Kehr@fu-berlin.de> wrote:

Yes,= in the case of gapped X-drop extension you have to do globalAlignment. The= function extendSeed does not do the traceback and does not determine the n= umber of matching positions.
But it does compute maximal and minimal dia= gonals, such that you can band the global alignment (see the Tutorial examp= le).

What kind of score are you using? Would the score of the extens= ions help you?


-Birte

__= ______________________________________
From: Beifang Niu [neilniu.cn@gmail.com]

Sent: Thursday, October 27, 2011 11:15 PM
To: Kehr, Bi= rte
Subject: Re: [Seqan-dev] about extendSeed of Seqan


Hi Birte,

=
Unfortunately, I have to use gapped X-drop extension.
Do I have to d= o globalAlignment to get the matched number?  I just need the matched = number after gapped seed extension and there will be increase in computatio= n time if I have to do globalAlignment for getting matched number.
Any i= deas?

thanks,
Beifang.

On Thu, Oct 27, 2011 at 2:08 PM, Kehr, Birte <Birte.Kehr@fu-berlin.de<mailto:Birte.Kehr@fu-berlin.de>> wrote:
Hi= Beifang,

the function extendSeed does not return the number of matc= hed sequence positions.

I assume you have used ungapped X-drop exten= sion? Then you can count matching positions by simply iterating over the in= fixes:

typedef typename Infix<TSeq>::Type TInfix;
TInfix in= fix1 =3D infix(seq1, leftPosition(seed, 0), rightPosition(seed, 0)+1);
T= Infix infix2 =3D infix(seq2, leftPosition(seed, 1), rightPosition(seed, 1)+= 1);

unsigned count =3D 0;
for(int i =3D 0; i < length(seed); += +i)
{
  if (value(infix1, i) =3D=3D value(infix2, i))
  =     ++count;
}

-Birte

__________________________= ______________

From: Beifang Niu [= neilniu.cn@gmail.com<mailto:= neilniu.cn@gmail.com>]<= /o:p>

Sent: Thu= rsday, October 27, 2011 9:19 PM
To: Kehr, Birte
Subject: Re: seqan-de= v Digest, Vol 25, Issue 6

Hi Birte,

Thank you for your prompt= response but I didn't receive your reply from seqan development mail list.=
 I did see the example for seed extension in the SeqAn-Tutorial. N= ow, I have other questions for you.
 How can I get the actual align= ed bases number between two extended seeds after running extendSeeds?
&n= bsp;for example, sequences: ACGTAGTTT  and ACGTGGTTT , there is one se= ed GTTT, after the extension of left , I got the extension seeds:   &n= bsp; ACGTAGTTT   and ACGTGGTTT ( there is only one mismatch) , the act= ual aligned bases number is  8.
 I want to get this number but= I don;t know how to get it only running extendSeeds.



thank = you,
Beifang.

     seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de>&l= t;mailto:seqan-dev@lists.fu= -berlin.de<mailto:se= qan-dev@lists.fu-berlin.de>>


To subscribe or unsubscribe via the World Wide Web, visit
 =    https://lists.fu-berlin.de/listinfo/seqan-dev
or, via= email, send a message with subject or body 'help' to

<= p class=3DMsoNormal>     seqan-dev-request@lists.fu-berlin.de<mailto:seqan-dev-request@list= s.fu-berlin.de><mailto:seqan-dev-request@lists.fu-berlin.de<mailto:seqan-dev-request@lists.fu-be= rlin.de>>


You can rea= ch the person managing the list at

     = seqan-dev-owner@lists.fu-berlin.de<mailto:seqan-dev-owner@lists.fu-berlin.de><m= ailto:seqan-dev-owner= @lists.fu-berlin.de<mailto:seqan-dev-owner@lists.fu-berlin.de>>

<= div>


When replying, please edit your Subject line s= o it is more specific
than "Re: Contents of seqan-dev digest...&quo= t;


Today's Topics:

 1. about extendSeed of Seqan (Be= ifang Niu)
 2. Re: about extendSeed of Seqan (Kehr, Birte)

<= br>----------------------------------------------------------------------
Message: 1
Date: Mon, 24 Oct 2011 14:50:33 -0700

From: Beifang Niu <neilniu.cn@gmail.com<mailto:neilniu.cn@gmail.com><mailto:neilniu.cn@gmail.com<mailto:neilniu.cn@gmail.com>>>

Subject: [Seqan-dev] about extendSeed of Seqan<= /p>

To: seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de><mailto:seqan-dev@lists.fu-berlin.de= <mailto:seqan-dev@lists.= fu-berlin.de>>
Message-ID:
     <CABnPkb9P5nAvhQD_4in6mo+Vtim6uhjWEFgzGMhxfb5XWuLQ2g@mail.gmail.com<mailto:CABnPkb9P5nAvhQD_4in6mo%2BVtim6uhjWEFgzGMhxfb5XW= uLQ2g@mail.gmail.com><mailto:CABnPkb9P5nAvhQD_4in6= mo%2BVtim6uhjWEFgzGMhxfb5XWuLQ2g@mail.gmail.com<mailto:CABnPkb9P5nAvhQD_4in6mo%252BVtim6uhjWEFgzGMhxfb5XWuLQ2g@mail.gmail.com<= /a>>>>

From: &= quot;Kehr, Birte" <Birte= .Kehr@fu-berlin.de<mailto:Birte.Kehr@fu-berlin.de><mailto:Birte.Kehr@fu-berlin.de<mailto:Birte.Kehr@fu-berlin.de>>>

Subject: Re: [Seqan-dev] about extendSeed of Seqan

To: SeqAn Development <seqan-dev@lists.fu-berlin.de<m= ailto:seqan-dev@lists.fu-be= rlin.de><mailto:s= eqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de>>>
Message-ID:<= br>     <DAD226CB6878494EABEFD5215AA102015A8DE1= CD68@exchange6.fu-berlin.de<mailto:DAD226CB6878494EABEFD52= 15AA102015A8DE1CD68@exchange6.fu-berlin.de><mailto:DAD2= 26CB6878494EABEFD5215AA102015A8DE1CD68@exchange6.fu-berlin.de<mailto= :DAD226CB6878494EABEFD5215AA102015A8DE1CD68@exchange6.fu-berlin.d= e>>>

Content-Type: text/plain; charset=3D"us-ascii"
<= br>Hi Beifang,

you can find an example for seed extension in the Seq= An-Tutorial at
http:/= /trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtensionAndB= andedAlignment.

You might also want to consider to use the seeds= 2 module instead of the seeds module since we plan to replace the seeds mod= ule by the seeds2 module. Unfortunately, there is no example on how to use = the seeds2 module, yet.

-Birte

From: Beifang Niu [mailto:neilniu.cn@gmail.com<mailto:neilniu.cn@gmail.com><mailto:neilniu.cn@gmail.com<mailto:neilniu.cn@gmail.com>>]

Sent: Montag, 24. Oktober 2011 14:51

To: seqan-dev@lis= ts.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de><mailto:seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de&g= t;>

Subject: [Seqan-dev] about e= xtendSeed of Seqan

Hi,

I am trying to use Seqan library to do= the MEM (max exact match) extension.
Firstly, I get the MEM of the two = genome sequences using MUMMER3 and then I want to use extendSeed of Seqan t= o do extension of MEMs.
Is there any examples for extendSeed function of= seeds class?

thanks,
Beifang.
-------------- next part ------= --------
An HTML attachment was scrubbed...
URL: <https://lists.fu-berlin.de/pipermail/seqan-d= ev/attachments/20111025/a7b787b0/attachment.htm>

------------= ------------------

_______________________________________________seqan-dev mailing list

seqan-dev@lists.fu-berlin.de&l= t;mailto:seqan-dev@lists.fu= -berlin.de><mailto:seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de>>

https://lists.= fu-berlin.de/listinfo/seqan-dev


End of seqan-dev Digest, Vol= 25, Issue 6
****************************************

=

 

= --_000_DAD226CB6878494EABEFD5215AA102015A9527E3D0exchange6fube_-- From Birte.Kehr@fu-berlin.de Fri Oct 28 21:32:43 2011 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RJsAS-0007sv-Ah>; Fri, 28 Oct 2011 21:32:40 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1RJsAR-0001Dt-Ub>; Fri, 28 Oct 2011 21:32:40 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1RJsAR-0006pk-CM>; Fri, 28 Oct 2011 21:32:39 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Fri, 28 Oct 2011 21:32:39 +0200 From: "Kehr, Birte" To: Beifang Niu , SeqAn Development Date: Fri, 28 Oct 2011 21:32:36 +0200 Thread-Topic: [Seqan-dev] about extendSeed of Seqan Thread-Index: AcyU/26eCqFnC06LSLqAf8rES9QevAAowLuA Message-ID: References: In-Reply-To: Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: multipart/alternative; boundary="_000_DAD226CB6878494EABEFD5215AA102015A9527E3D1exchange6fube_" MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-ZEDAT-Hint: A X-purgate: suspect X-purgate-type: suspect X-purgate-ID: 151147::1319830360-000041E7-5D190AB5/3450669284-0/0-1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Burundi.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=ALL_TRUSTED, FU_XPURGATE_SUSP, HTML_50_60,HTML_MESSAGE Subject: Re: [Seqan-dev] about extendSeed of Seqan X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Oct 2011 19:32:43 -0000 --_000_DAD226CB6878494EABEFD5215AA102015A9527E3D1exchange6fube_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Beifang, right now, we do not plan to implement traceback for extendSeed, though we = might in the future. If you do not fear to modify code, the most efficient way to get the *numbe= r* of matching positions is without traceback at all, i.e. also without glo= bal alignment: While extending the seed, parts of a DP matrix is filled with scores. In ad= dition to that score, you could store the number of matching positions for = each matrix entry. In the code this would mean to replace occurrences of TS= coreValue by Pair (for the antidiagonals and tmp var= iables), and to increase the unsigned value if the max for one matrix entry= is a match. The current seeds-module uses end positions that are not consistent with th= e rest of SeqAn. For the rest of SeqAn, the position behind the last positi= on is the end position, i.e. end - start =3D length. Also, diagonals are co= unted from left/bottom to right/top and not the other way around. Unfortuna= tely, the seeds-module does it differently. That is why you have to take th= e negative of the diagonals. This is also one of the reasons why the seeds2-module is already there (fix= ing these inconsistencies). As soon as all functionality is implemented and= tested in the seeds2-module we will replace the seeds-module by it. The im= plementation of seed extension in the seeds2-module is already stable. -Birte From: Beifang Niu [mailto:neilniu.cn@gmail.com] Sent: Donnerstag, 27. Oktober 2011 16:23 To: Kehr, Birte Subject: Re: [Seqan-dev] about extendSeed of Seqan Hi Birte, Do you have plan to add traceback implement in extendSeed function to get m= atch numbers or seed identity? maybe, it will consume less CPU time than th= e process extendSeed+globalAlignment. I am confusing that why the minus dignonals "-leftDiagonal(seed) - 2, -righ= tDiagonal(seed) + 2" were used in tutorial example. Can I use the same sty= le in my real seed extension? I used the simple score "Score scoreMatrix(1, -2, -1, -6);"= and I think this kind of score is not helpful to get match numbers base= d on the return score from extendSeed. thanks, Beifang. On Thu, Oct 27, 2011 at 3:41 PM, Kehr, Birte > wrote: Yes, in the case of gapped X-drop extension you have to do globalAlignment.= The function extendSeed does not do the traceback and does not determine t= he number of matching positions. But it does compute maximal and minimal diagonals, such that you can band t= he global alignment (see the Tutorial example). What kind of score are you using? Would the score of the extensions help yo= u? -Birte ________________________________________ From: Beifang Niu [neilniu.cn@gmail.com] Sent: Thursday, October 27, 2011 11:15 PM To: Kehr, Birte Subject: Re: [Seqan-dev] about extendSeed of Seqan Hi Birte, Unfortunately, I have to use gapped X-drop extension. Do I have to do globalAlignment to get the matched number? I just need the= matched number after gapped seed extension and there will be increase in c= omputation time if I have to do globalAlignment for getting matched number. Any ideas? thanks, Beifang. On Thu, Oct 27, 2011 at 2:08 PM, Kehr, Birte >> wrote: Hi Beifang, the function extendSeed does not return the number of matched sequence posi= tions. I assume you have used ungapped X-drop extension? Then you can count matchi= ng positions by simply iterating over the infixes: typedef typename Infix::Type TInfix; TInfix infix1 =3D infix(seq1, leftPosition(seed, 0), rightPosition(seed, 0)= +1); TInfix infix2 =3D infix(seq2, leftPosition(seed, 1), rightPosition(seed, 1)= +1); unsigned count =3D 0; for(int i =3D 0; i < length(seed); ++i) { if (value(infix1, i) =3D=3D value(infix2, i)) ++count; } -Birte ________________________________________ From: Beifang Niu [neilniu.cn@gmail.com>] Sent: Thursday, October 27, 2011 9:19 PM To: Kehr, Birte Subject: Re: seqan-dev Digest, Vol 25, Issue 6 Hi Birte, Thank you for your prompt response but I didn't receive your reply from seq= an development mail list. I did see the example for seed extension in the SeqAn-Tutorial. Now, I hav= e other questions for you. How can I get the actual aligned bases number between two extended seeds a= fter running extendSeeds? for example, sequences: ACGTAGTTT and ACGTGGTTT , there is one seed GTTT,= after the extension of left , I got the extension seeds: ACGTAGTTT a= nd ACGTGGTTT ( there is only one mismatch) , the actual aligned bases numbe= r is 8. I want to get this number but I don;t know how to get it only running exte= ndSeeds. thank you, Beifang. On Tue, Oct 25, 2011 at 3:00 AM, >>>> wrote: Send seqan-dev mailing list submissions to seqan-dev@lists.fu-berlin.de>>> To subscribe or unsubscribe via the World Wide Web, visit https://lists.fu-berlin.de/listinfo/seqan-dev or, via email, send a message with subject or body 'help' to seqan-dev-request@lists.fu-berlin.de>>> You can reach the person managing the list at seqan-dev-owner@lists.fu-berlin.de>>> When replying, please edit your Subject line so it is more specific than "Re: Contents of seqan-dev digest..." Today's Topics: 1. about extendSeed of Seqan (Beifang Niu) 2. Re: about extendSeed of Seqan (Kehr, Birte) ---------------------------------------------------------------------- Message: 1 Date: Mon, 24 Oct 2011 14:50:33 -0700 From: Beifang Niu >>>> Subject: [Seqan-dev] about extendSeed of Seqan To: seqan-dev@lists.fu-berlin.de>>> Message-ID: <= mailto:CABnPkb9P5nAvhQD_4in6mo%2BVtim6uhjWEFgzGMhxfb5XWuLQ2g@mail.gmail.com= >>>> Content-Type: text/plain; charset=3D"iso-8859-1" Hi, I am trying to use Seqan library to do the MEM (max exact match) extension. Firstly, I get the MEM of the two genome sequences using MUMMER3 and then I want to use extendSeed of Seqan to do extension of MEMs. Is there any examples for extendSeed function of seeds class? thanks, Beifang. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 25 Oct 2011 00:59:24 +0200 From: "Kehr, Birte" >>>> Subject: Re: [Seqan-dev] about extendSeed of Seqan To: SeqAn Development >>>> Message-ID: >>>> Content-Type: text/plain; charset=3D"us-ascii" Hi Beifang, you can find an example for seed extension in the SeqAn-Tutorial at http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtensi= onAndBandedAlignment. You might also want to consider to use the seeds2 module instead of the see= ds module since we plan to replace the seeds module by the seeds2 module. U= nfortunately, there is no example on how to use the seeds2 module, yet. -Birte From: Beifang Niu [mailto:neilniu.cn@gmail.com= >>>] Sent: Montag, 24. Oktober 2011 14:51 To: seqan-dev@lists.fu-berlin.de>>> Subject: [Seqan-dev] about extendSeed of Seqan Hi, I am trying to use Seqan library to do the MEM (max exact match) extension. Firstly, I get the MEM of the two genome sequences using MUMMER3 and then I= want to use extendSeed of Seqan to do extension of MEMs. Is there any examples for extendSeed function of seeds class? thanks, Beifang. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de>>> https://lists.fu-berlin.de/listinfo/seqan-dev End of seqan-dev Digest, Vol 25, Issue 6 **************************************** --_000_DAD226CB6878494EABEFD5215AA102015A9527E3D1exchange6fube_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi Beifang,

 

righ= t now, we do not plan to implement traceback for extendSeed, though we migh= t in the future.

 

If= you do not fear to modify code, the most efficient way to get the *numb= er* of matching positions is without traceback at all, i.e. also withou= t global alignment:

While extending the seed, parts of a DP matrix is filled with scores= . In addition to that score, you could store the number of matching positio= ns for each matrix entry. In the code this would mean to replace occurrence= s of TScoreValue by Pair<TScoreValue, unsigned> (for the antidiagonal= s and tmp variables), and to increase the unsigned value if the max for one= matrix entry is a match.

 

 

The current seeds-module uses end positions that are not consistent with t= he rest of SeqAn. For the rest of SeqAn, the position behind the last posit= ion is the end position, i.e. end - start =3D length. Also, diagonals are c= ounted from left/bottom to right/top and not the other way around. Unfortun= ately, the seeds-module does it differently. That is why you have to take t= he negative of the diagonals.

 

This is also one of the reasons why the seeds2-module is already= there (fixing these inconsistencies). As soon as all functionality is impl= emented and tested in the seeds2-module we will replace the seeds-module by= it. The implementation of seed extension in the seeds2-module is already s= table.

 

-Birte

 

 

From: Beifang N= iu [mailto:neilniu.cn@gmail.com]
Sent: Donnerstag, 27. Oktober 2= 011 16:23
To: Kehr, Birte
Subject: Re: [Seqan-dev] abou= t extendSeed of Seqan

=  

Hi Birte,

 

Do you ha= ve plan to add traceback implement in extendSeed function to get match= numbers or seed identity? maybe, it will consume less CPU time than the pr= ocess extendSeed+globalAlignment.

 

I am confusing th= at why the minus dignonals "-leftDia= gonal(seed) - 2, -rightDiagonal(seed) + 2"  were us= ed in tutorial example. Can I use the same style in my real seed extension?=

 

<= div>

I used the simple score "Score<TScore, Simple> scoreMatrix(1, -2, -1, -6);" and I think    this kind of score is not helpful t= o get match numbers based on the return score from extendSeed.

 

thanks,

Beifang.=

 

<= div>

 

On Thu, Oct 27, 2011 at 3:41 PM, Kehr, Birte <Birte.Kehr@fu-berlin.de> wrote:<= /p>

Yes, in the case of gapped X-drop extension you hav= e to do globalAlignment. The function extendSeed does not do the traceback = and does not determine the number of matching positions.
But it does com= pute maximal and minimal diagonals, such that you can band the global align= ment (see the Tutorial example).

What kind of score are you using? W= ould the score of the extensions help you?


-Birte

________________________________________
From:= Beifang Niu [neilniu.cn@gmail.com<= /a>]

Sent: Thursday, October 27, 2= 011 11:15 PM
To: Kehr, Birte
Subject: Re: [Seqan-dev] about extendSee= d of Seqan

On Thu, Oct 27, 2011 at 2:08 PM, Kehr, Birt= e <Birte.Kehr@fu-berlin.de<mailto:Birte.Kehr@fu-berlin= .de>> wrote:
Hi Beifang,

the function extendSeed does n= ot return the number of matched sequence positions.

I assume you hav= e used ungapped X-drop extension? Then you can count matching positions by = simply iterating over the infixes:

typedef typename Infix<TSeq>= ;::Type TInfix;
TInfix infix1 =3D infix(seq1, leftPosition(seed, 0), rig= htPosition(seed, 0)+1);
TInfix infix2 =3D infix(seq2, leftPosition(seed,= 1), rightPosition(seed, 1)+1);

unsigned count =3D 0;
for(int i = =3D 0; i < length(seed); ++i)
{
  if (value(infix1, i) =3D=3D= value(infix2, i))
      ++count;
}

-Birte
<= br>________________________________________

From: Beifang Niu [neilni= u.cn@gmail.com<mailto:neilni= u.cn@gmail.com>]

Sent: Thursday, October 27, 2011 9:19 PM
To: Kehr, B= irte
Subject: Re: seqan-dev Digest, Vol 25, Issue 6

Hi Birte,
=
Thank you for your prompt response but I didn't receive your reply from= seqan development mail list.
 I did see the example for seed exten= sion in the SeqAn-Tutorial. Now, I have other questions for you.
 H= ow can I get the actual aligned bases number between two extended seeds aft= er running extendSeeds?
 for example, sequences: ACGTAGTTT  an= d ACGTGGTTT , there is one seed GTTT, after the extension of left , I got t= he extension seeds:     ACGTAGTTT   and ACGTGGTTT ( there is= only one mismatch) , the actual aligned bases number is  8.
 = I want to get this number but I don;t know how to get it only running exten= dSeeds.



thank you,
Beifang.

On Tue, Oct 25, 2011 at 3:00 AM, <seqan-dev-request@lists.fu-berlin.de= <mailto:seqan-de= v-request@lists.fu-berlin.de><mailto:seqan-dev-request@lists.fu-berlin.de<mai= lto:seqan-dev-reque= st@lists.fu-berlin.de>>> wrote:
Send seqan-dev mailing list= submissions to

    &nbs= p;seqan-dev@lists.fu-berlin= .de<mailto:seqan-dev= @lists.fu-berlin.de><mailto:seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de>>


To subscribe or unsubscribe via the Wo= rld Wide Web, visit
     https://lists.fu-berlin.de/list= info/seqan-dev
or, via email, send a message with subject or body 'h= elp' to

     seqan-dev-request@lists.f= u-berlin.de<mailto:seqan-dev-request@lists.fu-berlin.de><mailto:seqan-dev-request@lists.fu-berli= n.de<mailto:= seqan-dev-request@lists.fu-berlin.de>>


You can reach the person managing the list at<= /p>

     seqan-dev-owner@lists.fu-berlin.de<mail= to:seqan-dev-owner@li= sts.fu-berlin.de><mailto:seqan-dev-owner@lists.fu-berlin.de<mailto:seqan-dev-owner@lists.fu-berlin.d= e>>


When replying, pl= ease edit your Subject line so it is more specific
than "Re: Conten= ts of seqan-dev digest..."


Today's Topics:

 1. = about extendSeed of Seqan (Beifang Niu)
 2. Re: about extendSeed of= Seqan (Kehr, Birte)


-------------------------------------------= ---------------------------

Message: 1
Date: Mon, 24 Oct 2011 14:= 50:33 -0700

From: Beifang Niu <= neilniu.cn@gmail.com<mailto:= neilniu.cn@gmail.com><mai= lto:neilniu.cn@gmail.com<mai= lto:neilniu.cn@gmail.com>>= ;>

Subject: [Seqan-dev] about ex= tendSeed of Seqan

To: seqan-dev@lists.fu-berlin.de<mai= lto:seqan-dev@lists.fu-berl= in.de><mailto:seq= an-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de>>
Message-ID:
&nb= sp;    <CABnPkb9P5nAvhQD_4in6mo+Vtim6uhjWEFgzGMhx= fb5XWuLQ2g@mail.gmail.com<mailto:CABnPkb9P5nAvhQD_4in= 6mo%2BVtim6uhjWEFgzGMhxfb5XWuLQ2g@mail.gmail.com><mailto:CABnPkb9P5nAvhQD_4in6mo%2BVtim6uhjWEFgzGMhxfb5XWuLQ2g@mail.gmail.co= m<mailto:CABnPkb9P5nAvhQD_4in6mo%252BVtim6uhjWEFgzG= Mhxfb5XWuLQ2g@mail.gmail.com>>>

Content-Type: text/plain; charset=3D"iso-8859-1"
Hi,

I am trying to use Seqan library to do the MEM (max exact match= ) extension.
Firstly, I get the MEM of the two genome sequences using MU= MMER3 and then I
want to use extendSeed of Seqan to do extension of MEMs= .
Is there any examples for extendSeed function of seeds class?

t= hanks,
Beifang.
-------------- next part --------------
An HTML at= tachment was scrubbed...
URL: <https://lists.fu-berlin.de/pipermail/seqan-dev/attachments/2011102= 4/bca37918/attachment.htm>

------------------------------
=
Message: 2
Date: Tue, 25 Oct 2011 00:59:24 +0200

From: "Kehr, Birte" <Birte.Kehr@fu-berlin.de<mailto:Birte.Kehr@fu-berlin.de><mailto:Birte.Kehr@fu-berlin.de<mail= to:Birte.Kehr@fu-berlin.de&g= t;>>

Subject: Re: [Seqan-dev]= about extendSeed of Seqan

To: Seq= An Development <seqan-de= v@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de><mailto:seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de>>>
Message-ID:
     <
DAD226CB6= 878494EABEFD5215AA102015A8DE1CD68@exchange6.fu-berlin.de<mailto:DAD226CB6878494EABEFD5215AA102015A8DE1CD68@exchange6.fu-berlin.de= ><mailto:DAD226CB6878494EABEFD5215AA102015A8DE1CD68@exchang= e6.fu-berlin.de<mailto:DAD226CB6878494EABEFD5215AA102015A8= DE1CD68@exchange6.fu-berlin.de>>>

Content-Type: text/plain; chars= et=3D"us-ascii"

Hi Beifang,

you can find an example= for seed extension in the SeqAn-Tutorial at
http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/See= d-and-Extend#SeedExtensionAndBandedAlignment.

You might also wan= t to consider to use the seeds2 module instead of the seeds module since we= plan to replace the seeds module by the seeds2 module. Unfortunately, ther= e is no example on how to use the seeds2 module, yet.

-Birte

=

From: Beifang Niu [mailto:neilniu.cn@gmail.com<mailto:neilniu.cn@gmail.com><mailto:neilniu.cn@gmail.com<mailto:neilniu.cn@gmail.com>>]=

Sent: Montag, 24. Oktober 2011 14:51

To: seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de><mai= lto:seqan-dev@lists.fu-berl= in.de<mailto:seqan-d= ev@lists.fu-berlin.de>>

S= ubject: [Seqan-dev] about extendSeed of Seqan

Hi,

I am trying= to use Seqan library to do the MEM (max exact match) extension.
Firstly= , I get the MEM of the two genome sequences using MUMMER3 and then I want t= o use extendSeed of Seqan to do extension of MEMs.
Is there any examples= for extendSeed function of seeds class?

thanks,
Beifang.
----= ---------- next part --------------
An HTML attachment was scrubbed...URL: <https://lists.fu-= berlin.de/pipermail/seqan-dev/attachments/20111025/a7b787b0/attachment.htm<= /a>>

------------------------------

______________________= _________________________
seqan-dev mailing list

seqan-dev= @lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de><mailto:seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de= >>

https://lists.fu-berlin.de/listinfo/seqan-dev


En= d of seqan-dev Digest, Vol 25, Issue 6
*********************************= *******

&= nbsp;

= --_000_DAD226CB6878494EABEFD5215AA102015A9527E3D1exchange6fube_-- From Birte.Kehr@fu-berlin.de Fri Oct 28 21:40:07 2011 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RJsHd-0008Al-3r>; Fri, 28 Oct 2011 21:40:05 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1RJsHc-0001xF-U6>; Fri, 28 Oct 2011 21:40:05 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1RJsHc-0007I8-H0>; Fri, 28 Oct 2011 21:40:04 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Fri, 28 Oct 2011 21:40:04 +0200 From: "Kehr, Birte" To: Beifang Niu , SeqAn Development Date: Fri, 28 Oct 2011 21:40:02 +0200 Thread-Topic: [Seqan-dev] about extendSeed of Seqan Thread-Index: AcyVqGoVOkKIac/gQtC2LtccdhaBdAAAF/Mg Message-ID: References: In-Reply-To: Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: multipart/alternative; boundary="_000_DAD226CB6878494EABEFD5215AA102015A9527E3D2exchange6fube_" MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-ZEDAT-Hint: A X-purgate: suspect X-purgate-type: suspect X-purgate-ID: 151147::1319830805-000041E7-003C1655/3450669284-0/0-1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Dschibuti.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=ALL_TRUSTED, FU_XPURGATE_SUSP, HTML_60_70,HTML_MESSAGE Subject: Re: [Seqan-dev] about extendSeed of Seqan X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Oct 2011 19:40:07 -0000 --_000_DAD226CB6878494EABEFD5215AA102015A9527E3D2exchange6fube_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Beifang, unfortunately, there is no tutorial example of how to limit seed extension = locally. But SeqAn uses templates wherever possible, such that you always can replac= e any sequence type by an infix. A short description of different kinds of = sequence segments can be found in the sequences Tutorial at http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Sequences#Segments -Birte From: Beifang Niu [mailto:neilniu.cn@gmail.com] Sent: Freitag, 28. Oktober 2011 12:33 To: Kehr, Birte Subject: Re: [Seqan-dev] about extendSeed of Seqan Hi Birte, That's a very convenient way to limit the extension. Thanks. Is there any tutorial example for localAlignment of extended seed? Beifang. On Fri, Oct 28, 2011 at 12:21 PM, Kehr, Birte > wrote: Hi Beifang, Apart from modifying the function extendSeed, there is another simple way h= ow you can limit the extension to up to 100bp. You can use infixes of the input sequences that you pass over to the functi= on extendSeed instead of seq0 and seq1, e.g.: typedef typename Infix::Type TInfix; TInfix infix0 =3D infix(seq0, _max(0, leftPosition(seed, 0) - 100), _min(le= ngth(seq0), rightPosition(seed, 0) + 101)); TInfix infix1 =3D infix(seq1, _max(0, leftPosition(seed, 1) - 100), _min(le= ngth(seq1), rightPosition(seed, 1) + 101)); extendSeed(..., infix0, infix1, 2, ...); In order to get the number of matching positions of an alignment you have t= o iterate over the alignment and test at every position if there is a gap i= n any of the sequences, and if not if the characters are equal. Have a look= at the Alignment Tutorial for an introduction to the Align data structure. http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Alignments -Birte From: Beifang Niu [mailto:neilniu.cn@gmail.com= ] Sent: Donnerstag, 27. Oktober 2011 17:17 To: Kehr, Birte Subject: Re: [Seqan-dev] about extendSeed of Seqan Hi Birte, I have other questions for you. One is about the scope of seed. I just want to do seed extension around see= d because there are so many seeds (maximal exact matchs) between two genome= s. There is just one seed extension example for two sequences in the tutorial.= Can I set the scope of seed extension? for example, I just want to do seed= extension within 100bps around the seed in two directions, not the extensi= on on the whole sequence. I checked the code of gapped extension and found the prefix() and suffix() = function. I don't know if it is feasible to modify these two functions to g= et the part prefix and suffix of the seed. It will be simple to ungapped extension and I can directly give a threshold= to limit the seed extension within 100bps. another question is : How do i get the match numbers from globalAlignment results of the seed? thank you, Beifang. On Thu, Oct 27, 2011 at 3:41 PM, Kehr, Birte > wrote: Yes, in the case of gapped X-drop extension you have to do globalAlignment.= The function extendSeed does not do the traceback and does not determine t= he number of matching positions. But it does compute maximal and minimal diagonals, such that you can band t= he global alignment (see the Tutorial example). What kind of score are you using? Would the score of the extensions help yo= u? -Birte ________________________________________ From: Beifang Niu [neilniu.cn@gmail.com] Sent: Thursday, October 27, 2011 11:15 PM To: Kehr, Birte Subject: Re: [Seqan-dev] about extendSeed of Seqan Hi Birte, Unfortunately, I have to use gapped X-drop extension. Do I have to do globalAlignment to get the matched number? I just need the= matched number after gapped seed extension and there will be increase in c= omputation time if I have to do globalAlignment for getting matched number. Any ideas? thanks, Beifang. On Thu, Oct 27, 2011 at 2:08 PM, Kehr, Birte >> wrote: Hi Beifang, the function extendSeed does not return the number of matched sequence posi= tions. I assume you have used ungapped X-drop extension? Then you can count matchi= ng positions by simply iterating over the infixes: typedef typename Infix::Type TInfix; TInfix infix1 =3D infix(seq1, leftPosition(seed, 0), rightPosition(seed, 0)= +1); TInfix infix2 =3D infix(seq2, leftPosition(seed, 1), rightPosition(seed, 1)= +1); unsigned count =3D 0; for(int i =3D 0; i < length(seed); ++i) { if (value(infix1, i) =3D=3D value(infix2, i)) ++count; } -Birte ________________________________________ From: Beifang Niu [neilniu.cn@gmail.com>] Sent: Thursday, October 27, 2011 9:19 PM To: Kehr, Birte Subject: Re: seqan-dev Digest, Vol 25, Issue 6 Hi Birte, Thank you for your prompt response but I didn't receive your reply from seq= an development mail list. I did see the example for seed extension in the SeqAn-Tutorial. Now, I hav= e other questions for you. How can I get the actual aligned bases number between two extended seeds a= fter running extendSeeds? for example, sequences: ACGTAGTTT and ACGTGGTTT , there is one seed GTTT,= after the extension of left , I got the extension seeds: ACGTAGTTT a= nd ACGTGGTTT ( there is only one mismatch) , the actual aligned bases numbe= r is 8. I want to get this number but I don;t know how to get it only running exte= ndSeeds. thank you, Beifang. On Tue, Oct 25, 2011 at 3:00 AM, >>>> wrote: Send seqan-dev mailing list submissions to seqan-dev@lists.fu-berlin.de>>> To subscribe or unsubscribe via the World Wide Web, visit https://lists.fu-berlin.de/listinfo/seqan-dev or, via email, send a message with subject or body 'help' to seqan-dev-request@lists.fu-berlin.de>>> You can reach the person managing the list at seqan-dev-owner@lists.fu-berlin.de>>> When replying, please edit your Subject line so it is more specific than "Re: Contents of seqan-dev digest..." Today's Topics: 1. about extendSeed of Seqan (Beifang Niu) 2. Re: about extendSeed of Seqan (Kehr, Birte) ---------------------------------------------------------------------- Message: 1 Date: Mon, 24 Oct 2011 14:50:33 -0700 From: Beifang Niu >>>> Subject: [Seqan-dev] about extendSeed of Seqan To: seqan-dev@lists.fu-berlin.de>>> Message-ID: <= mailto:CABnPkb9P5nAvhQD_4in6mo%2BVtim6uhjWEFgzGMhxfb5XWuLQ2g@mail.gmail.com= >>>> Content-Type: text/plain; charset=3D"iso-8859-1" Hi, I am trying to use Seqan library to do the MEM (max exact match) extension. Firstly, I get the MEM of the two genome sequences using MUMMER3 and then I want to use extendSeed of Seqan to do extension of MEMs. Is there any examples for extendSeed function of seeds class? thanks, Beifang. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 25 Oct 2011 00:59:24 +0200 From: "Kehr, Birte" >>>> Subject: Re: [Seqan-dev] about extendSeed of Seqan To: SeqAn Development >>>> Message-ID: >>>> Content-Type: text/plain; charset=3D"us-ascii" Hi Beifang, you can find an example for seed extension in the SeqAn-Tutorial at http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtensi= onAndBandedAlignment. You might also want to consider to use the seeds2 module instead of the see= ds module since we plan to replace the seeds module by the seeds2 module. U= nfortunately, there is no example on how to use the seeds2 module, yet. -Birte From: Beifang Niu [mailto:neilniu.cn@gmail.com= >>>] Sent: Montag, 24. Oktober 2011 14:51 To: seqan-dev@lists.fu-berlin.de>>> Subject: [Seqan-dev] about extendSeed of Seqan Hi, I am trying to use Seqan library to do the MEM (max exact match) extension. Firstly, I get the MEM of the two genome sequences using MUMMER3 and then I= want to use extendSeed of Seqan to do extension of MEMs. Is there any examples for extendSeed function of seeds class? thanks, Beifang. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de>>> https://lists.fu-berlin.de/listinfo/seqan-dev End of seqan-dev Digest, Vol 25, Issue 6 **************************************** --_000_DAD226CB6878494EABEFD5215AA102015A9527E3D2exchange6fube_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi Beifang,=

 =

unfortunately, there is no tut= orial example of how to limit seed extension locally.

=

 

But SeqAn uses templates wherever possib= le, such that you always can replace any sequence type by an infix. A short= description of different kinds of sequence segments can be found in the se= quences Tutorial at

http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Sequences#Segmen= ts

 

-Birte

 

From: Bei= fang Niu [mailto:neilniu.cn@gmail.com]
Sent: Freitag, 28. Oktobe= r 2011 12:33
To: Kehr, Birte
Subject: Re: [Seqan-dev] a= bout extendSeed of Seqan

 

Hi Birte,

 

That's = a very convenient way to limit the extension. Thanks.

<= div>

Is there any tutorial example for localAlignment o= f extended seed?

 <= /o:p>

Beifang.

=

 

On Fri, Oct 28, 2011 at 12:21 PM, Kehr, Birte <Birte.Kehr@fu-berlin.de> wrote:

=

Hi Beifang,

 

Apart from modifying the fun= ction extendSeed, there is another simple way how you can limit the extensi= on to up to 100bp.

You can use infixes of the input sequences = that you pass over to the function extendSeed instead of seq0 and seq1, e.g= .:

 

typedef typename Infix<DnaString>= ::Type TInfix;

TInfix infix0 =3D infix(seq0, _max(0, leftPosit= ion(seed, 0) – 100), _min(length(seq0), rightPosition(seed, 0) + 101)= );

TInfix infix1 =3D infix(seq1, _max(0, leftPosition(seed, 1= ) – 100), _min(length(seq1), rightPosition(seed, 1) + 101)); <= o:p>

extendSeed(…, infix0, infix1, 2, …);

 <= /span>

In order to get the number of matching positions of an alignme= nt you have to iterate over the alignment and test at every position if the= re is a gap in any of the sequences, and if not if the characters are equal= . Have a look at the Alignment Tutorial for an introduction to the Align da= ta structure.

http://trac.mi.fu-berlin.de/seqan/= wiki/Tutorial/Alignments

 

-Birte=

 

 

<= span style=3D'font-size:10.0pt'>From: Beifang Niu [mailto:neilniu.cn@gmail.com]
Sent: Donnerstag, 27. Oktober 2011 17:17<= o:p>


To: Kehr, Birte
= Subject: Re: [Seqan-dev] about extendSeed of Seqan

 

Hi Birte,<= /o:p>

 

I have other qu= estions for you.

One is about the scope of = seed. I just want to do seed extension around seed because there are so man= y seeds (maximal exact matchs) between two genomes.

There is just one seed extension example for two sequences in the = tutorial. Can I set the scope of seed extension? for example, I just want t= o do seed extension within 100bps around the seed in two directions, not th= e extension on the whole sequence.

I checke= d the code of gapped extension and found the prefix() and suffix() function= . I don't know if it is feasible to modify these two functions to get the p= art prefix and suffix of the seed. 

I= t will be simple to ungapped extension and I can directly give a threshold = to limit the seed extension within 100bps.

=  

 

another question is : 

 <= /o:p>

How do i get the match numbers from globalAlignm= ent results of the seed?

 <= /p>

thank you,

Beifang.

 

O= n Thu, Oct 27, 2011 at 3:41 PM, Kehr, Birte <Birte.Kehr@fu-berlin.de> wrote:

Yes, in the case of gapped X-drop extension you have to = do globalAlignment. The function extendSeed does not do the traceback and d= oes not determine the number of matching positions.
But it does compute = maximal and minimal diagonals, such that you can band the global alignment = (see the Tutorial example).

What kind of score are you using? Would = the score of the extensions help you?


-Birte<= br>
________________________________________
From: Beifang Niu [neilniu.cn@gmail.com]

Sent: Thursday, October 27, 2011 11:15 PMTo: Kehr, Birte
Subject: Re: [Seqan-dev] about extendSeed of Seqan=

On Thu, Oct 27, 2011 at 2:08 PM, Kehr, Birte <= ;Birte.Kehr@fu= -berlin.de<mailto:Birte.Kehr@fu-berlin.de>> wrote:
Hi Beifang,
the function extendSeed does not return the number of matched sequence po= sitions.

I assume you have used ungapped X-drop extension? Then you = can count matching positions by simply iterating over the infixes:

t= ypedef typename Infix<TSeq>::Type TInfix;
TInfix infix1 =3D infix(= seq1, leftPosition(seed, 0), rightPosition(seed, 0)+1);
TInfix infix2 = =3D infix(seq2, leftPosition(seed, 1), rightPosition(seed, 1)+1);

un= signed count =3D 0;
for(int i =3D 0; i < length(seed); ++i)
{
&= nbsp; if (value(infix1, i) =3D=3D value(infix2, i))
     = ++count;
}

-Birte

_______________________________________= _

From: Beifang Niu [neilniu.cn@gmail.com<mailto:neilniu.cn@gmail.com= >]

Sent: Thursday, October 27, 2011 9:19 PM
To: K= ehr, Birte
Subject: Re: seqan-dev Digest, Vol 25, Issue 6

Hi Birt= e,

Thank you for your prompt response but I didn't receive your repl= y from seqan development mail list.
 I did see the example for seed= extension in the SeqAn-Tutorial. Now, I have other questions for you.
&= nbsp;How can I get the actual aligned bases number between two extended see= ds after running extendSeeds?
 for example, sequences: ACGTAGTTT &n= bsp;and ACGTGGTTT , there is one seed GTTT, after the extension of left , I= got the extension seeds:     ACGTAGTTT   and ACGTGGTTT ( th= ere is only one mismatch) , the actual aligned bases number is  8.
=  I want to get this number but I don;t know how to get it only running= extendSeeds.



thank you,
Beifang.

On Tue, Oct 25, 2011 at 3:00 AM, <seqan-dev-request@lists.fu-ber= lin.de<mailto:seqan-dev-request@lists.fu-berlin.de><mailto:= s= eqan-dev-request@lists.fu-berlin.de<mailto:seqan-dev-request@lists.fu= -berlin.de>>> wrote:
Send seqan-dev mailing list submission= s to

     seqan-dev@lists.fu-berlin.de= <mailto:seqan-dev@lists.fu-berlin.de><mailto:seqan-dev@lists.fu-berlin.de<= ;mailto:s= eqan-dev@lists.fu-berlin.de>>


To su= bscribe or unsubscribe via the World Wide Web, visit
     = ;https://lists.fu-berlin.de/listinfo/seqan-dev
or, via email, send = a message with subject or body 'help' to

  =    seqan-dev-request@lists.fu-berlin.de<mailto:seqan-dev-req= uest@lists.fu-berlin.de><mailto:seqan-dev-request@lists.fu-berlin.= de<mailto:seqan-dev-request@lists.fu-berlin.de>>=


You can reach the person managing the list at

     seqan-dev-owner@lists.fu-berlin.de&l= t;mailto:seqan-dev-owner@lists.fu-berlin.de><mailto:seqan-dev-owner@lis= ts.fu-berlin.de<mailto:seqan-dev-owner@lists.fu-berlin.de>>


When replying, please edit your Subject line s= o it is more specific
than "Re: Contents of seqan-dev digest...&quo= t;


Today's Topics:

 1. about extendSeed of Seqan (Be= ifang Niu)
 2. Re: about extendSeed of Seqan (Kehr, Birte)

<= br>----------------------------------------------------------------------
Message: 1
Date: Mon, 24 Oct 2011 14:50:33 -0700

From: Beifang Niu <neilniu.cn@gmail.com<mailto:neilniu.cn@gmail.com><mailto:neilniu.cn@gmail.com= <mailto:ne= ilniu.cn@gmail.com>>>

Subject: [Seqa= n-dev] about extendSeed of Seqan

To: seqan-dev@lists.fu-b= erlin.de<mailto:seqan-dev@lists.fu-berlin.de><mailto:seqan-dev@lists.fu-berl= in.de<mailto:seqan-dev@lists.fu-berlin.de>>
Message-ID:
&nbs= p;    <CABnPkb9P5nAvhQD_4in6mo+= Vtim6uhjWEFgzGMhxfb5XWuLQ2g@mail.gmail.com<mailto:CABnPkb9P5nAvhQD_4in6mo%2BVtim6uhjWEFgzGMhxfb5XWuLQ2g@mail.g= mail.com><mailto:CABnPkb9P5nAvhQ= D_4in6mo%2BVtim6uhjWEFgzGMhxfb5XWuLQ2g@mail.gmail.com<mailto:CABnPkb9P5nAvhQD_4in6mo%252BVtim6uhjWEFgzGMhxfb= 5XWuLQ2g@mail.gmail.com>>>

Content-T= ype: text/plain; charset=3D"iso-8859-1"

Hi,

I am tr= ying to use Seqan library to do the MEM (max exact match) extension.
Fir= stly, I get the MEM of the two genome sequences using MUMMER3 and then Iwant to use extendSeed of Seqan to do extension of MEMs.
Is there any e= xamples for extendSeed function of seeds class?

thanks,
Beifang.<= br>-------------- next part --------------
An HTML attachment was scrubb= ed...
URL: <https://li= sts.fu-berlin.de/pipermail/seqan-dev/attachments/20111024/bca37918/attachme= nt.htm>

------------------------------

Message: 2
D= ate: Tue, 25 Oct 2011 00:59:24 +0200

From: "= ;Kehr, Birte" <Birte.Kehr@fu-berlin.de<mailto:Birte.Kehr@fu-berlin.de><mailto:= Birte.Kehr@fu-= berlin.de<mailto:Birte.Kehr@fu-berlin.de>>>

Subject: Re: [Seqan-dev] about extendSeed of Seqan

<= p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:= auto'>To: SeqAn Development <seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.= fu-berlin.de><mailto:seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-= berlin.de>>>
Message-ID:
     <DAD226CB6878494EABEFD5215AA102015A8DE1CD68@exchange6.f= u-berlin.de<mailto:DAD226CB6878494EABEFD= 5215AA102015A8DE1CD68@exchange6.fu-berlin.de><mailto:DAD226CB6878494EABEFD5215AA102015A8DE1CD68@exchange6.fu-ber= lin.de<mailto:DAD226CB6878494EABEFD5215A= A102015A8DE1CD68@exchange6.fu-berlin.de>>>

=

Content-Type: text/plain; charset=3D"us-ascii"

Hi Beifang= ,

you can find an example for seed extension in the SeqAn-Tutorial a= t
http://trac.mi.fu-b= erlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtensionAndBandedAlignmen= t.

You might also want to consider to use the seeds2 module inst= ead of the seeds module since we plan to replace the seeds module by the se= eds2 module. Unfortunately, there is no example on how to use the seeds2 mo= dule, yet.

-Birte

From: Beifang Niu [mail= to:neilniu.cn@gma= il.com<mailto:neilniu.cn@gmail.com><mailto:neilniu.cn@gmail.com<mailto:neilniu.cn@gmail.com>>= ;]

Sent: Montag, 24. Oktober 2011 14:51

To: seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de= ><mailto:seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de<= /a>>>

seqan-dev@lists.fu-berlin.de<mailt= o:seqan-d= ev@lists.fu-berlin.de><mailto:seqan-dev@lists.fu-berlin.de<mailto:<= a href=3D"mailto:seqan-dev@lists.fu-berlin.de" target=3D"_blank">seqan-dev@= lists.fu-berlin.de>>

https://lists.fu-b= erlin.de/listinfo/seqan-dev


End of seqan-dev Digest, Vol 25,= Issue 6
****************************************

 

<= p class=3DMsoNormal> 

= --_000_DAD226CB6878494EABEFD5215AA102015A9527E3D2exchange6fube_-- From Birte.Kehr@fu-berlin.de Sat Oct 29 01:20:29 2011 Received: from outpost1.zedat.fu-berlin.de ([130.133.4.66]) by list1.zedat.fu-berlin.de (Exim 4.69) for seqan-dev@lists.fu-berlin.de with esmtp (envelope-from ) id <1RJvis-0007h3-9Q>; Sat, 29 Oct 2011 01:20:26 +0200 Received: from relay2.zedat.fu-berlin.de ([130.133.4.80]) by outpost1.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1RJvis-0006lY-2W>; Sat, 29 Oct 2011 01:20:26 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by relay2.zedat.fu-berlin.de (Exim 4.69) with esmtp (envelope-from ) id <1RJvir-0003mH-LI>; Sat, 29 Oct 2011 01:20:26 +0200 Received: from exchange6.fu-berlin.de ([160.45.9.133]) by exchange6.fu-berlin.de ([160.45.9.133]) with mapi; Sat, 29 Oct 2011 01:20:25 +0200 From: "Kehr, Birte" To: Beifang Niu , SeqAn Development Date: Sat, 29 Oct 2011 01:20:23 +0200 Thread-Topic: [Seqan-dev] about extendSeed of Seqan Thread-Index: AcyVxQYXyEcTLxY5TVKbIewjuqLuXAAAh6dw Message-ID: References: In-Reply-To: Accept-Language: en-US, de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US, de-DE Content-Type: multipart/alternative; boundary="_000_DAD226CB6878494EABEFD5215AA102015A9527E3D6exchange6fube_" MIME-Version: 1.0 X-Originating-IP: 160.45.9.133 X-ZEDAT-Hint: A X-purgate: clean X-purgate-type: clean X-purgate-ID: 151147::1319844026-000041E7-51340DCD/0-0/0-0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-Spam-Flag: NO X-Spam-Checker-Version: SpamAssassin 3.0.4 on Algerien.ZEDAT.FU-Berlin.DE X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED,HTML_60_70, HTML_MESSAGE Subject: Re: [Seqan-dev] about extendSeed of Seqan X-BeenThere: seqan-dev@lists.fu-berlin.de X-Mailman-Version: 2.1.11 Precedence: list Reply-To: SeqAn Development List-Id: SeqAn Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Oct 2011 23:20:29 -0000 --_000_DAD226CB6878494EABEFD5215AA102015A9527E3D6exchange6fube_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Beifang, the problem here is that the seed is defined on the original sequences, but= the extension is conducted on the infixes. So, we got some confusion about= sequence positions. If you specify the seed positions relative to the infi= x start positions, it will work. In your example this would be: typedef Infix::Type TInfix; TInfix infix0 =3D infix(seq0, 9 - 4, 9 + 7 + 4); TInfix infix1 =3D infix(seq1, 10 - 4, 10 + 7 + 4); TSeed seed(4, 4, 7); writeSeed(seed, infix0, infix1); extendSeed(seed, scoreDropOff, scoreMatrix, infix0, infix1, 0, GappedXDrop(= )); Sorry about that. -Birte From: Beifang Niu [mailto:neilniu.cn@gmail.com] Sent: Freitag, 28. Oktober 2011 15:58 To: Kehr, Birte Subject: Re: [Seqan-dev] about extendSeed of Seqan Hi Birte, I test the extension limit but it doesn't work. It still exceed the extensi= on limit. (attachment is source code) Is the extendSeed seed extension globally? I got one alignment with score -1 from 20 exact match seed extension and th= e match number is less than 20. It is not what I want and actually I just w= ant a best local extension around the seed and this extension includes the = seed part with score >=3D20 at least. There is also no extension score cutoff for globally extension. I tried use= extendSeedScore to do extension and set the score=3D20 (A reference to the= score of the seed. This will be increased by the score of the extension.) = . I still got the extension with score -1. It is so lower score extension. thanks, Beifang. On Fri, Oct 28, 2011 at 2:25 PM, Kehr, Birte > wrote: Hi Beifang, in non-template functions like main you do not need the keyword "typename". But keep the "::Type": typedef Infix::Type TInfix; -Birte From: Beifang Niu [mailto:neilniu.cn@gmail.com= ] Sent: Freitag, 28. Oktober 2011 14:21 To: Kehr, Birte Subject: Re: [Seqan-dev] about extendSeed of Seqan Hi Birte, There is error when I try to compile the limit extension code you provided. " typedef typename Infix::Type TInfix; TInfix infix0 =3D infix(seq0, _max(0,leftPosition(seed, 0)-100), _min(lengt= h(seq0),rightPosition(seed, 0)+ 101) ); " error message: ./seedextend.cpp: In function 'int main()': ./seedextend.cpp:48: error: using 'typename' outside of template when I changed the code like this: " typedef Infix TInfix; TInfix infix0 =3D infix(seq0, _max(0,leftPosition(seed, 0)-100), _min(lengt= h(seq0),rightPosition(seed, 0)+ 101) ); " error message: ./seedextend.cpp: In function 'int main()': ./seedextend.cpp:50: error: conversion from 'seqan::Segment, seqan::Alloc >, seqan::= InfixSegment>' to non-scalar type 'seqan::Infix, seqan::Alloc > >' requested any ideas? thanks, Beifang. On Fri, Oct 28, 2011 at 12:21 PM, Kehr, Birte > wrote: Hi Beifang, Apart from modifying the function extendSeed, there is another simple way h= ow you can limit the extension to up to 100bp. You can use infixes of the input sequences that you pass over to the functi= on extendSeed instead of seq0 and seq1, e.g.: typedef typename Infix::Type TInfix; TInfix infix0 =3D infix(seq0, _max(0, leftPosition(seed, 0) - 100), _min(le= ngth(seq0), rightPosition(seed, 0) + 101)); TInfix infix1 =3D infix(seq1, _max(0, leftPosition(seed, 1) - 100), _min(le= ngth(seq1), rightPosition(seed, 1) + 101)); extendSeed(..., infix0, infix1, 2, ...); In order to get the number of matching positions of an alignment you have t= o iterate over the alignment and test at every position if there is a gap i= n any of the sequences, and if not if the characters are equal. Have a look= at the Alignment Tutorial for an introduction to the Align data structure. http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Alignments -Birte From: Beifang Niu [mailto:neilniu.cn@gmail.com= ] Sent: Donnerstag, 27. Oktober 2011 17:17 To: Kehr, Birte Subject: Re: [Seqan-dev] about extendSeed of Seqan Hi Birte, I have other questions for you. One is about the scope of seed. I just want to do seed extension around see= d because there are so many seeds (maximal exact matchs) between two genome= s. There is just one seed extension example for two sequences in the tutorial.= Can I set the scope of seed extension? for example, I just want to do seed= extension within 100bps around the seed in two directions, not the extensi= on on the whole sequence. I checked the code of gapped extension and found the prefix() and suffix() = function. I don't know if it is feasible to modify these two functions to g= et the part prefix and suffix of the seed. It will be simple to ungapped extension and I can directly give a threshold= to limit the seed extension within 100bps. another question is : How do i get the match numbers from globalAlignment results of the seed? thank you, Beifang. On Thu, Oct 27, 2011 at 3:41 PM, Kehr, Birte > wrote: Yes, in the case of gapped X-drop extension you have to do globalAlignment.= The function extendSeed does not do the traceback and does not determine t= he number of matching positions. But it does compute maximal and minimal diagonals, such that you can band t= he global alignment (see the Tutorial example). What kind of score are you using? Would the score of the extensions help yo= u? -Birte ________________________________________ From: Beifang Niu [neilniu.cn@gmail.com] Sent: Thursday, October 27, 2011 11:15 PM To: Kehr, Birte Subject: Re: [Seqan-dev] about extendSeed of Seqan Hi Birte, Unfortunately, I have to use gapped X-drop extension. Do I have to do globalAlignment to get the matched number? I just need the= matched number after gapped seed extension and there will be increase in c= omputation time if I have to do globalAlignment for getting matched number. Any ideas? thanks, Beifang. On Thu, Oct 27, 2011 at 2:08 PM, Kehr, Birte >> wrote: Hi Beifang, the function extendSeed does not return the number of matched sequence posi= tions. I assume you have used ungapped X-drop extension? Then you can count matchi= ng positions by simply iterating over the infixes: typedef typename Infix::Type TInfix; TInfix infix1 =3D infix(seq1, leftPosition(seed, 0), rightPosition(seed, 0)= +1); TInfix infix2 =3D infix(seq2, leftPosition(seed, 1), rightPosition(seed, 1)= +1); unsigned count =3D 0; for(int i =3D 0; i < length(seed); ++i) { if (value(infix1, i) =3D=3D value(infix2, i)) ++count; } -Birte ________________________________________ From: Beifang Niu [neilniu.cn@gmail.com>] Sent: Thursday, October 27, 2011 9:19 PM To: Kehr, Birte Subject: Re: seqan-dev Digest, Vol 25, Issue 6 Hi Birte, Thank you for your prompt response but I didn't receive your reply from seq= an development mail list. I did see the example for seed extension in the SeqAn-Tutorial. Now, I hav= e other questions for you. How can I get the actual aligned bases number between two extended seeds a= fter running extendSeeds? for example, sequences: ACGTAGTTT and ACGTGGTTT , there is one seed GTTT,= after the extension of left , I got the extension seeds: ACGTAGTTT a= nd ACGTGGTTT ( there is only one mismatch) , the actual aligned bases numbe= r is 8. I want to get this number but I don;t know how to get it only running exte= ndSeeds. thank you, Beifang. On Tue, Oct 25, 2011 at 3:00 AM, >>>> wrote: Send seqan-dev mailing list submissions to seqan-dev@lists.fu-berlin.de>>> To subscribe or unsubscribe via the World Wide Web, visit https://lists.fu-berlin.de/listinfo/seqan-dev or, via email, send a message with subject or body 'help' to seqan-dev-request@lists.fu-berlin.de>>> You can reach the person managing the list at seqan-dev-owner@lists.fu-berlin.de>>> When replying, please edit your Subject line so it is more specific than "Re: Contents of seqan-dev digest..." Today's Topics: 1. about extendSeed of Seqan (Beifang Niu) 2. Re: about extendSeed of Seqan (Kehr, Birte) ---------------------------------------------------------------------- Message: 1 Date: Mon, 24 Oct 2011 14:50:33 -0700 From: Beifang Niu >>>> Subject: [Seqan-dev] about extendSeed of Seqan To: seqan-dev@lists.fu-berlin.de>>> Message-ID: <= mailto:CABnPkb9P5nAvhQD_4in6mo%2BVtim6uhjWEFgzGMhxfb5XWuLQ2g@mail.gmail.com= >>>> Content-Type: text/plain; charset=3D"iso-8859-1" Hi, I am trying to use Seqan library to do the MEM (max exact match) extension. Firstly, I get the MEM of the two genome sequences using MUMMER3 and then I want to use extendSeed of Seqan to do extension of MEMs. Is there any examples for extendSeed function of seeds class? thanks, Beifang. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Tue, 25 Oct 2011 00:59:24 +0200 From: "Kehr, Birte" >>>> Subject: Re: [Seqan-dev] about extendSeed of Seqan To: SeqAn Development >>>> Message-ID: >>>> Content-Type: text/plain; charset=3D"us-ascii" Hi Beifang, you can find an example for seed extension in the SeqAn-Tutorial at http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtensi= onAndBandedAlignment. You might also want to consider to use the seeds2 module instead of the see= ds module since we plan to replace the seeds module by the seeds2 module. U= nfortunately, there is no example on how to use the seeds2 module, yet. -Birte From: Beifang Niu [mailto:neilniu.cn@gmail.com= >>>] Sent: Montag, 24. Oktober 2011 14:51 To: seqan-dev@lists.fu-berlin.de>>> Subject: [Seqan-dev] about extendSeed of Seqan Hi, I am trying to use Seqan library to do the MEM (max exact match) extension. Firstly, I get the MEM of the two genome sequences using MUMMER3 and then I= want to use extendSeed of Seqan to do extension of MEMs. Is there any examples for extendSeed function of seeds class? thanks, Beifang. -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ seqan-dev mailing list seqan-dev@lists.fu-berlin.de>>> https://lists.fu-berlin.de/listinfo/seqan-dev End of seqan-dev Digest, Vol 25, Issue 6 **************************************** --_000_DAD226CB6878494EABEFD5215AA102015A9527E3D6exchange6fube_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi Beifang,=

 <= /p>

the problem here is that the se= ed is defined on the original sequences, but the extension is conducted on = the infixes. So, we got some confusion about sequence positions. If you spe= cify the seed positions relative to the infix start positions, it will work= .

 =

In your example t= his would be:

 

typedef Infix<DnaString>::Type TInfix;

TInfix infix0 =3D i= nfix(seq0, 9 - 4, 9 + 7 + 4);

TInfix infix1 =3D infix(seq1, 10 - 4, 10 + 7 + 4);=

=  

<= span lang=3DEN-US style=3D'font-size:10.0pt;font-family:"Courier New"'>TSee= d seed(4, 4, 7);

writeSeed(seed, infix0, infix1);

 

extendSeed(seed, sc= oreDropOff, scoreMatrix, infix0, infix1, 0, GappedXDrop());

<= o:p> 

Sorry a= bout that.

 

-Birte

 

 

From: Beifang= Niu [mailto:neilniu.cn@gmail.com]
Sent: Freitag, 28. Oktober 20= 11 15:58
To: Kehr, Birte
Subject: Re: [Seqan-dev] about= extendSeed of Seqan

&= nbsp;

Hi Birte,

 

I te= st the extension limit but it doesn't work. It still exceed the extension l= imit.  (attachment is source code)

Is the extendSeed seed extension globally?

 

I got one alignment with score -1 from 20 exact match seed extension a= nd the match number is less than 20. It is not what I want and actually I j= ust want a best local extension around the seed and this extension includes= the seed part with score >=3D20 at least.

There is also no extension score cutoff for globally exten= sion. I tried use extendSeedScore to do extension and set the score=3D20 (A= reference to the score of the seed. This will be increased by the score of= the extension.) . 

I st= ill got the extension with score -1. It is so lower score extension.

 

thanks,

Beifang.

 =

 

On Fri, Oct 28, 2011 at 2:25 PM, Kehr, Birte <Birte.Kehr@fu-berlin.de> wrote:<= o:p>

Hi Beifang,

 

in non-template functions like main you= do not need the keyword “typename”.

But keep the = “::Type”:

 

typedef Infix<DnaString>::Type TInfix;

&n= bsp;

-Birte

&nb= sp;

 

From: Beifang Niu [mailto:neilniu.cn@gmail.com]
Sent: Freit= ag, 28. Oktober 2011 14:21


To: Kehr, Birte
Subject: Re: [Seqan-dev] about ext= endSeed of Seqan

 

Hi Birte,

 

There is error when I try to compile the limit extensio= n code you provided.

 

<= /div>

"

typedef typename In= fix<DnaString>::Type TInfix;

T= Infix infix0 =3D infix(seq0, _max(0,leftPosition(seed, 0)-100), _min(length= (seq0),rightPosition(seed, 0)+ 101) );

"

error message:  

 

./seedexten= d.cpp: In function 'int main()':

./seedexte= nd.cpp:48: error: using 'typename' outside of template

=

 

when I changed the= code like this:

 

"

typedef Infix<D= naString> TInfix;

TInfix infix0 =3D infi= x(seq0, _max(0,leftPosition(seed, 0)-100), _min(length(seq0),rightPosition(= seed, 0)+ 101) );

"

error message:

 

./seedextend.cpp: In function 'int main= ()':

./seedextend.cpp:50: error: conversion= from 'seqan::Segment<seqan::String<seqan::SimpleType<unsigned cha= r, seqan::Dna_>, seqan::Alloc<void> >, seqan::InfixSegment>'= to non-scalar type 'seqan::Infix<seqan::String<seqan::SimpleType<= unsigned char, seqan::Dna_>, seqan::Alloc<void> > >' request= ed

 

any ideas?

 

=

thanks,

Beifang.

=

 

On Fri, Oct 28, 2011 at 12:21 = PM, Kehr, Birte <Birte.Kehr@fu-berlin.de> wrote:

Hi Beifang,

 

Apart from modifying the function extendSeed= , there is another simple way how you can limit the extension to up to 100b= p.

You can use infixes of the input sequences that you pass ov= er to the function extendSeed instead of seq0 and seq1, e.g.:

=  

typedef typename Infix<DnaString>::Type TInfix;

TInfix infix0 =3D infix(seq0, _max(0, leftPosition(seed, 0) = 211; 100), _min(length(seq0), rightPosition(seed, 0) + 101)); <= /o:p>

TInfix infix1 =3D infix(seq1, _max(0, leftPosition(seed, 1) – 100), = _min(length(seq1), rightPosition(seed, 1) + 101));

extendSeed= (…, infix0, infix1, 2, …);

 

In = order to get the number of matching positions of an alignment you have to i= terate over the alignment and test at every position if there is a gap in a= ny of the sequences, and if not if the characters are equal. Have a look at= the Alignment Tutorial for an introduction to the Align data structure.

http://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Al= ignments

 

-Birte

<= p class=3DMsoNormal style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:= auto'> 

 

From: Beifang = Niu [mailto:neiln= iu.cn@gmail.com]
Sent: Donnerstag, 27.
Oktober 2011 17:17


To: Kehr, Birte
Subject: Re: [Seqan-dev]= about extendSeed of Seqan

 

Hi Birte,

 

I have other questions for you.

=

One is about the scope of seed. I just want to do seed ex= tension around seed because there are so many seeds (maximal exact matchs) = between two genomes.

There is just one seed= extension example for two sequences in the tutorial. Can I set the scope o= f seed extension? for example, I just want to do seed extension within 100b= ps around the seed in two directions, not the extension on the whole sequen= ce.

I checked the code of gapped extension = and found the prefix() and suffix() function. I don't know if it is feasibl= e to modify these two functions to get the part prefix and suffix of the se= ed. 

It will be simple to ungapped ext= ension and I can directly give a threshold to limit the seed extension with= in 100bps.

 

=

 

another question is : 

 

How d= o i get the match numbers from globalAlignment results of the seed?

 

thank you,<= o:p>

Beifang.

&= nbsp;

On Thu, Oct 27, 2011 at 3:41 PM,= Kehr, Birte <Birte.Kehr@fu-berlin.de> wrote:

Yes, in the = case of gapped X-drop extension you have to do globalAlignment. The functio= n extendSeed does not do the traceback and does not determine the number of= matching positions.
But it does compute maximal and minimal diagonals, = such that you can band the global alignment (see the Tutorial example).
=
What kind of score are you using? Would the score of the extensions hel= p you?


-Birte

________________________= ________________
From: Beifang Niu [neilniu.cn@gmail.com]

= Sent: Thursday, October 27, 2011 11:15 PM
To: Kehr, Birte
Subject: Re= : [Seqan-dev] about extendSeed of Seqan


Hi Birte,

Unfortunately, I have to use gapped X-drop extension.
Do I hav= e to do globalAlignment to get the matched number?  I just need the ma= tched number after gapped seed extension and there will be increase in comp= utation time if I have to do globalAlignment for getting matched number.Any ideas?

thanks,
Beifang.

On T= hu, Oct 27, 2011 at 2:08 PM, Kehr, Birte <Birte.Kehr@fu-berlin.de<mailto:Birte.Kehr@fu-berlin= .de>> wrote:
Hi Beifang,

the function extendSeed does n= ot return the number of matched sequence positions.

I assume you hav= e used ungapped X-drop extension? Then you can count matching positions by = simply iterating over the infixes:

typedef typename Infix<TSeq>= ;::Type TInfix;
TInfix infix1 =3D infix(seq1, leftPosition(seed, 0), rig= htPosition(seed, 0)+1);
TInfix infix2 =3D infix(seq2, leftPosition(seed,= 1), rightPosition(seed, 1)+1);

unsigned count =3D 0;
for(int i = =3D 0; i < length(seed); ++i)
{
  if (value(infix1, i) =3D=3D= value(infix2, i))
      ++count;
}

-Birte
<= br>________________________________________

From= : Beifang Niu [ne= ilniu.cn@gmail.com<mailto:neilniu.cn@gmail.com>]

Sent: T= hursday, October 27, 2011 9:19 PM
To: Kehr, Birte
Subject: Re: seqan-= dev Digest, Vol 25, Issue 6

Hi Birte,

Thank you for your prom= pt response but I didn't receive your reply from seqan development mail lis= t.
 I did see the example for seed extension in the SeqAn-Tutorial.= Now, I have other questions for you.
 How can I get the actual ali= gned bases number between two extended seeds after running extendSeeds?
=  for example, sequences: ACGTAGTTT  and ACGTGGTTT , there is one = seed GTTT, after the extension of left , I got the extension seeds:   =   ACGTAGTTT   and ACGTGGTTT ( there is only one mismatch) , the a= ctual aligned bases number is  8.
 I want to get this number b= ut I don;t know how to get it only running extendSeeds.



than= k you,
Beifang.

On Tue, Oct 25, 2011 at = 3:00 AM, <seqan-dev-request@lists.fu-berlin.de<mailto:seqan-dev-req= uest@lists.fu-berlin.de><mailto:seqan-dev-request@lists.fu-berlin.= de<mailto:seqan-dev-request@lists.fu-berlin.de>>> wrote:=
Send seqan-dev mailing list submissions to

&= nbsp;    seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de<= /a>><mailto:seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de= >>


To subscribe or unsubscribe via the = World Wide Web, visit
     https://lists.fu-berlin.de/li= stinfo/seqan-dev
or, via email, send a message with subject or body = 'help' to

     seqan-dev-request@l= ists.fu-berlin.de<mailto:seqan-dev-request@lists.fu-berlin.de>= <mailto:seqan-dev-request@lists.fu-berlin.de<mailto:seqan-dev-req= uest@lists.fu-berlin.de>>


You can r= each the person managing the list at

  &nbs= p;  seqan-dev-owner@lists.fu-berlin.de<mailto:seqan-dev-owner@lists.f= u-berlin.de><mailto:seqan-dev-owner@lists.fu-berlin.de<mailto= :se= qan-dev-owner@lists.fu-berlin.de>>


= When replying, please edit your Subject line so it is more specific
than= "Re: Contents of seqan-dev digest..."


Today's Topics:=

 1. about extendSeed of Seqan (Beifang Niu)
 2. Re: ab= out extendSeed of Seqan (Kehr, Birte)


--------------------------= --------------------------------------------

Message: 1
Date: Mon= , 24 Oct 2011 14:50:33 -0700

From: Beifang Niu &= lt;neilniu.cn@gma= il.com<mailto:neilniu.cn@gmail.com><mailto:neilniu.cn@gmail.com<mailto:neilniu.cn@gmail.com>>= ;>

Subject: [Seqan-dev] about extendSeed of Se= qan

To: seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists= .fu-berlin.de><mailto:seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.= fu-berlin.de>>
Message-ID:
     <CABnPkb9P5nAvhQD_4in6mo+Vtim6uhjWEFgzGMhxfb5XWuLQ2g= @mail.gmail.com<mailto:CABnPkb9P5nA= vhQD_4in6mo%2BVtim6uhjWEFgzGMhxfb5XWuLQ2g@mail.gmail.com><mailto:= CABnPkb9P5nAvhQD_4in6mo%2BVtim6uhjWEFgzGMh= xfb5XWuLQ2g@mail.gmail.com<mailto:CABnPkb9P5nAvhQD_4in6mo%252BVtim6uhjWEFgzGMhxfb5XWuLQ2g@mail.gmail.com= >>>

Content-Type: text/plain; charset=3D= "iso-8859-1"

Hi,

I am trying to use Seqan library t= o do the MEM (max exact match) extension.
Firstly, I get the MEM of the = two genome sequences using MUMMER3 and then I
want to use extendSeed of = Seqan to do extension of MEMs.
Is there any examples for extendSeed func= tion of seeds class?

thanks,
Beifang.
-------------- next part= --------------
An HTML attachment was scrubbed...
URL: <https://lists.fu-berlin.de/pipermai= l/seqan-dev/attachments/20111024/bca37918/attachment.htm>

---= ---------------------------

Message: 2
Date: Tue, 25 Oct 2011 00:= 59:24 +0200

From: "Kehr, Birte" <Birte.Kehr@fu-be= rlin.de<mailto:Birte.Kehr@fu-berlin.de><mailto:Birte.Kehr@fu-berlin.de<mailto:<= a href=3D"mailto:Birte.Kehr@fu-berlin.de" target=3D"_blank">Birte.Kehr@fu-b= erlin.de>>>

Subject: Re: [Seqan-dev]= about extendSeed of Seqan

To: SeqAn Developme= nt <se= qan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de><mai= lto:seqan= -dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de>>>Message-ID:
     <DAD226= CB6878494EABEFD5215AA102015A8DE1CD68@exchange6.fu-berlin.de<mailto:<= a href=3D"mailto:DAD226CB6878494EABEFD5215AA102015A8DE1CD68@exchange6.fu-be= rlin.de" target=3D"_blank">DAD226CB6878494EABEFD5215AA102015A8DE1CD68@excha= nge6.fu-berlin.de><mailto:DAD226CB687= 8494EABEFD5215AA102015A8DE1CD68@exchange6.fu-berlin.de<mailto:DAD226CB6878494EABEFD5215AA102015A8DE1CD68@exchange6.= fu-berlin.de>>>

Content-Type: text/plain;= charset=3D"us-ascii"

Hi Beifang,

you can find an e= xample for seed extension in the SeqAn-Tutorial at
http://trac.mi.fu-berlin.de/seqan/wiki/Tutori= al/Seed-and-Extend#SeedExtensionAndBandedAlignment.

You might al= so want to consider to use the seeds2 module instead of the seeds module si= nce we plan to replace the seeds module by the seeds2 module. Unfortunately= , there is no example on how to use the seeds2 module, yet.

-Birte

From: Beifang Niu [mailto:neilniu.cn@gmail.com<mailto:neilniu.cn@gmail.com><mailto:= neilniu.cn@gmail.com<mailto:neilniu.cn@gmail.com>>]

Sent: Montag, 24. Oktober 2011 14:51

To: seqan-dev@lis= ts.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de><mailto:seqan-dev@lists.= fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de>>

Subject: [Seqan-dev] about extendSeed of Seqan

Hi,

= I am trying to use Seqan library to do the MEM (max exact match) extension.=
Firstly, I get the MEM of the two genome sequences using MUMMER3 and th= en I want to use extendSeed of Seqan to do extension of MEMs.
Is there a= ny examples for extendSeed function of seeds class?

thanks,
Beifa= ng.
-------------- next part --------------
An HTML attachment was sc= rubbed...
URL: <https:= //lists.fu-berlin.de/pipermail/seqan-dev/attachments/20111025/a7b787b0/atta= chment.htm>

------------------------------

___________= ____________________________________
seqan-dev mailing list

seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de>= ;<mailto:seqan-dev@lists.fu-berlin.de<mailto:seqan-dev@lists.fu-berlin.de>&g= t;

 =

 

<= /div>

 

= --_000_DAD226CB6878494EABEFD5215AA102015A9527E3D6exchange6fube_--