FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

[Seqan-dev] Wildcard characters in the haystack?

thread -->
date -->
  • From: Johannes Junker <dr.kugelmehl@googlemail.com>
  • To: seqan-dev@lists.fu-berlin.de
  • Date: Wed, 20 Oct 2010 09:39:26 +0200
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=mhvejgXfYjt1LCgfIECei2w2WDgYFMuK+gH5Eqw/7wTmoCNdvvDnyKUNQzTc0rzMWE 8svr/qfongAHD51N6PNoe0rMR8D4qjYBTQub2c7+/ZNiNA0qFBBA8YM89wiAWFzLc4p2 1Zu2C3NwZ5p/Nj588Qx/vn0ym6JvfNnJiP1/8=
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: [Seqan-dev] Wildcard characters in the haystack?

Hi,

I was just wondering if it is possible in seqan to use wildcard
characters within the haystack. As far as I understood from the
documentation, a wildcard search is only possible for a needle
containing wildcard characters against some haystack. However, in the
case below, the haystack all_protein_sequences may contain ambiguous
characters (e.g. an X should match all possible amino acid letters in
the needle, a J should match only I and L, and so on), whereas the
needles themselves do not contain any ambiguous characters. In the
current implementation, the protein sequences containing these
wildcard characters are not matched with their corresponding needles.
Is there some clever way to do this?

157   seqan::Finder<seqan::String<char> > finder(all_protein_sequences);
158   seqan::Pattern<seqan::StringSet<seqan::String<char> >,
seqan::AhoCorasick > pattern(needle);
159 	
160 	seqan::String<seqan::Pair<Size, Size> > pat_hits;
161 	Map<Size, vector<Size> > peptide_to_indices;
162 	writeDebug_("Finding peptide/protein matches...", 1);
163 	while (find(finder, pattern))
164 	{
165 	    seqan::appendValue(pat_hits, seqan::Pair<Size,
Size>(position(pattern), position(finder)));
166 	    peptide_to_indices[position(pattern)].push_back(position(finder));
167 	}

Thanks in advance!

Best,
Johannes



thread -->
date -->
  • Follow-Ups:
    • Re: [Seqan-dev] Wildcard characters in the haystack?
      • From: "Holtgrewe, Manuel" <manuel.holtgrewe@fu-berlin.de>
  • seqan-dev - October 2010 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal