FU Logo
  • Startseite
  • Kontakt
  • Impressum
  • Home
  • Listenauswahl
  • Anleitungen

Re: [Seqan-dev] Wildcard characters in the haystack?

<-- thread -->
<-- date -->
  • From: "Holtgrewe, Manuel" <manuel.holtgrewe@fu-berlin.de>
  • To: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Date: Fri, 22 Oct 2010 13:40:54 +0200
  • Acceptlanguage: en-US, de-DE
  • Reply-to: SeqAn Development <seqan-dev@lists.fu-berlin.de>
  • Subject: Re: [Seqan-dev] Wildcard characters in the haystack?

To cite David:

> the only search that supports wildcards is Shift-And for exact pattern matching or Myers for approximate matching. Both are single pattern searches. To have a multi-pattern Aho-Corasick with wildcards, all bases have to be enumerated at X-positions which would blow the string trie up. To resolve this, identical paths could be merged, sounds like a BSc. thesis.
> 
> David

Am 20.10.2010 um 09:39 schrieb Johannes Junker:

> Hi,
> 
> I was just wondering if it is possible in seqan to use wildcard
> characters within the haystack. As far as I understood from the
> documentation, a wildcard search is only possible for a needle
> containing wildcard characters against some haystack. However, in the
> case below, the haystack all_protein_sequences may contain ambiguous
> characters (e.g. an X should match all possible amino acid letters in
> the needle, a J should match only I and L, and so on), whereas the
> needles themselves do not contain any ambiguous characters. In the
> current implementation, the protein sequences containing these
> wildcard characters are not matched with their corresponding needles.
> Is there some clever way to do this?
> 
> 157   seqan::Finder<seqan::String<char> > finder(all_protein_sequences);
> 158   seqan::Pattern<seqan::StringSet<seqan::String<char> >,
> seqan::AhoCorasick > pattern(needle);
> 159 	
> 160 	seqan::String<seqan::Pair<Size, Size> > pat_hits;
> 161 	Map<Size, vector<Size> > peptide_to_indices;
> 162 	writeDebug_("Finding peptide/protein matches...", 1);
> 163 	while (find(finder, pattern))
> 164 	{
> 165 	    seqan::appendValue(pat_hits, seqan::Pair<Size,
> Size>(position(pattern), position(finder)));
> 166 	    peptide_to_indices[position(pattern)].push_back(position(finder));
> 167 	}
> 
> Thanks in advance!
> 
> Best,
> Johannes
> 
> _______________________________________________
> seqan-dev mailing list
> seqan-dev@lists.fu-berlin.de
> https://lists.fu-berlin.de/listinfo/seqan-dev




<-- thread -->
<-- date -->
  • References:
    • [Seqan-dev] Wildcard characters in the haystack?
      • From: Johannes Junker <dr.kugelmehl@googlemail.com>
  • seqan-dev - October 2010 - Archives indexes sorted by:
    [ thread ] [ subject ] [ author ] [ date ]
  • Complete archive of the seqan-dev mailing list
  • More info on this list...

Hilfe

  • FAQ
  • Dienstbeschreibung
  • ZEDAT Beratung
  • postmaster@lists.fu-berlin.de

Service-Navigation

  • Startseite
  • Listenauswahl

Einrichtung Mailingliste

  • ZEDAT-Portal
  • Mailinglisten Portal