To cite David: > the only search that supports wildcards is Shift-And for exact pattern matching or Myers for approximate matching. Both are single pattern searches. To have a multi-pattern Aho-Corasick with wildcards, all bases have to be enumerated at X-positions which would blow the string trie up. To resolve this, identical paths could be merged, sounds like a BSc. thesis. > > David Am 20.10.2010 um 09:39 schrieb Johannes Junker: > Hi, > > I was just wondering if it is possible in seqan to use wildcard > characters within the haystack. As far as I understood from the > documentation, a wildcard search is only possible for a needle > containing wildcard characters against some haystack. However, in the > case below, the haystack all_protein_sequences may contain ambiguous > characters (e.g. an X should match all possible amino acid letters in > the needle, a J should match only I and L, and so on), whereas the > needles themselves do not contain any ambiguous characters. In the > current implementation, the protein sequences containing these > wildcard characters are not matched with their corresponding needles. > Is there some clever way to do this? > > 157 seqan::Finder<seqan::String<char> > finder(all_protein_sequences); > 158 seqan::Pattern<seqan::StringSet<seqan::String<char> >, > seqan::AhoCorasick > pattern(needle); > 159 > 160 seqan::String<seqan::Pair<Size, Size> > pat_hits; > 161 Map<Size, vector<Size> > peptide_to_indices; > 162 writeDebug_("Finding peptide/protein matches...", 1); > 163 while (find(finder, pattern)) > 164 { > 165 seqan::appendValue(pat_hits, seqan::Pair<Size, > Size>(position(pattern), position(finder))); > 166 peptide_to_indices[position(pattern)].push_back(position(finder)); > 167 } > > Thanks in advance! > > Best, > Johannes > > _______________________________________________ > seqan-dev mailing list > seqan-dev@lists.fu-berlin.de > https://lists.fu-berlin.de/listinfo/seqan-dev