Re: [Seqan-dev] SwiftLocal specialization with Hamming distance


Hi Fabian,

 

you are right, there is no Hamming specialization for SwiftLocal in SeqAn yet.

I am currently working on a verification strategy for the more general edit distance version.

 

In your case, I would suggest to use the more general edit distance filter (SwiftLocal). Swift is only a filter algorithm, so all hamming distance matches will be contained in the results from the edit distance version. And you will have to verify the reported hits in any case.

 

You are also right that the local Swift computes not only the positions in the haystack but also the positions in the needle. The local version is thought to be a filter for local alignments between two long sequences.

Once you have called the find function on a finder and pattern, e.g. find(finder, pattern, epsilon, minLength), you can obtain the positions of a hit in the haystack and needle with the function positionRange(finder) and postionRange(pattern), and the corresponding sequences with infix(finder) and infix(pattern).

 

For the verification , you should be aware that Swift only guarantees to report regions that *overlap* with possible epsilon-matches. My suggestion is to use banded local alignment (BandedWatermanEggert) on the swift hits (parallelograms). The local alignments could be used as seeds for ungapped extension (UngappedXDrop). Here are the corresponding links to the SeqAn documentation:

http://www.seqan.de/dddoc/html_devel/FUNCTION.local_Alignment.html

http://www.seqan.de/dddoc/html_devel/FUNCTION.extend_Seed.html

 

You may also find the sections about local alignment and seed extension of the SeqAn tutorial interesting:

https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Alignments#Local

https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtensionAndBandedAlignment

 

For the details of your verification step you will have to see what is appropriate for your special application.

 

Cheers,

Birte