Re: [Seqan-dev] SwiftLocal specialization with Hamming distance - problems with WatermanEggert
Hi Birte,
Kehr, Birte wrote:
Hi Fabian,
I added an example for the BandedWatermanEggert in demos/alignment_local.cpp.
Your problem was that you did not specify the band - the lowest diagonal and the highest diagonal that is to be computed in the alignment matrix.
Also make sure that you have updated to the current version of SeqAn. (I have fixed a small bug in the banded local alignment.)
The (non-banded) WatermanEggert is throwing a "Bad access memory
exception" from time to time (even in the alignment example under
demos). Its difficult to narrow down what the problem actually is. I
take it thats the reason why the example is commented out in the first
place?
Apologies for the confusion about the commented WatermanEggert example.
I did that myself and forgot about it.
Here, I need some more information from you. In the current version no example is commented out in demos/alignment_local.cpp. I also do not get an exception. Could you send me the code that is producing the error message?
Again, make sure that you have checked out the current version of SeqAn.
I've updated my local version to the current trunk but I still get a
"Bus error" when running the Waterman Eggert algorithm. In fact the very
first of the examples of alignment_local demo causes the problem. I'm
working on i686-apple-darwin9-gcc-4.0.1 on a Mac osx.
I dump the debugger tree below:
#0 0x0000725a in seqan::previousPosition<int, 0u, unsigned long> at
matrix_base.h:413
#1 0x0000813c in seqan::goPrevious<int, 0u> at matrix_base.h:543
#2 0x0001fd90 in seqan::_smith_waterman_declump<int,
seqan::String<seqan::SimpleType<unsigned char, seqan::_Dna>,
seqan::Alloc<void> >, seqan::ArrayGaps> at align_local_dynprog.h:414
#3 0x00020654 in
seqan::_smithWatermanGetNext<seqan::String<seqan::SimpleType<unsigned
char, seqan::_Dna>, seqan::Alloc<void> >, seqan::ArrayGaps, int> at
align_local_dynprog.h:765
#4 0x00020886 in
seqan::localAlignment<seqan::String<seqan::SimpleType<unsigned char,
seqan::_Dna>, seqan::Alloc<void> >, seqan::ArrayGaps, int, int, int> at
align_local_dynprog.h:841
#5 0x000036e9 in main at alignment_local.cpp:26
The variables in previousPosition contain:
position_ = 3221222916
dimension_ = 1
while
me.data_factors is pretty much empty (capacity=0)
So I take it that is has not been initialized properly.
Your other remark:
... the other two alignments clearly overlap with the sequence parts used in the first alignment. I thought the WatermanEggert would filter these alignments as not feasible(?) Is this a bug or am I mistaken?
The Waterman-Eggert algorithm computes no alignments where a character from seq1 is matched to the same character in seq2 as in a previous alignment. Sequence parts are allowed to overlap, only the traces in the alignment matrix are not allowed to overlap. In your example, there are different possibilities to match a G from one sequence to a G in the other sequence. The pairs of characters that are matched to each other are all different.
Thanks for making that clear.
Also this method in find_swift.h [...] complains about TText &text and would rather prefer a TText const &text
(at least for the data structure I input).
We have fixed a c&p bug here, it should work now. Thanks.
In my applications I do not specify the text separately. If you have a reason for specifying it, I would be curious about it.
I'm not doing that either. The compiler just complained about it.
Cheers,
Fabian
Cheers,
Birte
Kehr, Birte wrote:
Hi Fabian,
you are right, there is no Hamming specialization for SwiftLocal in
SeqAn yet.
I am currently working on a verification strategy for the more general
edit distance version.
In your case, I would suggest to use the more general edit distance
filter (SwiftLocal). Swift is only a filter algorithm, so all hamming
distance matches will be contained in the results from the edit
distance version. And you will have to verify the reported hits in any
case.
You are also right that the local Swift computes not only the
positions in the haystack but also the positions in the needle. The
local version is thought to be a filter for local alignments between
two long sequences.
Once you have called the find function on a finder and pattern, e.g.
find(finder, pattern, epsilon, minLength), you can obtain the
positions of a hit in the haystack and needle with the function
positionRange(finder) and postionRange(pattern), and the corresponding
sequences with infix(finder) and infix(pattern).
For the verification , you should be aware that Swift only guarantees
to report regions that **overlap** with possible epsilon-matches. My
suggestion is to use banded local alignment (BandedWatermanEggert) on
the swift hits (parallelograms). The local alignments could be used as
seeds for ungapped extension (UngappedXDrop). Here are the
corresponding links to the SeqAn documentation:
http://www.seqan.de/dddoc/html_devel/FUNCTION.local_Alignment.html
http://www.seqan.de/dddoc/html_devel/FUNCTION.extend_Seed.html
You may also find the sections about local alignment and seed
extension of the SeqAn tutorial interesting:
https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Alignments#Local
https://trac.mi.fu-berlin.de/seqan/wiki/Tutorial/Seed-and-Extend#SeedExtensionAndBandedAlignment
For the details of your verification step you will have to see what is
appropriate for your special application.
Cheers,
Birte
------------------------------------------------------------------------
_______________________________________________
seqan-dev mailing list
seqan-dev@lists.fu-berlin.de
https://lists.fu-berlin.de/listinfo/seqan-dev
--
Fabian Buske
Institute for Molecular Bioscience
The University of Queensland
Brisbane, Qld. 4072 Australia
Phone: (61)-(7)-334-62608